linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
@ 2019-03-01 14:03 Will Deacon
  2019-03-01 14:03 ` [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
                   ` (20 more replies)
  0 siblings, 21 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

Hi everybody,

This is a non-RFC posting of the RFC previously posted here:

  https://lwn.net/ml/linux-kernel/20190222185026.10973-1-will.deacon@arm.com/

There have been some significant changes since then, including:

  * Reduced mmiowb_spin_{lock,unlock}() overhead
  * Support for the arch code to provide the mmiowb_state storage
  * Moved PowerPC over to the generic algorithm
  * Now compiles on x86 and some obscure ia64 configurations
  * A bunch of minor tweaks

As before, this is based on a couple of other I/O-related series that I have
pending:

  https://lkml.org/lkml/2019/2/22/484	// Queued by paulmck, pending review
  https://lkml.org/lkml/2019/2/22/500	// Queued in arm64

so I've pushed the whole lot out on a branch for your comfort and convenience:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git io

Feedback welcome,

Will

Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Tony Luck <tony.luck@intel.com>

--->8

Will Deacon (20):
  asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  arch: Use asm-generic header for asm/mmiowb.h
  mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
  ARM/io: Remove useless definition of mmiowb()
  arm64/io: Remove useless definition of mmiowb()
  x86/io: Remove useless definition of mmiowb()
  nds32/io: Remove useless definition of mmiowb()
  m68k/io: Remove useless definition of mmiowb()
  sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
  riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
  Documentation: Kill all references to mmiowb()
  drivers: Remove useless trailing comments from mmiowb() invocations
  drivers: Remove explicit invocations of mmiowb()
  scsi/qla1280: Remove stale comment about mmiowb()
  i40iw: Redefine i40iw_mmiowb() to do nothing
  net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
  arch: Remove dummy mmiowb() definitions from arch code

 Documentation/driver-api/device-io.rst             |  45 ---------
 Documentation/driver-api/pci/p2pdma.rst            |   4 -
 Documentation/memory-barriers.txt                  | 103 +--------------------
 arch/alpha/include/asm/Kbuild                      |   1 +
 arch/alpha/include/asm/io.h                        |   2 -
 arch/arc/include/asm/Kbuild                        |   1 +
 arch/arm/include/asm/Kbuild                        |   1 +
 arch/arm/include/asm/io.h                          |   2 -
 arch/arm64/include/asm/Kbuild                      |   1 +
 arch/arm64/include/asm/io.h                        |   2 -
 arch/c6x/include/asm/Kbuild                        |   1 +
 arch/csky/include/asm/Kbuild                       |   1 +
 arch/h8300/include/asm/Kbuild                      |   1 +
 arch/hexagon/include/asm/Kbuild                    |   1 +
 arch/hexagon/include/asm/io.h                      |   2 -
 arch/ia64/include/asm/io.h                         |  17 ----
 arch/ia64/include/asm/mmiowb.h                     |  25 +++++
 arch/ia64/include/asm/spinlock.h                   |   2 +
 arch/m68k/include/asm/Kbuild                       |   1 +
 arch/m68k/include/asm/io_mm.h                      |   2 -
 arch/microblaze/include/asm/Kbuild                 |   1 +
 arch/mips/include/asm/io.h                         |   3 -
 arch/mips/include/asm/mmiowb.h                     |  11 +++
 arch/mips/include/asm/spinlock.h                   |  15 +++
 arch/nds32/include/asm/Kbuild                      |   1 +
 arch/nds32/include/asm/io.h                        |   2 -
 arch/nios2/include/asm/Kbuild                      |   1 +
 arch/openrisc/include/asm/Kbuild                   |   1 +
 arch/parisc/include/asm/Kbuild                     |   1 +
 arch/parisc/include/asm/io.h                       |   2 -
 arch/powerpc/Kconfig                               |   1 +
 arch/powerpc/include/asm/io.h                      |  33 +------
 arch/powerpc/include/asm/mmiowb.h                  |  18 ++++
 arch/powerpc/include/asm/paca.h                    |   6 +-
 arch/powerpc/include/asm/spinlock.h                |  17 ----
 arch/powerpc/xmon/xmon.c                           |   5 +-
 arch/riscv/Kconfig                                 |   1 +
 arch/riscv/include/asm/io.h                        |  15 +--
 arch/riscv/include/asm/mmiowb.h                    |  14 +++
 arch/s390/include/asm/Kbuild                       |   1 +
 arch/sh/include/asm/io.h                           |   3 -
 arch/sh/include/asm/mmiowb.h                       |  12 +++
 arch/sh/include/asm/spinlock-llsc.h                |   2 +
 arch/sparc/include/asm/Kbuild                      |   1 +
 arch/sparc/include/asm/io_64.h                     |   2 -
 arch/um/include/asm/Kbuild                         |   1 +
 arch/unicore32/include/asm/Kbuild                  |   1 +
 arch/x86/include/asm/Kbuild                        |   1 +
 arch/x86/include/asm/io.h                          |   2 -
 arch/xtensa/include/asm/Kbuild                     |   1 +
 drivers/crypto/cavium/nitrox/nitrox_reqmgr.c       |   4 -
 drivers/dma/txx9dmac.c                             |   3 -
 drivers/firewire/ohci.c                            |   1 -
 drivers/gpu/drm/i915/intel_hdmi.c                  |  10 --
 drivers/ide/tx4939ide.c                            |   2 -
 drivers/infiniband/hw/hfi1/chip.c                  |   3 -
 drivers/infiniband/hw/hfi1/pio.c                   |   1 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c         |   2 -
 drivers/infiniband/hw/i40iw/i40iw_osdep.h          |   2 +-
 drivers/infiniband/hw/mlx4/qp.c                    |   6 --
 drivers/infiniband/hw/mlx5/qp.c                    |   1 -
 drivers/infiniband/hw/mthca/mthca_cmd.c            |   6 --
 drivers/infiniband/hw/mthca/mthca_cq.c             |   5 -
 drivers/infiniband/hw/mthca/mthca_qp.c             |  17 ----
 drivers/infiniband/hw/mthca/mthca_srq.c            |   6 --
 drivers/infiniband/hw/qedr/verbs.c                 |  12 ---
 drivers/infiniband/hw/qib/qib_iba6120.c            |   4 -
 drivers/infiniband/hw/qib/qib_iba7220.c            |   3 -
 drivers/infiniband/hw/qib/qib_iba7322.c            |   3 -
 drivers/infiniband/hw/qib/qib_sd7220.c             |   4 -
 drivers/media/pci/dt3155/dt3155.c                  |   8 --
 drivers/memstick/host/jmb38x_ms.c                  |   4 -
 drivers/misc/ioc4.c                                |   2 -
 drivers/misc/mei/hw-me.c                           |   3 -
 drivers/misc/tifm_7xx1.c                           |   1 -
 drivers/mmc/host/alcor.c                           |   1 -
 drivers/mmc/host/sdhci.c                           |  13 ---
 drivers/mmc/host/tifm_sd.c                         |   3 -
 drivers/mmc/host/via-sdmmc.c                       |  10 --
 drivers/mtd/nand/raw/r852.c                        |   2 -
 drivers/mtd/nand/raw/txx9ndfmc.c                   |   1 -
 drivers/net/ethernet/aeroflex/greth.c              |   1 -
 drivers/net/ethernet/alacritech/slicoss.c          |   4 -
 drivers/net/ethernet/amazon/ena/ena_com.c          |   1 -
 drivers/net/ethernet/atheros/atlx/atl1.c           |   1 -
 drivers/net/ethernet/atheros/atlx/atl2.c           |   1 -
 drivers/net/ethernet/broadcom/bnx2.c               |   4 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c    |   2 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h    |   4 -
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    |   1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |  29 ------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c     |   1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c  |   2 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c   |   4 -
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   3 -
 drivers/net/ethernet/broadcom/tg3.c                |   6 --
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   |  10 --
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   1 -
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c |   4 -
 .../net/ethernet/cavium/liquidio/request_manager.c |   1 -
 drivers/net/ethernet/intel/e1000/e1000_main.c      |   5 -
 drivers/net/ethernet/intel/e1000e/netdev.c         |   7 --
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c       |   2 -
 drivers/net/ethernet/intel/fm10k/fm10k_main.c      |   5 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   5 -
 drivers/net/ethernet/intel/iavf/iavf_txrx.c        |   5 -
 drivers/net/ethernet/intel/ice/ice_txrx.c          |   5 -
 drivers/net/ethernet/intel/igb/igb_main.c          |   5 -
 drivers/net/ethernet/intel/igbvf/netdev.c          |   4 -
 drivers/net/ethernet/intel/igc/igc_main.c          |   5 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   5 -
 drivers/net/ethernet/marvell/sky2.c                |   4 -
 drivers/net/ethernet/mellanox/mlx4/catas.c         |   4 -
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |  13 ---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      |   1 -
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c   |   2 -
 drivers/net/ethernet/neterion/s2io.c               |   2 -
 drivers/net/ethernet/neterion/vxge/vxge-main.c     |   5 -
 drivers/net/ethernet/neterion/vxge/vxge-traffic.c  |   4 -
 drivers/net/ethernet/qlogic/qed/qed_int.c          |  13 ---
 drivers/net/ethernet/qlogic/qed/qed_spq.c          |   3 -
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c    |   8 --
 drivers/net/ethernet/qlogic/qede/qede_fp.c         |   8 --
 drivers/net/ethernet/qlogic/qla3xxx.c              |   1 -
 drivers/net/ethernet/qlogic/qlge/qlge.h            |   1 -
 drivers/net/ethernet/qlogic/qlge/qlge_main.c       |   1 -
 drivers/net/ethernet/realtek/r8169.c               |   5 -
 drivers/net/ethernet/renesas/ravb_main.c           |   9 --
 drivers/net/ethernet/renesas/ravb_ptp.c            |   3 -
 drivers/net/ethernet/renesas/sh_eth.c              |   1 -
 drivers/net/ethernet/sfc/falcon/io.h               |   2 -
 drivers/net/ethernet/sfc/io.h                      |   2 -
 drivers/net/ethernet/silan/sc92031.c               |  15 ---
 drivers/net/ethernet/via/via-rhine.c               |   3 -
 drivers/net/ethernet/wiznet/w5100.c                |   6 --
 drivers/net/ethernet/wiznet/w5300.c                |  15 ---
 drivers/net/wireless/ath/ath5k/base.c              |   4 -
 drivers/net/wireless/ath/ath5k/mac80211-ops.c      |   2 -
 drivers/net/wireless/broadcom/b43/main.c           |   7 --
 drivers/net/wireless/broadcom/b43/sysfs.c          |   1 -
 drivers/net/wireless/broadcom/b43legacy/ilt.c      |   2 -
 drivers/net/wireless/broadcom/b43legacy/main.c     |  20 ----
 drivers/net/wireless/broadcom/b43legacy/phy.c      |   1 -
 drivers/net/wireless/broadcom/b43legacy/pio.h      |   1 -
 drivers/net/wireless/broadcom/b43legacy/radio.c    |   4 -
 drivers/net/wireless/broadcom/b43legacy/sysfs.c    |   1 -
 drivers/net/wireless/intel/iwlegacy/common.h       |   7 --
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c    |   1 -
 drivers/ntb/hw/idt/ntb_hw_idt.c                    |   7 --
 drivers/ntb/test/ntb_perf.c                        |   3 -
 drivers/scsi/bfa/bfa.h                             |   3 +-
 drivers/scsi/bfa/bfa_hw_cb.c                       |   2 -
 drivers/scsi/bfa/bfa_hw_ct.c                       |   2 -
 drivers/scsi/bnx2fc/bnx2fc_hwi.c                   |   2 -
 drivers/scsi/bnx2i/bnx2i_hwi.c                     |   3 -
 drivers/scsi/megaraid/megaraid_sas_base.c          |   1 -
 drivers/scsi/megaraid/megaraid_sas_fusion.c        |   1 -
 drivers/scsi/mpt3sas/mpt3sas_base.c                |   1 -
 drivers/scsi/qedf/qedf_io.c                        |   1 -
 drivers/scsi/qedi/qedi_fw.c                        |   1 -
 drivers/scsi/qla1280.c                             |  15 ---
 drivers/ssb/pci.c                                  |   1 -
 drivers/ssb/pcmcia.c                               |   4 -
 drivers/staging/comedi/drivers/mite.c              |   3 -
 drivers/staging/comedi/drivers/ni_660x.c           |   2 -
 drivers/staging/comedi/drivers/ni_mio_common.c     |   1 -
 drivers/staging/comedi/drivers/ni_pcidio.c         |   2 -
 drivers/staging/comedi/drivers/ni_tio.c            |   1 -
 drivers/staging/comedi/drivers/s626.c              |   2 -
 drivers/tty/serial/men_z135_uart.c                 |   1 -
 drivers/tty/serial/serial_txx9.c                   |   1 -
 drivers/usb/early/xhci-dbc.c                       |   4 -
 drivers/usb/host/xhci-dbgcap.c                     |   2 -
 include/asm-generic/io.h                           |   7 +-
 include/asm-generic/mmiowb.h                       |  63 +++++++++++++
 include/asm-generic/mmiowb_types.h                 |  12 +++
 include/linux/qed/qed_if.h                         |   2 -
 include/linux/spinlock.h                           |  11 ++-
 kernel/Kconfig.locks                               |   7 ++
 kernel/locking/spinlock.c                          |   7 ++
 kernel/locking/spinlock_debug.c                    |   6 +-
 sound/soc/txx9/txx9aclc-ac97.c                     |   1 -
 182 files changed, 247 insertions(+), 783 deletions(-)
 create mode 100644 arch/ia64/include/asm/mmiowb.h
 create mode 100644 arch/mips/include/asm/mmiowb.h
 create mode 100644 arch/powerpc/include/asm/mmiowb.h
 create mode 100644 arch/riscv/include/asm/mmiowb.h
 create mode 100644 arch/sh/include/asm/mmiowb.h
 create mode 100644 include/asm-generic/mmiowb.h
 create mode 100644 include/asm-generic/mmiowb_types.h

-- 
2.11.0


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-03  1:43   ` Nicholas Piggin
  2019-03-01 14:03 ` [PATCH 02/20] arch: Use asm-generic header for asm/mmiowb.h Will Deacon
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

In preparation for removing all explicit mmiowb() calls from driver
code, implement a tracking system in asm-generic based loosely on the
PowerPC implementation. This allows architectures with a non-empty
mmiowb() definition to have the barrier automatically inserted in
spin_unlock() following a critical section containing an I/O write.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/mmiowb.h       | 63 ++++++++++++++++++++++++++++++++++++++
 include/asm-generic/mmiowb_types.h | 12 ++++++++
 kernel/Kconfig.locks               |  7 +++++
 kernel/locking/spinlock.c          |  7 +++++
 4 files changed, 89 insertions(+)
 create mode 100644 include/asm-generic/mmiowb.h
 create mode 100644 include/asm-generic/mmiowb_types.h

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
new file mode 100644
index 000000000000..9439ff037b2d
--- /dev/null
+++ b/include/asm-generic/mmiowb.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_GENERIC_MMIOWB_H
+#define __ASM_GENERIC_MMIOWB_H
+
+/*
+ * Generic implementation of mmiowb() tracking for spinlocks.
+ *
+ * If your architecture doesn't ensure that writes to an I/O peripheral
+ * within two spinlocked sections on two different CPUs are seen by the
+ * peripheral in the order corresponding to the lock handover, then you
+ * need to follow these FIVE easy steps:
+ *
+ * 	1. Implement mmiowb() (and arch_mmiowb_state() if you're fancy)
+ *	   in asm/mmiowb.h, then #include this file
+ *	2. Ensure your I/O write accessors call mmiowb_set_pending()
+ *	3. Select ARCH_HAS_MMIOWB
+ *	4. Untangle the resulting mess of header files
+ *	5. Complain to your architects
+ */
+#ifdef CONFIG_MMIOWB
+
+#include <linux/compiler.h>
+#include <asm-generic/mmiowb_types.h>
+
+#ifndef arch_mmiowb_state
+#include <asm/percpu.h>
+#include <asm/smp.h>
+
+DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
+#define __mmiowb_state()	this_cpu_ptr(&__mmiowb_state)
+#else
+#define __mmiowb_state()	arch_mmiowb_state()
+#endif	/* arch_mmiowb_state */
+
+static inline void mmiowb_set_pending(void)
+{
+	struct mmiowb_state *ms = __mmiowb_state();
+	ms->mmiowb_pending = ms->nesting_count;
+}
+
+static inline void mmiowb_spin_lock(void)
+{
+	struct mmiowb_state *ms = __mmiowb_state();
+	ms->nesting_count++;
+}
+
+static inline void mmiowb_spin_unlock(void)
+{
+	struct mmiowb_state *ms = __mmiowb_state();
+
+	if (unlikely(ms->mmiowb_pending)) {
+		ms->mmiowb_pending = 0;
+		mmiowb();
+	}
+
+	ms->nesting_count--;
+}
+#else
+#define mmiowb_set_pending()		do { } while (0)
+#define mmiowb_spin_lock()		do { } while (0)
+#define mmiowb_spin_unlock()		do { } while (0)
+#endif	/* CONFIG_MMIOWB */
+#endif	/* __ASM_GENERIC_MMIOWB_H */
diff --git a/include/asm-generic/mmiowb_types.h b/include/asm-generic/mmiowb_types.h
new file mode 100644
index 000000000000..8eb0095655e7
--- /dev/null
+++ b/include/asm-generic/mmiowb_types.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_GENERIC_MMIOWB_TYPES_H
+#define __ASM_GENERIC_MMIOWB_TYPES_H
+
+#include <linux/types.h>
+
+struct mmiowb_state {
+	u16	nesting_count;
+	u16	mmiowb_pending;
+};
+
+#endif	/* __ASM_GENERIC_MMIOWB_TYPES_H */
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 84d882f3e299..82fa481ecb78 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -248,3 +248,10 @@ config ARCH_USE_QUEUED_RWLOCKS
 config QUEUED_RWLOCKS
 	def_bool y if ARCH_USE_QUEUED_RWLOCKS
 	depends on SMP
+
+config ARCH_HAS_MMIOWB
+	bool
+
+config MMIOWB
+	def_bool y if ARCH_HAS_MMIOWB
+	depends on SMP
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 936f3d14dd6b..0ff08380f531 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -22,6 +22,13 @@
 #include <linux/debug_locks.h>
 #include <linux/export.h>
 
+#ifdef CONFIG_MMIOWB
+#ifndef arch_mmiowb_state
+DEFINE_PER_CPU(struct mmiowb_state, __mmiowb_state);
+EXPORT_PER_CPU_SYMBOL(__mmiowb_state);
+#endif
+#endif
+
 /*
  * If lockdep is enabled then we use the non-preemption spin-ops
  * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/20] arch: Use asm-generic header for asm/mmiowb.h
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
  2019-03-01 14:03 ` [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 03/20] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors Will Deacon
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Masahiro Yamada

Hook up asm-generic/mmiowb.h to Kbuild for all architectures so that we
can subsequently include asm/mmiowb.h from core code.

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/alpha/include/asm/Kbuild      | 1 +
 arch/arc/include/asm/Kbuild        | 1 +
 arch/arm/include/asm/Kbuild        | 1 +
 arch/arm64/include/asm/Kbuild      | 1 +
 arch/c6x/include/asm/Kbuild        | 1 +
 arch/csky/include/asm/Kbuild       | 1 +
 arch/h8300/include/asm/Kbuild      | 1 +
 arch/hexagon/include/asm/Kbuild    | 1 +
 arch/ia64/include/asm/Kbuild       | 1 +
 arch/m68k/include/asm/Kbuild       | 1 +
 arch/microblaze/include/asm/Kbuild | 1 +
 arch/mips/include/asm/Kbuild       | 1 +
 arch/nds32/include/asm/Kbuild      | 1 +
 arch/nios2/include/asm/Kbuild      | 1 +
 arch/openrisc/include/asm/Kbuild   | 1 +
 arch/parisc/include/asm/Kbuild     | 1 +
 arch/powerpc/include/asm/Kbuild    | 1 +
 arch/riscv/include/asm/Kbuild      | 1 +
 arch/s390/include/asm/Kbuild       | 1 +
 arch/sh/include/asm/Kbuild         | 1 +
 arch/sparc/include/asm/Kbuild      | 1 +
 arch/um/include/asm/Kbuild         | 1 +
 arch/unicore32/include/asm/Kbuild  | 1 +
 arch/x86/include/asm/Kbuild        | 1 +
 arch/xtensa/include/asm/Kbuild     | 1 +
 25 files changed, 25 insertions(+)

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index dc0ab28baca1..2b4b2fe715bb 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -8,6 +8,7 @@ generic-y += fb.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += sections.h
 generic-y += trace_clock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index caa270261521..138f564afc74 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -14,6 +14,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 1d66db9c9db5..ee6d155ca372 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += preempt.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 1e17ea5c372b..3dae4fd028cf 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
index 63b4a1705182..fda53087eabc 100644
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -22,6 +22,7 @@ generic-y += kprobes.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += mmu.h
 generic-y += mmu_context.h
 generic-y += pci.h
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 2a0abe8f2a35..95f4e550db8a 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -28,6 +28,7 @@ generic-y += linkage.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += mutex.h
 generic-y += pci.h
diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild
index 961c1dc064e1..86ad35d93861 100644
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -29,6 +29,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += mmu.h
 generic-y += mmu_context.h
 generic-y += module.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index b25fd42aa0f4..ff9134a01a13 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -23,6 +23,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 43e21fe3499c..3273d7aedfa0 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -4,6 +4,7 @@ generic-y += exec.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += trace_clock.h
 generic-y += vtime.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index 95f8f631c4df..2add40f9fdef 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += sections.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 791cc8d54d0a..a1138bb6c5df 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -22,6 +22,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index f15d5db5dd67..5653b1e47dd0 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
index 64ceff7ab99b..688b6ed26227 100644
--- a/arch/nds32/include/asm/Kbuild
+++ b/arch/nds32/include/asm/Kbuild
@@ -31,6 +31,7 @@ generic-y += limits.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 8fde4fa2c34f..139357327a77 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -26,6 +26,7 @@ generic-y += kprobes.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index 1f04844b6b82..a2883689039a 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -24,6 +24,7 @@ generic-y += kprobes.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 0b1e354c8c24..eb8b20f95751 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += seccomp.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 77ff7fb24823..57bd1f6660f4 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -8,6 +8,7 @@ generic-y += irq_regs.h
 generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += vtime.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index cccd12cf27d4..221cd2ec78a4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -21,6 +21,7 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += mutex.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index e3239772887a..68ac5b05c125 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -20,6 +20,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += trace_clock.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index a6ef3fee5f85..e3f6926c4b86 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index b82f64e28f55..b29a4551a8f2 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -14,6 +14,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += msi.h
 generic-y += preempt.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 00bcbe2326d9..b506ad06aefc 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += irq_work.h
 generic-y += kdebug.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += param.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/unicore32/include/asm/Kbuild b/arch/unicore32/include/asm/Kbuild
index 1d1544b6ca74..288ccd741ff2 100644
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@ -21,6 +21,7 @@ generic-y += kprobes.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index a0ab9ab61c75..eebd05942e6c 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += early_ioremap.h
 generic-y += export.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index e255683cd520..e5bb2514ade0 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -20,6 +20,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/20] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
  2019-03-01 14:03 ` [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
  2019-03-01 14:03 ` [PATCH 02/20] arch: Use asm-generic header for asm/mmiowb.h Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-03  1:47   ` Nicholas Piggin
  2019-03-01 14:03 ` [PATCH 04/20] ARM/io: Remove useless definition of mmiowb() Will Deacon
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

Removing explicit calls to mmiowb() from driver code means that we must
now call into the generic mmiowb_spin_{lock,unlock}() functions from the
core spinlock code. In order to elide barriers following critical
sections without any I/O writes, we also hook into the asm-generic I/O
routines.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/io.h        |  3 ++-
 include/linux/spinlock.h        | 11 ++++++++++-
 kernel/locking/spinlock_debug.c |  6 +++++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 303871651f8a..bc490a746602 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -19,6 +19,7 @@
 #include <asm-generic/iomap.h>
 #endif
 
+#include <asm/mmiowb.h>
 #include <asm-generic/pci_iomap.h>
 
 #ifndef mmiowb
@@ -49,7 +50,7 @@
 
 /* serialize device access against a spin_unlock, usually handled there. */
 #ifndef __io_aw
-#define __io_aw()      barrier()
+#define __io_aw()      mmiowb_set_pending()
 #endif
 
 #ifndef __io_pbw
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index e089157dcf97..4298b1b31d9b 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -57,6 +57,7 @@
 #include <linux/stringify.h>
 #include <linux/bottom_half.h>
 #include <asm/barrier.h>
+#include <asm/mmiowb.h>
 
 
 /*
@@ -177,6 +178,7 @@ do {								\
 static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock)
 {
 	__acquire(lock);
+	mmiowb_spin_lock();
 	arch_spin_lock(&lock->raw_lock);
 }
 
@@ -188,16 +190,23 @@ static inline void
 do_raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long *flags) __acquires(lock)
 {
 	__acquire(lock);
+	mmiowb_spin_lock();
 	arch_spin_lock_flags(&lock->raw_lock, *flags);
 }
 
 static inline int do_raw_spin_trylock(raw_spinlock_t *lock)
 {
-	return arch_spin_trylock(&(lock)->raw_lock);
+	int ret = arch_spin_trylock(&(lock)->raw_lock);
+
+	if (ret)
+		mmiowb_spin_lock();
+
+	return ret;
 }
 
 static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 {
+	mmiowb_spin_unlock();
 	arch_spin_unlock(&lock->raw_lock);
 	__release(lock);
 }
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 9aa0fccd5d43..654484b6e70c 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -109,6 +109,7 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
  */
 void do_raw_spin_lock(raw_spinlock_t *lock)
 {
+	mmiowb_spin_lock();
 	debug_spin_lock_before(lock);
 	arch_spin_lock(&lock->raw_lock);
 	debug_spin_lock_after(lock);
@@ -118,8 +119,10 @@ int do_raw_spin_trylock(raw_spinlock_t *lock)
 {
 	int ret = arch_spin_trylock(&lock->raw_lock);
 
-	if (ret)
+	if (ret) {
+		mmiowb_spin_lock();
 		debug_spin_lock_after(lock);
+	}
 #ifndef CONFIG_SMP
 	/*
 	 * Must not happen on UP:
@@ -131,6 +134,7 @@ int do_raw_spin_trylock(raw_spinlock_t *lock)
 
 void do_raw_spin_unlock(raw_spinlock_t *lock)
 {
+	mmiowb_spin_unlock();
 	debug_spin_unlock(lock);
 	arch_spin_unlock(&lock->raw_lock);
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/20] ARM/io: Remove useless definition of mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (2 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 03/20] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 05/20] arm64/io: " Will Deacon
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

ARM includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 6b51826ab3d1..7e22c81398c4 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -281,8 +281,6 @@ extern void _memcpy_fromio(void *, const volatile void __iomem *, size_t);
 extern void _memcpy_toio(volatile void __iomem *, const void *, size_t);
 extern void _memset_io(volatile void __iomem *, int, size_t);
 
-#define mmiowb()
-
 /*
  *  Memory access primitives
  *  ------------------------
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/20] arm64/io: Remove useless definition of mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (3 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 04/20] ARM/io: Remove useless definition of mmiowb() Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 06/20] x86/io: " Will Deacon
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

arm64 includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 8bb7210ac286..b807cb9b517d 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -124,8 +124,6 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
 #define __io_par(v)		__iormb(v)
 #define __iowmb()		wmb()
 
-#define mmiowb()		do { } while (0)
-
 /*
  * Relaxed I/O memory access primitives. These follow the Device memory
  * ordering rules but do not guarantee any ordering relative to Normal memory
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/20] x86/io: Remove useless definition of mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (4 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 05/20] arm64/io: " Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 07/20] nds32/io: " Will Deacon
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

x86 maps mmiowb() to barrier(), but this is superfluous because a
compiler barrier is already implied by spin_unlock(). Since x86 also
includes asm-generic/io.h in its asm/io.h file, we can remove the
definition entirely and pick up the dummy definition from core code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/x86/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 686247db3106..a06a9f8294ea 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -90,8 +90,6 @@ build_mmio_write(__writel, "l", unsigned int, "r", )
 #define __raw_writew __writew
 #define __raw_writel __writel
 
-#define mmiowb() barrier()
-
 #ifdef CONFIG_X86_64
 
 build_mmio_read(readq, "q", u64, "=r", :"memory")
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/20] nds32/io: Remove useless definition of mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (5 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 06/20] x86/io: " Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 08/20] m68k/io: " Will Deacon
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

mmiowb() only makes sense for SMP platforms, so we can remove it
entirely for nds32.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/nds32/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/nds32/include/asm/io.h b/arch/nds32/include/asm/io.h
index 71cd226d6863..5ef8ae5ba833 100644
--- a/arch/nds32/include/asm/io.h
+++ b/arch/nds32/include/asm/io.h
@@ -55,8 +55,6 @@ static inline u32 __raw_readl(const volatile void __iomem *addr)
 #define __iormb()               rmb()
 #define __iowmb()               wmb()
 
-#define mmiowb()        __asm__ __volatile__ ("msync all" : : : "memory");
-
 /*
  * {read,write}{b,w,l,q}_relaxed() are like the regular version, but
  * are not guaranteed to provide ordering against spinlocks or memory
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/20] m68k/io: Remove useless definition of mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (6 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 07/20] nds32/io: " Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 09/20] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() Will Deacon
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

m68k includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/m68k/include/asm/io_mm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/m68k/include/asm/io_mm.h b/arch/m68k/include/asm/io_mm.h
index 782b78f8a048..6c03ca5bc436 100644
--- a/arch/m68k/include/asm/io_mm.h
+++ b/arch/m68k/include/asm/io_mm.h
@@ -377,8 +377,6 @@ static inline void isa_delay(void)
 #define writesw(port, buf, nr)    raw_outsw((port), (u16 *)(buf), (nr))
 #define writesl(port, buf, nr)    raw_outsl((port), (u32 *)(buf), (nr))
 
-#define mmiowb()
-
 #ifndef CONFIG_SUN3
 #define IO_SPACE_LIMIT 0xffff
 #else
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/20] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (7 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 08/20] m68k/io: " Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 10/20] mips/mmiowb: " Will Deacon
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for sh. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performed inside the
critical section.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/sh/include/asm/Kbuild          |  1 -
 arch/sh/include/asm/io.h            |  3 ---
 arch/sh/include/asm/mmiowb.h        | 12 ++++++++++++
 arch/sh/include/asm/spinlock-llsc.h |  2 ++
 4 files changed, 14 insertions(+), 4 deletions(-)
 create mode 100644 arch/sh/include/asm/mmiowb.h

diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index e3f6926c4b86..a6ef3fee5f85 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -13,7 +13,6 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/sh/include/asm/io.h b/arch/sh/include/asm/io.h
index 4f7f235f15f8..c28e37a344ad 100644
--- a/arch/sh/include/asm/io.h
+++ b/arch/sh/include/asm/io.h
@@ -229,9 +229,6 @@ __BUILD_IOPORT_STRING(q, u64)
 
 #define IO_SPACE_LIMIT 0xffffffff
 
-/* synco on SH-4A, otherwise a nop */
-#define mmiowb()		wmb()
-
 /* We really want to try and get these to memcpy etc */
 void memcpy_fromio(void *, const volatile void __iomem *, unsigned long);
 void memcpy_toio(volatile void __iomem *, const void *, unsigned long);
diff --git a/arch/sh/include/asm/mmiowb.h b/arch/sh/include/asm/mmiowb.h
new file mode 100644
index 000000000000..535d59735f1d
--- /dev/null
+++ b/arch/sh/include/asm/mmiowb.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_SH_MMIOWB_H
+#define __ASM_SH_MMIOWB_H
+
+#include <asm/barrier.h>
+
+/* synco on SH-4A, otherwise a nop */
+#define mmiowb()			wmb()
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* __ASM_SH_MMIOWB_H */
diff --git a/arch/sh/include/asm/spinlock-llsc.h b/arch/sh/include/asm/spinlock-llsc.h
index 786ee0fde3b0..7fd929cd2e7a 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -47,6 +47,8 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	unsigned long tmp;
 
+	/* This could be optimised with ARCH_HAS_MMIOWB */
+	mmiowb();
 	__asm__ __volatile__ (
 		"mov		#1, %0 ! arch_spin_unlock	\n\t"
 		"mov.l		%0, @%1				\n\t"
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/20] mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (8 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 09/20] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 22:16   ` Paul Burton
  2019-03-01 14:03 ` [PATCH 11/20] ia64/mmiowb: " Will Deacon
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for mips. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performed inside the
critical section.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/mips/include/asm/Kbuild     |  1 -
 arch/mips/include/asm/io.h       |  3 ---
 arch/mips/include/asm/mmiowb.h   | 11 +++++++++++
 arch/mips/include/asm/spinlock.h | 15 +++++++++++++++
 4 files changed, 26 insertions(+), 4 deletions(-)
 create mode 100644 arch/mips/include/asm/mmiowb.h

diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 5653b1e47dd0..f15d5db5dd67 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -13,7 +13,6 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 845fbbc7a2e3..29997e42480e 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -102,9 +102,6 @@ static inline void set_io_port_base(unsigned long base)
 #define iobarrier_w() wmb()
 #define iobarrier_sync() iob()
 
-/* Some callers use this older API instead.  */
-#define mmiowb() iobarrier_w()
-
 /*
  *     virt_to_phys    -       map virtual addresses to physical
  *     @address: address to remap
diff --git a/arch/mips/include/asm/mmiowb.h b/arch/mips/include/asm/mmiowb.h
new file mode 100644
index 000000000000..a40824e3ef8e
--- /dev/null
+++ b/arch/mips/include/asm/mmiowb.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_MMIOWB_H
+#define _ASM_MMIOWB_H
+
+#include <asm/io.h>
+
+#define mmiowb()	iobarrier_w()
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* _ASM_MMIOWB_H */
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index ee81297d9117..8a88eb265516 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -11,6 +11,21 @@
 
 #include <asm/processor.h>
 #include <asm/qrwlock.h>
+
+#include <asm-generic/qspinlock_types.h>
+
+#define	queued_spin_unlock queued_spin_unlock
+/**
+ * queued_spin_unlock - release a queued spinlock
+ * @lock : Pointer to queued spinlock structure
+ */
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+	/* This could be optimised with ARCH_HAS_MMIOWB */
+	mmiowb();
+	smp_store_release(&lock->locked, 0);
+}
+
 #include <asm/qspinlock.h>
 
 #endif /* _ASM_SPINLOCK_H */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/20] ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (9 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 10/20] mips/mmiowb: " Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 12/20] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code Will Deacon
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for ia64. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performed inside the
critical section.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/ia64/include/asm/Kbuild     |  1 -
 arch/ia64/include/asm/io.h       | 17 -----------------
 arch/ia64/include/asm/mmiowb.h   | 25 +++++++++++++++++++++++++
 arch/ia64/include/asm/spinlock.h |  2 ++
 4 files changed, 27 insertions(+), 18 deletions(-)
 create mode 100644 arch/ia64/include/asm/mmiowb.h

diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 3273d7aedfa0..43e21fe3499c 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -4,7 +4,6 @@ generic-y += exec.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += trace_clock.h
 generic-y += vtime.h
diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h
index 1e6fef69bb01..a511d62d447a 100644
--- a/arch/ia64/include/asm/io.h
+++ b/arch/ia64/include/asm/io.h
@@ -113,20 +113,6 @@ extern int valid_mmap_phys_addr_range (unsigned long pfn, size_t count);
  */
 #define __ia64_mf_a()	ia64_mfa()
 
-/**
- * ___ia64_mmiowb - I/O write barrier
- *
- * Ensure ordering of I/O space writes.  This will make sure that writes
- * following the barrier will arrive after all previous writes.  For most
- * ia64 platforms, this is a simple 'mf.a' instruction.
- *
- * See Documentation/driver-api/device-io.rst for more information.
- */
-static inline void ___ia64_mmiowb(void)
-{
-	ia64_mfa();
-}
-
 static inline void*
 __ia64_mk_io_addr (unsigned long port)
 {
@@ -161,7 +147,6 @@ __ia64_mk_io_addr (unsigned long port)
 #define __ia64_writew	___ia64_writew
 #define __ia64_writel	___ia64_writel
 #define __ia64_writeq	___ia64_writeq
-#define __ia64_mmiowb	___ia64_mmiowb
 
 /*
  * For the in/out routines, we need to do "mf.a" _after_ doing the I/O access to ensure
@@ -296,7 +281,6 @@ __outsl (unsigned long port, const void *src, unsigned long count)
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
-#define __mmiowb	platform_mmiowb
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -310,7 +294,6 @@ __outsl (unsigned long port, const void *src, unsigned long count)
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
-#define mmiowb()	__mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
diff --git a/arch/ia64/include/asm/mmiowb.h b/arch/ia64/include/asm/mmiowb.h
new file mode 100644
index 000000000000..297b85ac84a0
--- /dev/null
+++ b/arch/ia64/include/asm/mmiowb.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_IA64_MMIOWB_H
+#define _ASM_IA64_MMIOWB_H
+
+#include <asm/machvec.h>
+
+/**
+ * ___ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes.  This will make sure that writes
+ * following the barrier will arrive after all previous writes.  For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ */
+static inline void ___ia64_mmiowb(void)
+{
+	ia64_mfa();
+}
+
+#define __ia64_mmiowb	___ia64_mmiowb
+#define mmiowb()	platform_mmiowb()
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* _ASM_IA64_MMIOWB_H */
diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index afd0b3121b4c..5f620e66384e 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -73,6 +73,8 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
 	unsigned short	*p = (unsigned short *)&lock->lock + 1, tmp;
 
+	/* This could be optimised with ARCH_HAS_MMIOWB */
+	mmiowb();
 	asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
 	WRITE_ONCE(*p, (tmp + 2) & ~1);
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 12/20] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (10 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 11/20] ia64/mmiowb: " Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-02 12:46   ` Michael Ellerman
  2019-03-01 14:03 ` [PATCH 13/20] riscv/mmiowb: " Will Deacon
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

In a bid to kill off explicit mmiowb() usage in driver code, hook up
the asm-generic mmiowb() tracking code but provide a definition of
arch_mmiowb_state() so that the tracking data can remain in the paca
as it does at present

This replaces the existing (flawed) implementation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/powerpc/Kconfig                |  1 +
 arch/powerpc/include/asm/Kbuild     |  1 -
 arch/powerpc/include/asm/io.h       | 33 +++------------------------------
 arch/powerpc/include/asm/mmiowb.h   | 20 ++++++++++++++++++++
 arch/powerpc/include/asm/paca.h     |  6 +++++-
 arch/powerpc/include/asm/spinlock.h | 17 -----------------
 arch/powerpc/xmon/xmon.c            |  5 ++++-
 7 files changed, 33 insertions(+), 50 deletions(-)
 create mode 100644 arch/powerpc/include/asm/mmiowb.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2890d36eb531..6979304475fd 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
+	select ARCH_HAS_MMIOWB			if PPC64
 	select ARCH_HAS_PHYS_TO_DMA
 	select ARCH_HAS_PMEM_API                if PPC64
 	select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 57bd1f6660f4..77ff7fb24823 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -8,7 +8,6 @@ generic-y += irq_regs.h
 generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += vtime.h
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 7f19fbd3ba55..828100476ba6 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -34,14 +34,11 @@ extern struct pci_dev *isa_bridge_pcidev;
 #include <asm/byteorder.h>
 #include <asm/synch.h>
 #include <asm/delay.h>
+#include <asm/mmiowb.h>
 #include <asm/mmu.h>
 #include <asm/ppc_asm.h>
 #include <asm/pgtable.h>
 
-#ifdef CONFIG_PPC64
-#include <asm/paca.h>
-#endif
-
 #define SIO_CONFIG_RA	0x398
 #define SIO_CONFIG_RD	0x399
 
@@ -107,12 +104,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG()	do { local_paca->io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 #define DEF_MMIO_IN_X(name, size, insn)				\
 static inline u##size name(const volatile u##size __iomem *addr)	\
 {									\
@@ -127,7 +118,7 @@ static inline void name(volatile u##size __iomem *addr, u##size val)	\
 {									\
 	__asm__ __volatile__("sync;"#insn" %1,%y0"			\
 		: "=Z" (*addr) : "r" (val) : "memory");			\
-	IO_SET_SYNC_FLAG();						\
+	mmiowb_set_pending();						\
 }
 
 #define DEF_MMIO_IN_D(name, size, insn)				\
@@ -144,7 +135,7 @@ static inline void name(volatile u##size __iomem *addr, u##size val)	\
 {									\
 	__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0"			\
 		: "=m" (*addr) : "r" (val) : "memory");			\
-	IO_SET_SYNC_FLAG();						\
+	mmiowb_set_pending();						\
 }
 
 DEF_MMIO_IN_D(in_8,     8, lbz);
@@ -652,24 +643,6 @@ static inline void name at					\
 
 #include <asm-generic/iomap.h>
 
-#ifdef CONFIG_PPC32
-#define mmiowb()
-#else
-/*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
- */
-static inline void mmiowb(void)
-{
-	unsigned long tmp;
-
-	__asm__ __volatile__("sync; li %0,0; stb %0,%1(13)"
-	: "=&r" (tmp) : "i" (offsetof(struct paca_struct, io_sync))
-	: "memory");
-}
-#endif /* !CONFIG_PPC32 */
-
 static inline void iosync(void)
 {
         __asm__ __volatile__ ("sync" : : : "memory");
diff --git a/arch/powerpc/include/asm/mmiowb.h b/arch/powerpc/include/asm/mmiowb.h
new file mode 100644
index 000000000000..b10180613507
--- /dev/null
+++ b/arch/powerpc/include/asm/mmiowb.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_MMIOWB_H
+#define _ASM_POWERPC_MMIOWB_H
+
+#ifdef CONFIG_MMIOWB
+
+#include <linux/compiler.h>
+#include <asm/barrier.h>
+#include <asm/paca.h>
+
+#define arch_mmiowb_state()	(&local_paca->mmiowb_state)
+#define mmiowb()		mb()
+
+#else
+#define mmiowb()		do { } while (0)
+#endif /* CONFIG_MMIOWB */
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* _ASM_POWERPC_MMIOWB_H */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e843bc5d1a0f..134e912d403f 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -34,6 +34,8 @@
 #include <asm/cpuidle.h>
 #include <asm/atomic.h>
 
+#include <asm-generic/mmiowb_types.h>
+
 register struct paca_struct *local_paca asm("r13");
 
 #if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_SMP)
@@ -171,7 +173,6 @@ struct paca_struct {
 	u16 trap_save;			/* Used when bad stack is encountered */
 	u8 irq_soft_mask;		/* mask for irq soft masking */
 	u8 irq_happened;		/* irq happened while soft-disabled */
-	u8 io_sync;			/* writel() needs spin_unlock sync */
 	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
 	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -264,6 +265,9 @@ struct paca_struct {
 #ifdef CONFIG_STACKPROTECTOR
 	unsigned long canary;
 #endif
+#ifdef CONFIG_MMIOWB
+	struct mmiowb_state mmiowb_state;
+#endif
 } ____cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 685c72310f5d..15b39c407c4e 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -39,19 +39,6 @@
 #define LOCK_TOKEN	1
 #endif
 
-#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC	(get_paca()->io_sync = 0)
-#define SYNC_IO		do {						\
-				if (unlikely(get_paca()->io_sync)) {	\
-					mb();				\
-					get_paca()->io_sync = 0;	\
-				}					\
-			} while (0)
-#else
-#define CLEAR_IO_SYNC
-#define SYNC_IO
-#endif
-
 #ifdef CONFIG_PPC_PSERIES
 #define vcpu_is_preempted vcpu_is_preempted
 static inline bool vcpu_is_preempted(int cpu)
@@ -99,7 +86,6 @@ static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock)
 
 static inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
-	CLEAR_IO_SYNC;
 	return __arch_spin_trylock(lock) == 0;
 }
 
@@ -130,7 +116,6 @@ extern void __rw_yield(arch_rwlock_t *lock);
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
-	CLEAR_IO_SYNC;
 	while (1) {
 		if (likely(__arch_spin_trylock(lock) == 0))
 			break;
@@ -148,7 +133,6 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
 {
 	unsigned long flags_dis;
 
-	CLEAR_IO_SYNC;
 	while (1) {
 		if (likely(__arch_spin_trylock(lock) == 0))
 			break;
@@ -167,7 +151,6 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	SYNC_IO;
 	__asm__ __volatile__("# arch_spin_unlock\n\t"
 				PPC_RELEASE_BARRIER: : :"memory");
 	lock->slock = 0;
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 757b8499aba2..de8e4693b176 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2429,7 +2429,10 @@ static void dump_one_paca(int cpu)
 	DUMP(p, trap_save, "%#-*x");
 	DUMP(p, irq_soft_mask, "%#-*x");
 	DUMP(p, irq_happened, "%#-*x");
-	DUMP(p, io_sync, "%#-*x");
+#ifdef CONFIG_MMIOWB
+	DUMP(p, mmiowb_state.nesting_count, "%#-*x");
+	DUMP(p, mmiowb_state.mmiowb_pending, "%#-*x");
+#endif
 	DUMP(p, irq_work_pending, "%#-*x");
 	DUMP(p, nap_state_lost, "%#-*x");
 	DUMP(p, sprg_vdso, "%#-*llx");
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 13/20] riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (11 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 12/20] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 21:13   ` Palmer Dabbelt
  2019-03-01 14:03 ` [PATCH 14/20] Documentation: Kill all references to mmiowb() Will Deacon
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

In a bid to kill off explicit mmiowb() usage in driver code, hook up
the asm-generic mmiowb() tracking code for riscv, so that an mmiowb()
is automatically issued from spin_unlock() if an I/O write was performed
in the critical section.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/riscv/Kconfig              |  1 +
 arch/riscv/include/asm/Kbuild   |  1 -
 arch/riscv/include/asm/io.h     | 15 ++-------------
 arch/riscv/include/asm/mmiowb.h | 14 ++++++++++++++
 4 files changed, 17 insertions(+), 14 deletions(-)
 create mode 100644 arch/riscv/include/asm/mmiowb.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 515fc3cc9687..08f4415203c5 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -49,6 +49,7 @@ config RISCV
 	select RISCV_TIMER
 	select GENERIC_IRQ_MULTI_HANDLER
 	select ARCH_HAS_PTE_SPECIAL
+	select ARCH_HAS_MMIOWB
 
 config MMU
 	def_bool y
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 221cd2ec78a4..cccd12cf27d4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += mutex.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
index 1d9c1376dc64..744fd92e77bc 100644
--- a/arch/riscv/include/asm/io.h
+++ b/arch/riscv/include/asm/io.h
@@ -20,6 +20,7 @@
 #define _ASM_RISCV_IO_H
 
 #include <linux/types.h>
+#include <asm/mmiowb.h>
 
 extern void __iomem *ioremap(phys_addr_t offset, unsigned long size);
 
@@ -100,18 +101,6 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
 #endif
 
 /*
- * FIXME: I'm flip-flopping on whether or not we should keep this or enforce
- * the ordering with I/O on spinlocks like PowerPC does.  The worry is that
- * drivers won't get this correct, but I also don't want to introduce a fence
- * into the lock code that otherwise only uses AMOs (and is essentially defined
- * by the ISA to be correct).   For now I'm leaving this here: "o,w" is
- * sufficient to ensure that all writes to the device have completed before the
- * write to the spinlock is allowed to commit.  I surmised this from reading
- * "ACQUIRES VS I/O ACCESSES" in memory-barriers.txt.
- */
-#define mmiowb()	__asm__ __volatile__ ("fence o,w" : : : "memory");
-
-/*
  * Unordered I/O memory access primitives.  These are even more relaxed than
  * the relaxed versions, as they don't even order accesses between successive
  * operations to the I/O regions.
@@ -165,7 +154,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
 #define __io_br()	do {} while (0)
 #define __io_ar(v)	__asm__ __volatile__ ("fence i,r" : : : "memory");
 #define __io_bw()	__asm__ __volatile__ ("fence w,o" : : : "memory");
-#define __io_aw()	do {} while (0)
+#define __io_aw()	mmiowb_set_pending()
 
 #define readb(c)	({ u8  __v; __io_br(); __v = readb_cpu(c); __io_ar(__v); __v; })
 #define readw(c)	({ u16 __v; __io_br(); __v = readw_cpu(c); __io_ar(__v); __v; })
diff --git a/arch/riscv/include/asm/mmiowb.h b/arch/riscv/include/asm/mmiowb.h
new file mode 100644
index 000000000000..5d7e3a2b4e3b
--- /dev/null
+++ b/arch/riscv/include/asm/mmiowb.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_RISCV_MMIOWB_H
+#define _ASM_RISCV_MMIOWB_H
+
+/*
+ * "o,w" is sufficient to ensure that all writes to the device have completed
+ * before the write to the spinlock is allowed to commit.
+ */
+#define mmiowb()	__asm__ __volatile__ ("fence o,w" : : : "memory");
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* ASM_RISCV_MMIOWB_H */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 14/20] Documentation: Kill all references to mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (12 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 13/20] riscv/mmiowb: " Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 15/20] drivers: Remove useless trailing comments from mmiowb() invocations Will Deacon
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

The guarantees provided by mmiowb() are now provided implicitly by
spin_unlock(), so we can remove all references to this most confusing
of barriers from our Documentation.

Good riddance.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/driver-api/device-io.rst  |  45 --------------
 Documentation/driver-api/pci/p2pdma.rst |   4 --
 Documentation/memory-barriers.txt       | 103 ++------------------------------
 3 files changed, 4 insertions(+), 148 deletions(-)

diff --git a/Documentation/driver-api/device-io.rst b/Documentation/driver-api/device-io.rst
index b00b23903078..0e389378f71d 100644
--- a/Documentation/driver-api/device-io.rst
+++ b/Documentation/driver-api/device-io.rst
@@ -103,51 +103,6 @@ continuing execution::
         ha->flags.ints_enabled = 0;
     }
 
-In addition to write posting, on some large multiprocessing systems
-(e.g. SGI Challenge, Origin and Altix machines) posted writes won't be
-strongly ordered coming from different CPUs. Thus it's important to
-properly protect parts of your driver that do memory-mapped writes with
-locks and use the :c:func:`mmiowb()` to make sure they arrive in the
-order intended. Issuing a regular readX() will also ensure write ordering,
-but should only be used when the 
-driver has to be sure that the write has actually arrived at the device
-(not that it's simply ordered with respect to other writes), since a
-full readX() is a relatively expensive operation.
-
-Generally, one should use :c:func:`mmiowb()` prior to releasing a spinlock
-that protects regions using :c:func:`writeb()` or similar functions that
-aren't surrounded by readb() calls, which will ensure ordering
-and flushing. The following pseudocode illustrates what might occur if
-write ordering isn't guaranteed via :c:func:`mmiowb()` or one of the
-readX() functions::
-
-    CPU A:  spin_lock_irqsave(&dev_lock, flags)
-    CPU A:  ...
-    CPU A:  writel(newval, ring_ptr);
-    CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-            ...
-    CPU B:  spin_lock_irqsave(&dev_lock, flags)
-    CPU B:  writel(newval2, ring_ptr);
-    CPU B:  ...
-    CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-
-In the case above, newval2 could be written to ring_ptr before newval.
-Fixing it is easy though::
-
-    CPU A:  spin_lock_irqsave(&dev_lock, flags)
-    CPU A:  ...
-    CPU A:  writel(newval, ring_ptr);
-    CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
-    CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-            ...
-    CPU B:  spin_lock_irqsave(&dev_lock, flags)
-    CPU B:  writel(newval2, ring_ptr);
-    CPU B:  ...
-    CPU B:  mmiowb();
-    CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-
-See tg3.c for a real world example of how to use :c:func:`mmiowb()`
-
 PCI ordering rules also guarantee that PIO read responses arrive after any
 outstanding DMA writes from that bus, since for some devices the result of
 a readb() call may signal to the driver that a DMA transaction is
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
index 6d85b5a2598d..44deb52beeb4 100644
--- a/Documentation/driver-api/pci/p2pdma.rst
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -132,10 +132,6 @@ precludes passing these pages to userspace.
 P2P memory is also technically IO memory but should never have any side
 effects behind it. Thus, the order of loads and stores should not be important
 and ioreadX(), iowriteX() and friends should not be necessary.
-However, as the memory is not cache coherent, if access ever needs to
-be protected by a spinlock then :c:func:`mmiowb()` must be used before
-unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
-Documentation/memory-barriers.txt)
 
 
 P2P DMA Support Library
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 158947ae78c2..6d6eff413462 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1937,21 +1937,6 @@ There are some more advanced barrier functions:
      information on consistent memory.
 
 
-MMIO WRITE BARRIER
-------------------
-
-The Linux kernel also has a special barrier for use with memory-mapped I/O
-writes:
-
-	mmiowb();
-
-This is a variation on the mandatory write barrier that causes writes to weakly
-ordered I/O regions to be partially ordered.  Its effects may go beyond the
-CPU->Hardware interface and actually affect the hardware at some level.
-
-See the subsection "Acquires vs I/O accesses" for more information.
-
-
 ===============================
 IMPLICIT KERNEL MEMORY BARRIERS
 ===============================
@@ -2317,75 +2302,6 @@ But it won't see any of:
 	*E, *F or *G following RELEASE Q
 
 
-
-ACQUIRES VS I/O ACCESSES
-------------------------
-
-Under certain circumstances (especially involving NUMA), I/O accesses within
-two spinlocked sections on two different CPUs may be seen as interleaved by the
-PCI bridge, because the PCI bridge does not necessarily participate in the
-cache-coherence protocol, and is therefore incapable of issuing the required
-read memory barriers.
-
-For example:
-
-	CPU 1				CPU 2
-	===============================	===============================
-	spin_lock(Q)
-	writel(0, ADDR)
-	writel(1, DATA);
-	spin_unlock(Q);
-					spin_lock(Q);
-					writel(4, ADDR);
-					writel(5, DATA);
-					spin_unlock(Q);
-
-may be seen by the PCI bridge as follows:
-
-	STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE *DATA = 5
-
-which would probably cause the hardware to malfunction.
-
-
-What is necessary here is to intervene with an mmiowb() before dropping the
-spinlock, for example:
-
-	CPU 1				CPU 2
-	===============================	===============================
-	spin_lock(Q)
-	writel(0, ADDR)
-	writel(1, DATA);
-	mmiowb();
-	spin_unlock(Q);
-					spin_lock(Q);
-					writel(4, ADDR);
-					writel(5, DATA);
-					mmiowb();
-					spin_unlock(Q);
-
-this will ensure that the two stores issued on CPU 1 appear at the PCI bridge
-before either of the stores issued on CPU 2.
-
-
-Furthermore, following a store by a load from the same device obviates the need
-for the mmiowb(), because the load forces the store to complete before the load
-is performed:
-
-	CPU 1				CPU 2
-	===============================	===============================
-	spin_lock(Q)
-	writel(0, ADDR)
-	a = readl(DATA);
-	spin_unlock(Q);
-					spin_lock(Q);
-					writel(4, ADDR);
-					b = readl(DATA);
-					spin_unlock(Q);
-
-
-See Documentation/driver-api/device-io.rst for more information.
-
-
 =================================
 WHERE ARE MEMORY BARRIERS NEEDED?
 =================================
@@ -2532,16 +2448,9 @@ the device to malfunction.
 Inside of the Linux kernel, I/O should be done through the appropriate accessor
 routines - such as inb() or writel() - which know how to make such accesses
 appropriately sequential.  While this, for the most part, renders the explicit
-use of memory barriers unnecessary, there are a couple of situations where they
-might be needed:
-
- (1) On some systems, I/O stores are not strongly ordered across all CPUs, and
-     so for _all_ general drivers locks should be used and mmiowb() must be
-     issued prior to unlocking the critical section.
-
- (2) If the accessor functions are used to refer to an I/O memory window with
-     relaxed memory access properties, then _mandatory_ memory barriers are
-     required to enforce ordering.
+use of memory barriers unnecessary, if the accessor functions are used to refer
+to an I/O memory window with relaxed memory access properties, then _mandatory_
+memory barriers are required to enforce ordering.
 
 See Documentation/driver-api/device-io.rst for more information.
 
@@ -2586,8 +2495,7 @@ explicit barriers are used.
 
 Normally this won't be a problem because the I/O accesses done inside such
 sections will include synchronous load operations on strictly ordered I/O
-registers that form implicit I/O barriers.  If this isn't sufficient then an
-mmiowb() may need to be used explicitly.
+registers that form implicit I/O barriers.
 
 
 A similar situation may occur between an interrupt routine and two routines
@@ -2687,9 +2595,6 @@ guarantees:
 All of these accessors assume that the underlying peripheral is little-endian,
 and will therefore perform byte-swapping operations on big-endian architectures.
 
-Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK
-operations is a dangerous sport which may require the use of mmiowb(). See the
-subsection "Acquires vs I/O accesses" for more information.
 
 ========================================
 ASSUMED MINIMUM EXECUTION ORDERING MODEL
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 15/20] drivers: Remove useless trailing comments from mmiowb() invocations
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (13 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 14/20] Documentation: Kill all references to mmiowb() Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 16/20] drivers: Remove explicit invocations of mmiowb() Will Deacon
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

In preparation for using coccinelle to remove all mmiowb() instances
from drivers, remove all trailing comments since they won't be picked up
by spatch later on and will end up being preserved in the code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/infiniband/hw/hfi1/chip.c                | 2 +-
 drivers/infiniband/hw/qedr/verbs.c               | 2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  | 2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
 drivers/scsi/bnx2i/bnx2i_hwi.c                   | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index b443642eac02..955bad21a519 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -8352,7 +8352,7 @@ static inline void clear_recv_intr(struct hfi1_ctxtdata *rcd)
 	struct hfi1_devdata *dd = rcd->dd;
 	u32 addr = CCE_INT_CLEAR + (8 * rcd->ireg);
 
-	mmiowb();	/* make sure everything before is written */
+	mmiowb();
 	write_csr(dd, addr, rcd->imask);
 	/* force the above write on the chip and get a value back */
 	(void)read_csr(dd, addr);
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index e1ccf32b1c3d..23353e0e4bd4 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -3744,7 +3744,7 @@ int qedr_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 
 		if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
 			writel(qp->rq.iwarp_db2_data.raw, qp->rq.iwarp_db2);
-			mmiowb();	/* for second doorbell */
+			mmiowb();
 		}
 
 		wr = wr->next;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 2462e7aa0c5d..1ed068509337 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -527,7 +527,7 @@ static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
 		REG_WR_RELAXED(bp, fp->ustorm_rx_prods_offset + i * 4,
 			       ((u32 *)&rx_prods)[i]);
 
-	mmiowb(); /* keep prod updates ordered */
+	mmiowb();
 
 	DP(NETIF_MSG_RX_STATUS,
 	   "queue[%d]:  wrote  bd_prod %u  cqe_prod %u  sge_prod %u\n",
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 3b5b47e98c73..64bc6d6fcd65 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -5244,7 +5244,7 @@ static void bnx2x_update_eq_prod(struct bnx2x *bp, u16 prod)
 {
 	/* No memory barriers */
 	storm_memset_eq_prod(bp, prod, BP_FUNC(bp));
-	mmiowb(); /* keep prod updates ordered */
+	mmiowb();
 }
 
 static int  bnx2x_cnic_handle_cfc_del(struct bnx2x *bp, u32 cid,
diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index fae6f71e677d..d56a78f411cd 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -280,7 +280,7 @@ static void bnx2i_ring_sq_dbell(struct bnx2i_conn *bnx2i_conn, int count)
 	} else
 		writew(count, ep->qp.ctx_base + CNIC_SEND_DOORBELL);
 
-	mmiowb(); /* flush posted PCI writes */
+	mmiowb();
 }
 
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 16/20] drivers: Remove explicit invocations of mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (14 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 15/20] drivers: Remove useless trailing comments from mmiowb() invocations Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 17/20] scsi/qla1280: Remove stale comment about mmiowb() Will Deacon
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

	@mmiowb@
	@@
	- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/crypto/cavium/nitrox/nitrox_reqmgr.c       |  4 ---
 drivers/dma/txx9dmac.c                             |  3 ---
 drivers/firewire/ohci.c                            |  1 -
 drivers/gpu/drm/i915/intel_hdmi.c                  | 10 --------
 drivers/ide/tx4939ide.c                            |  2 --
 drivers/infiniband/hw/hfi1/chip.c                  |  3 ---
 drivers/infiniband/hw/hfi1/pio.c                   |  1 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c         |  2 --
 drivers/infiniband/hw/mlx4/qp.c                    |  6 -----
 drivers/infiniband/hw/mlx5/qp.c                    |  1 -
 drivers/infiniband/hw/mthca/mthca_cmd.c            |  6 -----
 drivers/infiniband/hw/mthca/mthca_cq.c             |  5 ----
 drivers/infiniband/hw/mthca/mthca_qp.c             | 17 -------------
 drivers/infiniband/hw/mthca/mthca_srq.c            |  6 -----
 drivers/infiniband/hw/qedr/verbs.c                 | 12 ---------
 drivers/infiniband/hw/qib/qib_iba6120.c            |  4 ---
 drivers/infiniband/hw/qib/qib_iba7220.c            |  3 ---
 drivers/infiniband/hw/qib/qib_iba7322.c            |  3 ---
 drivers/infiniband/hw/qib/qib_sd7220.c             |  4 ---
 drivers/media/pci/dt3155/dt3155.c                  |  8 ------
 drivers/memstick/host/jmb38x_ms.c                  |  4 ---
 drivers/misc/ioc4.c                                |  2 --
 drivers/misc/mei/hw-me.c                           |  3 ---
 drivers/misc/tifm_7xx1.c                           |  1 -
 drivers/mmc/host/alcor.c                           |  1 -
 drivers/mmc/host/sdhci.c                           | 13 ----------
 drivers/mmc/host/tifm_sd.c                         |  3 ---
 drivers/mmc/host/via-sdmmc.c                       | 10 --------
 drivers/mtd/nand/raw/r852.c                        |  2 --
 drivers/mtd/nand/raw/txx9ndfmc.c                   |  1 -
 drivers/net/ethernet/aeroflex/greth.c              |  1 -
 drivers/net/ethernet/alacritech/slicoss.c          |  4 ---
 drivers/net/ethernet/amazon/ena/ena_com.c          |  1 -
 drivers/net/ethernet/atheros/atlx/atl1.c           |  1 -
 drivers/net/ethernet/atheros/atlx/atl2.c           |  1 -
 drivers/net/ethernet/broadcom/bnx2.c               |  4 ---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c    |  2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h    |  4 ---
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    |  1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   | 29 ----------------------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c     |  1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c  |  2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c   |  4 ---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  3 ---
 drivers/net/ethernet/broadcom/tg3.c                |  6 -----
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   | 10 --------
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  1 -
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c |  4 ---
 .../net/ethernet/cavium/liquidio/request_manager.c |  1 -
 drivers/net/ethernet/intel/e1000/e1000_main.c      |  5 ----
 drivers/net/ethernet/intel/e1000e/netdev.c         |  7 ------
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c       |  2 --
 drivers/net/ethernet/intel/fm10k/fm10k_main.c      |  5 ----
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  5 ----
 drivers/net/ethernet/intel/iavf/iavf_txrx.c        |  5 ----
 drivers/net/ethernet/intel/ice/ice_txrx.c          |  5 ----
 drivers/net/ethernet/intel/igb/igb_main.c          |  5 ----
 drivers/net/ethernet/intel/igbvf/netdev.c          |  4 ---
 drivers/net/ethernet/intel/igc/igc_main.c          |  5 ----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  5 ----
 drivers/net/ethernet/marvell/sky2.c                |  4 ---
 drivers/net/ethernet/mellanox/mlx4/catas.c         |  4 ---
 drivers/net/ethernet/mellanox/mlx4/cmd.c           | 13 ----------
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      |  1 -
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c   |  2 --
 drivers/net/ethernet/neterion/s2io.c               |  2 --
 drivers/net/ethernet/neterion/vxge/vxge-main.c     |  5 ----
 drivers/net/ethernet/neterion/vxge/vxge-traffic.c  |  4 ---
 drivers/net/ethernet/qlogic/qed/qed_int.c          | 13 ----------
 drivers/net/ethernet/qlogic/qed/qed_spq.c          |  3 ---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c    |  8 ------
 drivers/net/ethernet/qlogic/qede/qede_fp.c         |  8 ------
 drivers/net/ethernet/qlogic/qla3xxx.c              |  1 -
 drivers/net/ethernet/qlogic/qlge/qlge.h            |  1 -
 drivers/net/ethernet/qlogic/qlge/qlge_main.c       |  1 -
 drivers/net/ethernet/realtek/r8169.c               |  5 ----
 drivers/net/ethernet/renesas/ravb_main.c           |  9 -------
 drivers/net/ethernet/renesas/ravb_ptp.c            |  3 ---
 drivers/net/ethernet/renesas/sh_eth.c              |  1 -
 drivers/net/ethernet/sfc/falcon/io.h               |  2 --
 drivers/net/ethernet/sfc/io.h                      |  2 --
 drivers/net/ethernet/silan/sc92031.c               | 14 -----------
 drivers/net/ethernet/via/via-rhine.c               |  3 ---
 drivers/net/ethernet/wiznet/w5100.c                |  6 -----
 drivers/net/ethernet/wiznet/w5300.c                | 15 -----------
 drivers/net/wireless/ath/ath5k/base.c              |  4 ---
 drivers/net/wireless/ath/ath5k/mac80211-ops.c      |  2 --
 drivers/net/wireless/broadcom/b43/main.c           |  7 ------
 drivers/net/wireless/broadcom/b43/sysfs.c          |  1 -
 drivers/net/wireless/broadcom/b43legacy/ilt.c      |  2 --
 drivers/net/wireless/broadcom/b43legacy/main.c     | 20 ---------------
 drivers/net/wireless/broadcom/b43legacy/phy.c      |  1 -
 drivers/net/wireless/broadcom/b43legacy/pio.h      |  1 -
 drivers/net/wireless/broadcom/b43legacy/radio.c    |  4 ---
 drivers/net/wireless/broadcom/b43legacy/sysfs.c    |  1 -
 drivers/net/wireless/intel/iwlegacy/common.h       |  7 ------
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c    |  1 -
 drivers/ntb/hw/idt/ntb_hw_idt.c                    |  7 ------
 drivers/ntb/test/ntb_perf.c                        |  3 ---
 drivers/scsi/bfa/bfa.h                             |  3 +--
 drivers/scsi/bfa/bfa_hw_cb.c                       |  2 --
 drivers/scsi/bfa/bfa_hw_ct.c                       |  2 --
 drivers/scsi/bnx2fc/bnx2fc_hwi.c                   |  2 --
 drivers/scsi/bnx2i/bnx2i_hwi.c                     |  3 ---
 drivers/scsi/megaraid/megaraid_sas_base.c          |  1 -
 drivers/scsi/megaraid/megaraid_sas_fusion.c        |  1 -
 drivers/scsi/mpt3sas/mpt3sas_base.c                |  1 -
 drivers/scsi/qedf/qedf_io.c                        |  1 -
 drivers/scsi/qedi/qedi_fw.c                        |  1 -
 drivers/scsi/qla1280.c                             |  5 ----
 drivers/ssb/pci.c                                  |  1 -
 drivers/ssb/pcmcia.c                               |  4 ---
 drivers/staging/comedi/drivers/mite.c              |  3 ---
 drivers/staging/comedi/drivers/ni_660x.c           |  2 --
 drivers/staging/comedi/drivers/ni_mio_common.c     |  1 -
 drivers/staging/comedi/drivers/ni_pcidio.c         |  2 --
 drivers/staging/comedi/drivers/ni_tio.c            |  1 -
 drivers/staging/comedi/drivers/s626.c              |  2 --
 drivers/tty/serial/men_z135_uart.c                 |  1 -
 drivers/tty/serial/serial_txx9.c                   |  1 -
 drivers/usb/early/xhci-dbc.c                       |  4 ---
 drivers/usb/host/xhci-dbgcap.c                     |  2 --
 include/linux/qed/qed_if.h                         |  2 --
 sound/soc/txx9/txx9aclc-ac97.c                     |  1 -
 124 files changed, 1 insertion(+), 513 deletions(-)

diff --git a/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c b/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
index 4c97478d44bd..5826c2c98a50 100644
--- a/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
+++ b/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
@@ -303,8 +303,6 @@ static void post_se_instr(struct nitrox_softreq *sr,
 
 	/* Ring doorbell with count 1 */
 	writeq(1, cmdq->dbell_csr_addr);
-	/* orders the doorbell rings */
-	mmiowb();
 
 	cmdq->write_idx = incr_index(idx, 1, ndev->qlen);
 
@@ -599,8 +597,6 @@ void pkt_slc_resp_tasklet(unsigned long data)
 	 * MSI-X interrupt generates if Completion count > Threshold
 	 */
 	writeq(slc_cnts.value, cmdq->compl_cnt_csr_addr);
-	/* order the writes */
-	mmiowb();
 
 	if (atomic_read(&cmdq->backlog_count))
 		schedule_work(&cmdq->backlog_qflush);
diff --git a/drivers/dma/txx9dmac.c b/drivers/dma/txx9dmac.c
index eb45af71d3a3..e8d0881b64d8 100644
--- a/drivers/dma/txx9dmac.c
+++ b/drivers/dma/txx9dmac.c
@@ -327,7 +327,6 @@ static void txx9dmac_reset_chan(struct txx9dmac_chan *dc)
 	channel_writel(dc, SAIR, 0);
 	channel_writel(dc, DAIR, 0);
 	channel_writel(dc, CCR, 0);
-	mmiowb();
 }
 
 /* Called with dc->lock held and bh disabled */
@@ -954,7 +953,6 @@ static void txx9dmac_chain_dynamic(struct txx9dmac_chan *dc,
 	dma_sync_single_for_device(chan2parent(&dc->chan),
 				   prev->txd.phys, ddev->descsize,
 				   DMA_TO_DEVICE);
-	mmiowb();
 	if (!(channel_readl(dc, CSR) & TXX9_DMA_CSR_CHNEN) &&
 	    channel_read_CHAR(dc) == prev->txd.phys)
 		/* Restart chain DMA */
@@ -1080,7 +1078,6 @@ static void txx9dmac_free_chan_resources(struct dma_chan *chan)
 static void txx9dmac_off(struct txx9dmac_dev *ddev)
 {
 	dma_writel(ddev, MCR, 0);
-	mmiowb();
 }
 
 static int __init txx9dmac_chan_probe(struct platform_device *pdev)
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 45c048751f3b..7183ab34269e 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -2939,7 +2939,6 @@ static void set_multichannel_mask(struct fw_ohci *ohci, u64 channels)
 	reg_write(ohci, OHCI1394_IRMultiChanMaskLoClear, ~lo);
 	reg_write(ohci, OHCI1394_IRMultiChanMaskHiSet, hi);
 	reg_write(ohci, OHCI1394_IRMultiChanMaskLoSet, lo);
-	mmiowb();
 	ohci->mc_channels = channels;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c
index 07e803a604bd..ca09c74b6769 100644
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -183,7 +183,6 @@ static void g4x_write_infoframe(struct intel_encoder *encoder,
 
 	I915_WRITE(VIDEO_DIP_CTL, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(VIDEO_DIP_DATA, *data);
 		data++;
@@ -191,7 +190,6 @@ static void g4x_write_infoframe(struct intel_encoder *encoder,
 	/* Write every possible data byte to force correct ECC calculation. */
 	for (; i < VIDEO_DIP_DATA_SIZE; i += 4)
 		I915_WRITE(VIDEO_DIP_DATA, 0);
-	mmiowb();
 
 	val |= g4x_infoframe_enable(type);
 	val &= ~VIDEO_DIP_FREQ_MASK;
@@ -238,7 +236,6 @@ static void ibx_write_infoframe(struct intel_encoder *encoder,
 
 	I915_WRITE(reg, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(TVIDEO_DIP_DATA(intel_crtc->pipe), *data);
 		data++;
@@ -246,7 +243,6 @@ static void ibx_write_infoframe(struct intel_encoder *encoder,
 	/* Write every possible data byte to force correct ECC calculation. */
 	for (; i < VIDEO_DIP_DATA_SIZE; i += 4)
 		I915_WRITE(TVIDEO_DIP_DATA(intel_crtc->pipe), 0);
-	mmiowb();
 
 	val |= g4x_infoframe_enable(type);
 	val &= ~VIDEO_DIP_FREQ_MASK;
@@ -299,7 +295,6 @@ static void cpt_write_infoframe(struct intel_encoder *encoder,
 
 	I915_WRITE(reg, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(TVIDEO_DIP_DATA(intel_crtc->pipe), *data);
 		data++;
@@ -307,7 +302,6 @@ static void cpt_write_infoframe(struct intel_encoder *encoder,
 	/* Write every possible data byte to force correct ECC calculation. */
 	for (; i < VIDEO_DIP_DATA_SIZE; i += 4)
 		I915_WRITE(TVIDEO_DIP_DATA(intel_crtc->pipe), 0);
-	mmiowb();
 
 	val |= g4x_infoframe_enable(type);
 	val &= ~VIDEO_DIP_FREQ_MASK;
@@ -353,7 +347,6 @@ static void vlv_write_infoframe(struct intel_encoder *encoder,
 
 	I915_WRITE(reg, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(VLV_TVIDEO_DIP_DATA(intel_crtc->pipe), *data);
 		data++;
@@ -361,7 +354,6 @@ static void vlv_write_infoframe(struct intel_encoder *encoder,
 	/* Write every possible data byte to force correct ECC calculation. */
 	for (; i < VIDEO_DIP_DATA_SIZE; i += 4)
 		I915_WRITE(VLV_TVIDEO_DIP_DATA(intel_crtc->pipe), 0);
-	mmiowb();
 
 	val |= g4x_infoframe_enable(type);
 	val &= ~VIDEO_DIP_FREQ_MASK;
@@ -407,7 +399,6 @@ static void hsw_write_infoframe(struct intel_encoder *encoder,
 	val &= ~hsw_infoframe_enable(type);
 	I915_WRITE(ctl_reg, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(hsw_dip_data_reg(dev_priv, cpu_transcoder,
 					    type, i >> 2), *data);
@@ -417,7 +408,6 @@ static void hsw_write_infoframe(struct intel_encoder *encoder,
 	for (; i < data_size; i += 4)
 		I915_WRITE(hsw_dip_data_reg(dev_priv, cpu_transcoder,
 					    type, i >> 2), 0);
-	mmiowb();
 
 	val |= hsw_infoframe_enable(type);
 	I915_WRITE(ctl_reg, val);
diff --git a/drivers/ide/tx4939ide.c b/drivers/ide/tx4939ide.c
index 67d4a7d4acc8..88d132edc4e3 100644
--- a/drivers/ide/tx4939ide.c
+++ b/drivers/ide/tx4939ide.c
@@ -156,7 +156,6 @@ static u16 tx4939ide_check_error_ints(ide_hwif_t *hwif)
 		u16 sysctl = tx4939ide_readw(base, TX4939IDE_Sys_Ctl);
 
 		tx4939ide_writew(sysctl | 0x4000, base, TX4939IDE_Sys_Ctl);
-		mmiowb();
 		/* wait 12GBUSCLK (typ. 60ns @ GBUS200MHz, max 270ns) */
 		ndelay(270);
 		tx4939ide_writew(sysctl, base, TX4939IDE_Sys_Ctl);
@@ -396,7 +395,6 @@ static void tx4939ide_init_hwif(ide_hwif_t *hwif)
 
 	/* Soft Reset */
 	tx4939ide_writew(0x8000, base, TX4939IDE_Sys_Ctl);
-	mmiowb();
 	/* at least 20 GBUSCLK (typ. 100ns @ GBUS200MHz, max 450ns) */
 	ndelay(450);
 	tx4939ide_writew(0x0000, base, TX4939IDE_Sys_Ctl);
diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 955bad21a519..297112f64012 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -8352,7 +8352,6 @@ static inline void clear_recv_intr(struct hfi1_ctxtdata *rcd)
 	struct hfi1_devdata *dd = rcd->dd;
 	u32 addr = CCE_INT_CLEAR + (8 * rcd->ireg);
 
-	mmiowb();
 	write_csr(dd, addr, rcd->imask);
 	/* force the above write on the chip and get a value back */
 	(void)read_csr(dd, addr);
@@ -11790,12 +11789,10 @@ void update_usrhead(struct hfi1_ctxtdata *rcd, u32 hd, u32 updegr, u32 egrhd,
 			<< RCV_EGR_INDEX_HEAD_HEAD_SHIFT;
 		write_uctxt_csr(dd, ctxt, RCV_EGR_INDEX_HEAD, reg);
 	}
-	mmiowb();
 	reg = ((u64)rcv_intr_count << RCV_HDR_HEAD_COUNTER_SHIFT) |
 		(((u64)hd & RCV_HDR_HEAD_HEAD_MASK)
 			<< RCV_HDR_HEAD_HEAD_SHIFT);
 	write_uctxt_csr(dd, ctxt, RCV_HDR_HEAD, reg);
-	mmiowb();
 }
 
 u32 hdrqempty(struct hfi1_ctxtdata *rcd)
diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c
index 04126d7e318d..4a371ed43211 100644
--- a/drivers/infiniband/hw/hfi1/pio.c
+++ b/drivers/infiniband/hw/hfi1/pio.c
@@ -1578,7 +1578,6 @@ void hfi1_sc_wantpiobuf_intr(struct send_context *sc, u32 needint)
 		sc_del_credit_return_intr(sc);
 	trace_hfi1_wantpiointr(sc, needint, sc->credit_ctrl);
 	if (needint) {
-		mmiowb();
 		sc_return_credits(sc);
 	}
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index b74c742b000c..072d8c176b44 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -1745,8 +1745,6 @@ static int hns_roce_v1_post_mbox(struct hns_roce_dev *hr_dev, u64 in_param,
 
 	writel(val, hcr + 5);
 
-	mmiowb();
-
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 971e9a9ebdaf..0f02222c5601 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -3735,12 +3735,6 @@ static int _mlx4_ib_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 		writel_relaxed(qp->doorbell_qpn,
 			to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
 
-		/*
-		 * Make sure doorbells don't leak out of SQ spinlock
-		 * and reach the HCA out of order.
-		 */
-		mmiowb();
-
 		stamp_send_wqe(qp, ind + qp->sq_spare_wqes - 1);
 
 		qp->sq_next_wqe = ind;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 7db778d96ef5..dec7af23a736 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -5021,7 +5021,6 @@ static int _mlx5_ib_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 		/* Make sure doorbells don't leak out of SQ spinlock
 		 * and reach the HCA out of order.
 		 */
-		mmiowb();
 		bf->offset ^= bf->buf_size;
 	}
 
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 83aa47eb81a9..bdf5ed38de22 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -292,12 +292,6 @@ static int mthca_cmd_post(struct mthca_dev *dev,
 		err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier,
 					 op_modifier, op, token, event);
 
-	/*
-	 * Make sure that our HCR writes don't get mixed in with
-	 * writes from another CPU starting a FW command.
-	 */
-	mmiowb();
-
 	mutex_unlock(&dev->cmd.hcr_mutex);
 	return err;
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c
index a6531ffe29a6..877a6daffa98 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -211,11 +211,6 @@ static inline void update_cons_index(struct mthca_dev *dev, struct mthca_cq *cq,
 		mthca_write64(MTHCA_TAVOR_CQ_DB_INC_CI | cq->cqn, incr - 1,
 			      dev->kar + MTHCA_CQ_DOORBELL,
 			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
-		/*
-		 * Make sure doorbells don't leak out of CQ spinlock
-		 * and reach the HCA out of order:
-		 */
-		mmiowb();
 	}
 }
 
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index 4e5b5cc17f1d..988ff1a541f1 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1804,11 +1804,6 @@ int mthca_tavor_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 			      (qp->qpn << 8) | size0,
 			      dev->kar + MTHCA_SEND_DOORBELL,
 			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
-		/*
-		 * Make sure doorbells don't leak out of SQ spinlock
-		 * and reach the HCA out of order:
-		 */
-		mmiowb();
 	}
 
 	qp->sq.next_ind = ind;
@@ -1919,12 +1914,6 @@ int mthca_tavor_post_receive(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 	qp->rq.next_ind = ind;
 	qp->rq.head    += nreq;
 
-	/*
-	 * Make sure doorbells don't leak out of RQ spinlock and reach
-	 * the HCA out of order:
-	 */
-	mmiowb();
-
 	spin_unlock_irqrestore(&qp->rq.lock, flags);
 	return err;
 }
@@ -2159,12 +2148,6 @@ int mthca_arbel_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 	}
 
-	/*
-	 * Make sure doorbells don't leak out of SQ spinlock and reach
-	 * the HCA out of order:
-	 */
-	mmiowb();
-
 	spin_unlock_irqrestore(&qp->sq.lock, flags);
 	return err;
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c
index b8333c79e3fa..cb715107e4ad 100644
--- a/drivers/infiniband/hw/mthca/mthca_srq.c
+++ b/drivers/infiniband/hw/mthca/mthca_srq.c
@@ -565,12 +565,6 @@ int mthca_tavor_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
 			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 	}
 
-	/*
-	 * Make sure doorbells don't leak out of SRQ spinlock and
-	 * reach the HCA out of order:
-	 */
-	mmiowb();
-
 	spin_unlock_irqrestore(&srq->lock, flags);
 	return err;
 }
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 23353e0e4bd4..eeef1bb04368 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -811,9 +811,6 @@ static void doorbell_cq(struct qedr_cq *cq, u32 cons, u8 flags)
 	cq->db.data.agg_flags = flags;
 	cq->db.data.value = cpu_to_le32(cons);
 	writeq(cq->db.raw, cq->db_addr);
-
-	/* Make sure write would stick */
-	mmiowb();
 }
 
 int qedr_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
@@ -2128,8 +2125,6 @@ static int qedr_update_qp_state(struct qedr_dev *dev,
 
 			if (rdma_protocol_roce(&dev->ibdev, 1)) {
 				writel(qp->rq.db_data.raw, qp->rq.db);
-				/* Make sure write takes effect */
-				mmiowb();
 			}
 			break;
 		case QED_ROCE_QP_STATE_ERR:
@@ -3546,9 +3541,6 @@ int qedr_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 	smp_wmb();
 	writel(qp->sq.db_data.raw, qp->sq.db);
 
-	/* Make sure write sticks */
-	mmiowb();
-
 	spin_unlock_irqrestore(&qp->q_lock, flags);
 
 	return rc;
@@ -3739,12 +3731,8 @@ int qedr_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 
 		writel(qp->rq.db_data.raw, qp->rq.db);
 
-		/* Make sure write sticks */
-		mmiowb();
-
 		if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
 			writel(qp->rq.iwarp_db2_data.raw, qp->rq.iwarp_db2);
-			mmiowb();
 		}
 
 		wr = wr->next;
diff --git a/drivers/infiniband/hw/qib/qib_iba6120.c b/drivers/infiniband/hw/qib/qib_iba6120.c
index cdbf707fa267..531d8a1db2c3 100644
--- a/drivers/infiniband/hw/qib/qib_iba6120.c
+++ b/drivers/infiniband/hw/qib/qib_iba6120.c
@@ -1884,7 +1884,6 @@ static void qib_6120_put_tid(struct qib_devdata *dd, u64 __iomem *tidptr,
 	qib_write_kreg(dd, kr_scratch, 0xfeeddeaf);
 	writel(pa, tidp32);
 	qib_write_kreg(dd, kr_scratch, 0xdeadbeef);
-	mmiowb();
 	spin_unlock_irqrestore(tidlockp, flags);
 }
 
@@ -1928,7 +1927,6 @@ static void qib_6120_put_tid_2(struct qib_devdata *dd, u64 __iomem *tidptr,
 			pa |= 2 << 29;
 	}
 	writel(pa, tidp32);
-	mmiowb();
 }
 
 
@@ -2053,9 +2051,7 @@ static void qib_update_6120_usrhead(struct qib_ctxtdata *rcd, u64 hd,
 {
 	if (updegr)
 		qib_write_ureg(rcd->dd, ur_rcvegrindexhead, egrhd, rcd->ctxt);
-	mmiowb();
 	qib_write_ureg(rcd->dd, ur_rcvhdrhead, hd, rcd->ctxt);
-	mmiowb();
 }
 
 static u32 qib_6120_hdrqempty(struct qib_ctxtdata *rcd)
diff --git a/drivers/infiniband/hw/qib/qib_iba7220.c b/drivers/infiniband/hw/qib/qib_iba7220.c
index 9fde45538f6e..ea3ddb05cbad 100644
--- a/drivers/infiniband/hw/qib/qib_iba7220.c
+++ b/drivers/infiniband/hw/qib/qib_iba7220.c
@@ -2175,7 +2175,6 @@ static void qib_7220_put_tid(struct qib_devdata *dd, u64 __iomem *tidptr,
 		pa = chippa;
 	}
 	writeq(pa, tidptr);
-	mmiowb();
 }
 
 /**
@@ -2704,9 +2703,7 @@ static void qib_update_7220_usrhead(struct qib_ctxtdata *rcd, u64 hd,
 {
 	if (updegr)
 		qib_write_ureg(rcd->dd, ur_rcvegrindexhead, egrhd, rcd->ctxt);
-	mmiowb();
 	qib_write_ureg(rcd->dd, ur_rcvhdrhead, hd, rcd->ctxt);
-	mmiowb();
 }
 
 static u32 qib_7220_hdrqempty(struct qib_ctxtdata *rcd)
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index 17d6b24b3473..ac6a84f11ad0 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -3793,7 +3793,6 @@ static void qib_7322_put_tid(struct qib_devdata *dd, u64 __iomem *tidptr,
 		pa = chippa;
 	}
 	writeq(pa, tidptr);
-	mmiowb();
 }
 
 /**
@@ -4440,10 +4439,8 @@ static void qib_update_7322_usrhead(struct qib_ctxtdata *rcd, u64 hd,
 		adjust_rcv_timeout(rcd, npkts);
 	if (updegr)
 		qib_write_ureg(rcd->dd, ur_rcvegrindexhead, egrhd, rcd->ctxt);
-	mmiowb();
 	qib_write_ureg(rcd->dd, ur_rcvhdrhead, hd, rcd->ctxt);
 	qib_write_ureg(rcd->dd, ur_rcvhdrhead, hd, rcd->ctxt);
-	mmiowb();
 }
 
 static u32 qib_7322_hdrqempty(struct qib_ctxtdata *rcd)
diff --git a/drivers/infiniband/hw/qib/qib_sd7220.c b/drivers/infiniband/hw/qib/qib_sd7220.c
index 12caf3db8c34..4f4a09c2dbcd 100644
--- a/drivers/infiniband/hw/qib/qib_sd7220.c
+++ b/drivers/infiniband/hw/qib/qib_sd7220.c
@@ -1068,7 +1068,6 @@ static int qib_sd_setvals(struct qib_devdata *dd)
 	for (idx = 0; idx < NUM_DDS_REGS; ++idx) {
 		data = ((dds_reg_map & 0xF) << 4) | TX_FAST_ELT;
 		writeq(data, iaddr + idx);
-		mmiowb();
 		qib_read_kreg32(dd, kr_scratch);
 		dds_reg_map >>= 4;
 		for (midx = 0; midx < DDS_ROWS; ++midx) {
@@ -1076,7 +1075,6 @@ static int qib_sd_setvals(struct qib_devdata *dd)
 
 			data = dds_init_vals[midx].reg_vals[idx];
 			writeq(data, daddr);
-			mmiowb();
 			qib_read_kreg32(dd, kr_scratch);
 		} /* End inner for (vals for this reg, each row) */
 	} /* end outer for (regs to be stored) */
@@ -1098,13 +1096,11 @@ static int qib_sd_setvals(struct qib_devdata *dd)
 		didx = idx + min_idx;
 		/* Store the next RXEQ register address */
 		writeq(rxeq_init_vals[idx].rdesc, iaddr + didx);
-		mmiowb();
 		qib_read_kreg32(dd, kr_scratch);
 		/* Iterate through RXEQ values */
 		for (vidx = 0; vidx < 4; vidx++) {
 			data = rxeq_init_vals[idx].rdata[vidx];
 			writeq(data, taddr + (vidx << 6) + idx);
-			mmiowb();
 			qib_read_kreg32(dd, kr_scratch);
 		}
 	} /* end outer for (Reg-writes for RXEQ) */
diff --git a/drivers/media/pci/dt3155/dt3155.c b/drivers/media/pci/dt3155/dt3155.c
index 17d69bd5d7f1..49677ee889e3 100644
--- a/drivers/media/pci/dt3155/dt3155.c
+++ b/drivers/media/pci/dt3155/dt3155.c
@@ -46,7 +46,6 @@ static int read_i2c_reg(void __iomem *addr, u8 index, u8 *data)
 	u32 tmp = index;
 
 	iowrite32((tmp << 17) | IIC_READ, addr + IIC_CSR2);
-	mmiowb();
 	udelay(45); /* wait at least 43 usec for NEW_CYCLE to clear */
 	if (ioread32(addr + IIC_CSR2) & NEW_CYCLE)
 		return -EIO; /* error: NEW_CYCLE not cleared */
@@ -77,7 +76,6 @@ static int write_i2c_reg(void __iomem *addr, u8 index, u8 data)
 	u32 tmp = index;
 
 	iowrite32((tmp << 17) | IIC_WRITE | data, addr + IIC_CSR2);
-	mmiowb();
 	udelay(65); /* wait at least 63 usec for NEW_CYCLE to clear */
 	if (ioread32(addr + IIC_CSR2) & NEW_CYCLE)
 		return -EIO; /* error: NEW_CYCLE not cleared */
@@ -104,7 +102,6 @@ static void write_i2c_reg_nowait(void __iomem *addr, u8 index, u8 data)
 	u32 tmp = index;
 
 	iowrite32((tmp << 17) | IIC_WRITE | data, addr + IIC_CSR2);
-	mmiowb();
 }
 
 /**
@@ -264,7 +261,6 @@ static irqreturn_t dt3155_irq_handler_even(int irq, void *dev_id)
 						FLD_DN_ODD | FLD_DN_EVEN |
 						CAP_CONT_EVEN | CAP_CONT_ODD,
 							ipd->regs + CSR1);
-		mmiowb();
 	}
 
 	spin_lock(&ipd->lock);
@@ -282,7 +278,6 @@ static irqreturn_t dt3155_irq_handler_even(int irq, void *dev_id)
 		iowrite32(dma_addr + ipd->width, ipd->regs + ODD_DMA_START);
 		iowrite32(ipd->width, ipd->regs + EVEN_DMA_STRIDE);
 		iowrite32(ipd->width, ipd->regs + ODD_DMA_STRIDE);
-		mmiowb();
 	}
 
 	/* enable interrupts, clear all irq flags */
@@ -437,12 +432,10 @@ static int dt3155_init_board(struct dt3155_priv *pd)
 	/*  resetting the adapter  */
 	iowrite32(ADDR_ERR_ODD | ADDR_ERR_EVEN | FLD_CRPT_ODD | FLD_CRPT_EVEN |
 			FLD_DN_ODD | FLD_DN_EVEN, pd->regs + CSR1);
-	mmiowb();
 	msleep(20);
 
 	/*  initializing adapter registers  */
 	iowrite32(FIFO_EN | SRST, pd->regs + CSR1);
-	mmiowb();
 	iowrite32(0xEEEEEE01, pd->regs + EVEN_PIXEL_FMT);
 	iowrite32(0xEEEEEE01, pd->regs + ODD_PIXEL_FMT);
 	iowrite32(0x00000020, pd->regs + FIFO_TRIGER);
@@ -454,7 +447,6 @@ static int dt3155_init_board(struct dt3155_priv *pd)
 	iowrite32(0, pd->regs + MASK_LENGTH);
 	iowrite32(0x0005007C, pd->regs + FIFO_FLAG_CNT);
 	iowrite32(0x01010101, pd->regs + IIC_CLK_DUR);
-	mmiowb();
 
 	/* verifying that we have a DT3155 board (not just a SAA7116 chip) */
 	read_i2c_reg(pd->regs, DT_ID, &tmp);
diff --git a/drivers/memstick/host/jmb38x_ms.c b/drivers/memstick/host/jmb38x_ms.c
index bcdca9fbef51..e3a5af65dbce 100644
--- a/drivers/memstick/host/jmb38x_ms.c
+++ b/drivers/memstick/host/jmb38x_ms.c
@@ -644,7 +644,6 @@ static int jmb38x_ms_reset(struct jmb38x_ms_host *host)
 	writel(HOST_CONTROL_RESET_REQ | HOST_CONTROL_CLOCK_EN
 	       | readl(host->addr + HOST_CONTROL),
 	       host->addr + HOST_CONTROL);
-	mmiowb();
 
 	for (cnt = 0; cnt < 20; ++cnt) {
 		if (!(HOST_CONTROL_RESET_REQ
@@ -659,7 +658,6 @@ static int jmb38x_ms_reset(struct jmb38x_ms_host *host)
 	writel(HOST_CONTROL_RESET | HOST_CONTROL_CLOCK_EN
 	       | readl(host->addr + HOST_CONTROL),
 	       host->addr + HOST_CONTROL);
-	mmiowb();
 
 	for (cnt = 0; cnt < 20; ++cnt) {
 		if (!(HOST_CONTROL_RESET
@@ -672,7 +670,6 @@ static int jmb38x_ms_reset(struct jmb38x_ms_host *host)
 	return -EIO;
 
 reset_ok:
-	mmiowb();
 	writel(INT_STATUS_ALL, host->addr + INT_SIGNAL_ENABLE);
 	writel(INT_STATUS_ALL, host->addr + INT_STATUS_ENABLE);
 	return 0;
@@ -1009,7 +1006,6 @@ static void jmb38x_ms_remove(struct pci_dev *dev)
 		tasklet_kill(&host->notify);
 		writel(0, host->addr + INT_SIGNAL_ENABLE);
 		writel(0, host->addr + INT_STATUS_ENABLE);
-		mmiowb();
 		dev_dbg(&jm->pdev->dev, "interrupts off\n");
 		spin_lock_irqsave(&host->lock, flags);
 		if (host->req) {
diff --git a/drivers/misc/ioc4.c b/drivers/misc/ioc4.c
index ec0832278170..9d0445a567db 100644
--- a/drivers/misc/ioc4.c
+++ b/drivers/misc/ioc4.c
@@ -156,7 +156,6 @@ ioc4_clock_calibrate(struct ioc4_driver_data *idd)
 
 	/* Reset to power-on state */
 	writel(0, &idd->idd_misc_regs->int_out.raw);
-	mmiowb();
 
 	/* Set up square wave */
 	int_out.raw = 0;
@@ -164,7 +163,6 @@ ioc4_clock_calibrate(struct ioc4_driver_data *idd)
 	int_out.fields.mode = IOC4_INT_OUT_MODE_TOGGLE;
 	int_out.fields.diag = 0;
 	writel(int_out.raw, &idd->idd_misc_regs->int_out.raw);
-	mmiowb();
 
 	/* Check square wave period averaged over some number of cycles */
 	start = ktime_get_ns();
diff --git a/drivers/misc/mei/hw-me.c b/drivers/misc/mei/hw-me.c
index 3fbbadfa2ae1..8a47a6fc3fc7 100644
--- a/drivers/misc/mei/hw-me.c
+++ b/drivers/misc/mei/hw-me.c
@@ -350,9 +350,6 @@ static void mei_me_hw_reset_release(struct mei_device *dev)
 	hcsr |= H_IG;
 	hcsr &= ~H_RST;
 	mei_hcsr_set(dev, hcsr);
-
-	/* complete this write before we set host ready on another CPU */
-	mmiowb();
 }
 
 /**
diff --git a/drivers/misc/tifm_7xx1.c b/drivers/misc/tifm_7xx1.c
index 9ac95b48ef92..cc729f7ab32e 100644
--- a/drivers/misc/tifm_7xx1.c
+++ b/drivers/misc/tifm_7xx1.c
@@ -403,7 +403,6 @@ static void tifm_7xx1_remove(struct pci_dev *dev)
 	fm->eject = tifm_7xx1_dummy_eject;
 	fm->has_ms_pif = tifm_7xx1_dummy_has_ms_pif;
 	writel(TIFM_IRQ_SETALL, fm->addr + FM_CLEAR_INTERRUPT_ENABLE);
-	mmiowb();
 	free_irq(dev->irq, fm);
 
 	tifm_remove_adapter(fm);
diff --git a/drivers/mmc/host/alcor.c b/drivers/mmc/host/alcor.c
index c712b7deb3a9..8af4fae06791 100644
--- a/drivers/mmc/host/alcor.c
+++ b/drivers/mmc/host/alcor.c
@@ -967,7 +967,6 @@ static void alcor_timeout_timer(struct work_struct *work)
 		alcor_request_complete(host, 0);
 	}
 
-	mmiowb();
 	mutex_unlock(&host->cmd_mutex);
 }
 
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index eba9bcc92ad3..929e004ddad0 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1802,7 +1802,6 @@ void sdhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
 			sdhci_send_command(host, mrq->cmd);
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 EXPORT_SYMBOL_GPL(sdhci_request);
@@ -2005,8 +2004,6 @@ void sdhci_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
 	 */
 	if (host->quirks & SDHCI_QUIRK_RESET_CMD_DATA_ON_IOS)
 		sdhci_do_reset(host, SDHCI_RESET_CMD | SDHCI_RESET_DATA);
-
-	mmiowb();
 }
 EXPORT_SYMBOL_GPL(sdhci_set_ios);
 
@@ -2098,7 +2095,6 @@ static void sdhci_enable_sdio_irq_nolock(struct sdhci_host *host, int enable)
 
 		sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
 		sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
-		mmiowb();
 	}
 }
 
@@ -2346,7 +2342,6 @@ void sdhci_send_tuning(struct sdhci_host *host, u32 opcode)
 
 	host->tuning_done = 0;
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	/* Wait for Buffer Read Ready interrupt */
@@ -2697,7 +2692,6 @@ static bool sdhci_request_done(struct sdhci_host *host)
 
 	host->mrqs_done[i] = NULL;
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	mmc_request_done(host->mmc, mrq);
@@ -2731,7 +2725,6 @@ static void sdhci_timeout_timer(struct timer_list *t)
 		sdhci_finish_mrq(host, host->cmd->mrq);
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 
@@ -2762,7 +2755,6 @@ static void sdhci_timeout_data_timer(struct timer_list *t)
 		}
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 
@@ -3243,7 +3235,6 @@ int sdhci_resume_host(struct sdhci_host *host)
 		mmc->ops->set_ios(mmc, &mmc->ios);
 	} else {
 		sdhci_init(host, (host->mmc->pm_flags & MMC_PM_KEEP_POWER));
-		mmiowb();
 	}
 
 	if (host->irq_wake_enabled) {
@@ -3376,7 +3367,6 @@ void sdhci_cqe_enable(struct mmc_host *mmc)
 		 mmc_hostname(mmc), host->ier,
 		 sdhci_readl(host, SDHCI_INT_STATUS));
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 EXPORT_SYMBOL_GPL(sdhci_cqe_enable);
@@ -3401,7 +3391,6 @@ void sdhci_cqe_disable(struct mmc_host *mmc, bool recovery)
 		 mmc_hostname(mmc), host->ier,
 		 sdhci_readl(host, SDHCI_INT_STATUS));
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 EXPORT_SYMBOL_GPL(sdhci_cqe_disable);
@@ -4240,8 +4229,6 @@ int __sdhci_add_host(struct sdhci_host *host)
 		goto unirq;
 	}
 
-	mmiowb();
-
 	ret = mmc_add_host(mmc);
 	if (ret)
 		goto unled;
diff --git a/drivers/mmc/host/tifm_sd.c b/drivers/mmc/host/tifm_sd.c
index b6644ce296b2..35dd34b82a4d 100644
--- a/drivers/mmc/host/tifm_sd.c
+++ b/drivers/mmc/host/tifm_sd.c
@@ -889,7 +889,6 @@ static int tifm_sd_initialize_host(struct tifm_sd *host)
 	struct tifm_dev *sock = host->dev;
 
 	writel(0, sock->addr + SOCK_MMCSD_INT_ENABLE);
-	mmiowb();
 	host->clk_div = 61;
 	host->clk_freq = 20000000;
 	writel(TIFM_MMCSD_RESET, sock->addr + SOCK_MMCSD_SYSTEM_CONTROL);
@@ -940,7 +939,6 @@ static int tifm_sd_initialize_host(struct tifm_sd *host)
 	writel(TIFM_MMCSD_CERR | TIFM_MMCSD_BRS | TIFM_MMCSD_EOC
 	       | TIFM_MMCSD_ERRMASK,
 	       sock->addr + SOCK_MMCSD_INT_ENABLE);
-	mmiowb();
 
 	return 0;
 }
@@ -1005,7 +1003,6 @@ static void tifm_sd_remove(struct tifm_dev *sock)
 	spin_lock_irqsave(&sock->lock, flags);
 	host->eject = 1;
 	writel(0, sock->addr + SOCK_MMCSD_INT_ENABLE);
-	mmiowb();
 	spin_unlock_irqrestore(&sock->lock, flags);
 
 	tasklet_kill(&host->finish_tasklet);
diff --git a/drivers/mmc/host/via-sdmmc.c b/drivers/mmc/host/via-sdmmc.c
index 32c4211506fc..412395ac2935 100644
--- a/drivers/mmc/host/via-sdmmc.c
+++ b/drivers/mmc/host/via-sdmmc.c
@@ -686,7 +686,6 @@ static void via_sdc_request(struct mmc_host *mmc, struct mmc_request *mrq)
 		via_sdc_send_command(host, mrq->cmd);
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 
@@ -711,7 +710,6 @@ static void via_sdc_set_power(struct via_crdr_mmc_host *host,
 		gatt &= ~VIA_CRDR_PCICLKGATT_PAD_PWRON;
 	writeb(gatt, host->pcictrl_mmiobase + VIA_CRDR_PCICLKGATT);
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	via_pwron_sleep(host);
@@ -770,7 +768,6 @@ static void via_sdc_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
 	if (readb(addrbase + VIA_CRDR_PCISDCCLK) != clock)
 		writeb(clock, addrbase + VIA_CRDR_PCISDCCLK);
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	if (ios->power_mode != MMC_POWER_OFF)
@@ -830,7 +827,6 @@ static void via_reset_pcictrl(struct via_crdr_mmc_host *host)
 	via_restore_pcictrlreg(host);
 	via_restore_sdcreg(host);
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 
@@ -925,7 +921,6 @@ static irqreturn_t via_sdc_isr(int irq, void *dev_id)
 
 	result = IRQ_HANDLED;
 
-	mmiowb();
 out:
 	spin_unlock(&sdhost->lock);
 
@@ -960,7 +955,6 @@ static void via_sdc_timeout(struct timer_list *t)
 		}
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&sdhost->lock, flags);
 }
 
@@ -1012,7 +1006,6 @@ static void via_sdc_card_detect(struct work_struct *work)
 			tasklet_schedule(&host->finish_tasklet);
 		}
 
-		mmiowb();
 		spin_unlock_irqrestore(&host->lock, flags);
 
 		via_reset_pcictrl(host);
@@ -1020,7 +1013,6 @@ static void via_sdc_card_detect(struct work_struct *work)
 		spin_lock_irqsave(&host->lock, flags);
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	via_print_pcictrl(host);
@@ -1188,7 +1180,6 @@ static void via_sd_remove(struct pci_dev *pcidev)
 
 	/* Disable generating further interrupts */
 	writeb(0x0, sdhost->pcictrl_mmiobase + VIA_CRDR_PCIINTCTRL);
-	mmiowb();
 
 	if (sdhost->mrq) {
 		pr_err("%s: Controller removed during "
@@ -1197,7 +1188,6 @@ static void via_sd_remove(struct pci_dev *pcidev)
 		/* make sure all DMA is stopped */
 		writel(VIA_CRDR_DMACTRL_SFTRST,
 			sdhost->ddma_mmiobase + VIA_CRDR_DMACTRL);
-		mmiowb();
 		sdhost->mrq->cmd->error = -ENOMEDIUM;
 		if (sdhost->mrq->stop)
 			sdhost->mrq->stop->error = -ENOMEDIUM;
diff --git a/drivers/mtd/nand/raw/r852.c b/drivers/mtd/nand/raw/r852.c
index c01422d953dd..a928effd352e 100644
--- a/drivers/mtd/nand/raw/r852.c
+++ b/drivers/mtd/nand/raw/r852.c
@@ -45,7 +45,6 @@ static inline void r852_write_reg(struct r852_device *dev,
 						int address, uint8_t value)
 {
 	writeb(value, dev->mmio + address);
-	mmiowb();
 }
 
 
@@ -61,7 +60,6 @@ static inline void r852_write_reg_dword(struct r852_device *dev,
 							int address, uint32_t value)
 {
 	writel(cpu_to_le32(value), dev->mmio + address);
-	mmiowb();
 }
 
 /* returns pointer to our private structure */
diff --git a/drivers/mtd/nand/raw/txx9ndfmc.c b/drivers/mtd/nand/raw/txx9ndfmc.c
index ddf0420c0997..97978227aa55 100644
--- a/drivers/mtd/nand/raw/txx9ndfmc.c
+++ b/drivers/mtd/nand/raw/txx9ndfmc.c
@@ -159,7 +159,6 @@ static void txx9ndfmc_cmd_ctrl(struct nand_chip *chip, int cmd,
 		if ((ctrl & NAND_CTRL_CHANGE) && cmd == NAND_CMD_NONE)
 			txx9ndfmc_write(dev, 0, TXX9_NDFDTR);
 	}
-	mmiowb();
 }
 
 static int txx9ndfmc_dev_ready(struct nand_chip *chip)
diff --git a/drivers/net/ethernet/aeroflex/greth.c b/drivers/net/ethernet/aeroflex/greth.c
index 47e5984f16fb..3155f7fa83eb 100644
--- a/drivers/net/ethernet/aeroflex/greth.c
+++ b/drivers/net/ethernet/aeroflex/greth.c
@@ -613,7 +613,6 @@ static irqreturn_t greth_interrupt(int irq, void *dev_id)
 		napi_schedule(&greth->napi);
 	}
 
-	mmiowb();
 	spin_unlock(&greth->devlock);
 
 	return retval;
diff --git a/drivers/net/ethernet/alacritech/slicoss.c b/drivers/net/ethernet/alacritech/slicoss.c
index 16477aa6d61f..4f7e792e50e9 100644
--- a/drivers/net/ethernet/alacritech/slicoss.c
+++ b/drivers/net/ethernet/alacritech/slicoss.c
@@ -345,8 +345,6 @@ static void slic_set_rx_mode(struct net_device *dev)
 	if (sdev->promisc != set_promisc) {
 		sdev->promisc = set_promisc;
 		slic_configure_rcv(sdev);
-		/* make sure writes to receiver cant leak out of the lock */
-		mmiowb();
 	}
 	spin_unlock_bh(&sdev->link_lock);
 }
@@ -1461,8 +1459,6 @@ static netdev_tx_t slic_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (slic_get_free_tx_descs(txq) < SLIC_MAX_REQ_TX_DESCS)
 		netif_stop_queue(dev);
-	/* make sure writes to io-memory cant leak out of tx queue lock */
-	mmiowb();
 
 	return NETDEV_TX_OK;
 drop_skb:
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index b17d435de09f..05798aa5bb73 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -2016,7 +2016,6 @@ void ena_com_aenq_intr_handler(struct ena_com_dev *dev, void *data)
 	mb();
 	writel_relaxed((u32)aenq->head,
 		       dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
-	mmiowb();
 }
 
 int ena_com_dev_reset(struct ena_com_dev *ena_dev,
diff --git a/drivers/net/ethernet/atheros/atlx/atl1.c b/drivers/net/ethernet/atheros/atlx/atl1.c
index 63edc5706c09..f0a42d5fc81a 100644
--- a/drivers/net/ethernet/atheros/atlx/atl1.c
+++ b/drivers/net/ethernet/atheros/atlx/atl1.c
@@ -2439,7 +2439,6 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb,
 	atl1_tx_map(adapter, skb, ptpd);
 	atl1_tx_queue(adapter, count, ptpd);
 	atl1_update_mailbox(adapter);
-	mmiowb();
 	return NETDEV_TX_OK;
 }
 
diff --git a/drivers/net/ethernet/atheros/atlx/atl2.c b/drivers/net/ethernet/atheros/atlx/atl2.c
index 31ff1e0d1baa..2e0d935cbc62 100644
--- a/drivers/net/ethernet/atheros/atlx/atl2.c
+++ b/drivers/net/ethernet/atheros/atlx/atl2.c
@@ -908,7 +908,6 @@ static netdev_tx_t atl2_xmit_frame(struct sk_buff *skb,
 	ATL2_WRITE_REGW(&adapter->hw, REG_MB_TXD_WR_IDX,
 		(adapter->txd_write_ptr >> 2));
 
-	mmiowb();
 	dev_kfree_skb_any(skb);
 	return NETDEV_TX_OK;
 }
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index d63371d70bce..dfdd14eadd57 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -3305,8 +3305,6 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 
 	BNX2_WR(bp, rxr->rx_bseq_addr, rxr->rx_prod_bseq);
 
-	mmiowb();
-
 	return rx_pkt;
 
 }
@@ -6723,8 +6721,6 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	BNX2_WR16(bp, txr->tx_bidx_addr, prod);
 	BNX2_WR(bp, txr->tx_bseq_addr, txr->tx_prod_bseq);
 
-	mmiowb();
-
 	txr->tx_prod = prod;
 
 	if (unlikely(bnx2_tx_avail(bp, txr) <= MAX_SKB_FRAGS)) {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index ecb1bd7eb508..0c8f5b546c6f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4166,8 +4166,6 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
 
-	mmiowb();
-
 	txdata->tx_bd_prod += nbd;
 
 	if (unlikely(bnx2x_tx_avail(bp, txdata) < MAX_DESC_PER_TX_PKT)) {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 1ed068509337..2d57af9c061c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -527,8 +527,6 @@ static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
 		REG_WR_RELAXED(bp, fp->ustorm_rx_prods_offset + i * 4,
 			       ((u32 *)&rx_prods)[i]);
 
-	mmiowb();
-
 	DP(NETIF_MSG_RX_STATUS,
 	   "queue[%d]:  wrote  bd_prod %u  cqe_prod %u  sge_prod %u\n",
 	   fp->index, bd_prod, rx_comp_prod, rx_sge_prod);
@@ -653,7 +651,6 @@ static inline void bnx2x_igu_ack_sb_gen(struct bnx2x *bp, u8 igu_sb_id,
 	REG_WR(bp, igu_addr, cmd_data.sb_id_and_flags);
 
 	/* Make sure that ACK is written */
-	mmiowb();
 	barrier();
 }
 
@@ -674,7 +671,6 @@ static inline void bnx2x_hc_ack_sb(struct bnx2x *bp, u8 sb_id,
 	REG_WR(bp, hc_addr, (*(u32 *)&igu_ack));
 
 	/* Make sure that ACK is written */
-	mmiowb();
 	barrier();
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 749d0ef44371..0745cccd416d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -2623,7 +2623,6 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
 	wmb();
 	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
 
-	mmiowb();
 	barrier();
 
 	num_pkts++;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 64bc6d6fcd65..9fdc2506ea30 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -869,9 +869,6 @@ static void bnx2x_hc_int_disable(struct bnx2x *bp)
 	   "write %x to HC %d (addr 0x%x)\n",
 	   val, port, addr);
 
-	/* flush all outstanding writes */
-	mmiowb();
-
 	REG_WR(bp, addr, val);
 	if (REG_RD(bp, addr) != val)
 		BNX2X_ERR("BUG! Proper val not read from IGU!\n");
@@ -887,9 +884,6 @@ static void bnx2x_igu_int_disable(struct bnx2x *bp)
 
 	DP(NETIF_MSG_IFDOWN, "write %x to IGU\n", val);
 
-	/* flush all outstanding writes */
-	mmiowb();
-
 	REG_WR(bp, IGU_REG_PF_CONFIGURATION, val);
 	if (REG_RD(bp, IGU_REG_PF_CONFIGURATION) != val)
 		BNX2X_ERR("BUG! Proper val not read from IGU!\n");
@@ -1595,7 +1589,6 @@ static void bnx2x_hc_int_enable(struct bnx2x *bp)
 	/*
 	 * Ensure that HC_CONFIG is written before leading/trailing edge config
 	 */
-	mmiowb();
 	barrier();
 
 	if (!CHIP_IS_E1(bp)) {
@@ -1611,9 +1604,6 @@ static void bnx2x_hc_int_enable(struct bnx2x *bp)
 		REG_WR(bp, HC_REG_TRAILING_EDGE_0 + port*8, val);
 		REG_WR(bp, HC_REG_LEADING_EDGE_0 + port*8, val);
 	}
-
-	/* Make sure that interrupts are indeed enabled from here on */
-	mmiowb();
 }
 
 static void bnx2x_igu_int_enable(struct bnx2x *bp)
@@ -1674,9 +1664,6 @@ static void bnx2x_igu_int_enable(struct bnx2x *bp)
 
 	REG_WR(bp, IGU_REG_TRAILING_EDGE_LATCH, val);
 	REG_WR(bp, IGU_REG_LEADING_EDGE_LATCH, val);
-
-	/* Make sure that interrupts are indeed enabled from here on */
-	mmiowb();
 }
 
 void bnx2x_int_enable(struct bnx2x *bp)
@@ -3833,7 +3820,6 @@ static void bnx2x_sp_prod_update(struct bnx2x *bp)
 
 	REG_WR16_RELAXED(bp, BAR_XSTRORM_INTMEM + XSTORM_SPQ_PROD_OFFSET(func),
 			 bp->spq_prod_idx);
-	mmiowb();
 }
 
 /**
@@ -5244,7 +5230,6 @@ static void bnx2x_update_eq_prod(struct bnx2x *bp, u16 prod)
 {
 	/* No memory barriers */
 	storm_memset_eq_prod(bp, prod, BP_FUNC(bp));
-	mmiowb();
 }
 
 static int  bnx2x_cnic_handle_cfc_del(struct bnx2x *bp, u32 cid,
@@ -6513,7 +6498,6 @@ void bnx2x_nic_init_cnic(struct bnx2x *bp)
 
 	/* flush all */
 	mb();
-	mmiowb();
 }
 
 void bnx2x_pre_irq_nic_init(struct bnx2x *bp)
@@ -6553,7 +6537,6 @@ void bnx2x_post_irq_nic_init(struct bnx2x *bp, u32 load_code)
 
 	/* flush all before enabling interrupts */
 	mb();
-	mmiowb();
 
 	bnx2x_int_enable(bp);
 
@@ -7775,12 +7758,10 @@ void bnx2x_igu_clear_sb_gen(struct bnx2x *bp, u8 func, u8 idu_sb_id, bool is_pf)
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 			 data, igu_addr_data);
 	REG_WR(bp, igu_addr_data, data);
-	mmiowb();
 	barrier();
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 			  ctl, igu_addr_ctl);
 	REG_WR(bp, igu_addr_ctl, ctl);
-	mmiowb();
 	barrier();
 
 	/* wait for clean up to finish */
@@ -9550,7 +9531,6 @@ static void bnx2x_set_234_gates(struct bnx2x *bp, bool close)
 
 	DP(NETIF_MSG_HW | NETIF_MSG_IFUP, "%s gates #2, #3 and #4\n",
 		close ? "closing" : "opening");
-	mmiowb();
 }
 
 #define SHARED_MF_CLP_MAGIC  0x80000000 /* `magic' bit */
@@ -9674,7 +9654,6 @@ static void bnx2x_pxp_prep(struct bnx2x *bp)
 	if (!CHIP_IS_E1(bp)) {
 		REG_WR(bp, PXP2_REG_RD_START_INIT, 0);
 		REG_WR(bp, PXP2_REG_RQ_RBC_DONE, 0);
-		mmiowb();
 	}
 }
 
@@ -9774,16 +9753,13 @@ static void bnx2x_process_kill_chip_reset(struct bnx2x *bp, bool global)
 	       reset_mask1 & (~not_reset_mask1));
 
 	barrier();
-	mmiowb();
 
 	REG_WR(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_2_SET,
 	       reset_mask2 & (~stay_reset2));
 
 	barrier();
-	mmiowb();
 
 	REG_WR(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_1_SET, reset_mask1);
-	mmiowb();
 }
 
 /**
@@ -9867,9 +9843,6 @@ static int bnx2x_process_kill(struct bnx2x *bp, bool global)
 	REG_WR(bp, MISC_REG_UNPREPARED, 0);
 	barrier();
 
-	/* Make sure all is written to the chip before the reset */
-	mmiowb();
-
 	/* Wait for 1ms to empty GLUE and PCI-E core queues,
 	 * PSWHST, GRC and PSWRD Tetris buffer.
 	 */
@@ -14830,7 +14803,6 @@ static int bnx2x_drv_ctl(struct net_device *dev, struct drv_ctl_info *ctl)
 		if (rc)
 			break;
 
-		mmiowb();
 		barrier();
 
 		/* Start accepting on iSCSI L2 ring */
@@ -14865,7 +14837,6 @@ static int bnx2x_drv_ctl(struct net_device *dev, struct drv_ctl_info *ctl)
 		if (!bnx2x_wait_sp_comp(bp, sp_bits))
 			BNX2X_ERR("rx_mode completion timed out!\n");
 
-		mmiowb();
 		barrier();
 
 		/* Unset iSCSI L2 MAC */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
index a9eaaf3e73a4..9f2bd1ae618c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
@@ -5039,7 +5039,6 @@ static inline int bnx2x_q_init(struct bnx2x *bp,
 	/* As no ramrod is sent, complete the command immediately  */
 	o->complete_cmd(bp, o, BNX2X_Q_CMD_INIT);
 
-	mmiowb();
 	smp_mb();
 
 	return 0;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index c835f6c7ecd0..b1db47895f16 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -100,13 +100,11 @@ static void bnx2x_vf_igu_ack_sb(struct bnx2x *bp, struct bnx2x_virtf *vf,
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 	   cmd_data.sb_id_and_flags, igu_addr_data);
 	REG_WR(bp, igu_addr_data, cmd_data.sb_id_and_flags);
-	mmiowb();
 	barrier();
 
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 	   ctl, igu_addr_ctl);
 	REG_WR(bp, igu_addr_ctl, ctl);
-	mmiowb();
 	barrier();
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 8e0a317b31f7..d29a51fdd3e3 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -172,8 +172,6 @@ static int bnx2x_send_msg2pf(struct bnx2x *bp, u8 *done, dma_addr_t msg_mapping)
 	/* Trigger the PF FW */
 	writeb_relaxed(1, &zone_data->trigger.vf_pf_channel.addr_valid);
 
-	mmiowb();
-
 	/* Wait for PF to complete */
 	while ((tout >= 0) && (!*done)) {
 		msleep(interval);
@@ -1179,7 +1177,6 @@ static void bnx2x_vf_mbx_resp_send_msg(struct bnx2x *bp,
 
 	/* ack the FW */
 	storm_memset_vf_mbx_ack(bp, vf->abs_vfid);
-	mmiowb();
 
 	/* copy the response header including status-done field,
 	 * must be last dmae, must be after FW is acked
@@ -2178,7 +2175,6 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf,
 		 */
 		storm_memset_vf_mbx_ack(bp, vf->abs_vfid);
 		/* Firmware ack should be written before unlocking channel */
-		mmiowb();
 		bnx2x_unlock_vf_pf_channel(bp, vf, mbx->first_tlv.tl.type);
 	}
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d95730c6e0f2..4ea28212d784 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -546,8 +546,6 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 tx_done:
 
-	mmiowb();
-
 	if (unlikely(bnxt_tx_avail(bp, txr) <= MAX_SKB_FRAGS + 1)) {
 		if (skb->xmit_more && !tx_buf->is_push)
 			bnxt_db_write(bp, &txr->tx_db, prod);
@@ -2113,7 +2111,6 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 			       &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
-	mmiowb();
 	return work_done;
 }
 
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index b1627dd5f2fd..4e0c69f6a342 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -1073,7 +1073,6 @@ static void tg3_int_reenable(struct tg3_napi *tnapi)
 	struct tg3 *tp = tnapi->tp;
 
 	tw32_mailbox(tnapi->int_mbox, tnapi->last_tag << 24);
-	mmiowb();
 
 	/* When doing tagged status, this work check is unnecessary.
 	 * The last_tag we write above tells the chip which piece of
@@ -6999,7 +6998,6 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 			tw32_rx_mbox(TG3_RX_JMB_PROD_IDX_REG,
 				     tpr->rx_jmb_prod_idx);
 		}
-		mmiowb();
 	} else if (work_mask) {
 		/* rx_std_buffers[] and rx_jmb_buffers[] entries must be
 		 * updated before the producer indices can be updated.
@@ -7210,8 +7208,6 @@ static int tg3_poll_work(struct tg3_napi *tnapi, int work_done, int budget)
 			tw32_rx_mbox(TG3_RX_JMB_PROD_IDX_REG,
 				     dpr->rx_jmb_prod_idx);
 
-		mmiowb();
-
 		if (err)
 			tw32_f(HOSTCC_MODE, tp->coal_now);
 	}
@@ -7278,7 +7274,6 @@ static int tg3_poll_msix(struct napi_struct *napi, int budget)
 						  HOSTCC_MODE_ENABLE |
 						  tnapi->coal_now);
 			}
-			mmiowb();
 			break;
 		}
 	}
@@ -8159,7 +8154,6 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (!skb->xmit_more || netif_xmit_stopped(txq)) {
 		/* Packets are ready, update Tx producer idx on card. */
 		tw32_tx_mbox(tnapi->prodmbox, entry);
-		mmiowb();
 	}
 
 	return NETDEV_TX_OK;
diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
index 2df7440f58df..39643be8c30a 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
@@ -38,9 +38,6 @@ int lio_cn6xxx_soft_reset(struct octeon_device *oct)
 	lio_pci_readq(oct, CN6XXX_CIU_SOFT_RST);
 	lio_pci_writeq(oct, 1, CN6XXX_CIU_SOFT_RST);
 
-	/* make sure that the reset is written before starting timer */
-	mmiowb();
-
 	/* Wait for 10ms as Octeon resets. */
 	mdelay(100);
 
@@ -487,9 +484,6 @@ void lio_cn6xxx_disable_interrupt(struct octeon_device *oct,
 
 	/* Disable Interrupts */
 	writeq(0, cn6xxx->intr_enb_reg64);
-
-	/* make sure interrupts are really disabled */
-	mmiowb();
 }
 
 static void lio_cn6xxx_get_pcie_qlmport(struct octeon_device *oct)
@@ -555,10 +549,6 @@ static int lio_cn6xxx_process_droq_intr_regs(struct octeon_device *oct)
 				value &= ~(1 << oq_no);
 				octeon_write_csr(oct, reg, value);
 
-				/* Ensure that the enable register is written.
-				 */
-				mmiowb();
-
 				spin_unlock(&cn6xxx->lock_for_droq_int_enb_reg);
 			}
 		}
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index ce8c3f818666..934115d18488 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -1449,7 +1449,6 @@ void lio_enable_irq(struct octeon_droq *droq, struct octeon_instr_queue *iq)
 		iq->pkt_in_done -= iq->pkts_processed;
 		iq->pkts_processed = 0;
 		/* this write needs to be flushed before we release the lock */
-		mmiowb();
 		spin_unlock_bh(&iq->lock);
 		oct = iq->oct_dev;
 	}
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
index a0c099f71524..017169023cca 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
@@ -513,8 +513,6 @@ int octeon_retry_droq_refill(struct octeon_droq *droq)
 		 */
 		wmb();
 		writel(desc_refilled, droq->pkts_credit_reg);
-		/* make sure mmio write completes */
-		mmiowb();
 
 		if (pkts_credit + desc_refilled >= CN23XX_SLI_DEF_BP)
 			reschedule = 0;
@@ -712,8 +710,6 @@ octeon_droq_fast_process_packets(struct octeon_device *oct,
 				 */
 				wmb();
 				writel(desc_refilled, droq->pkts_credit_reg);
-				/* make sure mmio write completes */
-				mmiowb();
 			}
 		}
 	}                       /* for (each packet)... */
diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index c6f4cbda040f..fcf20a8f92d9 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -278,7 +278,6 @@ ring_doorbell(struct octeon_device *oct, struct octeon_instr_queue *iq)
 	if (atomic_read(&oct->status) == OCT_DEV_RUNNING) {
 		writel(iq->fill_cnt, iq->doorbell_reg);
 		/* make sure doorbell write goes through */
-		mmiowb();
 		iq->fill_cnt = 0;
 		iq->last_db_time = jiffies;
 		return;
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 8fe9af0e2ab7..466bf1ea186d 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3270,11 +3270,6 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 		if (!skb->xmit_more ||
 		    netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
 			writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
-			/* we need this if more than one processor can write to
-			 * our tail at a time, it synchronizes IO on IA64/Altix
-			 * systems
-			 */
-			mmiowb();
 		}
 	} else {
 		dev_kfree_skb_any(skb);
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 189f231075c2..7066ace57320 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3816,7 +3816,6 @@ static void e1000_flush_tx_ring(struct e1000_adapter *adapter)
 	if (tx_ring->next_to_use == tx_ring->count)
 		tx_ring->next_to_use = 0;
 	ew32(TDT(0), tx_ring->next_to_use);
-	mmiowb();
 	usleep_range(200, 250);
 }
 
@@ -5907,12 +5906,6 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 						     tx_ring->next_to_use);
 			else
 				writel(tx_ring->next_to_use, tx_ring->tail);
-
-			/* we need this if more than one processor can write
-			 * to our tail at a time, it synchronizes IO on
-			 *IA64/Altix systems
-			 */
-			mmiowb();
 		}
 	} else {
 		dev_kfree_skb_any(skb);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 5d4f1761dc0c..8de77155f2e7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -321,8 +321,6 @@ static void fm10k_mask_aer_comp_abort(struct pci_dev *pdev)
 	pci_read_config_dword(pdev, pos + PCI_ERR_UNCOR_MASK, &err_mask);
 	err_mask |= PCI_ERR_UNC_COMP_ABORT;
 	pci_write_config_dword(pdev, pos + PCI_ERR_UNCOR_MASK, err_mask);
-
-	mmiowb();
 }
 
 int fm10k_iov_resume(struct pci_dev *pdev)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 6fd15a734324..dde5c3a14734 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1037,11 +1037,6 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 6c97667d20ef..ffb611bbedfa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3471,11 +3471,6 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 9b4d7cec2e18..6bfef82e7607 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -2360,11 +2360,6 @@ static inline void iavf_tx_map(struct iavf_ring *tx_ring, struct sk_buff *skb,
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 49fc38094185..5e089c84ebd4 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1297,11 +1297,6 @@ ice_tx_map(struct ice_ring *tx_ring, struct ice_tx_buf *first,
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 7137e7f9c7f3..064cab9d29d8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6043,11 +6043,6 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 	return 0;
 
diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index 4eab83faec62..34cd30d7162f 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -2279,10 +2279,6 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
 	writel(i, adapter->hw.hw_addr + tx_ring->tail);
-	/* we need this if more than one processor can write to our tail
-	 * at a time, it synchronizes IO on IA64/Altix systems
-	 */
-	mmiowb();
 }
 
 static netdev_tx_t igbvf_xmit_frame_ring_adv(struct sk_buff *skb,
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index f20183037fb2..a7d5c985e885 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -890,11 +890,6 @@ static int igc_tx_map(struct igc_ring *tx_ring,
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cb35d8202572..8665aeed33a8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8298,11 +8298,6 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index 57727fe1501e..dd544d443750 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -1138,9 +1138,6 @@ static inline void sky2_put_idx(struct sky2_hw *hw, unsigned q, u16 idx)
 	/* Make sure write' to descriptors are complete before we tell hardware */
 	wmb();
 	sky2_write16(hw, Y2_QADDR(q, PREF_UNIT_PUT_IDX), idx);
-
-	/* Synchronize I/O on since next processor may write to tail */
-	mmiowb();
 }
 
 
@@ -1353,7 +1350,6 @@ static void sky2_rx_stop(struct sky2_port *sky2)
 
 	/* reset the Rx prefetch unit */
 	sky2_write32(hw, Y2_QADDR(rxq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);
-	mmiowb();
 }
 
 /* Clean out receive buffer area, assumes receiver hardware stopped */
diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c
index c81d15bf259c..87e90b5d4d7d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/catas.c
+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c
@@ -129,10 +129,6 @@ static int mlx4_reset_slave(struct mlx4_dev *dev)
 	comm_flags = rst_req << COM_CHAN_RST_REQ_OFFSET;
 	__raw_writel((__force u32)cpu_to_be32(comm_flags),
 		     (__iomem char *)priv->mfunc.comm + MLX4_COMM_CHAN_FLAGS);
-	/* Make sure that our comm channel write doesn't
-	 * get mixed in with writes from another CPU.
-	 */
-	mmiowb();
 
 	end = msecs_to_jiffies(MLX4_COMM_TIME) + jiffies;
 	while (time_before(jiffies, end)) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e65bc3c95630..3c7a5cc29f94 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -281,7 +281,6 @@ static int mlx4_comm_cmd_post(struct mlx4_dev *dev, u8 cmd, u16 param)
 	val = param | (cmd << 16) | (priv->cmd.comm_toggle << 31);
 	__raw_writel((__force u32) cpu_to_be32(val),
 		     &priv->mfunc.comm->slave_write);
-	mmiowb();
 	mutex_unlock(&dev->persist->device_state_mutex);
 	return 0;
 }
@@ -496,12 +495,6 @@ static int mlx4_cmd_post(struct mlx4_dev *dev, u64 in_param, u64 out_param,
 					       (op_modifier << HCR_OPMOD_SHIFT) |
 					       op), hcr + 6);
 
-	/*
-	 * Make sure that our HCR writes don't get mixed in with
-	 * writes from another CPU starting a FW command.
-	 */
-	mmiowb();
-
 	cmd->toggle = cmd->toggle ^ 1;
 
 	ret = 0;
@@ -2206,7 +2199,6 @@ static void mlx4_master_do_cmd(struct mlx4_dev *dev, int slave, u8 cmd,
 	}
 	__raw_writel((__force u32) cpu_to_be32(reply),
 		     &priv->mfunc.comm[slave].slave_read);
-	mmiowb();
 
 	return;
 
@@ -2410,7 +2402,6 @@ int mlx4_multi_func_init(struct mlx4_dev *dev)
 				     &priv->mfunc.comm[i].slave_write);
 			__raw_writel((__force u32) 0,
 				     &priv->mfunc.comm[i].slave_read);
-			mmiowb();
 			for (port = 1; port <= MLX4_MAX_PORTS; port++) {
 				struct mlx4_vport_state *admin_vport;
 				struct mlx4_vport_state *oper_vport;
@@ -2576,10 +2567,6 @@ void mlx4_report_internal_err_comm_event(struct mlx4_dev *dev)
 		slave_read |= (u32)COMM_CHAN_EVENT_INTERNAL_ERR;
 		__raw_writel((__force u32)cpu_to_be32(slave_read),
 			     &priv->mfunc.comm[slave].slave_read);
-		/* Make sure that our comm channel write doesn't
-		 * get mixed in with writes from another CPU.
-		 */
-		mmiowb();
 	}
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index e267ff93e8a8..4c6c33a514ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -915,7 +915,6 @@ static void cmd_work_handler(struct work_struct *work)
 	mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 << ent->idx);
 	wmb();
 	iowrite32be(1 << ent->idx, &dev->iseg->cmd_dbell);
-	mmiowb();
 	/* if not in polling don't use ent after this point */
 	if (cmd_mode == CMD_MODE_POLLING || poll_cmd) {
 		poll_timeout(ent);
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 19ce0e605096..4767482ea922 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -1439,7 +1439,6 @@ myri10ge_tx_done(struct myri10ge_slice_state *ss, int mcp_index)
 			tx->queue_active = 0;
 			put_be32(htonl(1), tx->send_stop);
 			mb();
-			mmiowb();
 		}
 		__netif_tx_unlock(dev_queue);
 	}
@@ -2861,7 +2860,6 @@ static netdev_tx_t myri10ge_xmit(struct sk_buff *skb,
 		tx->queue_active = 1;
 		put_be32(htonl(1), tx->send_go);
 		mb();
-		mmiowb();
 	}
 	tx->pkt_start++;
 	if ((avail - count) < MXGEFW_MAX_SEND_DESC) {
diff --git a/drivers/net/ethernet/neterion/s2io.c b/drivers/net/ethernet/neterion/s2io.c
index 82be90075695..b332c53d8082 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -4153,8 +4153,6 @@ static netdev_tx_t s2io_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	writeq(val64, &tx_fifo->List_Control);
 
-	mmiowb();
-
 	put_off++;
 	if (put_off == fifo->tx_curr_put_info.fifo_len + 1)
 		put_off = 0;
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index 5ae3fa82909f..0193969bef10 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -1826,7 +1826,6 @@ static int vxge_poll_msix(struct napi_struct *napi, int budget)
 		vxge_hw_channel_msix_unmask(
 				(struct __vxge_hw_channel *)ring->handle,
 				ring->rx_vector_no);
-		mmiowb();
 	}
 
 	/* We are copying and returning the local variable, in case if after
@@ -2234,8 +2233,6 @@ static irqreturn_t vxge_tx_msix_handle(int irq, void *dev_id)
 	vxge_hw_channel_msix_unmask((struct __vxge_hw_channel *)fifo->handle,
 				    fifo->tx_vector_no);
 
-	mmiowb();
-
 	return IRQ_HANDLED;
 }
 
@@ -2272,14 +2269,12 @@ vxge_alarm_msix_handle(int irq, void *dev_id)
 		 */
 		vxge_hw_vpath_msix_mask(vdev->vpaths[i].handle, msix_id);
 		vxge_hw_vpath_msix_clear(vdev->vpaths[i].handle, msix_id);
-		mmiowb();
 
 		status = vxge_hw_vpath_alarm_process(vdev->vpaths[i].handle,
 			vdev->exec_mode);
 		if (status == VXGE_HW_OK) {
 			vxge_hw_vpath_msix_unmask(vdev->vpaths[i].handle,
 						  msix_id);
-			mmiowb();
 			continue;
 		}
 		vxge_debug_intr(VXGE_ERR,
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-traffic.c b/drivers/net/ethernet/neterion/vxge/vxge-traffic.c
index 59e77e3086bb..709d20d9938f 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-traffic.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-traffic.c
@@ -1399,11 +1399,7 @@ static void __vxge_hw_non_offload_db_post(struct __vxge_hw_fifo *fifo,
 		VXGE_HW_NODBW_GET_NO_SNOOP(no_snoop),
 		&fifo->nofl_db->control_0);
 
-	mmiowb();
-
 	writeq(txdl_ptr, &fifo->nofl_db->txdl_ptr);
-
-	mmiowb();
 }
 
 /**
diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c
index 92340919d852..21dbed2224a4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -772,18 +772,12 @@ static inline u16 qed_attn_update_idx(struct qed_hwfn *p_hwfn,
 {
 	u16 rc = 0, index;
 
-	/* Make certain HW write took affect */
-	mmiowb();
-
 	index = le16_to_cpu(p_sb_desc->sb_attn->sb_index);
 	if (p_sb_desc->index != index) {
 		p_sb_desc->index	= index;
 		rc		      = QED_SB_ATT_IDX;
 	}
 
-	/* Make certain we got a consistent view with HW */
-	mmiowb();
-
 	return rc;
 }
 
@@ -1168,7 +1162,6 @@ static void qed_sb_ack_attn(struct qed_hwfn *p_hwfn,
 	/* Both segments (interrupts & acks) are written to same place address;
 	 * Need to guarantee all commands will be received (in-order) by HW.
 	 */
-	mmiowb();
 	barrier();
 }
 
@@ -1803,9 +1796,6 @@ static void qed_int_igu_enable_attn(struct qed_hwfn *p_hwfn,
 	qed_wr(p_hwfn, p_ptt, IGU_REG_TRAILING_EDGE_LATCH, 0xfff);
 	qed_wr(p_hwfn, p_ptt, IGU_REG_ATTENTION_ENABLE, 0xfff);
 
-	/* Flush the writes to IGU */
-	mmiowb();
-
 	/* Unmask AEU signals toward IGU */
 	qed_wr(p_hwfn, p_ptt, MISC_REG_AEU_MASK_ATTN_IGU, 0xff);
 }
@@ -1869,9 +1859,6 @@ static void qed_int_igu_cleanup_sb(struct qed_hwfn *p_hwfn,
 
 	qed_wr(p_hwfn, p_ptt, IGU_REG_COMMAND_REG_CTRL, cmd_ctrl);
 
-	/* Flush the write to IGU */
-	mmiowb();
-
 	/* calculate where to read the status bit from */
 	sb_bit = 1 << (igu_sb_id % 32);
 	sb_bit_addr = igu_sb_id / 32 * sizeof(u32);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c b/drivers/net/ethernet/qlogic/qed/qed_spq.c
index ba64ff9bedbd..8c4fc39dd453 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_spq.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c
@@ -341,9 +341,6 @@ void qed_eq_prod_update(struct qed_hwfn *p_hwfn, u16 prod)
 		   USTORM_EQE_CONS_OFFSET(p_hwfn->rel_pf_id);
 
 	REG_WR16(p_hwfn, addr, prod);
-
-	/* keep prod updates ordered */
-	mmiowb();
 }
 
 int qed_eq_completion(struct qed_hwfn *p_hwfn, void *cookie)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 16331c6c6fa7..4bf4ab96ad7e 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -1517,14 +1517,6 @@ static int qede_selftest_transmit_traffic(struct qede_dev *edev,
 	barrier();
 	writel(txq->tx_db.raw, txq->doorbell_addr);
 
-	/* mmiowb is needed to synchronize doorbell writes from more than one
-	 * processor. It guarantees that the write arrives to the device before
-	 * the queue lock is released and another start_xmit is called (possibly
-	 * on another CPU). Without this barrier, the next doorbell can bypass
-	 * this doorbell. This is applicable to IA64/Altix systems.
-	 */
-	mmiowb();
-
 	for (i = 0; i < QEDE_SELFTEST_POLL_COUNT; i++) {
 		if (qede_txq_has_work(txq))
 			break;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index 31b046e24565..6f7e3622c6b4 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -580,14 +580,6 @@ void qede_update_rx_prod(struct qede_dev *edev, struct qede_rx_queue *rxq)
 
 	internal_ram_wr(rxq->hw_rxq_prod_addr, sizeof(rx_prods),
 			(u32 *)&rx_prods);
-
-	/* mmiowb is needed to synchronize doorbell writes from more than one
-	 * processor. It guarantees that the write arrives to the device before
-	 * the napi lock is released and another qede_poll is called (possibly
-	 * on another CPU). Without this barrier, the next doorbell can bypass
-	 * this doorbell. This is applicable to IA64/Altix systems.
-	 */
-	mmiowb();
 }
 
 static void qede_get_rxhash(struct sk_buff *skb, u8 bitfields, __le32 rss_hash)
diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c
index 10b075bc5959..c9d6becb4ab6 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -1858,7 +1858,6 @@ static void ql_update_small_bufq_prod_index(struct ql3_adapter *qdev)
 		wmb();
 		writel_relaxed(qdev->small_buf_q_producer_index,
 			       &port_regs->CommonRegs.rxSmallQProducerIndex);
-		mmiowb();
 	}
 }
 
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index 3e71b65a9546..ad7c5eb8a3b6 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -2181,7 +2181,6 @@ static inline void ql_write32(const struct ql_adapter *qdev, int reg, u32 val)
 static inline void ql_write_db_reg(u32 val, void __iomem *addr)
 {
 	writel(val, addr);
-	mmiowb();
 }
 
 /*
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 059ba9429e51..6cc61138ae69 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2695,7 +2695,6 @@ static netdev_tx_t qlge_send(struct sk_buff *skb, struct net_device *ndev)
 	wmb();
 
 	ql_write_db_reg_relaxed(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
-	mmiowb();
 	netif_printk(qdev, tx_queued, KERN_DEBUG, qdev->ndev,
 		     "tx queued, slot %d, len %d\n",
 		     tx_ring->prod_idx, skb->len);
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 6e36b88ca7c9..b1e509cadb55 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1286,13 +1286,11 @@ static u16 rtl_get_events(struct rtl8169_private *tp)
 static void rtl_ack_events(struct rtl8169_private *tp, u16 bits)
 {
 	RTL_W16(tp, IntrStatus, bits);
-	mmiowb();
 }
 
 static void rtl_irq_disable(struct rtl8169_private *tp)
 {
 	RTL_W16(tp, IntrMask, 0);
-	mmiowb();
 }
 
 #define RTL_EVENT_NAPI_RX	(RxOK | RxErr)
@@ -6131,8 +6129,6 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	RTL_W8(tp, TxPoll, NPQ);
 
-	mmiowb();
-
 	if (!rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
 		/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
 		 * not miss a ring update when it notices a stopped queue.
@@ -6490,7 +6486,6 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
 		napi_complete_done(napi, work_done);
 
 		rtl_irq_enable(tp);
-		mmiowb();
 	}
 
 	return work_done;
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index d28c8f9ca55b..74b373303b06 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -728,7 +728,6 @@ static irqreturn_t ravb_emac_interrupt(int irq, void *dev_id)
 
 	spin_lock(&priv->lock);
 	ravb_emac_interrupt_unlocked(ndev);
-	mmiowb();
 	spin_unlock(&priv->lock);
 	return IRQ_HANDLED;
 }
@@ -848,7 +847,6 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id)
 		result = IRQ_HANDLED;
 	}
 
-	mmiowb();
 	spin_unlock(&priv->lock);
 	return result;
 }
@@ -881,7 +879,6 @@ static irqreturn_t ravb_multi_interrupt(int irq, void *dev_id)
 		result = IRQ_HANDLED;
 	}
 
-	mmiowb();
 	spin_unlock(&priv->lock);
 	return result;
 }
@@ -898,7 +895,6 @@ static irqreturn_t ravb_dma_interrupt(int irq, void *dev_id, int q)
 	if (ravb_queue_interrupt(ndev, q))
 		result = IRQ_HANDLED;
 
-	mmiowb();
 	spin_unlock(&priv->lock);
 	return result;
 }
@@ -943,7 +939,6 @@ static int ravb_poll(struct napi_struct *napi, int budget)
 			ravb_write(ndev, ~(mask | TIS_RESERVED), TIS);
 			ravb_tx_free(ndev, q, true);
 			netif_wake_subqueue(ndev, q);
-			mmiowb();
 			spin_unlock_irqrestore(&priv->lock, flags);
 		}
 	}
@@ -959,7 +954,6 @@ static int ravb_poll(struct napi_struct *napi, int budget)
 		ravb_write(ndev, mask, RIE0);
 		ravb_write(ndev, mask, TIE);
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	/* Receive error message handling */
@@ -1008,7 +1002,6 @@ static void ravb_adjust_link(struct net_device *ndev)
 	if (priv->no_avb_link && phydev->link)
 		ravb_rcv_snd_enable(ndev);
 
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	if (new_state && netif_msg_link(priv))
@@ -1601,7 +1594,6 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 		netif_stop_subqueue(ndev, q);
 
 exit:
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 	return NETDEV_TX_OK;
 
@@ -1673,7 +1665,6 @@ static void ravb_set_rx_mode(struct net_device *ndev)
 	spin_lock_irqsave(&priv->lock, flags);
 	ravb_modify(ndev, ECMR, ECMR_PRM,
 		    ndev->flags & IFF_PROMISC ? ECMR_PRM : 0);
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 }
 
diff --git a/drivers/net/ethernet/renesas/ravb_ptp.c b/drivers/net/ethernet/renesas/ravb_ptp.c
index dce2a40a31e3..9a42580693cb 100644
--- a/drivers/net/ethernet/renesas/ravb_ptp.c
+++ b/drivers/net/ethernet/renesas/ravb_ptp.c
@@ -196,7 +196,6 @@ static int ravb_ptp_extts(struct ptp_clock_info *ptp,
 		ravb_write(ndev, GIE_PTCS, GIE);
 	else
 		ravb_write(ndev, GID_PTCD, GID);
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	return 0;
@@ -259,7 +258,6 @@ static int ravb_ptp_perout(struct ptp_clock_info *ptp,
 		else
 			ravb_write(ndev, GID_PTMD0, GID);
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	return error;
@@ -331,7 +329,6 @@ void ravb_ptp_init(struct net_device *ndev, struct platform_device *pdev)
 	spin_lock_irqsave(&priv->lock, flags);
 	ravb_wait(ndev, GCCR, GCCR_TCR, GCCR_TCR_NOREQ);
 	ravb_modify(ndev, GCCR, GCCR_TCSS, GCCR_TCSS_ADJGPTP);
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	priv->ptp.clock = ptp_clock_register(&priv->ptp.info, &pdev->dev);
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index f27a0dc8c563..ff6e48845961 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1989,7 +1989,6 @@ static void sh_eth_adjust_link(struct net_device *ndev)
 	if ((mdp->cd->no_psr || mdp->no_ether_link) && phydev->link)
 		sh_eth_rcv_snd_enable(ndev);
 
-	mmiowb();
 	spin_unlock_irqrestore(&mdp->lock, flags);
 
 	if (new_state && netif_msg_link(mdp))
diff --git a/drivers/net/ethernet/sfc/falcon/io.h b/drivers/net/ethernet/sfc/falcon/io.h
index 7085ee1d5e2b..c3577643fbda 100644
--- a/drivers/net/ethernet/sfc/falcon/io.h
+++ b/drivers/net/ethernet/sfc/falcon/io.h
@@ -108,7 +108,6 @@ static inline void ef4_writeo(struct ef4_nic *efx, const ef4_oword_t *value,
 	_ef4_writed(efx, value->u32[2], reg + 8);
 	_ef4_writed(efx, value->u32[3], reg + 12);
 #endif
-	mmiowb();
 	spin_unlock_irqrestore(&efx->biu_lock, flags);
 }
 
@@ -130,7 +129,6 @@ static inline void ef4_sram_writeq(struct ef4_nic *efx, void __iomem *membase,
 	__raw_writel((__force u32)value->u32[0], membase + addr);
 	__raw_writel((__force u32)value->u32[1], membase + addr + 4);
 #endif
-	mmiowb();
 	spin_unlock_irqrestore(&efx->biu_lock, flags);
 }
 
diff --git a/drivers/net/ethernet/sfc/io.h b/drivers/net/ethernet/sfc/io.h
index 89563170af52..2774a10f44e9 100644
--- a/drivers/net/ethernet/sfc/io.h
+++ b/drivers/net/ethernet/sfc/io.h
@@ -120,7 +120,6 @@ static inline void efx_writeo(struct efx_nic *efx, const efx_oword_t *value,
 	_efx_writed(efx, value->u32[2], reg + 8);
 	_efx_writed(efx, value->u32[3], reg + 12);
 #endif
-	mmiowb();
 	spin_unlock_irqrestore(&efx->biu_lock, flags);
 }
 
@@ -142,7 +141,6 @@ static inline void efx_sram_writeq(struct efx_nic *efx, void __iomem *membase,
 	__raw_writel((__force u32)value->u32[0], membase + addr);
 	__raw_writel((__force u32)value->u32[1], membase + addr + 4);
 #endif
-	mmiowb();
 	spin_unlock_irqrestore(&efx->biu_lock, flags);
 }
 
diff --git a/drivers/net/ethernet/silan/sc92031.c b/drivers/net/ethernet/silan/sc92031.c
index c07fd594fe71..db5dc8ce0aff 100644
--- a/drivers/net/ethernet/silan/sc92031.c
+++ b/drivers/net/ethernet/silan/sc92031.c
@@ -361,7 +361,6 @@ static void sc92031_disable_interrupts(struct net_device *dev)
 	/* stop interrupts */
 	iowrite32(0, port_base + IntrMask);
 	_sc92031_dummy_read(port_base);
-	mmiowb();
 
 	/* wait for any concurrent interrupt/tasklet to finish */
 	synchronize_irq(priv->pdev->irq);
@@ -379,7 +378,6 @@ static void sc92031_enable_interrupts(struct net_device *dev)
 	wmb();
 
 	iowrite32(IntrBits, port_base + IntrMask);
-	mmiowb();
 }
 
 static void _sc92031_disable_tx_rx(struct net_device *dev)
@@ -867,7 +865,6 @@ static void sc92031_tasklet(unsigned long data)
 	rmb();
 
 	iowrite32(intr_mask, port_base + IntrMask);
-	mmiowb();
 
 	spin_unlock(&priv->lock);
 }
@@ -901,7 +898,6 @@ static irqreturn_t sc92031_interrupt(int irq, void *dev_id)
 	rmb();
 
 	iowrite32(intr_mask, port_base + IntrMask);
-	mmiowb();
 
 	return IRQ_NONE;
 }
@@ -978,7 +974,6 @@ static netdev_tx_t sc92031_start_xmit(struct sk_buff *skb,
 	iowrite32(priv->tx_bufs_dma_addr + entry * TX_BUF_SIZE,
 			port_base + TxAddr0 + entry * 4);
 	iowrite32(tx_status, port_base + TxStatus0 + entry * 4);
-	mmiowb();
 
 	if (priv->tx_head - priv->tx_tail >= NUM_TX_DESC)
 		netif_stop_queue(dev);
@@ -1024,7 +1019,6 @@ static int sc92031_open(struct net_device *dev)
 	spin_lock_bh(&priv->lock);
 
 	_sc92031_reset(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 	sc92031_enable_interrupts(dev);
@@ -1060,7 +1054,6 @@ static int sc92031_stop(struct net_device *dev)
 
 	_sc92031_disable_tx_rx(dev);
 	_sc92031_tx_clear(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1081,7 +1074,6 @@ static void sc92031_set_multicast_list(struct net_device *dev)
 
 	_sc92031_set_mar(dev);
 	_sc92031_set_rx_config(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 }
@@ -1098,7 +1090,6 @@ static void sc92031_tx_timeout(struct net_device *dev)
 	priv->tx_timeouts++;
 
 	_sc92031_reset(dev);
-	mmiowb();
 
 	spin_unlock(&priv->lock);
 
@@ -1140,7 +1131,6 @@ sc92031_ethtool_get_link_ksettings(struct net_device *dev,
 
 	output_status = _sc92031_mii_read(port_base, MII_OutputStatus);
 	_sc92031_mii_scan(port_base);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1311,7 +1301,6 @@ static int sc92031_ethtool_set_wol(struct net_device *dev,
 
 	priv->pm_config = pm_config;
 	iowrite32(pm_config, port_base + PMConfig);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1337,7 +1326,6 @@ static int sc92031_ethtool_nway_reset(struct net_device *dev)
 
 out:
 	_sc92031_mii_scan(port_base);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1530,7 +1518,6 @@ static int sc92031_suspend(struct pci_dev *pdev, pm_message_t state)
 
 	_sc92031_disable_tx_rx(dev);
 	_sc92031_tx_clear(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1555,7 +1542,6 @@ static int sc92031_resume(struct pci_dev *pdev)
 	spin_lock_bh(&priv->lock);
 
 	_sc92031_reset(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 	sc92031_enable_interrupts(dev);
diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
index 33949248c829..ab55416a10fa 100644
--- a/drivers/net/ethernet/via/via-rhine.c
+++ b/drivers/net/ethernet/via/via-rhine.c
@@ -571,7 +571,6 @@ static void rhine_ack_events(struct rhine_private *rp, u32 mask)
 	if (rp->quirks & rqStatusWBRace)
 		iowrite8(mask >> 16, ioaddr + IntrStatus2);
 	iowrite16(mask, ioaddr + IntrStatus);
-	mmiowb();
 }
 
 /*
@@ -863,7 +862,6 @@ static int rhine_napipoll(struct napi_struct *napi, int budget)
 	if (work_done < budget) {
 		napi_complete_done(napi, work_done);
 		iowrite16(enable_mask, ioaddr + IntrEnable);
-		mmiowb();
 	}
 	return work_done;
 }
@@ -1893,7 +1891,6 @@ static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 static void rhine_irq_disable(struct rhine_private *rp)
 {
 	iowrite16(0x0000, rp->base + IntrEnable);
-	mmiowb();
 }
 
 /* The interrupt handler does all of the Rx thread work and cleans up
diff --git a/drivers/net/ethernet/wiznet/w5100.c b/drivers/net/ethernet/wiznet/w5100.c
index d8ba512f166a..1713c2d2dccf 100644
--- a/drivers/net/ethernet/wiznet/w5100.c
+++ b/drivers/net/ethernet/wiznet/w5100.c
@@ -219,7 +219,6 @@ static inline int __w5100_write_direct(struct net_device *ndev, u32 addr,
 static inline int w5100_write_direct(struct net_device *ndev, u32 addr, u8 data)
 {
 	__w5100_write_direct(ndev, addr, data);
-	mmiowb();
 
 	return 0;
 }
@@ -236,7 +235,6 @@ static int w5100_write16_direct(struct net_device *ndev, u32 addr, u16 data)
 {
 	__w5100_write_direct(ndev, addr, data >> 8);
 	__w5100_write_direct(ndev, addr + 1, data);
-	mmiowb();
 
 	return 0;
 }
@@ -260,8 +258,6 @@ static int w5100_writebulk_direct(struct net_device *ndev, u32 addr,
 	for (i = 0; i < len; i++, addr++)
 		__w5100_write_direct(ndev, addr, *buf++);
 
-	mmiowb();
-
 	return 0;
 }
 
@@ -375,7 +371,6 @@ static int w5100_readbulk_indirect(struct net_device *ndev, u32 addr, u8 *buf,
 	for (i = 0; i < len; i++)
 		*buf++ = w5100_read_direct(ndev, W5100_IDM_DR);
 
-	mmiowb();
 	spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
 
 	return 0;
@@ -394,7 +389,6 @@ static int w5100_writebulk_indirect(struct net_device *ndev, u32 addr,
 	for (i = 0; i < len; i++)
 		__w5100_write_direct(ndev, W5100_IDM_DR, *buf++);
 
-	mmiowb();
 	spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
 
 	return 0;
diff --git a/drivers/net/ethernet/wiznet/w5300.c b/drivers/net/ethernet/wiznet/w5300.c
index f9da5d6172e3..3f03eecc0479 100644
--- a/drivers/net/ethernet/wiznet/w5300.c
+++ b/drivers/net/ethernet/wiznet/w5300.c
@@ -141,7 +141,6 @@ static u16 w5300_read_indirect(struct w5300_priv *priv, u16 addr)
 
 	spin_lock_irqsave(&priv->reg_lock, flags);
 	w5300_write_direct(priv, W5300_IDM_AR, addr);
-	mmiowb();
 	data = w5300_read_direct(priv, W5300_IDM_DR);
 	spin_unlock_irqrestore(&priv->reg_lock, flags);
 
@@ -154,9 +153,7 @@ static void w5300_write_indirect(struct w5300_priv *priv, u16 addr, u16 data)
 
 	spin_lock_irqsave(&priv->reg_lock, flags);
 	w5300_write_direct(priv, W5300_IDM_AR, addr);
-	mmiowb();
 	w5300_write_direct(priv, W5300_IDM_DR, data);
-	mmiowb();
 	spin_unlock_irqrestore(&priv->reg_lock, flags);
 }
 
@@ -192,7 +189,6 @@ static int w5300_command(struct w5300_priv *priv, u16 cmd)
 	unsigned long timeout = jiffies + msecs_to_jiffies(100);
 
 	w5300_write(priv, W5300_S0_CR, cmd);
-	mmiowb();
 
 	while (w5300_read(priv, W5300_S0_CR) != 0) {
 		if (time_after(jiffies, timeout))
@@ -241,18 +237,15 @@ static void w5300_write_macaddr(struct w5300_priv *priv)
 	w5300_write(priv, W5300_SHARH,
 		      ndev->dev_addr[4] << 8 |
 		      ndev->dev_addr[5]);
-	mmiowb();
 }
 
 static void w5300_hw_reset(struct w5300_priv *priv)
 {
 	w5300_write_direct(priv, W5300_MR, MR_RST);
-	mmiowb();
 	mdelay(5);
 	w5300_write_direct(priv, W5300_MR, priv->indirect ?
 				 MR_WDF(7) | MR_PB | MR_IND :
 				 MR_WDF(7) | MR_PB);
-	mmiowb();
 	w5300_write(priv, W5300_IMR, 0);
 	w5300_write_macaddr(priv);
 
@@ -264,24 +257,20 @@ static void w5300_hw_reset(struct w5300_priv *priv)
 	w5300_write32(priv, W5300_TMSRL, 64 << 24);
 	w5300_write32(priv, W5300_TMSRH, 0);
 	w5300_write(priv, W5300_MTYPE, 0x00ff);
-	mmiowb();
 }
 
 static void w5300_hw_start(struct w5300_priv *priv)
 {
 	w5300_write(priv, W5300_S0_MR, priv->promisc ?
 			  S0_MR_MACRAW : S0_MR_MACRAW_MF);
-	mmiowb();
 	w5300_command(priv, S0_CR_OPEN);
 	w5300_write(priv, W5300_S0_IMR, S0_IR_RECV | S0_IR_SENDOK);
 	w5300_write(priv, W5300_IMR, IR_S0);
-	mmiowb();
 }
 
 static void w5300_hw_close(struct w5300_priv *priv)
 {
 	w5300_write(priv, W5300_IMR, 0);
-	mmiowb();
 	w5300_command(priv, S0_CR_CLOSE);
 }
 
@@ -372,7 +361,6 @@ static netdev_tx_t w5300_start_tx(struct sk_buff *skb, struct net_device *ndev)
 	netif_stop_queue(ndev);
 
 	w5300_write_frame(priv, skb->data, skb->len);
-	mmiowb();
 	ndev->stats.tx_packets++;
 	ndev->stats.tx_bytes += skb->len;
 	dev_kfree_skb(skb);
@@ -419,7 +407,6 @@ static int w5300_napi_poll(struct napi_struct *napi, int budget)
 	if (rx_count < budget) {
 		napi_complete_done(napi, rx_count);
 		w5300_write(priv, W5300_IMR, IR_S0);
-		mmiowb();
 	}
 
 	return rx_count;
@@ -434,7 +421,6 @@ static irqreturn_t w5300_interrupt(int irq, void *ndev_instance)
 	if (!ir)
 		return IRQ_NONE;
 	w5300_write(priv, W5300_S0_IR, ir);
-	mmiowb();
 
 	if (ir & S0_IR_SENDOK) {
 		netif_dbg(priv, tx_done, ndev, "tx done\n");
@@ -444,7 +430,6 @@ static irqreturn_t w5300_interrupt(int irq, void *ndev_instance)
 	if (ir & S0_IR_RECV) {
 		if (napi_schedule_prep(&priv->napi)) {
 			w5300_write(priv, W5300_IMR, 0);
-			mmiowb();
 			__napi_schedule(&priv->napi);
 		}
 	}
diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index a2351ef45ae0..65a4c142640d 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -837,7 +837,6 @@ ath5k_txbuf_setup(struct ath5k_hw *ah, struct ath5k_buf *bf,
 
 	txq->link = &ds->ds_link;
 	ath5k_hw_start_tx_dma(ah, txq->qnum);
-	mmiowb();
 	spin_unlock_bh(&txq->lock);
 
 	return 0;
@@ -2174,7 +2173,6 @@ ath5k_beacon_config(struct ath5k_hw *ah)
 	}
 
 	ath5k_hw_set_imr(ah, ah->imask);
-	mmiowb();
 	spin_unlock_bh(&ah->block);
 }
 
@@ -2779,7 +2777,6 @@ int ath5k_start(struct ieee80211_hw *hw)
 
 	ret = 0;
 done:
-	mmiowb();
 	mutex_unlock(&ah->lock);
 
 	set_bit(ATH_STAT_STARTED, ah->status);
@@ -2839,7 +2836,6 @@ void ath5k_stop(struct ieee80211_hw *hw)
 				"putting device to sleep\n");
 	}
 
-	mmiowb();
 	mutex_unlock(&ah->lock);
 
 	ath5k_stop_tasklets(ah);
diff --git a/drivers/net/wireless/ath/ath5k/mac80211-ops.c b/drivers/net/wireless/ath/ath5k/mac80211-ops.c
index 16e052d02c94..5e866a193ed0 100644
--- a/drivers/net/wireless/ath/ath5k/mac80211-ops.c
+++ b/drivers/net/wireless/ath/ath5k/mac80211-ops.c
@@ -263,7 +263,6 @@ ath5k_bss_info_changed(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
 		memcpy(common->curbssid, bss_conf->bssid, ETH_ALEN);
 		common->curaid = 0;
 		ath5k_hw_set_bssid(ah);
-		mmiowb();
 	}
 
 	if (changes & BSS_CHANGED_BEACON_INT)
@@ -528,7 +527,6 @@ ath5k_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd,
 		ret = -EINVAL;
 	}
 
-	mmiowb();
 	mutex_unlock(&ah->lock);
 	return ret;
 }
diff --git a/drivers/net/wireless/broadcom/b43/main.c b/drivers/net/wireless/broadcom/b43/main.c
index 74be3c809225..4c7980f84591 100644
--- a/drivers/net/wireless/broadcom/b43/main.c
+++ b/drivers/net/wireless/broadcom/b43/main.c
@@ -485,7 +485,6 @@ static void b43_ram_write(struct b43_wldev *dev, u16 offset, u32 val)
 		val = swab32(val);
 
 	b43_write32(dev, B43_MMIO_RAM_CONTROL, offset);
-	mmiowb();
 	b43_write32(dev, B43_MMIO_RAM_DATA, val);
 }
 
@@ -656,9 +655,7 @@ static void b43_tsf_write_locked(struct b43_wldev *dev, u64 tsf)
 	/* The hardware guarantees us an atomic write, if we
 	 * write the low register first. */
 	b43_write32(dev, B43_MMIO_REV3PLUS_TSF_LOW, low);
-	mmiowb();
 	b43_write32(dev, B43_MMIO_REV3PLUS_TSF_HIGH, high);
-	mmiowb();
 }
 
 void b43_tsf_write(struct b43_wldev *dev, u64 tsf)
@@ -1822,11 +1819,9 @@ static void b43_beacon_update_trigger_work(struct work_struct *work)
 		if (b43_bus_host_is_sdio(dev->dev)) {
 			/* wl->mutex is enough. */
 			b43_do_beacon_update_trigger_work(dev);
-			mmiowb();
 		} else {
 			spin_lock_irq(&wl->hardirq_lock);
 			b43_do_beacon_update_trigger_work(dev);
-			mmiowb();
 			spin_unlock_irq(&wl->hardirq_lock);
 		}
 	}
@@ -2078,7 +2073,6 @@ static irqreturn_t b43_interrupt_thread_handler(int irq, void *dev_id)
 
 	mutex_lock(&dev->wl->mutex);
 	b43_do_interrupt_thread(dev);
-	mmiowb();
 	mutex_unlock(&dev->wl->mutex);
 
 	return IRQ_HANDLED;
@@ -2143,7 +2137,6 @@ static irqreturn_t b43_interrupt_handler(int irq, void *dev_id)
 
 	spin_lock(&dev->wl->hardirq_lock);
 	ret = b43_do_interrupt(dev);
-	mmiowb();
 	spin_unlock(&dev->wl->hardirq_lock);
 
 	return ret;
diff --git a/drivers/net/wireless/broadcom/b43/sysfs.c b/drivers/net/wireless/broadcom/b43/sysfs.c
index 3190493bd07f..93d03b673670 100644
--- a/drivers/net/wireless/broadcom/b43/sysfs.c
+++ b/drivers/net/wireless/broadcom/b43/sysfs.c
@@ -129,7 +129,6 @@ static ssize_t b43_attr_interfmode_store(struct device *dev,
 	} else
 		err = -ENOSYS;
 
-	mmiowb();
 	mutex_unlock(&wldev->wl->mutex);
 
 	return err ? err : count;
diff --git a/drivers/net/wireless/broadcom/b43legacy/ilt.c b/drivers/net/wireless/broadcom/b43legacy/ilt.c
index ee5682e54204..6d15fb4d30c6 100644
--- a/drivers/net/wireless/broadcom/b43legacy/ilt.c
+++ b/drivers/net/wireless/broadcom/b43legacy/ilt.c
@@ -315,14 +315,12 @@ const u16 b43legacy_ilt_sigmasqr2[B43legacy_ILT_SIGMASQR_SIZE] = {
 void b43legacy_ilt_write(struct b43legacy_wldev *dev, u16 offset, u16 val)
 {
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_CTRL, offset);
-	mmiowb();
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA1, val);
 }
 
 void b43legacy_ilt_write32(struct b43legacy_wldev *dev, u16 offset, u32 val)
 {
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_CTRL, offset);
-	mmiowb();
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA2,
 			    (val & 0xFFFF0000) >> 16);
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA1,
diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c b/drivers/net/wireless/broadcom/b43legacy/main.c
index 55f411925960..c777efc6dc13 100644
--- a/drivers/net/wireless/broadcom/b43legacy/main.c
+++ b/drivers/net/wireless/broadcom/b43legacy/main.c
@@ -264,7 +264,6 @@ static void b43legacy_ram_write(struct b43legacy_wldev *dev, u16 offset,
 		val = swab32(val);
 
 	b43legacy_write32(dev, B43legacy_MMIO_RAM_CONTROL, offset);
-	mmiowb();
 	b43legacy_write32(dev, B43legacy_MMIO_RAM_DATA, val);
 }
 
@@ -341,14 +340,11 @@ void b43legacy_shm_write32(struct b43legacy_wldev *dev,
 		if (offset & 0x0003) {
 			/* Unaligned access */
 			b43legacy_shm_control_word(dev, routing, offset >> 2);
-			mmiowb();
 			b43legacy_write16(dev,
 					  B43legacy_MMIO_SHM_DATA_UNALIGNED,
 					  (value >> 16) & 0xffff);
-			mmiowb();
 			b43legacy_shm_control_word(dev, routing,
 						   (offset >> 2) + 1);
-			mmiowb();
 			b43legacy_write16(dev, B43legacy_MMIO_SHM_DATA,
 					  value & 0xffff);
 			return;
@@ -356,7 +352,6 @@ void b43legacy_shm_write32(struct b43legacy_wldev *dev,
 		offset >>= 2;
 	}
 	b43legacy_shm_control_word(dev, routing, offset);
-	mmiowb();
 	b43legacy_write32(dev, B43legacy_MMIO_SHM_DATA, value);
 }
 
@@ -368,7 +363,6 @@ void b43legacy_shm_write16(struct b43legacy_wldev *dev, u16 routing, u16 offset,
 		if (offset & 0x0003) {
 			/* Unaligned access */
 			b43legacy_shm_control_word(dev, routing, offset >> 2);
-			mmiowb();
 			b43legacy_write16(dev,
 					  B43legacy_MMIO_SHM_DATA_UNALIGNED,
 					  value);
@@ -377,7 +371,6 @@ void b43legacy_shm_write16(struct b43legacy_wldev *dev, u16 routing, u16 offset,
 		offset >>= 2;
 	}
 	b43legacy_shm_control_word(dev, routing, offset);
-	mmiowb();
 	b43legacy_write16(dev, B43legacy_MMIO_SHM_DATA, value);
 }
 
@@ -471,7 +464,6 @@ static void b43legacy_time_lock(struct b43legacy_wldev *dev)
 	status = b43legacy_read32(dev, B43legacy_MMIO_MACCTL);
 	status |= B43legacy_MACCTL_TBTTHOLD;
 	b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
-	mmiowb();
 }
 
 static void b43legacy_time_unlock(struct b43legacy_wldev *dev)
@@ -494,10 +486,8 @@ static void b43legacy_tsf_write_locked(struct b43legacy_wldev *dev, u64 tsf)
 		u32 hi = (tsf & 0xFFFFFFFF00000000ULL) >> 32;
 
 		b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_LOW, 0);
-		mmiowb();
 		b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_HIGH,
 				    hi);
-		mmiowb();
 		b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_LOW,
 				    lo);
 	} else {
@@ -507,13 +497,9 @@ static void b43legacy_tsf_write_locked(struct b43legacy_wldev *dev, u64 tsf)
 		u16 v3 = (tsf & 0xFFFF000000000000ULL) >> 48;
 
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_0, 0);
-		mmiowb();
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_3, v3);
-		mmiowb();
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_2, v2);
-		mmiowb();
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_1, v1);
-		mmiowb();
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_0, v0);
 	}
 }
@@ -1250,7 +1236,6 @@ static void b43legacy_beacon_update_trigger_work(struct work_struct *work)
 		/* The handler might have updated the IRQ mask. */
 		b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK,
 				  dev->irq_mask);
-		mmiowb();
 		spin_unlock_irq(&wl->irq_lock);
 	}
 	mutex_unlock(&wl->mutex);
@@ -1346,7 +1331,6 @@ static void b43legacy_interrupt_tasklet(struct b43legacy_wldev *dev)
 			       dma_reason[2], dma_reason[3],
 			       dma_reason[4], dma_reason[5]);
 			b43legacy_controller_restart(dev, "DMA error");
-			mmiowb();
 			spin_unlock_irqrestore(&dev->wl->irq_lock, flags);
 			return;
 		}
@@ -1396,7 +1380,6 @@ static void b43legacy_interrupt_tasklet(struct b43legacy_wldev *dev)
 		handle_irq_transmit_status(dev);
 
 	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, dev->irq_mask);
-	mmiowb();
 	spin_unlock_irqrestore(&dev->wl->irq_lock, flags);
 }
 
@@ -1488,7 +1471,6 @@ static irqreturn_t b43legacy_interrupt_handler(int irq, void *dev_id)
 	dev->irq_reason = reason;
 	tasklet_schedule(&dev->isr_tasklet);
 out:
-	mmiowb();
 	spin_unlock(&dev->wl->irq_lock);
 
 	return ret;
@@ -2781,7 +2763,6 @@ static int b43legacy_op_dev_config(struct ieee80211_hw *hw,
 
 	spin_lock_irqsave(&wl->irq_lock, flags);
 	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, dev->irq_mask);
-	mmiowb();
 	spin_unlock_irqrestore(&wl->irq_lock, flags);
 out_unlock_mutex:
 	mutex_unlock(&wl->mutex);
@@ -2900,7 +2881,6 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
 	spin_lock_irqsave(&wl->irq_lock, flags);
 	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, dev->irq_mask);
 	/* XXX: why? */
-	mmiowb();
 	spin_unlock_irqrestore(&wl->irq_lock, flags);
  out_unlock_mutex:
 	mutex_unlock(&wl->mutex);
diff --git a/drivers/net/wireless/broadcom/b43legacy/phy.c b/drivers/net/wireless/broadcom/b43legacy/phy.c
index 995c7d0c212a..f949766d27ca 100644
--- a/drivers/net/wireless/broadcom/b43legacy/phy.c
+++ b/drivers/net/wireless/broadcom/b43legacy/phy.c
@@ -134,7 +134,6 @@ u16 b43legacy_phy_read(struct b43legacy_wldev *dev, u16 offset)
 void b43legacy_phy_write(struct b43legacy_wldev *dev, u16 offset, u16 val)
 {
 	b43legacy_write16(dev, B43legacy_MMIO_PHY_CONTROL, offset);
-	mmiowb();
 	b43legacy_write16(dev, B43legacy_MMIO_PHY_DATA, val);
 }
 
diff --git a/drivers/net/wireless/broadcom/b43legacy/pio.h b/drivers/net/wireless/broadcom/b43legacy/pio.h
index 1cd1b9ca5e9c..08cd02282beb 100644
--- a/drivers/net/wireless/broadcom/b43legacy/pio.h
+++ b/drivers/net/wireless/broadcom/b43legacy/pio.h
@@ -92,7 +92,6 @@ void b43legacy_pio_write(struct b43legacy_pioqueue *queue,
 		       u16 offset, u16 value)
 {
 	b43legacy_write16(queue->dev, queue->mmio_base + offset, value);
-	mmiowb();
 }
 
 
diff --git a/drivers/net/wireless/broadcom/b43legacy/radio.c b/drivers/net/wireless/broadcom/b43legacy/radio.c
index eab1c9387846..c6db444ea07e 100644
--- a/drivers/net/wireless/broadcom/b43legacy/radio.c
+++ b/drivers/net/wireless/broadcom/b43legacy/radio.c
@@ -95,7 +95,6 @@ void b43legacy_radio_lock(struct b43legacy_wldev *dev)
 	B43legacy_WARN_ON(status & B43legacy_MACCTL_RADIOLOCK);
 	status |= B43legacy_MACCTL_RADIOLOCK;
 	b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
-	mmiowb();
 	udelay(10);
 }
 
@@ -108,7 +107,6 @@ void b43legacy_radio_unlock(struct b43legacy_wldev *dev)
 	B43legacy_WARN_ON(!(status & B43legacy_MACCTL_RADIOLOCK));
 	status &= ~B43legacy_MACCTL_RADIOLOCK;
 	b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
-	mmiowb();
 }
 
 u16 b43legacy_radio_read16(struct b43legacy_wldev *dev, u16 offset)
@@ -141,7 +139,6 @@ u16 b43legacy_radio_read16(struct b43legacy_wldev *dev, u16 offset)
 void b43legacy_radio_write16(struct b43legacy_wldev *dev, u16 offset, u16 val)
 {
 	b43legacy_write16(dev, B43legacy_MMIO_RADIO_CONTROL, offset);
-	mmiowb();
 	b43legacy_write16(dev, B43legacy_MMIO_RADIO_DATA_LOW, val);
 }
 
@@ -333,7 +330,6 @@ u8 b43legacy_radio_aci_scan(struct b43legacy_wldev *dev)
 void b43legacy_nrssi_hw_write(struct b43legacy_wldev *dev, u16 offset, s16 val)
 {
 	b43legacy_phy_write(dev, B43legacy_PHY_NRSSILT_CTRL, offset);
-	mmiowb();
 	b43legacy_phy_write(dev, B43legacy_PHY_NRSSILT_DATA, (u16)val);
 }
 
diff --git a/drivers/net/wireless/broadcom/b43legacy/sysfs.c b/drivers/net/wireless/broadcom/b43legacy/sysfs.c
index 2a1da15c913b..2db83eec7a11 100644
--- a/drivers/net/wireless/broadcom/b43legacy/sysfs.c
+++ b/drivers/net/wireless/broadcom/b43legacy/sysfs.c
@@ -143,7 +143,6 @@ static ssize_t b43legacy_attr_interfmode_store(struct device *dev,
 	if (err)
 		b43legacyerr(wldev->wl, "Interference Mitigation not "
 		       "supported by device\n");
-	mmiowb();
 	spin_unlock_irqrestore(&wldev->wl->irq_lock, flags);
 	mutex_unlock(&wldev->wl->mutex);
 
diff --git a/drivers/net/wireless/intel/iwlegacy/common.h b/drivers/net/wireless/intel/iwlegacy/common.h
index dc6a74a05983..c8bcaa627132 100644
--- a/drivers/net/wireless/intel/iwlegacy/common.h
+++ b/drivers/net/wireless/intel/iwlegacy/common.h
@@ -2030,13 +2030,6 @@ static inline void
 _il_release_nic_access(struct il_priv *il)
 {
 	_il_clear_bit(il, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
-	/*
-	 * In above we are reading CSR_GP_CNTRL register, what will flush any
-	 * previous writes, but still want write, which clear MAC_ACCESS_REQ
-	 * bit, be performed on PCI bus before any other writes scheduled on
-	 * different CPUs (after we drop reg_lock).
-	 */
-	mmiowb();
 }
 
 static inline u32
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index f97aea5ffc44..4ec1e91ebe04 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2109,7 +2109,6 @@ static void iwl_trans_pcie_release_nic_access(struct iwl_trans *trans,
 	 * MAC_ACCESS_REQ bit to be performed before any other writes
 	 * scheduled on different CPUs (after we drop reg_lock).
 	 */
-	mmiowb();
 out:
 	spin_unlock_irqrestore(&trans_pcie->reg_lock, *flags);
 }
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 1dede87dd54f..dcf234680535 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -358,8 +358,6 @@ static void idt_sw_write(struct idt_ntb_dev *ndev,
 	iowrite32((u32)reg, ndev->cfgspc + (ptrdiff_t)IDT_NT_GASAADDR);
 	/* Put the new value of the register */
 	iowrite32(data, ndev->cfgspc + (ptrdiff_t)IDT_NT_GASADATA);
-	/* Make sure the PCIe transactions are executed */
-	mmiowb();
 	/* Unlock GASA registers operations */
 	spin_unlock_irqrestore(&ndev->gasa_lock, irqflags);
 }
@@ -750,7 +748,6 @@ static void idt_ntb_local_link_enable(struct idt_ntb_dev *ndev)
 	spin_lock_irqsave(&ndev->mtbl_lock, irqflags);
 	idt_nt_write(ndev, IDT_NT_NTMTBLADDR, ndev->part);
 	idt_nt_write(ndev, IDT_NT_NTMTBLDATA, mtbldata);
-	mmiowb();
 	spin_unlock_irqrestore(&ndev->mtbl_lock, irqflags);
 
 	/* Notify the peers by setting and clearing the global signal bit */
@@ -778,7 +775,6 @@ static void idt_ntb_local_link_disable(struct idt_ntb_dev *ndev)
 	spin_lock_irqsave(&ndev->mtbl_lock, irqflags);
 	idt_nt_write(ndev, IDT_NT_NTMTBLADDR, ndev->part);
 	idt_nt_write(ndev, IDT_NT_NTMTBLDATA, 0);
-	mmiowb();
 	spin_unlock_irqrestore(&ndev->mtbl_lock, irqflags);
 
 	/* Notify the peers by setting and clearing the global signal bit */
@@ -1339,7 +1335,6 @@ static int idt_ntb_peer_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
 		idt_nt_write(ndev, IDT_NT_LUTLDATA, (u32)addr);
 		idt_nt_write(ndev, IDT_NT_LUTMDATA, (u32)(addr >> 32));
 		idt_nt_write(ndev, IDT_NT_LUTUDATA, data);
-		mmiowb();
 		spin_unlock_irqrestore(&ndev->lut_lock, irqflags);
 		/* Limit address isn't specified since size is fixed for LUT */
 	}
@@ -1393,7 +1388,6 @@ static int idt_ntb_peer_mw_clear_trans(struct ntb_dev *ntb, int pidx,
 		idt_nt_write(ndev, IDT_NT_LUTLDATA, 0);
 		idt_nt_write(ndev, IDT_NT_LUTMDATA, 0);
 		idt_nt_write(ndev, IDT_NT_LUTUDATA, 0);
-		mmiowb();
 		spin_unlock_irqrestore(&ndev->lut_lock, irqflags);
 	}
 
@@ -1812,7 +1806,6 @@ static int idt_ntb_peer_msg_write(struct ntb_dev *ntb, int pidx, int midx,
 	/* Set the route and send the data */
 	idt_sw_write(ndev, partdata_tbl[ndev->part].msgctl[midx], swpmsgctl);
 	idt_nt_write(ndev, ntdata_tbl.msgs[midx].out, msg);
-	mmiowb();
 	/* Unlock the messages routing table */
 	spin_unlock_irqrestore(&ndev->msg_locks[midx], irqflags);
 
diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 2a9d6b0d1f19..11a6cd374004 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -284,11 +284,9 @@ static int perf_spad_cmd_send(struct perf_peer *peer, enum perf_cmd cmd,
 		ntb_peer_spad_write(perf->ntb, peer->pidx,
 				    PERF_SPAD_HDATA(perf->gidx),
 				    upper_32_bits(data));
-		mmiowb();
 		ntb_peer_spad_write(perf->ntb, peer->pidx,
 				    PERF_SPAD_CMD(perf->gidx),
 				    cmd);
-		mmiowb();
 		ntb_peer_db_set(perf->ntb, PERF_SPAD_NOTIFY(peer->gidx));
 
 		dev_dbg(&perf->ntb->dev, "DB ring peer %#llx\n",
@@ -379,7 +377,6 @@ static int perf_msg_cmd_send(struct perf_peer *peer, enum perf_cmd cmd,
 
 		ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_HDATA,
 				   upper_32_bits(data));
-		mmiowb();
 
 		/* This call shall trigger peer message event */
 		ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_CMD, cmd);
diff --git a/drivers/scsi/bfa/bfa.h b/drivers/scsi/bfa/bfa.h
index 0e119d838e1b..762cb77253b9 100644
--- a/drivers/scsi/bfa/bfa.h
+++ b/drivers/scsi/bfa/bfa.h
@@ -62,8 +62,7 @@ void bfa_isr_unhandled(struct bfa_s *bfa, struct bfi_msg_s *m);
 			((__bfa)->iocfc.cfg.drvcfg.num_reqq_elems - 1); \
 		writel((__bfa)->iocfc.req_cq_pi[__reqq],		\
 			(__bfa)->iocfc.bfa_regs.cpe_q_pi[__reqq]);	\
-		mmiowb();      \
-	} while (0)
+		} while (0)
 
 #define bfa_rspq_pi(__bfa, __rspq)					\
 	(*(u32 *)((__bfa)->iocfc.rsp_cq_shadow_pi[__rspq].kva))
diff --git a/drivers/scsi/bfa/bfa_hw_cb.c b/drivers/scsi/bfa/bfa_hw_cb.c
index c4a0c0eb88a5..4a0d881b2602 100644
--- a/drivers/scsi/bfa/bfa_hw_cb.c
+++ b/drivers/scsi/bfa/bfa_hw_cb.c
@@ -61,7 +61,6 @@ bfa_hwcb_rspq_ack_msix(struct bfa_s *bfa, int rspq, u32 ci)
 
 	bfa_rspq_ci(bfa, rspq) = ci;
 	writel(ci, bfa->iocfc.bfa_regs.rme_q_ci[rspq]);
-	mmiowb();
 }
 
 void
@@ -72,7 +71,6 @@ bfa_hwcb_rspq_ack(struct bfa_s *bfa, int rspq, u32 ci)
 
 	bfa_rspq_ci(bfa, rspq) = ci;
 	writel(ci, bfa->iocfc.bfa_regs.rme_q_ci[rspq]);
-	mmiowb();
 }
 
 void
diff --git a/drivers/scsi/bfa/bfa_hw_ct.c b/drivers/scsi/bfa/bfa_hw_ct.c
index b0ff378dece2..b7be5f4f02a5 100644
--- a/drivers/scsi/bfa/bfa_hw_ct.c
+++ b/drivers/scsi/bfa/bfa_hw_ct.c
@@ -81,7 +81,6 @@ bfa_hwct_rspq_ack(struct bfa_s *bfa, int rspq, u32 ci)
 
 	bfa_rspq_ci(bfa, rspq) = ci;
 	writel(ci, bfa->iocfc.bfa_regs.rme_q_ci[rspq]);
-	mmiowb();
 }
 
 /*
@@ -94,7 +93,6 @@ bfa_hwct2_rspq_ack(struct bfa_s *bfa, int rspq, u32 ci)
 {
 	bfa_rspq_ci(bfa, rspq) = ci;
 	writel(ci, bfa->iocfc.bfa_regs.rme_q_ci[rspq]);
-	mmiowb();
 }
 
 void
diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
index 039328d9ef13..19734ec7f42e 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
@@ -991,7 +991,6 @@ void bnx2fc_arm_cq(struct bnx2fc_rport *tgt)
 			FCOE_CQE_TOGGLE_BIT_SHIFT);
 	msg = *((u32 *)rx_db);
 	writel(cpu_to_le32(msg), tgt->ctx_base);
-	mmiowb();
 
 }
 
@@ -1409,7 +1408,6 @@ void bnx2fc_ring_doorbell(struct bnx2fc_rport *tgt)
 				(tgt->sq_curr_toggle_bit << 15);
 	msg = *((u32 *)sq_db);
 	writel(cpu_to_le32(msg), tgt->ctx_base);
-	mmiowb();
 
 }
 
diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index d56a78f411cd..12666313b937 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -253,7 +253,6 @@ void bnx2i_put_rq_buf(struct bnx2i_conn *bnx2i_conn, int count)
 		writew(ep->qp.rq_prod_idx,
 		       ep->qp.ctx_base + CNIC_RECV_DOORBELL);
 	}
-	mmiowb();
 }
 
 
@@ -279,8 +278,6 @@ static void bnx2i_ring_sq_dbell(struct bnx2i_conn *bnx2i_conn, int count)
 		bnx2i_ring_577xx_doorbell(bnx2i_conn);
 	} else
 		writew(count, ep->qp.ctx_base + CNIC_SEND_DOORBELL);
-
-	mmiowb();
 }
 
 
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index fcbff83c0097..f9f246a5a101 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -815,7 +815,6 @@ megasas_fire_cmd_skinny(struct megasas_instance *instance,
 	       &(regs)->inbound_high_queue_port);
 	writel((lower_32_bits(frame_phys_addr) | (frame_count<<1))|1,
 	       &(regs)->inbound_low_queue_port);
-	mmiowb();
 	spin_unlock_irqrestore(&instance->hba_lock, flags);
 }
 
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 647f48a28f85..f241aa9b3858 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -242,7 +242,6 @@ megasas_fire_cmd_fusion(struct megasas_instance *instance,
 		&instance->reg_set->inbound_low_queue_port);
 	writel(le32_to_cpu(req_desc->u.high),
 		&instance->reg_set->inbound_high_queue_port);
-	mmiowb();
 	spin_unlock_irqrestore(&instance->hba_lock, flags);
 #endif
 }
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 0a6cb8f0680c..cd1e93e5ea15 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3327,7 +3327,6 @@ _base_mpi_ep_writeq(__u64 b, volatile void __iomem *addr,
 	spin_lock_irqsave(writeq_lock, flags);
 	__raw_writel((u32)(b), addr);
 	__raw_writel((u32)(b >> 32), (addr + 4));
-	mmiowb();
 	spin_unlock_irqrestore(writeq_lock, flags);
 }
 
diff --git a/drivers/scsi/qedf/qedf_io.c b/drivers/scsi/qedf/qedf_io.c
index 6bbc38b1b465..8a5df52c98a1 100644
--- a/drivers/scsi/qedf/qedf_io.c
+++ b/drivers/scsi/qedf/qedf_io.c
@@ -807,7 +807,6 @@ void qedf_ring_doorbell(struct qedf_rport *fcport)
 	writel(*(u32 *)&dbell, fcport->p_doorbell);
 	/* Make sure SQ index is updated so f/w prcesses requests in order */
 	wmb();
-	mmiowb();
 }
 
 static void qedf_trace_io(struct qedf_rport *fcport, struct qedf_ioreq *io_req,
diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 25d763ae5d5a..7100577587f3 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -992,7 +992,6 @@ static void qedi_ring_doorbell(struct qedi_conn *qedi_conn)
 	 * others they are two different assembly operations.
 	 */
 	wmb();
-	mmiowb();
 	QEDI_INFO(&qedi_conn->qedi->dbg_ctx, QEDI_LOG_MP_REQ,
 		  "prod_idx=0x%x, fw_prod_idx=0x%x, cid=0x%x\n",
 		  qedi_conn->ep->sq_prod_idx, qedi_conn->ep->fw_sq_prod_idx,
diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c
index 6856dfdfa473..93acbc5094f0 100644
--- a/drivers/scsi/qla1280.c
+++ b/drivers/scsi/qla1280.c
@@ -3004,8 +3004,6 @@ qla1280_64bit_start_scsi(struct scsi_qla_host *ha, struct srb * sp)
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
-	mmiowb();
 
  out:
 	if (status)
@@ -3254,8 +3252,6 @@ qla1280_32bit_start_scsi(struct scsi_qla_host *ha, struct srb * sp)
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
-	mmiowb();
 
 out:
 	if (status)
@@ -3379,7 +3375,6 @@ qla1280_isp_cmd(struct scsi_qla_host *ha)
 	 * See Documentation/driver-api/device-io.rst for more information.
 	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	mmiowb();
 
 	LEAVE("qla1280_isp_cmd");
 }
diff --git a/drivers/ssb/pci.c b/drivers/ssb/pci.c
index 84807a9b4b13..da2d2ab8104d 100644
--- a/drivers/ssb/pci.c
+++ b/drivers/ssb/pci.c
@@ -305,7 +305,6 @@ static int sprom_do_write(struct ssb_bus *bus, const u16 *sprom)
 		else if (i % 2)
 			pr_cont(".");
 		writew(sprom[i], bus->mmio + bus->sprom_offset + (i * 2));
-		mmiowb();
 		msleep(20);
 	}
 	err = pci_read_config_dword(pdev, SSB_SPROMCTL, &spromctl);
diff --git a/drivers/ssb/pcmcia.c b/drivers/ssb/pcmcia.c
index 567013f8a8be..d7d730c245c5 100644
--- a/drivers/ssb/pcmcia.c
+++ b/drivers/ssb/pcmcia.c
@@ -338,7 +338,6 @@ static void ssb_pcmcia_write8(struct ssb_device *dev, u16 offset, u8 value)
 	err = select_core_and_segment(dev, &offset);
 	if (likely(!err))
 		writeb(value, bus->mmio + offset);
-	mmiowb();
 	spin_unlock_irqrestore(&bus->bar_lock, flags);
 }
 
@@ -352,7 +351,6 @@ static void ssb_pcmcia_write16(struct ssb_device *dev, u16 offset, u16 value)
 	err = select_core_and_segment(dev, &offset);
 	if (likely(!err))
 		writew(value, bus->mmio + offset);
-	mmiowb();
 	spin_unlock_irqrestore(&bus->bar_lock, flags);
 }
 
@@ -368,7 +366,6 @@ static void ssb_pcmcia_write32(struct ssb_device *dev, u16 offset, u32 value)
 		writew((value & 0x0000FFFF), bus->mmio + offset);
 		writew(((value & 0xFFFF0000) >> 16), bus->mmio + offset + 2);
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&bus->bar_lock, flags);
 }
 
@@ -424,7 +421,6 @@ static void ssb_pcmcia_block_write(struct ssb_device *dev, const void *buffer,
 		WARN_ON(1);
 	}
 unlock:
-	mmiowb();
 	spin_unlock_irqrestore(&bus->bar_lock, flags);
 }
 #endif /* CONFIG_SSB_BLOCKIO */
diff --git a/drivers/staging/comedi/drivers/mite.c b/drivers/staging/comedi/drivers/mite.c
index 61e03ad84123..639ec1586976 100644
--- a/drivers/staging/comedi/drivers/mite.c
+++ b/drivers/staging/comedi/drivers/mite.c
@@ -371,7 +371,6 @@ static unsigned int mite_get_status(struct mite_channel *mite_chan)
 		writel(CHOR_CLRDONE,
 		       mite->mmio + MITE_CHOR(mite_chan->channel));
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&mite->lock, flags);
 	return status;
 }
@@ -451,7 +450,6 @@ void mite_dma_arm(struct mite_channel *mite_chan)
 	mite_chan->done = 0;
 	/* arm */
 	writel(CHOR_START, mite->mmio + MITE_CHOR(mite_chan->channel));
-	mmiowb();
 	spin_unlock_irqrestore(&mite->lock, flags);
 }
 EXPORT_SYMBOL_GPL(mite_dma_arm);
@@ -638,7 +636,6 @@ void mite_release_channel(struct mite_channel *mite_chan)
 		       CHCR_CLR_LC_IE | CHCR_CLR_CONT_RB_IE,
 		       mite->mmio + MITE_CHCR(mite_chan->channel));
 		mite_chan->ring = NULL;
-		mmiowb();
 	}
 	spin_unlock_irqrestore(&mite->lock, flags);
 }
diff --git a/drivers/staging/comedi/drivers/ni_660x.c b/drivers/staging/comedi/drivers/ni_660x.c
index e70a461e723f..d176d7cb35d0 100644
--- a/drivers/staging/comedi/drivers/ni_660x.c
+++ b/drivers/staging/comedi/drivers/ni_660x.c
@@ -320,7 +320,6 @@ static inline void ni_660x_set_dma_channel(struct comedi_device *dev,
 	ni_660x_write(dev, chip, devpriv->dma_cfg[chip] |
 		      NI660X_DMA_CFG_RESET(mite_channel),
 		      NI660X_DMA_CFG);
-	mmiowb();
 }
 
 static inline void ni_660x_unset_dma_channel(struct comedi_device *dev,
@@ -333,7 +332,6 @@ static inline void ni_660x_unset_dma_channel(struct comedi_device *dev,
 	devpriv->dma_cfg[chip] &= ~NI660X_DMA_CFG_SEL_MASK(mite_channel);
 	devpriv->dma_cfg[chip] |= NI660X_DMA_CFG_SEL_NONE(mite_channel);
 	ni_660x_write(dev, chip, devpriv->dma_cfg[chip], NI660X_DMA_CFG);
-	mmiowb();
 }
 
 static int ni_660x_request_mite_channel(struct comedi_device *dev,
diff --git a/drivers/staging/comedi/drivers/ni_mio_common.c b/drivers/staging/comedi/drivers/ni_mio_common.c
index 5edf59ac6706..8e4dd41effb4 100644
--- a/drivers/staging/comedi/drivers/ni_mio_common.c
+++ b/drivers/staging/comedi/drivers/ni_mio_common.c
@@ -547,7 +547,6 @@ static inline void ni_set_bitfield(struct comedi_device *dev, int reg,
 			reg);
 		break;
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&devpriv->soft_reg_copy_lock, flags);
 }
 
diff --git a/drivers/staging/comedi/drivers/ni_pcidio.c b/drivers/staging/comedi/drivers/ni_pcidio.c
index b9a0dc6eac44..662fed82f668 100644
--- a/drivers/staging/comedi/drivers/ni_pcidio.c
+++ b/drivers/staging/comedi/drivers/ni_pcidio.c
@@ -308,7 +308,6 @@ static int ni_pcidio_request_di_mite_channel(struct comedi_device *dev)
 	writeb(primary_DMAChannel_bits(devpriv->di_mite_chan->channel) |
 	       secondary_DMAChannel_bits(devpriv->di_mite_chan->channel),
 	       dev->mmio + DMA_Line_Control_Group1);
-	mmiowb();
 	spin_unlock_irqrestore(&devpriv->mite_channel_lock, flags);
 	return 0;
 }
@@ -325,7 +324,6 @@ static void ni_pcidio_release_di_mite_channel(struct comedi_device *dev)
 		writeb(primary_DMAChannel_bits(0) |
 		       secondary_DMAChannel_bits(0),
 		       dev->mmio + DMA_Line_Control_Group1);
-		mmiowb();
 	}
 	spin_unlock_irqrestore(&devpriv->mite_channel_lock, flags);
 }
diff --git a/drivers/staging/comedi/drivers/ni_tio.c b/drivers/staging/comedi/drivers/ni_tio.c
index 0eb388c0e1f0..bd21cd69a9ec 100644
--- a/drivers/staging/comedi/drivers/ni_tio.c
+++ b/drivers/staging/comedi/drivers/ni_tio.c
@@ -231,7 +231,6 @@ static void ni_tio_set_bits_transient(struct ni_gpct *counter,
 		counter_dev->regs[reg] &= ~mask;
 		counter_dev->regs[reg] |= (value & mask);
 		ni_tio_write(counter, counter_dev->regs[reg] | transient, reg);
-		mmiowb();
 		spin_unlock_irqrestore(&counter_dev->regs_lock, flags);
 	}
 }
diff --git a/drivers/staging/comedi/drivers/s626.c b/drivers/staging/comedi/drivers/s626.c
index f5af6f4069dc..39049d3c56d7 100644
--- a/drivers/staging/comedi/drivers/s626.c
+++ b/drivers/staging/comedi/drivers/s626.c
@@ -108,7 +108,6 @@ static void s626_mc_enable(struct comedi_device *dev,
 {
 	unsigned int val = (cmd << 16) | cmd;
 
-	mmiowb();
 	writel(val, dev->mmio + reg);
 }
 
@@ -116,7 +115,6 @@ static void s626_mc_disable(struct comedi_device *dev,
 			    unsigned int cmd, unsigned int reg)
 {
 	writel(cmd << 16, dev->mmio + reg);
-	mmiowb();
 }
 
 static bool s626_mc_test(struct comedi_device *dev,
diff --git a/drivers/tty/serial/men_z135_uart.c b/drivers/tty/serial/men_z135_uart.c
index ef89534dd760..e5d3ebab6dae 100644
--- a/drivers/tty/serial/men_z135_uart.c
+++ b/drivers/tty/serial/men_z135_uart.c
@@ -353,7 +353,6 @@ static void men_z135_handle_tx(struct men_z135_port *uart)
 
 	memcpy_toio(port->membase + MEN_Z135_TX_RAM, &xmit->buf[xmit->tail], n);
 	xmit->tail = (xmit->tail + n) & (UART_XMIT_SIZE - 1);
-	mmiowb();
 
 	iowrite32(n & 0x3ff, port->membase + MEN_Z135_TX_CTRL);
 
diff --git a/drivers/tty/serial/serial_txx9.c b/drivers/tty/serial/serial_txx9.c
index 1b4008d022bf..d22ccb32aa9b 100644
--- a/drivers/tty/serial/serial_txx9.c
+++ b/drivers/tty/serial/serial_txx9.c
@@ -248,7 +248,6 @@ static void serial_txx9_initialize(struct uart_port *port)
 	sio_out(up, TXX9_SIFCR, TXX9_SIFCR_SWRST);
 	/* TX4925 BUG WORKAROUND.  Accessing SIOC register
 	 * immediately after soft reset causes bus error. */
-	mmiowb();
 	udelay(1);
 	while ((sio_in(up, TXX9_SIFCR) & TXX9_SIFCR_SWRST) && --tmout)
 		udelay(1);
diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
index d2652dccc699..52a2143e6bac 100644
--- a/drivers/usb/early/xhci-dbc.c
+++ b/drivers/usb/early/xhci-dbc.c
@@ -533,8 +533,6 @@ static int xdbc_handle_external_reset(void)
 
 	xdbc_mem_init();
 
-	mmiowb();
-
 	ret = xdbc_start();
 	if (ret < 0)
 		goto reset_out;
@@ -587,8 +585,6 @@ static int __init xdbc_early_setup(void)
 
 	xdbc_mem_init();
 
-	mmiowb();
-
 	ret = xdbc_start();
 	if (ret < 0) {
 		writel(0, &xdbc.xdbc_reg->control);
diff --git a/drivers/usb/host/xhci-dbgcap.c b/drivers/usb/host/xhci-dbgcap.c
index 86cff5c28eff..0bd41fc1458a 100644
--- a/drivers/usb/host/xhci-dbgcap.c
+++ b/drivers/usb/host/xhci-dbgcap.c
@@ -421,8 +421,6 @@ static int xhci_dbc_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 	string_length = xhci_dbc_populate_strings(dbc->string);
 	xhci_dbc_init_contexts(xhci, string_length);
 
-	mmiowb();
-
 	xhci_dbc_eps_init(xhci);
 	dbc->state = DS_INITIALIZED;
 
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index 91c536a01b56..8730084b4387 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -1318,7 +1318,6 @@ static inline u16 qed_sb_update_sb_idx(struct qed_sb_info *sb_info)
 	}
 
 	/* Let SB update */
-	mmiowb();
 	return rc;
 }
 
@@ -1354,7 +1353,6 @@ static inline void qed_sb_ack(struct qed_sb_info *sb_info,
 	/* Both segments (interrupts & acks) are written to same place address;
 	 * Need to guarantee all commands will be received (in-order) by HW.
 	 */
-	mmiowb();
 	barrier();
 }
 
diff --git a/sound/soc/txx9/txx9aclc-ac97.c b/sound/soc/txx9/txx9aclc-ac97.c
index 1cfca698ae4b..b0fa285c7ba2 100644
--- a/sound/soc/txx9/txx9aclc-ac97.c
+++ b/sound/soc/txx9/txx9aclc-ac97.c
@@ -102,7 +102,6 @@ static void txx9aclc_ac97_cold_reset(struct snd_ac97 *ac97)
 	u32 ready = ACINT_CODECRDY(ac97->num) | ACINT_REGACCRDY;
 
 	__raw_writel(ACCTL_ENLINK, base + ACCTLDIS);
-	mmiowb();
 	udelay(1);
 	__raw_writel(ACCTL_ENLINK, base + ACCTLEN);
 	/* wait for primary codec ready status */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 17/20] scsi/qla1280: Remove stale comment about mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (15 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 16/20] drivers: Remove explicit invocations of mmiowb() Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 18/20] i40iw: Redefine i40iw_mmiowb() to do nothing Will Deacon
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

All mmiowb() invocations have been removed, so there's no need to keep
banging on about it.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/scsi/qla1280.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c
index 93acbc5094f0..327eff67a1ee 100644
--- a/drivers/scsi/qla1280.c
+++ b/drivers/scsi/qla1280.c
@@ -3363,16 +3363,6 @@ qla1280_isp_cmd(struct scsi_qla_host *ha)
 
 	/*
 	 * Update request index to mailbox4 (Request Queue In).
-	 * The mmiowb() ensures that this write is ordered with writes by other
-	 * CPUs.  Without the mmiowb(), it is possible for the following:
-	 *    CPUA posts write of index 5 to mailbox4
-	 *    CPUA releases host lock
-	 *    CPUB acquires host lock
-	 *    CPUB posts write of index 6 to mailbox4
-	 *    On PCI bus, order reverses and write of 6 posts, then index 5,
-	 *       causing chip to issue full queue of stale commands
-	 * The mmiowb() prevents future writes from crossing the barrier.
-	 * See Documentation/driver-api/device-io.rst for more information.
 	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 18/20] i40iw: Redefine i40iw_mmiowb() to do nothing
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (16 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 17/20] scsi/qla1280: Remove stale comment about mmiowb() Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 19/20] net/ethernet/silan/sc92031: Remove stale comment about mmiowb() Will Deacon
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

mmiowb() is now implicit in spin_unlock(), so there's no reason to call
it from driver code. Redefine i40iw_mmiowb() to do nothing instead.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/infiniband/hw/i40iw/i40iw_osdep.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_osdep.h b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
index f27be3e7830b..d474aad62a81 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_osdep.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
@@ -211,7 +211,7 @@ enum i40iw_status_code i40iw_hw_manage_vf_pble_bp(struct i40iw_device *iwdev,
 struct i40iw_sc_vsi;
 void i40iw_hw_stats_start_timer(struct i40iw_sc_vsi *vsi);
 void i40iw_hw_stats_stop_timer(struct i40iw_sc_vsi *vsi);
-#define i40iw_mmiowb() mmiowb()
+#define i40iw_mmiowb() do { } while (0)
 void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value);
 u32  i40iw_rd32(struct i40iw_hw *hw, u32 reg);
 #endif				/* _I40IW_OSDEP_H_ */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 19/20] net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (17 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 18/20] i40iw: Redefine i40iw_mmiowb() to do nothing Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 14:03 ` [PATCH 20/20] arch: Remove dummy mmiowb() definitions from arch code Will Deacon
  2019-03-01 16:41 ` [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

mmiowb() is no more. It has ceased to be. It is an ex-barrier. So remove
all references to it from comments.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/silan/sc92031.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/silan/sc92031.c b/drivers/net/ethernet/silan/sc92031.c
index db5dc8ce0aff..02b3962b0e63 100644
--- a/drivers/net/ethernet/silan/sc92031.c
+++ b/drivers/net/ethernet/silan/sc92031.c
@@ -251,7 +251,6 @@ enum PMConfigBits {
  * use of mdelay() at _sc92031_reset.
  * Functions prefixed with _sc92031_ must be called with the lock held;
  * functions prefixed with sc92031_ must be called without the lock held.
- * Use mmiowb() before unlocking if the hardware was written to.
  */
 
 /* Locking rules for the interrupt:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 20/20] arch: Remove dummy mmiowb() definitions from arch code
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (18 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 19/20] net/ethernet/silan/sc92031: Remove stale comment about mmiowb() Will Deacon
@ 2019-03-01 14:03 ` Will Deacon
  2019-03-01 16:41 ` [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
  20 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2019-03-01 14:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

Now that no driver code is using mmiowb() directly, we can remove the
dummy definitions remaining in architectures that don't make use of
asm-generic/io.h, as well as the definition in asm-generic/io,h itself.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/alpha/include/asm/io.h       | 2 --
 arch/hexagon/include/asm/io.h     | 2 --
 arch/parisc/include/asm/io.h      | 2 --
 arch/powerpc/include/asm/mmiowb.h | 2 --
 arch/sparc/include/asm/io_64.h    | 2 --
 include/asm-generic/io.h          | 4 ----
 6 files changed, 14 deletions(-)

diff --git a/arch/alpha/include/asm/io.h b/arch/alpha/include/asm/io.h
index 4c533fc94d62..ccf9d65166bb 100644
--- a/arch/alpha/include/asm/io.h
+++ b/arch/alpha/include/asm/io.h
@@ -513,8 +513,6 @@ extern inline void writeq(u64 b, volatile void __iomem *addr)
 #define writel_relaxed(b, addr)	__raw_writel(b, addr)
 #define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
 
-#define mmiowb()
-
 /*
  * String version of IO memory access ops:
  */
diff --git a/arch/hexagon/include/asm/io.h b/arch/hexagon/include/asm/io.h
index e17262ad125e..3d0ae09c2b8e 100644
--- a/arch/hexagon/include/asm/io.h
+++ b/arch/hexagon/include/asm/io.h
@@ -184,8 +184,6 @@ static inline void writel(u32 data, volatile void __iomem *addr)
 #define writew_relaxed __raw_writew
 #define writel_relaxed __raw_writel
 
-#define mmiowb()
-
 /*
  * Need an mtype somewhere in here, for cache type deals?
  * This is probably too long for an inline.
diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h
index afe493b23d04..b163043d49db 100644
--- a/arch/parisc/include/asm/io.h
+++ b/arch/parisc/include/asm/io.h
@@ -229,8 +229,6 @@ static inline void writeq(unsigned long long q, volatile void __iomem *addr)
 #define writel_relaxed(l, addr)	writel(l, addr)
 #define writeq_relaxed(q, addr)	writeq(q, addr)
 
-#define mmiowb() do { } while (0)
-
 void memset_io(volatile void __iomem *addr, unsigned char val, int count);
 void memcpy_fromio(void *dst, const volatile void __iomem *src, int count);
 void memcpy_toio(volatile void __iomem *dst, const void *src, int count);
diff --git a/arch/powerpc/include/asm/mmiowb.h b/arch/powerpc/include/asm/mmiowb.h
index b10180613507..74a00127eb20 100644
--- a/arch/powerpc/include/asm/mmiowb.h
+++ b/arch/powerpc/include/asm/mmiowb.h
@@ -11,8 +11,6 @@
 #define arch_mmiowb_state()	(&local_paca->mmiowb_state)
 #define mmiowb()		mb()
 
-#else
-#define mmiowb()		do { } while (0)
 #endif /* CONFIG_MMIOWB */
 
 #include <asm-generic/mmiowb.h>
diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
index b162c23ae8c2..688911051b44 100644
--- a/arch/sparc/include/asm/io_64.h
+++ b/arch/sparc/include/asm/io_64.h
@@ -396,8 +396,6 @@ static inline void memcpy_toio(volatile void __iomem *dst, const void *src,
 	}
 }
 
-#define mmiowb()
-
 #ifdef __KERNEL__
 
 /* On sparc64 we have the whole physical IO address space accessible
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index bc490a746602..8f3bf95a36d1 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -22,10 +22,6 @@
 #include <asm/mmiowb.h>
 #include <asm-generic/pci_iomap.h>
 
-#ifndef mmiowb
-#define mmiowb() do {} while (0)
-#endif
-
 #ifndef __io_br
 #define __io_br()      barrier()
 #endif
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
  2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (19 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 20/20] arch: Remove dummy mmiowb() definitions from arch code Will Deacon
@ 2019-03-01 16:41 ` Linus Torvalds
  2019-03-02 12:56   ` Michael Ellerman
  20 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2019-03-01 16:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, Linux List Kernel Mailing, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

Love the acronym, and the series looks good to me.

Michael - can you check (or maybe you already did?) that this works
for ppc too, and doesn't have any gotcha's?

                      Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 13/20] riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
  2019-03-01 14:03 ` [PATCH 13/20] riscv/mmiowb: " Will Deacon
@ 2019-03-01 21:13   ` Palmer Dabbelt
  0 siblings, 0 replies; 43+ messages in thread
From: Palmer Dabbelt @ 2019-03-01 21:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, linux-kernel, Will Deacon, paulmck, benh, mpe,
	Arnd Bergmann, peterz, andrea.parri, Daniel Lustig, dhowells,
	stern, Linus Torvalds, macro, paul.burton, mingo, ysato, dalias,
	tony.luck

On Fri, 01 Mar 2019 06:03:41 PST (-0800), Will Deacon wrote:
> In a bid to kill off explicit mmiowb() usage in driver code, hook up
> the asm-generic mmiowb() tracking code for riscv, so that an mmiowb()
> is automatically issued from spin_unlock() if an I/O write was performed
> in the critical section.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/riscv/Kconfig              |  1 +
>  arch/riscv/include/asm/Kbuild   |  1 -
>  arch/riscv/include/asm/io.h     | 15 ++-------------
>  arch/riscv/include/asm/mmiowb.h | 14 ++++++++++++++
>  4 files changed, 17 insertions(+), 14 deletions(-)
>  create mode 100644 arch/riscv/include/asm/mmiowb.h
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 515fc3cc9687..08f4415203c5 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -49,6 +49,7 @@ config RISCV
>  	select RISCV_TIMER
>  	select GENERIC_IRQ_MULTI_HANDLER
>  	select ARCH_HAS_PTE_SPECIAL
> +	select ARCH_HAS_MMIOWB
>
>  config MMU
>  	def_bool y
> diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> index 221cd2ec78a4..cccd12cf27d4 100644
> --- a/arch/riscv/include/asm/Kbuild
> +++ b/arch/riscv/include/asm/Kbuild
> @@ -21,7 +21,6 @@ generic-y += kvm_para.h
>  generic-y += local.h
>  generic-y += local64.h
>  generic-y += mm-arch-hooks.h
> -generic-y += mmiowb.h
>  generic-y += mutex.h
>  generic-y += percpu.h
>  generic-y += preempt.h
> diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
> index 1d9c1376dc64..744fd92e77bc 100644
> --- a/arch/riscv/include/asm/io.h
> +++ b/arch/riscv/include/asm/io.h
> @@ -20,6 +20,7 @@
>  #define _ASM_RISCV_IO_H
>
>  #include <linux/types.h>
> +#include <asm/mmiowb.h>
>
>  extern void __iomem *ioremap(phys_addr_t offset, unsigned long size);
>
> @@ -100,18 +101,6 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
>  #endif
>
>  /*
> - * FIXME: I'm flip-flopping on whether or not we should keep this or enforce
> - * the ordering with I/O on spinlocks like PowerPC does.  The worry is that
> - * drivers won't get this correct, but I also don't want to introduce a fence
> - * into the lock code that otherwise only uses AMOs (and is essentially defined
> - * by the ISA to be correct).   For now I'm leaving this here: "o,w" is
> - * sufficient to ensure that all writes to the device have completed before the
> - * write to the spinlock is allowed to commit.  I surmised this from reading
> - * "ACQUIRES VS I/O ACCESSES" in memory-barriers.txt.
> - */
> -#define mmiowb()	__asm__ __volatile__ ("fence o,w" : : : "memory");
> -
> -/*
>   * Unordered I/O memory access primitives.  These are even more relaxed than
>   * the relaxed versions, as they don't even order accesses between successive
>   * operations to the I/O regions.
> @@ -165,7 +154,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
>  #define __io_br()	do {} while (0)
>  #define __io_ar(v)	__asm__ __volatile__ ("fence i,r" : : : "memory");
>  #define __io_bw()	__asm__ __volatile__ ("fence w,o" : : : "memory");
> -#define __io_aw()	do {} while (0)
> +#define __io_aw()	mmiowb_set_pending()
>
>  #define readb(c)	({ u8  __v; __io_br(); __v = readb_cpu(c); __io_ar(__v); __v; })
>  #define readw(c)	({ u16 __v; __io_br(); __v = readw_cpu(c); __io_ar(__v); __v; })
> diff --git a/arch/riscv/include/asm/mmiowb.h b/arch/riscv/include/asm/mmiowb.h
> new file mode 100644
> index 000000000000..5d7e3a2b4e3b
> --- /dev/null
> +++ b/arch/riscv/include/asm/mmiowb.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _ASM_RISCV_MMIOWB_H
> +#define _ASM_RISCV_MMIOWB_H
> +
> +/*
> + * "o,w" is sufficient to ensure that all writes to the device have completed
> + * before the write to the spinlock is allowed to commit.
> + */
> +#define mmiowb()	__asm__ __volatile__ ("fence o,w" : : : "memory");
> +
> +#include <asm-generic/mmiowb.h>
> +
> +#endif	/* ASM_RISCV_MMIOWB_H */

Reviewed-by: Palmer Dabbelt <palmer@sifive.com>

Thanks for doing this, that comment was one of the more headache-incuding 
FIXMEs in our port.  I think it's better to keep __io_aw next to the others: 
even if it's the same as the generic implementation, it's easier to reason 
about this way.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/20] mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  2019-03-01 14:03 ` [PATCH 10/20] mips/mmiowb: " Will Deacon
@ 2019-03-01 22:16   ` Paul Burton
  0 siblings, 0 replies; 43+ messages in thread
From: Paul Burton @ 2019-03-01 22:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, linux-kernel, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

Hi Will,

On Fri, Mar 01, 2019 at 02:03:38PM +0000, Will Deacon wrote:
> The mmiowb() macro is horribly difficult to use and drivers will continue
> to work most of the time if they omit a call when it is required.
> 
> Rather than rely on driver authors getting this right, push mmiowb() into
> arch_spin_unlock() for mips. If this is deemed to be a performance issue,
> a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
> the barrier in cases where no I/O writes were performed inside the
> critical section.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Cleaning up our I/O functions has been on my to-do list for a while, so
I'll aim to get to that soon & get the calls to mmiowb_set_pending() in
place as part of it so that we can look at that optimization & drop the
custom queued_spin_unlock().

Meanwhile this looks sane & I don't want to hold it up so:

    Acked-by: Paul Burton <paul.burton@mips.com>

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 12/20] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
  2019-03-01 14:03 ` [PATCH 12/20] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code Will Deacon
@ 2019-03-02 12:46   ` Michael Ellerman
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Ellerman @ 2019-03-02 12:46 UTC (permalink / raw)
  To: Will Deacon, linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Arnd Bergmann, Peter Zijlstra,
	Andrea Parri, Palmer Dabbelt, Daniel Lustig, David Howells,
	Alan Stern, Linus Torvalds, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck

Hi Will,

Will Deacon <will.deacon@arm.com> writes:
> In a bid to kill off explicit mmiowb() usage in driver code, hook up
> the asm-generic mmiowb() tracking code but provide a definition of
> arch_mmiowb_state() so that the tracking data can remain in the paca
> as it does at present
>
> This replaces the existing (flawed) implementation.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/powerpc/Kconfig                |  1 +
>  arch/powerpc/include/asm/Kbuild     |  1 -
>  arch/powerpc/include/asm/io.h       | 33 +++------------------------------
>  arch/powerpc/include/asm/mmiowb.h   | 20 ++++++++++++++++++++
>  arch/powerpc/include/asm/paca.h     |  6 +++++-
>  arch/powerpc/include/asm/spinlock.h | 17 -----------------
>  arch/powerpc/xmon/xmon.c            |  5 ++++-
>  7 files changed, 33 insertions(+), 50 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/mmiowb.h

Thanks for fixing our bugs for us, I owe you some more beers :)

I meant to reply to your previous series saying that we could just use
more space in the paca, but you obviously worked that out yourself.

I'll run this through our builders and do some boot tests but I looks
good to me.

Acked-by: Michael Ellerman <mpe@ellerman.id.au>


cheers



> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 2890d36eb531..6979304475fd 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -134,6 +134,7 @@ config PPC
>  	select ARCH_HAS_ELF_RANDOMIZE
>  	select ARCH_HAS_FORTIFY_SOURCE
>  	select ARCH_HAS_GCOV_PROFILE_ALL
> +	select ARCH_HAS_MMIOWB			if PPC64
>  	select ARCH_HAS_PHYS_TO_DMA
>  	select ARCH_HAS_PMEM_API                if PPC64
>  	select ARCH_HAS_PTE_SPECIAL
> diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
> index 57bd1f6660f4..77ff7fb24823 100644
> --- a/arch/powerpc/include/asm/Kbuild
> +++ b/arch/powerpc/include/asm/Kbuild
> @@ -8,7 +8,6 @@ generic-y += irq_regs.h
>  generic-y += irq_work.h
>  generic-y += local64.h
>  generic-y += mcs_spinlock.h
> -generic-y += mmiowb.h
>  generic-y += preempt.h
>  generic-y += rwsem.h
>  generic-y += vtime.h
> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> index 7f19fbd3ba55..828100476ba6 100644
> --- a/arch/powerpc/include/asm/io.h
> +++ b/arch/powerpc/include/asm/io.h
> @@ -34,14 +34,11 @@ extern struct pci_dev *isa_bridge_pcidev;
>  #include <asm/byteorder.h>
>  #include <asm/synch.h>
>  #include <asm/delay.h>
> +#include <asm/mmiowb.h>
>  #include <asm/mmu.h>
>  #include <asm/ppc_asm.h>
>  #include <asm/pgtable.h>
>  
> -#ifdef CONFIG_PPC64
> -#include <asm/paca.h>
> -#endif
> -
>  #define SIO_CONFIG_RA	0x398
>  #define SIO_CONFIG_RD	0x399
>  
> @@ -107,12 +104,6 @@ extern bool isa_io_special;
>   *
>   */
>  
> -#ifdef CONFIG_PPC64
> -#define IO_SET_SYNC_FLAG()	do { local_paca->io_sync = 1; } while(0)
> -#else
> -#define IO_SET_SYNC_FLAG()
> -#endif
> -
>  #define DEF_MMIO_IN_X(name, size, insn)				\
>  static inline u##size name(const volatile u##size __iomem *addr)	\
>  {									\
> @@ -127,7 +118,7 @@ static inline void name(volatile u##size __iomem *addr, u##size val)	\
>  {									\
>  	__asm__ __volatile__("sync;"#insn" %1,%y0"			\
>  		: "=Z" (*addr) : "r" (val) : "memory");			\
> -	IO_SET_SYNC_FLAG();						\
> +	mmiowb_set_pending();						\
>  }
>  
>  #define DEF_MMIO_IN_D(name, size, insn)				\
> @@ -144,7 +135,7 @@ static inline void name(volatile u##size __iomem *addr, u##size val)	\
>  {									\
>  	__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0"			\
>  		: "=m" (*addr) : "r" (val) : "memory");			\
> -	IO_SET_SYNC_FLAG();						\
> +	mmiowb_set_pending();						\
>  }
>  
>  DEF_MMIO_IN_D(in_8,     8, lbz);
> @@ -652,24 +643,6 @@ static inline void name at					\
>  
>  #include <asm-generic/iomap.h>
>  
> -#ifdef CONFIG_PPC32
> -#define mmiowb()
> -#else
> -/*
> - * Enforce synchronisation of stores vs. spin_unlock
> - * (this does it explicitly, though our implementation of spin_unlock
> - * does it implicitely too)
> - */
> -static inline void mmiowb(void)
> -{
> -	unsigned long tmp;
> -
> -	__asm__ __volatile__("sync; li %0,0; stb %0,%1(13)"
> -	: "=&r" (tmp) : "i" (offsetof(struct paca_struct, io_sync))
> -	: "memory");
> -}
> -#endif /* !CONFIG_PPC32 */
> -
>  static inline void iosync(void)
>  {
>          __asm__ __volatile__ ("sync" : : : "memory");
> diff --git a/arch/powerpc/include/asm/mmiowb.h b/arch/powerpc/include/asm/mmiowb.h
> new file mode 100644
> index 000000000000..b10180613507
> --- /dev/null
> +++ b/arch/powerpc/include/asm/mmiowb.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_MMIOWB_H
> +#define _ASM_POWERPC_MMIOWB_H
> +
> +#ifdef CONFIG_MMIOWB
> +
> +#include <linux/compiler.h>
> +#include <asm/barrier.h>
> +#include <asm/paca.h>
> +
> +#define arch_mmiowb_state()	(&local_paca->mmiowb_state)
> +#define mmiowb()		mb()
> +
> +#else
> +#define mmiowb()		do { } while (0)
> +#endif /* CONFIG_MMIOWB */
> +
> +#include <asm-generic/mmiowb.h>
> +
> +#endif	/* _ASM_POWERPC_MMIOWB_H */
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index e843bc5d1a0f..134e912d403f 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -34,6 +34,8 @@
>  #include <asm/cpuidle.h>
>  #include <asm/atomic.h>
>  
> +#include <asm-generic/mmiowb_types.h>
> +
>  register struct paca_struct *local_paca asm("r13");
>  
>  #if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_SMP)
> @@ -171,7 +173,6 @@ struct paca_struct {
>  	u16 trap_save;			/* Used when bad stack is encountered */
>  	u8 irq_soft_mask;		/* mask for irq soft masking */
>  	u8 irq_happened;		/* irq happened while soft-disabled */
> -	u8 io_sync;			/* writel() needs spin_unlock sync */
>  	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
>  	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> @@ -264,6 +265,9 @@ struct paca_struct {
>  #ifdef CONFIG_STACKPROTECTOR
>  	unsigned long canary;
>  #endif
> +#ifdef CONFIG_MMIOWB
> +	struct mmiowb_state mmiowb_state;
> +#endif
>  } ____cacheline_aligned;
>  
>  extern void copy_mm_to_paca(struct mm_struct *mm);
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index 685c72310f5d..15b39c407c4e 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -39,19 +39,6 @@
>  #define LOCK_TOKEN	1
>  #endif
>  
> -#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
> -#define CLEAR_IO_SYNC	(get_paca()->io_sync = 0)
> -#define SYNC_IO		do {						\
> -				if (unlikely(get_paca()->io_sync)) {	\
> -					mb();				\
> -					get_paca()->io_sync = 0;	\
> -				}					\
> -			} while (0)
> -#else
> -#define CLEAR_IO_SYNC
> -#define SYNC_IO
> -#endif
> -
>  #ifdef CONFIG_PPC_PSERIES
>  #define vcpu_is_preempted vcpu_is_preempted
>  static inline bool vcpu_is_preempted(int cpu)
> @@ -99,7 +86,6 @@ static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock)
>  
>  static inline int arch_spin_trylock(arch_spinlock_t *lock)
>  {
> -	CLEAR_IO_SYNC;
>  	return __arch_spin_trylock(lock) == 0;
>  }
>  
> @@ -130,7 +116,6 @@ extern void __rw_yield(arch_rwlock_t *lock);
>  
>  static inline void arch_spin_lock(arch_spinlock_t *lock)
>  {
> -	CLEAR_IO_SYNC;
>  	while (1) {
>  		if (likely(__arch_spin_trylock(lock) == 0))
>  			break;
> @@ -148,7 +133,6 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
>  {
>  	unsigned long flags_dis;
>  
> -	CLEAR_IO_SYNC;
>  	while (1) {
>  		if (likely(__arch_spin_trylock(lock) == 0))
>  			break;
> @@ -167,7 +151,6 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
>  
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
> -	SYNC_IO;
>  	__asm__ __volatile__("# arch_spin_unlock\n\t"
>  				PPC_RELEASE_BARRIER: : :"memory");
>  	lock->slock = 0;
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index 757b8499aba2..de8e4693b176 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -2429,7 +2429,10 @@ static void dump_one_paca(int cpu)
>  	DUMP(p, trap_save, "%#-*x");
>  	DUMP(p, irq_soft_mask, "%#-*x");
>  	DUMP(p, irq_happened, "%#-*x");
> -	DUMP(p, io_sync, "%#-*x");
> +#ifdef CONFIG_MMIOWB
> +	DUMP(p, mmiowb_state.nesting_count, "%#-*x");
> +	DUMP(p, mmiowb_state.mmiowb_pending, "%#-*x");
> +#endif
>  	DUMP(p, irq_work_pending, "%#-*x");
>  	DUMP(p, nap_state_lost, "%#-*x");
>  	DUMP(p, sprg_vdso, "%#-*llx");
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
  2019-03-01 16:41 ` [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
@ 2019-03-02 12:56   ` Michael Ellerman
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Ellerman @ 2019-03-02 12:56 UTC (permalink / raw)
  To: Linus Torvalds, Will Deacon
  Cc: linux-arch, Linux List Kernel Mailing, Paul E. McKenney,
	Benjamin Herrenschmidt, Arnd Bergmann, Peter Zijlstra,
	Andrea Parri, Palmer Dabbelt, Daniel Lustig, David Howells,
	Alan Stern, Maciej W. Rozycki, Paul Burton, Ingo Molnar,
	Yoshinori Sato, Rich Felker, Tony Luck

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Love the acronym, and the series looks good to me.
>
> Michael - can you check (or maybe you already did?) that this works
> for ppc too, and doesn't have any gotcha's?

Yeah it looks fine to me.

I gave it a quick boot with a patch to count how many mb()s we actually
issue in the spin_unlock() path and it's definitely hitting that path
correctly. So it works at least as well as the old code.

cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-01 14:03 ` [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
@ 2019-03-03  1:43   ` Nicholas Piggin
  2019-03-03  2:18     ` Linus Torvalds
                       ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Nicholas Piggin @ 2019-03-03  1:43 UTC (permalink / raw)
  To: linux-arch, Will Deacon
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-kernel, Maciej W. Rozycki,
	Ingo Molnar, Michael Ellerman, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Linus Torvalds, Yoshinori Sato

Will Deacon's on March 2, 2019 12:03 am:
> In preparation for removing all explicit mmiowb() calls from driver
> code, implement a tracking system in asm-generic based loosely on the
> PowerPC implementation. This allows architectures with a non-empty
> mmiowb() definition to have the barrier automatically inserted in
> spin_unlock() following a critical section containing an I/O write.

Is there a reason to call this "mmiowb"? We already have wmb that
orders cacheable stores vs mmio stores don't we?

Yes ia64 "sn2" is broken in that case, but that can be fixed (if
anyone really cares about the platform any more). Maybe that's
orthogonal to what you're doing here, I just don't like seeing
"mmiowb" spread.

This series works for spin locks, but you would want a driver to
be able to use wmb() to order locks vs mmio when using a bit lock
or a mutex or whatever else. Calling your wmb-if-io-is-pending
version io_mb_before_unlock() would kind of match with existing
patterns.

> +static inline void mmiowb_set_pending(void)
> +{
> +	struct mmiowb_state *ms = __mmiowb_state();
> +	ms->mmiowb_pending = ms->nesting_count;
> +}
> +
> +static inline void mmiowb_spin_lock(void)
> +{
> +	struct mmiowb_state *ms = __mmiowb_state();
> +	ms->nesting_count++;
> +}
> +
> +static inline void mmiowb_spin_unlock(void)
> +{
> +	struct mmiowb_state *ms = __mmiowb_state();
> +
> +	if (unlikely(ms->mmiowb_pending)) {
> +		ms->mmiowb_pending = 0;
> +		mmiowb();
> +	}
> +
> +	ms->nesting_count--;
> +}

Humour me for a minute and tell me what this algorithm is doing, or
what was broken about the powerpc one, which is basically:

static inline void mmiowb_set_pending(void)
{
	struct mmiowb_state *ms = __mmiowb_state();
	ms->mmiowb_pending = 1;
}

static inline void mmiowb_spin_lock(void)
{
}

static inline void mmiowb_spin_unlock(void)
{
	struct mmiowb_state *ms = __mmiowb_state();

	if (unlikely(ms->mmiowb_pending)) {
		ms->mmiowb_pending = 0;
		mmiowb();
	}
}

> diff --git a/include/asm-generic/mmiowb_types.h b/include/asm-generic/mmiowb_types.h
> new file mode 100644
> index 000000000000..8eb0095655e7
> --- /dev/null
> +++ b/include/asm-generic/mmiowb_types.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_GENERIC_MMIOWB_TYPES_H
> +#define __ASM_GENERIC_MMIOWB_TYPES_H
> +
> +#include <linux/types.h>
> +
> +struct mmiowb_state {
> +	u16	nesting_count;
> +	u16	mmiowb_pending;
> +};

Really need more than 255 nested spin locks? I had the idea that 16
bit operations were a bit more costly than 8 bit on some CPUs... may
not be true, but at least the smaller size packs a bit better on
powerpc.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/20] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
  2019-03-01 14:03 ` [PATCH 03/20] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors Will Deacon
@ 2019-03-03  1:47   ` Nicholas Piggin
  0 siblings, 0 replies; 43+ messages in thread
From: Nicholas Piggin @ 2019-03-03  1:47 UTC (permalink / raw)
  To: linux-arch, Will Deacon
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-kernel, Maciej W. Rozycki,
	Ingo Molnar, Michael Ellerman, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Linus Torvalds, Yoshinori Sato

Will Deacon's on March 2, 2019 12:03 am:
> @@ -177,6 +178,7 @@ do {								\
>  static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock)
>  {
>  	__acquire(lock);
> +	mmiowb_spin_lock();
>  	arch_spin_lock(&lock->raw_lock);
>  }
>  
> @@ -188,16 +190,23 @@ static inline void
>  do_raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long *flags) __acquires(lock)
>  {
>  	__acquire(lock);
> +	mmiowb_spin_lock();
>  	arch_spin_lock_flags(&lock->raw_lock, *flags);
>  }

You'd be better to put these inside the spin lock, to match your 
trylock.

Also it means the mmiowb state can be used inside a lock/unlock pair
without a compiler barrer forcing it to be reloaded, should be better
code generation for very small critical sections on archs which inline
lock and unlock.

>  
>  static inline int do_raw_spin_trylock(raw_spinlock_t *lock)
>  {
> -	return arch_spin_trylock(&(lock)->raw_lock);
> +	int ret = arch_spin_trylock(&(lock)->raw_lock);
> +
> +	if (ret)
> +		mmiowb_spin_lock();
> +
> +	return ret;
>  }
>  
>  static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
>  {
> +	mmiowb_spin_unlock();
>  	arch_spin_unlock(&lock->raw_lock);
>  	__release(lock);
>  }

Thanks,
Nick


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-03  1:43   ` Nicholas Piggin
@ 2019-03-03  2:18     ` Linus Torvalds
  2019-03-03  3:34       ` Nicholas Piggin
  2019-03-03  9:26     ` Michael Ellerman
  2019-03-04 10:24     ` Michael Ellerman
  2 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2019-03-03  2:18 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Will Deacon, Andrea Parri, Arnd Bergmann,
	Benjamin Herrenschmidt, Rich Felker, David Howells,
	Daniel Lustig, Linux List Kernel Mailing, Maciej W. Rozycki,
	Ingo Molnar, Michael Ellerman, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Yoshinori Sato

On Sat, Mar 2, 2019 at 5:43 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Is there a reason to call this "mmiowb"? We already have wmb that
> orders cacheable stores vs mmio stores don't we?

Sadly no it doesn't. Not on ia64, and people tried to make that the
new rule because of the platform breakage on what some people thought
would be a major platform.

Plain wmb() was only guaranteed to order regular memory against each
other (mostly useful for dma) on some of these platforms, because they
had such broken IO synchronization.

So mmiowb() is not a new name. It's been around for a while, and the
people who wanted it have happily become irrelevant. Will is making it
go away, but the name remains for historical reasons, even if Will's
new acronym explanation for the name is much better ;)

                Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-03  2:18     ` Linus Torvalds
@ 2019-03-03  3:34       ` Nicholas Piggin
       [not found]         ` <CAHk-=whVN58nWh29jvXx+X-Yx9dCC6BeAZOtKak+d01y_UVg=A@mail.gmail.com>
  0 siblings, 1 reply; 43+ messages in thread
From: Nicholas Piggin @ 2019-03-03  3:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-arch,
	Linux List Kernel Mailing, Maciej W. Rozycki, Ingo Molnar,
	Michael Ellerman, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Will Deacon,
	Yoshinori Sato

Linus Torvalds's on March 3, 2019 12:18 pm:
> On Sat, Mar 2, 2019 at 5:43 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Is there a reason to call this "mmiowb"? We already have wmb that
>> orders cacheable stores vs mmio stores don't we?
> 
> Sadly no it doesn't. Not on ia64, and people tried to make that the
> new rule because of the platform breakage on what some people thought
> would be a major platform.

Let me try this again, because I was babbling a train of thought 
continuing from my past mails on the subject.

  Kill mmiowb with fire.

It was added for a niche platform that hasn't been produced for 10
years for a CPU ISA that is no longer being developed. Let's make mb/wmb
great again (aka actually possible for normal people to understand).

If something comes along again that reorders mmios from different CPUs 
in the IO controller like the Altix did, they implement wmb the slow and 
correct way. They can add a new faster primitive for the few devices 
they care about in the couple of perf critical places that matter.

It doesn't have to be done all at once with this series, obviously this 
is a big improvement on its own. But why perpetuate the nomenclature
and concept for new code added now? 

Thanks,
Nick


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-03  1:43   ` Nicholas Piggin
  2019-03-03  2:18     ` Linus Torvalds
@ 2019-03-03  9:26     ` Michael Ellerman
  2019-03-03 10:07       ` Nicholas Piggin
  2019-03-04 10:24     ` Michael Ellerman
  2 siblings, 1 reply; 43+ messages in thread
From: Michael Ellerman @ 2019-03-03  9:26 UTC (permalink / raw)
  To: Nicholas Piggin, linux-arch, Will Deacon
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-kernel, Maciej W. Rozycki,
	Ingo Molnar, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Linus Torvalds,
	Yoshinori Sato

Nicholas Piggin <npiggin@gmail.com> writes:
> Will Deacon's on March 2, 2019 12:03 am:
>> In preparation for removing all explicit mmiowb() calls from driver
>> code, implement a tracking system in asm-generic based loosely on the
>> PowerPC implementation. This allows architectures with a non-empty
>> mmiowb() definition to have the barrier automatically inserted in
>> spin_unlock() following a critical section containing an I/O write.
>
> Is there a reason to call this "mmiowb"? We already have wmb that
> orders cacheable stores vs mmio stores don't we?
>
> Yes ia64 "sn2" is broken in that case, but that can be fixed (if
> anyone really cares about the platform any more). Maybe that's
> orthogonal to what you're doing here, I just don't like seeing
> "mmiowb" spread.
>
> This series works for spin locks, but you would want a driver to
> be able to use wmb() to order locks vs mmio when using a bit lock
> or a mutex or whatever else. Calling your wmb-if-io-is-pending
> version io_mb_before_unlock() would kind of match with existing
> patterns.
>
>> +static inline void mmiowb_set_pending(void)
>> +{
>> +	struct mmiowb_state *ms = __mmiowb_state();
>> +	ms->mmiowb_pending = ms->nesting_count;
>> +}
>> +
>> +static inline void mmiowb_spin_lock(void)
>> +{
>> +	struct mmiowb_state *ms = __mmiowb_state();
>> +	ms->nesting_count++;
>> +}
>> +
>> +static inline void mmiowb_spin_unlock(void)
>> +{
>> +	struct mmiowb_state *ms = __mmiowb_state();
>> +
>> +	if (unlikely(ms->mmiowb_pending)) {
>> +		ms->mmiowb_pending = 0;
>> +		mmiowb();
>> +	}
>> +
>> +	ms->nesting_count--;
>> +}
>
> Humour me for a minute and tell me what this algorithm is doing, or
> what was broken about the powerpc one, which is basically:
>
> static inline void mmiowb_set_pending(void)
> {
> 	struct mmiowb_state *ms = __mmiowb_state();
> 	ms->mmiowb_pending = 1;
> }
>
> static inline void mmiowb_spin_lock(void)
> {
> }

The current powerpc code clears io_sync in spin_lock().

ie, it would be equivalent to:

static inline void mmiowb_spin_lock(void)
{
 	ms->mmiowb_pending = 0;
}

Which means that:

	spin_lock(a);
        writel(x, y);
	spin_lock(b);
        ...
	spin_unlock(b);
	spin_unlock(a);

Does no barrier.

cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
       [not found]         ` <CAHk-=whVN58nWh29jvXx+X-Yx9dCC6BeAZOtKak+d01y_UVg=A@mail.gmail.com>
@ 2019-03-03 10:05           ` Nicholas Piggin
  2019-03-03 18:48             ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Nicholas Piggin @ 2019-03-03 10:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-arch,
	Linux List Kernel Mailing, Maciej W. Rozycki, Ingo Molnar,
	Michael Ellerman, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Will Deacon,
	Yoshinori Sato

Linus Torvalds's on March 3, 2019 2:29 pm:
> On Sat, Mar 2, 2019, 19:34 Nicholas Piggin <npiggin@gmail.com> wrote:
> 
>>
>> It doesn't have to be done all at once with this series, obviously this
>> is a big improvement on its own. But why perpetuate the nomenclature
>> and concept for new code added now?
>>
> 
> What nomenclature?
> 
> Nobody will be using mmiowb(). That's the whole point of the patch series.
> 
> It's now an entirely internal name, and nobody cares.

Why even bother with it at all, "internal" or not?  Just get rid of 
mmiowb, the concept is obsolete.

> And none of this has anything to do with wmb(), since it's about IO being
> ordered across cpu's by spin locks, not by barriers.
> 
> So I'm not seeing what you're arguing about.

Pretend ia64 doesn't exist for a minute. Now the regular mb/wmb barriers 
orders IO across CPUs with respect to their cacheable accesses.  
Regardless of whether that cacheable access is a spin lock, a bit lock, 
an atomic, a mutex... This is how it was before mmiowb came along.

Nothing wrong with this series to make spinlocks order mmio, but why 
call it mmiowb? Another patch could rename ia64's mmiowb and then the
name can be removed from the tree completely.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-03  9:26     ` Michael Ellerman
@ 2019-03-03 10:07       ` Nicholas Piggin
  2019-03-04  1:01         ` Michael Ellerman
  0 siblings, 1 reply; 43+ messages in thread
From: Nicholas Piggin @ 2019-03-03 10:07 UTC (permalink / raw)
  To: linux-arch, Michael Ellerman, Will Deacon
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-kernel, Maciej W. Rozycki,
	Ingo Molnar, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Linus Torvalds,
	Yoshinori Sato

Michael Ellerman's on March 3, 2019 7:26 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> Will Deacon's on March 2, 2019 12:03 am:
>>> In preparation for removing all explicit mmiowb() calls from driver
>>> code, implement a tracking system in asm-generic based loosely on the
>>> PowerPC implementation. This allows architectures with a non-empty
>>> mmiowb() definition to have the barrier automatically inserted in
>>> spin_unlock() following a critical section containing an I/O write.
>>
>> Is there a reason to call this "mmiowb"? We already have wmb that
>> orders cacheable stores vs mmio stores don't we?
>>
>> Yes ia64 "sn2" is broken in that case, but that can be fixed (if
>> anyone really cares about the platform any more). Maybe that's
>> orthogonal to what you're doing here, I just don't like seeing
>> "mmiowb" spread.
>>
>> This series works for spin locks, but you would want a driver to
>> be able to use wmb() to order locks vs mmio when using a bit lock
>> or a mutex or whatever else. Calling your wmb-if-io-is-pending
>> version io_mb_before_unlock() would kind of match with existing
>> patterns.
>>
>>> +static inline void mmiowb_set_pending(void)
>>> +{
>>> +	struct mmiowb_state *ms = __mmiowb_state();
>>> +	ms->mmiowb_pending = ms->nesting_count;
>>> +}
>>> +
>>> +static inline void mmiowb_spin_lock(void)
>>> +{
>>> +	struct mmiowb_state *ms = __mmiowb_state();
>>> +	ms->nesting_count++;
>>> +}
>>> +
>>> +static inline void mmiowb_spin_unlock(void)
>>> +{
>>> +	struct mmiowb_state *ms = __mmiowb_state();
>>> +
>>> +	if (unlikely(ms->mmiowb_pending)) {
>>> +		ms->mmiowb_pending = 0;
>>> +		mmiowb();
>>> +	}
>>> +
>>> +	ms->nesting_count--;
>>> +}
>>
>> Humour me for a minute and tell me what this algorithm is doing, or
>> what was broken about the powerpc one, which is basically:
>>
>> static inline void mmiowb_set_pending(void)
>> {
>> 	struct mmiowb_state *ms = __mmiowb_state();
>> 	ms->mmiowb_pending = 1;
>> }
>>
>> static inline void mmiowb_spin_lock(void)
>> {
>> }
> 
> The current powerpc code clears io_sync in spin_lock().
> 
> ie, it would be equivalent to:
> 
> static inline void mmiowb_spin_lock(void)
> {
>  	ms->mmiowb_pending = 0;
> }

Ah okay that's what I missed. How about we just not do that?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-03 10:05           ` Nicholas Piggin
@ 2019-03-03 18:48             ` Linus Torvalds
  2019-03-05  0:21               ` Nicholas Piggin
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2019-03-03 18:48 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-arch,
	Linux List Kernel Mailing, Maciej W. Rozycki, Ingo Molnar,
	Michael Ellerman, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Will Deacon,
	Yoshinori Sato

On Sun, Mar 3, 2019 at 2:05 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Why even bother with it at all, "internal" or not?  Just get rid of
> mmiowb, the concept is obsolete.

It *is* gone, for chrissake!  Only the name remains as an internal
detail of "this is what we need to do".

> Pretend ia64 doesn't exist for a minute. Now the regular mb/wmb barriers
> orders IO across CPUs with respect to their cacheable accesses.

Stop with the total red herring already.

THIS HAS NOTHING TO DO WITH mb()/wmb().

As long as you keep bringing those up, you're only showing that you're
talking about the wrong thing.

> Regardless of whether that cacheable access is a spin lock, a bit lock,
> an atomic, a mutex... This is how it was before mmiowb came along.

No.

Beflore mmiowb() came along, there was one rule: do what x86 does.

And x86 orders mmio inside spinlocks.

Seriously.

Notice how there's not a single "barrier" mentioned here anywhere in
the above. No "mb()", no "wmb()", no nothing. Only "spinlocks order
IO".

That's the fundamental rule (that we broke for ia64), and all that
matters for this patch series.

Stop talking about wmb(). It's irrelevant. A spinlock does not
*contain* a wmb().

Nobody even _cares_ about wmb(). They are entirely irrelevant wrt IO,
because IO is ordered on any particular CPU anyway (which is what
wmb() enforces).

Only when you do special things like __raw_writel() etc does wmb()
matter, but at that point this whole series is entirely irrelevant,
and once again, that's still about just ordering on a single CPU.

So as long as you talk about wmb(), all you show is that you're
talking about something entirely different FROM THIS WHOLE SERIES.

And like it or not, ia64 still exists. We support it. It doesn't
_matter_ and we don't much care any more, but it still exists. Which
is why we have that concept of mmiowb().

On other platforms, mmiowb() might be a wmb(). Or it might not. It
might be some other barrier, or it might be a no-op entirely without a
barrier at all. It doesn't matter. But mmiowb() exists, and is now
happily entirely hidden inside the rule of "spinlocks order MMIO
across CPU's".

                 Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-03 10:07       ` Nicholas Piggin
@ 2019-03-04  1:01         ` Michael Ellerman
  2019-03-05  0:21           ` Nicholas Piggin
  0 siblings, 1 reply; 43+ messages in thread
From: Michael Ellerman @ 2019-03-04  1:01 UTC (permalink / raw)
  To: Nicholas Piggin, linux-arch, Will Deacon
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-kernel, Maciej W. Rozycki,
	Ingo Molnar, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Linus Torvalds,
	Yoshinori Sato

Nicholas Piggin <npiggin@gmail.com> writes:
> Michael Ellerman's on March 3, 2019 7:26 pm:
>> Nicholas Piggin <npiggin@gmail.com> writes:
...
>>> what was broken about the powerpc one, which is basically:
>>>
>>> static inline void mmiowb_set_pending(void)
>>> {
>>> 	struct mmiowb_state *ms = __mmiowb_state();
>>> 	ms->mmiowb_pending = 1;
>>> }
>>>
>>> static inline void mmiowb_spin_lock(void)
>>> {
>>> }
>> 
>> The current powerpc code clears io_sync in spin_lock().
>> 
>> ie, it would be equivalent to:
>> 
>> static inline void mmiowb_spin_lock(void)
>> {
>>  	ms->mmiowb_pending = 0;
>> }
>
> Ah okay that's what I missed. How about we just not do that?

Yeah I thought of that too but it's not great. We'd start semi-randomly
executing the sync in unlock depending on whether someone had done IO on
that CPU prior to the spinlock.

eg.

	writel(x, y);		// sets paca->io_sync
	...	

	<schedule>

	spin_lock(a);
        ...
        // No IO in here
        ...
        spin_unlock(a);		// sync() here because other task did writel().


Which wouldn't be *incorrect*, but would be kind of weird.

cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-03  1:43   ` Nicholas Piggin
  2019-03-03  2:18     ` Linus Torvalds
  2019-03-03  9:26     ` Michael Ellerman
@ 2019-03-04 10:24     ` Michael Ellerman
  2019-03-05  0:19       ` Linus Torvalds
  2 siblings, 1 reply; 43+ messages in thread
From: Michael Ellerman @ 2019-03-04 10:24 UTC (permalink / raw)
  To: Nicholas Piggin, linux-arch, Will Deacon
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-kernel, Maciej W. Rozycki,
	Ingo Molnar, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Linus Torvalds,
	Yoshinori Sato

Nicholas Piggin <npiggin@gmail.com> writes:
> Will Deacon's on March 2, 2019 12:03 am:
>> In preparation for removing all explicit mmiowb() calls from driver
>> code, implement a tracking system in asm-generic based loosely on the
>> PowerPC implementation. This allows architectures with a non-empty
>> mmiowb() definition to have the barrier automatically inserted in
>> spin_unlock() following a critical section containing an I/O write.
>
> Is there a reason to call this "mmiowb"? We already have wmb that
> orders cacheable stores vs mmio stores don't we?
>
> Yes ia64 "sn2" is broken in that case, but that can be fixed (if
> anyone really cares about the platform any more). Maybe that's
> orthogonal to what you're doing here, I just don't like seeing
> "mmiowb" spread.
>
> This series works for spin locks, but you would want a driver to
> be able to use wmb() to order locks vs mmio when using a bit lock
> or a mutex or whatever else.

Without wading into the rest of the discussion, this does raise an
interesting point, ie. what about eg. rwlock's?

They're basically equivalent to spinlocks, and so could reasonably be
expected to have the same behaviour.

But we don't check the io_sync flag in arch_read/write_unlock() etc. and
both of those use lwsync.

Seems like we just forgot they existed? Commit f007cacffc88 ("[POWERPC]
Fix MMIO ops to provide expected barrier behaviour") that added the
io_sync stuff doesn't mention them at all.

Am I missing anything? AFAICS read/write locks were never built on top
of spin locks, so seems like we're just hoping drivers using rwlock do
the right barriers?

cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-04 10:24     ` Michael Ellerman
@ 2019-03-05  0:19       ` Linus Torvalds
  2019-03-07  0:47         ` Michael Ellerman
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2019-03-05  0:19 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nicholas Piggin, linux-arch, Will Deacon, Andrea Parri,
	Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, Linux List Kernel Mailing,
	Maciej W. Rozycki, Ingo Molnar, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Yoshinori Sato

On Mon, Mar 4, 2019 at 2:24 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Without wading into the rest of the discussion, this does raise an
> interesting point, ie. what about eg. rwlock's?
>
> They're basically equivalent to spinlocks, and so could reasonably be
> expected to have the same behaviour.
>
> But we don't check the io_sync flag in arch_read/write_unlock() etc. and
> both of those use lwsync.

I think technically rwlocks should do the same thing, at least when
they are used for exclusion.

Because of the exclusion argument, we can presubably limit it to just
write_unlock(), although at least in theory I guess you could have
some "one reader does IO, then a writer comes in" situation..

Perhaps more importantly, what about sleeping locks? When they
actually *block*, they get the barrier thanks to the scheduler, but
you can have a nice non-contended sequence that never does that.

I guess the fact that these cases have never even shown up as an issue
means that we could just continue to ignore it.

We could even give that approach some fancy name, and claim it as a
revolutionary new programming paradigm ("ostrich programming" to go
with "agile" and "pair programming").

                    Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-03 18:48             ` Linus Torvalds
@ 2019-03-05  0:21               ` Nicholas Piggin
  2019-03-05  0:33                 ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Nicholas Piggin @ 2019-03-05  0:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-arch,
	Linux List Kernel Mailing, Maciej W. Rozycki, Ingo Molnar,
	Michael Ellerman, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Will Deacon,
	Yoshinori Sato

Linus Torvalds's on March 4, 2019 4:48 am:
> On Sun, Mar 3, 2019 at 2:05 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Why even bother with it at all, "internal" or not?  Just get rid of
>> mmiowb, the concept is obsolete.
> 
> It *is* gone, for chrissake!  Only the name remains as an internal
> detail of "this is what we need to do".
> 
>> Pretend ia64 doesn't exist for a minute. Now the regular mb/wmb barriers
>> orders IO across CPUs with respect to their cacheable accesses.
> 
> Stop with the total red herring already.
> 
> THIS HAS NOTHING TO DO WITH mb()/wmb().
> 
> As long as you keep bringing those up, you're only showing that you're
> talking about the wrong thing.

Why? I'm talking about them because they are not taken care of by this 
part of mmiowb removal. Talking about spin locks is the wrong thing
because we're already past that and everybody agrees it's the right
approach.

>> Regardless of whether that cacheable access is a spin lock, a bit lock,
>> an atomic, a mutex... This is how it was before mmiowb came along.
> 
> No.
> 
> Beflore mmiowb() came along, there was one rule: do what x86 does.
> 
> And x86 orders mmio inside spinlocks.
> 
> Seriously.
>
> Notice how there's not a single "barrier" mentioned here anywhere in
> the above. No "mb()", no "wmb()", no nothing. Only "spinlocks order
> IO".
> 
> That's the fundamental rule (that we broke for ia64), and all that
> matters for this patch series.
> 
> Stop talking about wmb(). It's irrelevant. A spinlock does not
> *contain* a wmb().

Well you don't have to talk about it but why do you want me to stop?
I don't understand. It's an open topic still after this series. I
can post a new thread about it if that would upset you less, I just
thought it would kind of fit here because we're talking about mmiowb,
I'm not trying to derail this series.

> Nobody even _cares_ about wmb(). They are entirely irrelevant wrt IO,
> because IO is ordered on any particular CPU anyway (which is what
> wmb() enforces).
> 
> Only when you do special things like __raw_writel() etc does wmb()
> matter, but at that point this whole series is entirely irrelevant,
> and once again, that's still about just ordering on a single CPU.
> 
> So as long as you talk about wmb(), all you show is that you're
> talking about something entirely different FROM THIS WHOLE SERIES.
> 
> And like it or not, ia64 still exists. We support it. It doesn't
> _matter_ and we don't much care any more, but it still exists. Which
> is why we have that concept of mmiowb().
> 
> On other platforms, mmiowb() might be a wmb(). Or it might not. It
> might be some other barrier, or it might be a no-op entirely without a
> barrier at all. It doesn't matter. But mmiowb() exists, and is now
> happily entirely hidden inside the rule of "spinlocks order MMIO
> across CPU's".

The driver writer still has to know exactly as much about mmiowb
(the concept, if not the name) before this series as afterward. That
is, sequences of mmio stores to a device from different CPUs can only
be atomic if you (put mmiowb before spin unlock | protect them with
spin locks).

I just don't understand the reason to expose the driver writer to
that additional detail. Intuitively, mb() should order stores to
all kind of memory the same as smp_mb() orders stores to cacheable
(without the detail of stores being reordered at the interconnect
or controller -- driver writer doesn't care about store queues in
the CPU or whatever details, they want the device to see IOs in
some order).

Thanks,
Nick


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-04  1:01         ` Michael Ellerman
@ 2019-03-05  0:21           ` Nicholas Piggin
  0 siblings, 0 replies; 43+ messages in thread
From: Nicholas Piggin @ 2019-03-05  0:21 UTC (permalink / raw)
  To: linux-arch, Michael Ellerman, Will Deacon
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-kernel, Maciej W. Rozycki,
	Ingo Molnar, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Linus Torvalds,
	Yoshinori Sato

Michael Ellerman's on March 4, 2019 11:01 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> Michael Ellerman's on March 3, 2019 7:26 pm:
>>> Nicholas Piggin <npiggin@gmail.com> writes:
> ...
>>>> what was broken about the powerpc one, which is basically:
>>>>
>>>> static inline void mmiowb_set_pending(void)
>>>> {
>>>> 	struct mmiowb_state *ms = __mmiowb_state();
>>>> 	ms->mmiowb_pending = 1;
>>>> }
>>>>
>>>> static inline void mmiowb_spin_lock(void)
>>>> {
>>>> }
>>> 
>>> The current powerpc code clears io_sync in spin_lock().
>>> 
>>> ie, it would be equivalent to:
>>> 
>>> static inline void mmiowb_spin_lock(void)
>>> {
>>>  	ms->mmiowb_pending = 0;
>>> }
>>
>> Ah okay that's what I missed. How about we just not do that?
> 
> Yeah I thought of that too but it's not great. We'd start semi-randomly
> executing the sync in unlock depending on whether someone had done IO on
> that CPU prior to the spinlock.
> 
> eg.
> 
> 	writel(x, y);		// sets paca->io_sync
> 	...	
> 
> 	<schedule>
> 
> 	spin_lock(a);
>         ...
>         // No IO in here
>         ...
>         spin_unlock(a);		// sync() here because other task did writel().
> 
> 
> Which wouldn't be *incorrect*, but would be kind of weird.

schedule is probably okay, we could clear pending there. But you
possibly could get interrupts, or some lock free mmios that set the
flag. Does it matter that much? A random cache miss could have the
same effect.

It may matter slightly less for powerpc because we don't inline
spin locks, although I have been hoping to for a while, this might
put the nail in that.

We can always tinker with it later though so I won't insist.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-05  0:21               ` Nicholas Piggin
@ 2019-03-05  0:33                 ` Linus Torvalds
  0 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2019-03-05  0:33 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, linux-arch,
	Linux List Kernel Mailing, Maciej W. Rozycki, Ingo Molnar,
	Michael Ellerman, Palmer Dabbelt, Paul Burton, Paul E. McKenney,
	Peter Zijlstra, Alan Stern, Tony Luck, Will Deacon,
	Yoshinori Sato

On Mon, Mar 4, 2019 at 4:21 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Well you don't have to talk about it but why do you want me to stop?
> I don't understand. It's an open topic still after this series. I
> can post a new thread about it if that would upset you less, I just
> thought it would kind of fit here because we're talking about mmiowb,
> I'm not trying to derail this series.

Because if anybody is doing lockless programming with IO, they deserve
whatever they get.

In other words, the whole "wmb()" issue is basically not an issue.

We already have rules like:

 - mmio is ordered wrt itself

 - mmio is ordered wrt previous memory ops (because of dma)

and while it turned out that at least alpha had broken those rules at
some point, and we had a discussion about it, that was just a bug.

So there's basically no real reason to ever use "wmb()" with any of
the normal mmio stuff.

Now, we do have __raw_writel() etc, which are explicitly not ordered,
but they also haven't been really standardized. And in fact, people
who use them often seem to want to use them together with various weak
memory remappings.

And yes, "wmb()" has been the traditional way to order those, to the
point where "wmb()" on x86 is actually a "sfence" because once you do
IO on those kinds of unordered mappings, the usual SMP rules go out
the window (a normal "smp_wmb()" is just a compiler barrier on x86).

But notice how this is *entirely* independent of the spinlock issue,
and has absolutely *nothing* to do with the whole mmiowb() thing that
was always about "normal IO vs normal RAM" (due to the ia64 breakage).

                 Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-05  0:19       ` Linus Torvalds
@ 2019-03-07  0:47         ` Michael Ellerman
  2019-03-07  1:13           ` Linus Torvalds
  2019-03-07  9:13           ` Peter Zijlstra
  0 siblings, 2 replies; 43+ messages in thread
From: Michael Ellerman @ 2019-03-07  0:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nicholas Piggin, linux-arch, Will Deacon, Andrea Parri,
	Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, Linux List Kernel Mailing,
	Maciej W. Rozycki, Ingo Molnar, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Yoshinori Sato

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Mon, Mar 4, 2019 at 2:24 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Without wading into the rest of the discussion, this does raise an
>> interesting point, ie. what about eg. rwlock's?
>>
>> They're basically equivalent to spinlocks, and so could reasonably be
>> expected to have the same behaviour.
>>
>> But we don't check the io_sync flag in arch_read/write_unlock() etc. and
>> both of those use lwsync.
>
> I think technically rwlocks should do the same thing, at least when
> they are used for exclusion.

OK.

> Because of the exclusion argument, we can presubably limit it to just
> write_unlock(), although at least in theory I guess you could have
> some "one reader does IO, then a writer comes in" situation..

It's a bit hard to grep for, but I did find one case:

static void netxen_nic_io_write_128M(struct netxen_adapter *adapter,
                void __iomem *addr, u32 data)
{
        read_lock(&adapter->ahw.crb_lock);
        writel(data, addr);
        read_unlock(&adapter->ahw.crb_lock);
}

It looks like that driver is using the rwlock to exclude cases that can
just do a readl()/writel() (readers) vs another case that has to reconfigure a
window or something, before doing readl()/writel() and then configuring
the window back. So that seems like a valid use for a rwlock.

Whether we want to penalise all read_unlock() usages with a mmiowb()
check just to support that one driver is another question.

> Perhaps more importantly, what about sleeping locks? When they
> actually *block*, they get the barrier thanks to the scheduler, but
> you can have a nice non-contended sequence that never does that.

Yeah.

The mutex unlock fast path is just:

	if (atomic_long_cmpxchg_release(&lock->owner, curr, 0UL) == curr)
		return true;

And because it's the "release" variant we just use lwsync, which doesn't
order MMIO. If it was just atomic_long_cmpxchg() that would work because
we use sync for those.

__up_write() uses atomic_long_sub_return_release(), so same story.

> I guess the fact that these cases have never even shown up as an issue
> means that we could just continue to ignore it.
>
> We could even give that approach some fancy name, and claim it as a
> revolutionary new programming paradigm ("ostrich programming" to go
> with "agile" and "pair programming").

Maybe. On power we have the double whammy of weaker ordering than
other arches and infinitesimal market share, which makes me worry that
there are bugs lurking that we just haven't found, it's happened before.

cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-07  0:47         ` Michael Ellerman
@ 2019-03-07  1:13           ` Linus Torvalds
  2019-03-07  9:13           ` Peter Zijlstra
  1 sibling, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2019-03-07  1:13 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nicholas Piggin, linux-arch, Will Deacon, Andrea Parri,
	Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, Linux List Kernel Mailing,
	Maciej W. Rozycki, Ingo Molnar, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Yoshinori Sato

On Wed, Mar 6, 2019 at 4:48 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> It's a bit hard to grep for, but I did find one case:
>
> static void netxen_nic_io_write_128M(struct netxen_adapter *adapter,
>                 void __iomem *addr, u32 data)
> {
>         read_lock(&adapter->ahw.crb_lock);
>         writel(data, addr);
>         read_unlock(&adapter->ahw.crb_lock);
> }
>
> It looks like that driver is using the rwlock to exclude cases that can
> just do a readl()/writel() (readers) vs another case that has to reconfigure a
> window or something, before doing readl()/writel() and then configuring
> the window back. So that seems like a valid use for a rwlock.

Oh, it's actually fairly sane: the IO itself is apparently windowed on
that hardware, and the *common* case is that you'd access "window 1".

So if everybody accesses window 1, they can all work in parallel - the
read case.

But if somebody needs to access any of the other special IO windows,
they need to take the write lock, then change the window pointer to
the window they want to access, do the access, and then set it back to
the default "window 1".

So yes. That driver very much relies on exclusion of the IO through an rwlock.

I'm guessing nobody uses that hardware on Power? Or maybe the "window
1 is common" is *so* common that the other cases basically never
happen and don't really end up ever causing problems?

[ Time passes, I look at it ]

Actually, the driver probably works on Power, because *setting* the
window isn't just a write to the window register, it's always
serialized by a read _from_ the window register to verify that the
write "took". Apparently the hardware itself really needs that "don't
do accesses to the window before I've settled".

And that would basically serialize the only operation that really
needs serialization, so in the case of _that_ driver, it all looks
safe. Even if it's partly by luck.

> > Perhaps more importantly, what about sleeping locks? When they
> > actually *block*, they get the barrier thanks to the scheduler, but
> > you can have a nice non-contended sequence that never does that.
>
> Yeah.
>
> The mutex unlock fast path is just:

Yup. Both lock/unlock have fast paths that should be just trivial
atomic sequences.

But the good news is that *usually* device IO is protected by a
spinlock, since you almost always have interrupt serialization needs
too whenever you have any sequence of MMIO that isn't just some "write
single word to start the engine".

So the "use mutex to serialize IO" may be fairly unusual.

                  Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-03-07  0:47         ` Michael Ellerman
  2019-03-07  1:13           ` Linus Torvalds
@ 2019-03-07  9:13           ` Peter Zijlstra
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2019-03-07  9:13 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Linus Torvalds, Nicholas Piggin, linux-arch, Will Deacon,
	Andrea Parri, Arnd Bergmann, Benjamin Herrenschmidt, Rich Felker,
	David Howells, Daniel Lustig, Linux List Kernel Mailing,
	Maciej W. Rozycki, Ingo Molnar, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Alan Stern, Tony Luck, Yoshinori Sato

On Thu, Mar 07, 2019 at 11:47:53AM +1100, Michael Ellerman wrote:
> The mutex unlock fast path is just:
> 
> 	if (atomic_long_cmpxchg_release(&lock->owner, curr, 0UL) == curr)
> 		return true;
> 
> And because it's the "release" variant we just use lwsync, which doesn't
> order MMIO. If it was just atomic_long_cmpxchg() that would work because
> we use sync for those.
> 
> __up_write() uses atomic_long_sub_return_release(), so same story.

As does spin_unlock() of course, which is a great segway into...

  my RCsc desires :-)

If all your unlocks were to have SYNC, your locks would, aside from
ordering MMIO, also be RCsc, Win-Win :-)

There is, of course, that pesky little performance detail that keeps
getting in the way.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2019-03-07  9:14 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-01 14:03 [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
2019-03-01 14:03 ` [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
2019-03-03  1:43   ` Nicholas Piggin
2019-03-03  2:18     ` Linus Torvalds
2019-03-03  3:34       ` Nicholas Piggin
     [not found]         ` <CAHk-=whVN58nWh29jvXx+X-Yx9dCC6BeAZOtKak+d01y_UVg=A@mail.gmail.com>
2019-03-03 10:05           ` Nicholas Piggin
2019-03-03 18:48             ` Linus Torvalds
2019-03-05  0:21               ` Nicholas Piggin
2019-03-05  0:33                 ` Linus Torvalds
2019-03-03  9:26     ` Michael Ellerman
2019-03-03 10:07       ` Nicholas Piggin
2019-03-04  1:01         ` Michael Ellerman
2019-03-05  0:21           ` Nicholas Piggin
2019-03-04 10:24     ` Michael Ellerman
2019-03-05  0:19       ` Linus Torvalds
2019-03-07  0:47         ` Michael Ellerman
2019-03-07  1:13           ` Linus Torvalds
2019-03-07  9:13           ` Peter Zijlstra
2019-03-01 14:03 ` [PATCH 02/20] arch: Use asm-generic header for asm/mmiowb.h Will Deacon
2019-03-01 14:03 ` [PATCH 03/20] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors Will Deacon
2019-03-03  1:47   ` Nicholas Piggin
2019-03-01 14:03 ` [PATCH 04/20] ARM/io: Remove useless definition of mmiowb() Will Deacon
2019-03-01 14:03 ` [PATCH 05/20] arm64/io: " Will Deacon
2019-03-01 14:03 ` [PATCH 06/20] x86/io: " Will Deacon
2019-03-01 14:03 ` [PATCH 07/20] nds32/io: " Will Deacon
2019-03-01 14:03 ` [PATCH 08/20] m68k/io: " Will Deacon
2019-03-01 14:03 ` [PATCH 09/20] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() Will Deacon
2019-03-01 14:03 ` [PATCH 10/20] mips/mmiowb: " Will Deacon
2019-03-01 22:16   ` Paul Burton
2019-03-01 14:03 ` [PATCH 11/20] ia64/mmiowb: " Will Deacon
2019-03-01 14:03 ` [PATCH 12/20] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code Will Deacon
2019-03-02 12:46   ` Michael Ellerman
2019-03-01 14:03 ` [PATCH 13/20] riscv/mmiowb: " Will Deacon
2019-03-01 21:13   ` Palmer Dabbelt
2019-03-01 14:03 ` [PATCH 14/20] Documentation: Kill all references to mmiowb() Will Deacon
2019-03-01 14:03 ` [PATCH 15/20] drivers: Remove useless trailing comments from mmiowb() invocations Will Deacon
2019-03-01 14:03 ` [PATCH 16/20] drivers: Remove explicit invocations of mmiowb() Will Deacon
2019-03-01 14:03 ` [PATCH 17/20] scsi/qla1280: Remove stale comment about mmiowb() Will Deacon
2019-03-01 14:03 ` [PATCH 18/20] i40iw: Redefine i40iw_mmiowb() to do nothing Will Deacon
2019-03-01 14:03 ` [PATCH 19/20] net/ethernet/silan/sc92031: Remove stale comment about mmiowb() Will Deacon
2019-03-01 14:03 ` [PATCH 20/20] arch: Remove dummy mmiowb() definitions from arch code Will Deacon
2019-03-01 16:41 ` [PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
2019-03-02 12:56   ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).