linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
@ 2019-04-05 13:59 Will Deacon
  2019-04-05 13:59 ` [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Will Deacon
                   ` (21 more replies)
  0 siblings, 22 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

Hi everybody,

This is version two of the patches I previously posted here:

  RFC: https://lwn.net/ml/linux-kernel/20190222185026.10973-1-will.deacon@arm.com/
  v1: https://lkml.kernel.org/r/20190301140348.25175-1-will.deacon@arm.com

I would really appreciate review comments and/or Acks on the first patch, since
it was that change which triggered the rest of the series and, without an ack,
it's holding the rest of the patches up.

Changes since v1 include:

  * Move mmiowb_spin_{lock,unlock}() calls into the critical section
  * Included memory-barriers.txt patch on which this lot depends
  * Added acks
  * Based on v5.1-rc3

I've also pushed this series out here:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/mmiowb

and I would like to get it into -next once the first patch has been acked.

Cheers,

Will

Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>

--->8

Will Deacon (21):
  docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  arch: Use asm-generic header for asm/mmiowb.h
  mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
  ARM/io: Remove useless definition of mmiowb()
  arm64/io: Remove useless definition of mmiowb()
  x86/io: Remove useless definition of mmiowb()
  nds32/io: Remove useless definition of mmiowb()
  m68k/io: Remove useless definition of mmiowb()
  sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
  riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
  Documentation: Kill all references to mmiowb()
  drivers: Remove useless trailing comments from mmiowb() invocations
  drivers: Remove explicit invocations of mmiowb()
  scsi/qla1280: Remove stale comment about mmiowb()
  i40iw: Redefine i40iw_mmiowb() to do nothing
  net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
  arch: Remove dummy mmiowb() definitions from arch code

 Documentation/driver-api/device-io.rst             |  45 -----
 Documentation/driver-api/pci/p2pdma.rst            |   4 -
 Documentation/memory-barriers.txt                  | 212 +++++++--------------
 arch/alpha/include/asm/Kbuild                      |   1 +
 arch/alpha/include/asm/io.h                        |   2 -
 arch/arc/include/asm/Kbuild                        |   1 +
 arch/arm/include/asm/Kbuild                        |   1 +
 arch/arm/include/asm/io.h                          |   2 -
 arch/arm64/include/asm/Kbuild                      |   1 +
 arch/arm64/include/asm/io.h                        |   2 -
 arch/c6x/include/asm/Kbuild                        |   1 +
 arch/csky/include/asm/Kbuild                       |   1 +
 arch/h8300/include/asm/Kbuild                      |   1 +
 arch/hexagon/include/asm/Kbuild                    |   1 +
 arch/hexagon/include/asm/io.h                      |   2 -
 arch/ia64/include/asm/io.h                         |  17 --
 arch/ia64/include/asm/mmiowb.h                     |  25 +++
 arch/ia64/include/asm/spinlock.h                   |   2 +
 arch/m68k/include/asm/Kbuild                       |   1 +
 arch/m68k/include/asm/io_mm.h                      |   2 -
 arch/microblaze/include/asm/Kbuild                 |   1 +
 arch/mips/include/asm/io.h                         |   3 -
 arch/mips/include/asm/mmiowb.h                     |  11 ++
 arch/mips/include/asm/spinlock.h                   |  15 ++
 arch/nds32/include/asm/Kbuild                      |   1 +
 arch/nds32/include/asm/io.h                        |   2 -
 arch/nios2/include/asm/Kbuild                      |   1 +
 arch/openrisc/include/asm/Kbuild                   |   1 +
 arch/parisc/include/asm/Kbuild                     |   1 +
 arch/parisc/include/asm/io.h                       |   2 -
 arch/powerpc/Kconfig                               |   1 +
 arch/powerpc/include/asm/io.h                      |  33 +---
 arch/powerpc/include/asm/mmiowb.h                  |  18 ++
 arch/powerpc/include/asm/paca.h                    |   6 +-
 arch/powerpc/include/asm/spinlock.h                |  17 --
 arch/powerpc/xmon/xmon.c                           |   5 +-
 arch/riscv/Kconfig                                 |   1 +
 arch/riscv/include/asm/io.h                        |  15 +-
 arch/riscv/include/asm/mmiowb.h                    |  14 ++
 arch/s390/include/asm/Kbuild                       |   1 +
 arch/sh/include/asm/io.h                           |   3 -
 arch/sh/include/asm/mmiowb.h                       |  12 ++
 arch/sh/include/asm/spinlock-llsc.h                |   2 +
 arch/sparc/include/asm/Kbuild                      |   1 +
 arch/sparc/include/asm/io_64.h                     |   2 -
 arch/um/include/asm/Kbuild                         |   1 +
 arch/unicore32/include/asm/Kbuild                  |   1 +
 arch/x86/include/asm/Kbuild                        |   1 +
 arch/x86/include/asm/io.h                          |   2 -
 arch/xtensa/include/asm/Kbuild                     |   1 +
 drivers/crypto/cavium/nitrox/nitrox_reqmgr.c       |   4 -
 drivers/dma/txx9dmac.c                             |   3 -
 drivers/firewire/ohci.c                            |   1 -
 drivers/gpu/drm/i915/intel_hdmi.c                  |  10 -
 drivers/ide/tx4939ide.c                            |   2 -
 drivers/infiniband/hw/hfi1/chip.c                  |   3 -
 drivers/infiniband/hw/hfi1/pio.c                   |   1 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c         |   2 -
 drivers/infiniband/hw/i40iw/i40iw_osdep.h          |   2 +-
 drivers/infiniband/hw/mlx4/qp.c                    |   6 -
 drivers/infiniband/hw/mlx5/qp.c                    |   1 -
 drivers/infiniband/hw/mthca/mthca_cmd.c            |   6 -
 drivers/infiniband/hw/mthca/mthca_cq.c             |   5 -
 drivers/infiniband/hw/mthca/mthca_qp.c             |  17 --
 drivers/infiniband/hw/mthca/mthca_srq.c            |   6 -
 drivers/infiniband/hw/qedr/verbs.c                 |  12 --
 drivers/infiniband/hw/qib/qib_iba6120.c            |   4 -
 drivers/infiniband/hw/qib/qib_iba7220.c            |   3 -
 drivers/infiniband/hw/qib/qib_iba7322.c            |   3 -
 drivers/infiniband/hw/qib/qib_sd7220.c             |   4 -
 drivers/media/pci/dt3155/dt3155.c                  |   8 -
 drivers/memstick/host/jmb38x_ms.c                  |   4 -
 drivers/misc/ioc4.c                                |   2 -
 drivers/misc/mei/hw-me.c                           |   3 -
 drivers/misc/tifm_7xx1.c                           |   1 -
 drivers/mmc/host/alcor.c                           |   1 -
 drivers/mmc/host/sdhci.c                           |  13 --
 drivers/mmc/host/tifm_sd.c                         |   3 -
 drivers/mmc/host/via-sdmmc.c                       |  10 -
 drivers/mtd/nand/raw/r852.c                        |   2 -
 drivers/mtd/nand/raw/txx9ndfmc.c                   |   1 -
 drivers/net/ethernet/aeroflex/greth.c              |   1 -
 drivers/net/ethernet/alacritech/slicoss.c          |   4 -
 drivers/net/ethernet/amazon/ena/ena_com.c          |   1 -
 drivers/net/ethernet/atheros/atlx/atl1.c           |   1 -
 drivers/net/ethernet/atheros/atlx/atl2.c           |   1 -
 drivers/net/ethernet/broadcom/bnx2.c               |   4 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c    |   2 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h    |   4 -
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    |   1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |  29 ---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c     |   1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c  |   2 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c   |   4 -
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   3 -
 drivers/net/ethernet/broadcom/tg3.c                |   6 -
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   |  10 -
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   1 -
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c |   4 -
 .../net/ethernet/cavium/liquidio/request_manager.c |   1 -
 drivers/net/ethernet/intel/e1000/e1000_main.c      |   5 -
 drivers/net/ethernet/intel/e1000e/netdev.c         |   7 -
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c       |   2 -
 drivers/net/ethernet/intel/fm10k/fm10k_main.c      |   5 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   5 -
 drivers/net/ethernet/intel/iavf/iavf_txrx.c        |   5 -
 drivers/net/ethernet/intel/ice/ice_txrx.c          |   5 -
 drivers/net/ethernet/intel/igb/igb_main.c          |   5 -
 drivers/net/ethernet/intel/igbvf/netdev.c          |   4 -
 drivers/net/ethernet/intel/igc/igc_main.c          |   5 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   5 -
 drivers/net/ethernet/marvell/sky2.c                |   4 -
 drivers/net/ethernet/mellanox/mlx4/catas.c         |   4 -
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |  13 --
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      |   1 -
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c   |   2 -
 drivers/net/ethernet/neterion/s2io.c               |   2 -
 drivers/net/ethernet/neterion/vxge/vxge-main.c     |   5 -
 drivers/net/ethernet/neterion/vxge/vxge-traffic.c  |   4 -
 drivers/net/ethernet/qlogic/qed/qed_int.c          |  13 --
 drivers/net/ethernet/qlogic/qed/qed_spq.c          |   3 -
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c    |   8 -
 drivers/net/ethernet/qlogic/qede/qede_fp.c         |   8 -
 drivers/net/ethernet/qlogic/qla3xxx.c              |   1 -
 drivers/net/ethernet/qlogic/qlge/qlge.h            |   1 -
 drivers/net/ethernet/qlogic/qlge/qlge_main.c       |   1 -
 drivers/net/ethernet/renesas/ravb_main.c           |   9 -
 drivers/net/ethernet/renesas/ravb_ptp.c            |   3 -
 drivers/net/ethernet/renesas/sh_eth.c              |   1 -
 drivers/net/ethernet/sfc/falcon/io.h               |   2 -
 drivers/net/ethernet/sfc/io.h                      |   2 -
 drivers/net/ethernet/silan/sc92031.c               |  15 --
 drivers/net/ethernet/via/via-rhine.c               |   3 -
 drivers/net/ethernet/wiznet/w5100.c                |   6 -
 drivers/net/ethernet/wiznet/w5300.c                |  15 --
 drivers/net/wireless/ath/ath5k/base.c              |   4 -
 drivers/net/wireless/ath/ath5k/mac80211-ops.c      |   2 -
 drivers/net/wireless/broadcom/b43/main.c           |   7 -
 drivers/net/wireless/broadcom/b43/sysfs.c          |   1 -
 drivers/net/wireless/broadcom/b43legacy/ilt.c      |   2 -
 drivers/net/wireless/broadcom/b43legacy/main.c     |  20 --
 drivers/net/wireless/broadcom/b43legacy/phy.c      |   1 -
 drivers/net/wireless/broadcom/b43legacy/pio.h      |   1 -
 drivers/net/wireless/broadcom/b43legacy/radio.c    |   4 -
 drivers/net/wireless/broadcom/b43legacy/sysfs.c    |   1 -
 drivers/net/wireless/intel/iwlegacy/common.h       |   7 -
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c    |   1 -
 drivers/ntb/hw/idt/ntb_hw_idt.c                    |   7 -
 drivers/ntb/test/ntb_perf.c                        |   3 -
 drivers/scsi/bfa/bfa.h                             |   3 +-
 drivers/scsi/bfa/bfa_hw_cb.c                       |   2 -
 drivers/scsi/bfa/bfa_hw_ct.c                       |   2 -
 drivers/scsi/bnx2fc/bnx2fc_hwi.c                   |   2 -
 drivers/scsi/bnx2i/bnx2i_hwi.c                     |   3 -
 drivers/scsi/megaraid/megaraid_sas_base.c          |   1 -
 drivers/scsi/megaraid/megaraid_sas_fusion.c        |   1 -
 drivers/scsi/mpt3sas/mpt3sas_base.c                |   1 -
 drivers/scsi/qedf/qedf_io.c                        |   1 -
 drivers/scsi/qedi/qedi_fw.c                        |   1 -
 drivers/scsi/qla1280.c                             |  15 --
 drivers/ssb/pci.c                                  |   1 -
 drivers/ssb/pcmcia.c                               |   4 -
 drivers/staging/comedi/drivers/mite.c              |   3 -
 drivers/staging/comedi/drivers/ni_660x.c           |   2 -
 drivers/staging/comedi/drivers/ni_mio_common.c     |   1 -
 drivers/staging/comedi/drivers/ni_pcidio.c         |   2 -
 drivers/staging/comedi/drivers/ni_tio.c            |   1 -
 drivers/staging/comedi/drivers/s626.c              |   2 -
 drivers/tty/serial/men_z135_uart.c                 |   1 -
 drivers/tty/serial/serial_txx9.c                   |   1 -
 drivers/usb/early/xhci-dbc.c                       |   4 -
 drivers/usb/host/xhci-dbgcap.c                     |   2 -
 include/asm-generic/io.h                           |   7 +-
 include/asm-generic/mmiowb.h                       |  63 ++++++
 include/asm-generic/mmiowb_types.h                 |  12 ++
 include/linux/qed/qed_if.h                         |   2 -
 include/linux/spinlock.h                           |  11 +-
 kernel/Kconfig.locks                               |   7 +
 kernel/locking/spinlock.c                          |   7 +
 kernel/locking/spinlock_debug.c                    |   6 +-
 sound/soc/txx9/txx9aclc-ac97.c                     |   1 -
 181 files changed, 314 insertions(+), 820 deletions(-)
 create mode 100644 arch/ia64/include/asm/mmiowb.h
 create mode 100644 arch/mips/include/asm/mmiowb.h
 create mode 100644 arch/powerpc/include/asm/mmiowb.h
 create mode 100644 arch/riscv/include/asm/mmiowb.h
 create mode 100644 arch/sh/include/asm/mmiowb.h
 create mode 100644 include/asm-generic/mmiowb.h
 create mode 100644 include/asm-generic/mmiowb_types.h

-- 
2.11.0


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-10 10:58   ` Ingo Molnar
  2019-04-11 22:12   ` Benjamin Herrenschmidt
  2019-04-05 13:59 ` [PATCH v2 02/21] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
                   ` (20 subsequent siblings)
  21 siblings, 2 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

The "KERNEL I/O BARRIER EFFECTS" section of memory-barriers.txt is vague,
x86-centric, out-of-date, incomplete and demonstrably incorrect in places.
This is largely because I/O ordering is a horrible can of worms, but also
because the document has stagnated as our understanding has evolved.

Attempt to address some of that, by rewriting the section based on
recent(-ish) discussions with Arnd, BenH and others. Maybe one day we'll
find a way to formalise this stuff, but for now let's at least try to
make the English easier to understand.

Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/memory-barriers.txt | 115 +++++++++++++++++++++++---------------
 1 file changed, 70 insertions(+), 45 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1c22b21ae922..5eb6f4c6a133 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2599,72 +2599,97 @@ likely, then interrupt-disabling locks should be used to guarantee ordering.
 KERNEL I/O BARRIER EFFECTS
 ==========================
 
-When accessing I/O memory, drivers should use the appropriate accessor
-functions:
+Interfacing with peripherals via I/O accesses is deeply architecture and device
+specific. Therefore, drivers which are inherently non-portable may rely on
+specific behaviours of their target systems in order to achieve synchronization
+in the most lightweight manner possible. For drivers intending to be portable
+between multiple architectures and bus implementations, the kernel offers a
+series of accessor functions that provide various degrees of ordering
+guarantees:
 
- (*) inX(), outX():
+ (*) readX(), writeX():
 
-     These are intended to talk to I/O space rather than memory space, but
-     that's primarily a CPU-specific concept.  The i386 and x86_64 processors
-     do indeed have special I/O space access cycles and instructions, but many
-     CPUs don't have such a concept.
+     The readX() and writeX() MMIO accessors take a pointer to the peripheral
+     being accessed as an __iomem * parameter. For pointers mapped with the
+     default I/O attributes (e.g. those returned by ioremap()), then the
+     ordering guarantees are as follows:
 
-     The PCI bus, amongst others, defines an I/O space concept which - on such
-     CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O
-     space.  However, it may also be mapped as a virtual I/O space in the CPU's
-     memory map, particularly on those CPUs that don't support alternate I/O
-     spaces.
+     1. All readX() and writeX() accesses to the same peripheral are ordered
+        with respect to each other. For example, this ensures that MMIO register
+	writes by the CPU to a particular device will arrive in program order.
 
-     Accesses to this space may be fully synchronous (as on i386), but
-     intermediary bridges (such as the PCI host bridge) may not fully honour
-     that.
+     2. A writeX() by the CPU to the peripheral will first wait for the
+        completion of all prior CPU writes to memory. For example, this ensures
+        that writes by the CPU to an outbound DMA buffer allocated by
+        dma_alloc_coherent() will be visible to a DMA engine when the CPU writes
+        to its MMIO control register to trigger the transfer.
 
-     They are guaranteed to be fully ordered with respect to each other.
+     3. A readX() by the CPU from the peripheral will complete before any
+	subsequent CPU reads from memory can begin. For example, this ensures
+	that reads by the CPU from an incoming DMA buffer allocated by
+	dma_alloc_coherent() will not see stale data after reading from the DMA
+	engine's MMIO status register to establish that the DMA transfer has
+	completed.
 
-     They are not guaranteed to be fully ordered with respect to other types of
-     memory and I/O operation.
+     4. A readX() by the CPU from the peripheral will complete before any
+	subsequent delay() loop can begin execution. For example, this ensures
+	that two MMIO register writes by the CPU to a peripheral will arrive at
+	least 1us apart if the first write is immediately read back with readX()
+	and udelay(1) is called prior to the second writeX().
 
- (*) readX(), writeX():
+     __iomem pointers obtained with non-default attributes (e.g. those returned
+     by ioremap_wc()) are unlikely to provide many of these guarantees.
 
-     Whether these are guaranteed to be fully ordered and uncombined with
-     respect to each other on the issuing CPU depends on the characteristics
-     defined for the memory window through which they're accessing.  On later
-     i386 architecture machines, for example, this is controlled by way of the
-     MTRR registers.
+ (*) readX_relaxed(), writeX_relaxed():
 
-     Ordinarily, these will be guaranteed to be fully ordered and uncombined,
-     provided they're not accessing a prefetchable device.
+     These are similar to readX() and writeX(), but provide weaker memory
+     ordering guarantees. Specifically, they do not guarantee ordering with
+     respect to normal memory accesses or delay() loops (i.e bullets 2-4 above)
+     but they are still guaranteed to be ordered with respect to other accesses
+     to the same peripheral when operating on __iomem pointers mapped with the
+     default I/O attributes.
 
-     However, intermediary hardware (such as a PCI bridge) may indulge in
-     deferral if it so wishes; to flush a store, a load from the same location
-     is preferred[*], but a load from the same device or from configuration
-     space should suffice for PCI.
+ (*) readsX(), writesX():
 
-     [*] NOTE! attempting to load from the same location as was written to may
-	 cause a malfunction - consider the 16550 Rx/Tx serial registers for
-	 example.
+     The readsX() and writesX() MMIO accessors are designed for accessing
+     register-based, memory-mapped FIFOs residing on peripherals that are not
+     capable of performing DMA. Consequently, they provide only the ordering
+     guarantees of readX_relaxed() and writeX_relaxed(), as documented above.
 
-     Used with prefetchable I/O memory, an mmiowb() barrier may be required to
-     force stores to be ordered.
+ (*) inX(), outX():
 
-     Please refer to the PCI specification for more information on interactions
-     between PCI transactions.
+     The inX() and outX() accessors are intended to access legacy port-mapped
+     I/O peripherals, which may require special instructions on some
+     architectures (notably x86). The port number of the peripheral being
+     accessed is passed as an argument.
 
- (*) readX_relaxed(), writeX_relaxed()
+     Since many CPU architectures ultimately access these peripherals via an
+     internal virtual memory mapping, the portable ordering guarantees provided
+     by inX() and outX() are the same as those provided by readX() and writeX()
+     respectively when accessing a mapping with the default I/O attributes.
 
-     These are similar to readX() and writeX(), but provide weaker memory
-     ordering guarantees.  Specifically, they do not guarantee ordering with
-     respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
-     ordering with respect to LOCK or UNLOCK operations.  If the latter is
-     required, an mmiowb() barrier can be used.  Note that relaxed accesses to
-     the same peripheral are guaranteed to be ordered with respect to each
-     other.
+     Device drivers may expect outX() to emit a non-posted write transaction
+     that waits for a completion response from the I/O peripheral before
+     returning. This is not guaranteed by all architectures and is therefore
+     not part of the portable ordering semantics.
+
+ (*) insX(), outsX():
+
+     As above, the insX() and outsX() accessors provide the same ordering
+     guarantees as readsX() and writesX() respectively when accessing a mapping
+     with the default I/O attributes.
 
  (*) ioreadX(), iowriteX()
 
      These will perform appropriately for the type of access they're actually
      doing, be it inX()/outX() or readX()/writeX().
 
+All of these accessors assume that the underlying peripheral is little-endian,
+and will therefore perform byte-swapping operations on big-endian architectures.
+
+Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK
+operations is a dangerous sport which may require the use of mmiowb(). See the
+subsection "Acquires vs I/O accesses" for more information.
 
 ========================================
 ASSUMED MINIMUM EXECUTION ORDERING MODEL
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 02/21] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
  2019-04-05 13:59 ` [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 03/21] arch: Use asm-generic header for asm/mmiowb.h Will Deacon
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

In preparation for removing all explicit mmiowb() calls from driver
code, implement a tracking system in asm-generic based loosely on the
PowerPC implementation. This allows architectures with a non-empty
mmiowb() definition to have the barrier automatically inserted in
spin_unlock() following a critical section containing an I/O write.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/mmiowb.h       | 63 ++++++++++++++++++++++++++++++++++++++
 include/asm-generic/mmiowb_types.h | 12 ++++++++
 kernel/Kconfig.locks               |  7 +++++
 kernel/locking/spinlock.c          |  7 +++++
 4 files changed, 89 insertions(+)
 create mode 100644 include/asm-generic/mmiowb.h
 create mode 100644 include/asm-generic/mmiowb_types.h

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
new file mode 100644
index 000000000000..9439ff037b2d
--- /dev/null
+++ b/include/asm-generic/mmiowb.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_GENERIC_MMIOWB_H
+#define __ASM_GENERIC_MMIOWB_H
+
+/*
+ * Generic implementation of mmiowb() tracking for spinlocks.
+ *
+ * If your architecture doesn't ensure that writes to an I/O peripheral
+ * within two spinlocked sections on two different CPUs are seen by the
+ * peripheral in the order corresponding to the lock handover, then you
+ * need to follow these FIVE easy steps:
+ *
+ * 	1. Implement mmiowb() (and arch_mmiowb_state() if you're fancy)
+ *	   in asm/mmiowb.h, then #include this file
+ *	2. Ensure your I/O write accessors call mmiowb_set_pending()
+ *	3. Select ARCH_HAS_MMIOWB
+ *	4. Untangle the resulting mess of header files
+ *	5. Complain to your architects
+ */
+#ifdef CONFIG_MMIOWB
+
+#include <linux/compiler.h>
+#include <asm-generic/mmiowb_types.h>
+
+#ifndef arch_mmiowb_state
+#include <asm/percpu.h>
+#include <asm/smp.h>
+
+DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
+#define __mmiowb_state()	this_cpu_ptr(&__mmiowb_state)
+#else
+#define __mmiowb_state()	arch_mmiowb_state()
+#endif	/* arch_mmiowb_state */
+
+static inline void mmiowb_set_pending(void)
+{
+	struct mmiowb_state *ms = __mmiowb_state();
+	ms->mmiowb_pending = ms->nesting_count;
+}
+
+static inline void mmiowb_spin_lock(void)
+{
+	struct mmiowb_state *ms = __mmiowb_state();
+	ms->nesting_count++;
+}
+
+static inline void mmiowb_spin_unlock(void)
+{
+	struct mmiowb_state *ms = __mmiowb_state();
+
+	if (unlikely(ms->mmiowb_pending)) {
+		ms->mmiowb_pending = 0;
+		mmiowb();
+	}
+
+	ms->nesting_count--;
+}
+#else
+#define mmiowb_set_pending()		do { } while (0)
+#define mmiowb_spin_lock()		do { } while (0)
+#define mmiowb_spin_unlock()		do { } while (0)
+#endif	/* CONFIG_MMIOWB */
+#endif	/* __ASM_GENERIC_MMIOWB_H */
diff --git a/include/asm-generic/mmiowb_types.h b/include/asm-generic/mmiowb_types.h
new file mode 100644
index 000000000000..8eb0095655e7
--- /dev/null
+++ b/include/asm-generic/mmiowb_types.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_GENERIC_MMIOWB_TYPES_H
+#define __ASM_GENERIC_MMIOWB_TYPES_H
+
+#include <linux/types.h>
+
+struct mmiowb_state {
+	u16	nesting_count;
+	u16	mmiowb_pending;
+};
+
+#endif	/* __ASM_GENERIC_MMIOWB_TYPES_H */
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index fbba478ae522..6ba2570eddad 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -251,3 +251,10 @@ config ARCH_USE_QUEUED_RWLOCKS
 config QUEUED_RWLOCKS
 	def_bool y if ARCH_USE_QUEUED_RWLOCKS
 	depends on SMP
+
+config ARCH_HAS_MMIOWB
+	bool
+
+config MMIOWB
+	def_bool y if ARCH_HAS_MMIOWB
+	depends on SMP
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 936f3d14dd6b..0ff08380f531 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -22,6 +22,13 @@
 #include <linux/debug_locks.h>
 #include <linux/export.h>
 
+#ifdef CONFIG_MMIOWB
+#ifndef arch_mmiowb_state
+DEFINE_PER_CPU(struct mmiowb_state, __mmiowb_state);
+EXPORT_PER_CPU_SYMBOL(__mmiowb_state);
+#endif
+#endif
+
 /*
  * If lockdep is enabled then we use the non-preemption spin-ops
  * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 03/21] arch: Use asm-generic header for asm/mmiowb.h
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
  2019-04-05 13:59 ` [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Will Deacon
  2019-04-05 13:59 ` [PATCH v2 02/21] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 04/21] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors Will Deacon
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin, Masahiro Yamada

Hook up asm-generic/mmiowb.h to Kbuild for all architectures so that we
can subsequently include asm/mmiowb.h from core code.

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/alpha/include/asm/Kbuild      | 1 +
 arch/arc/include/asm/Kbuild        | 1 +
 arch/arm/include/asm/Kbuild        | 1 +
 arch/arm64/include/asm/Kbuild      | 1 +
 arch/c6x/include/asm/Kbuild        | 1 +
 arch/csky/include/asm/Kbuild       | 1 +
 arch/h8300/include/asm/Kbuild      | 1 +
 arch/hexagon/include/asm/Kbuild    | 1 +
 arch/ia64/include/asm/Kbuild       | 1 +
 arch/m68k/include/asm/Kbuild       | 1 +
 arch/microblaze/include/asm/Kbuild | 1 +
 arch/mips/include/asm/Kbuild       | 1 +
 arch/nds32/include/asm/Kbuild      | 1 +
 arch/nios2/include/asm/Kbuild      | 1 +
 arch/openrisc/include/asm/Kbuild   | 1 +
 arch/parisc/include/asm/Kbuild     | 1 +
 arch/powerpc/include/asm/Kbuild    | 1 +
 arch/riscv/include/asm/Kbuild      | 1 +
 arch/s390/include/asm/Kbuild       | 1 +
 arch/sh/include/asm/Kbuild         | 1 +
 arch/sparc/include/asm/Kbuild      | 1 +
 arch/um/include/asm/Kbuild         | 1 +
 arch/unicore32/include/asm/Kbuild  | 1 +
 arch/x86/include/asm/Kbuild        | 1 +
 arch/xtensa/include/asm/Kbuild     | 1 +
 25 files changed, 25 insertions(+)

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 70b783333965..89e87bbc987f 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += irq_work.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += sections.h
 generic-y += trace_clock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index decc306a3b52..393d4f5e1450 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index a8a4eb7f6dae..a3fc0a230a68 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += preempt.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 1e17ea5c372b..3dae4fd028cf 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
index 249c9f6f26dc..6b168d32fbff 100644
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -23,6 +23,7 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += mmu.h
 generic-y += mmu_context.h
 generic-y += pci.h
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 2a0abe8f2a35..95f4e550db8a 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -28,6 +28,7 @@ generic-y += linkage.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += mutex.h
 generic-y += pci.h
diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild
index e3dead402e5f..123d8f54be4a 100644
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -29,6 +29,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += mmu.h
 generic-y += mmu_context.h
 generic-y += module.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index d046e8ccdf78..d53704d561e6 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -24,6 +24,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 11f191689c9e..cabfe0280c33 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += irq_work.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += trace_clock.h
 generic-y += vtime.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index 2c359d9e80f6..0ddae4a74adb 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -18,6 +18,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += sections.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 1a8285c3f693..17a8d0a62038 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -23,6 +23,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 87b86cdf126a..bf39c2253ec8 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,6 +12,7 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
index 64ceff7ab99b..688b6ed26227 100644
--- a/arch/nds32/include/asm/Kbuild
+++ b/arch/nds32/include/asm/Kbuild
@@ -31,6 +31,7 @@ generic-y += limits.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 88a667d12aaa..d7ef3512504a 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -27,6 +27,7 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index 22aa97136c01..1919cc5e0f11 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -24,6 +24,7 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 9bcd0c903dbb..b8c7db777144 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += seccomp.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index a0c132bedfae..74b6605ca55f 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -7,6 +7,7 @@ generic-y += export.h
 generic-y += irq_regs.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += vtime.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index cccd12cf27d4..221cd2ec78a4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -21,6 +21,7 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += mutex.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 12d77cb11fe5..bdc4f06a04c5 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -20,6 +20,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += rwsem.h
 generic-y += trace_clock.h
 generic-y += unaligned.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 7bf2cb680d32..162c9054561f 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -14,6 +14,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index a22cfd5c0ee8..468440db6657 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -15,6 +15,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += msi.h
 generic-y += preempt.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 00bcbe2326d9..b506ad06aefc 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += irq_work.h
 generic-y += kdebug.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += param.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/unicore32/include/asm/Kbuild b/arch/unicore32/include/asm/Kbuild
index d77d953c04c1..b301a0b3c0b2 100644
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@ -22,6 +22,7 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index a0ab9ab61c75..eebd05942e6c 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += early_ioremap.h
 generic-y += export.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 3843198e03d4..794e461785e1 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -20,6 +20,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 04/21] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (2 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 03/21] arch: Use asm-generic header for asm/mmiowb.h Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 05/21] ARM/io: Remove useless definition of mmiowb() Will Deacon
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

Removing explicit calls to mmiowb() from driver code means that we must
now call into the generic mmiowb_spin_{lock,unlock}() functions from the
core spinlock code. In order to elide barriers following critical
sections without any I/O writes, we also hook into the asm-generic I/O
routines.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/io.h        |  3 ++-
 include/linux/spinlock.h        | 11 ++++++++++-
 kernel/locking/spinlock_debug.c |  6 +++++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 303871651f8a..bc490a746602 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -19,6 +19,7 @@
 #include <asm-generic/iomap.h>
 #endif
 
+#include <asm/mmiowb.h>
 #include <asm-generic/pci_iomap.h>
 
 #ifndef mmiowb
@@ -49,7 +50,7 @@
 
 /* serialize device access against a spin_unlock, usually handled there. */
 #ifndef __io_aw
-#define __io_aw()      barrier()
+#define __io_aw()      mmiowb_set_pending()
 #endif
 
 #ifndef __io_pbw
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index e089157dcf97..ed7c4d6b8235 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -57,6 +57,7 @@
 #include <linux/stringify.h>
 #include <linux/bottom_half.h>
 #include <asm/barrier.h>
+#include <asm/mmiowb.h>
 
 
 /*
@@ -178,6 +179,7 @@ static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock)
 {
 	__acquire(lock);
 	arch_spin_lock(&lock->raw_lock);
+	mmiowb_spin_lock();
 }
 
 #ifndef arch_spin_lock_flags
@@ -189,15 +191,22 @@ do_raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long *flags) __acquires(lo
 {
 	__acquire(lock);
 	arch_spin_lock_flags(&lock->raw_lock, *flags);
+	mmiowb_spin_lock();
 }
 
 static inline int do_raw_spin_trylock(raw_spinlock_t *lock)
 {
-	return arch_spin_trylock(&(lock)->raw_lock);
+	int ret = arch_spin_trylock(&(lock)->raw_lock);
+
+	if (ret)
+		mmiowb_spin_lock();
+
+	return ret;
 }
 
 static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 {
+	mmiowb_spin_unlock();
 	arch_spin_unlock(&lock->raw_lock);
 	__release(lock);
 }
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 9aa0fccd5d43..399669f7eba8 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -111,6 +111,7 @@ void do_raw_spin_lock(raw_spinlock_t *lock)
 {
 	debug_spin_lock_before(lock);
 	arch_spin_lock(&lock->raw_lock);
+	mmiowb_spin_lock();
 	debug_spin_lock_after(lock);
 }
 
@@ -118,8 +119,10 @@ int do_raw_spin_trylock(raw_spinlock_t *lock)
 {
 	int ret = arch_spin_trylock(&lock->raw_lock);
 
-	if (ret)
+	if (ret) {
+		mmiowb_spin_lock();
 		debug_spin_lock_after(lock);
+	}
 #ifndef CONFIG_SMP
 	/*
 	 * Must not happen on UP:
@@ -131,6 +134,7 @@ int do_raw_spin_trylock(raw_spinlock_t *lock)
 
 void do_raw_spin_unlock(raw_spinlock_t *lock)
 {
+	mmiowb_spin_unlock();
 	debug_spin_unlock(lock);
 	arch_spin_unlock(&lock->raw_lock);
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 05/21] ARM/io: Remove useless definition of mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (3 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 04/21] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 06/21] arm64/io: " Will Deacon
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

ARM includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 6b51826ab3d1..7e22c81398c4 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -281,8 +281,6 @@ extern void _memcpy_fromio(void *, const volatile void __iomem *, size_t);
 extern void _memcpy_toio(volatile void __iomem *, const void *, size_t);
 extern void _memset_io(volatile void __iomem *, int, size_t);
 
-#define mmiowb()
-
 /*
  *  Memory access primitives
  *  ------------------------
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 06/21] arm64/io: Remove useless definition of mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (4 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 05/21] ARM/io: Remove useless definition of mmiowb() Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 07/21] x86/io: " Will Deacon
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

arm64 includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 8bb7210ac286..b807cb9b517d 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -124,8 +124,6 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
 #define __io_par(v)		__iormb(v)
 #define __iowmb()		wmb()
 
-#define mmiowb()		do { } while (0)
-
 /*
  * Relaxed I/O memory access primitives. These follow the Device memory
  * ordering rules but do not guarantee any ordering relative to Normal memory
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 07/21] x86/io: Remove useless definition of mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (5 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 06/21] arm64/io: " Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 14:14   ` Thomas Gleixner
  2019-04-05 13:59 ` [PATCH v2 08/21] nds32/io: " Will Deacon
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

x86 maps mmiowb() to barrier(), but this is superfluous because a
compiler barrier is already implied by spin_unlock(). Since x86 also
includes asm-generic/io.h in its asm/io.h file, we can remove the
definition entirely and pick up the dummy definition from core code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/x86/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 686247db3106..a06a9f8294ea 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -90,8 +90,6 @@ build_mmio_write(__writel, "l", unsigned int, "r", )
 #define __raw_writew __writew
 #define __raw_writel __writel
 
-#define mmiowb() barrier()
-
 #ifdef CONFIG_X86_64
 
 build_mmio_read(readq, "q", u64, "=r", :"memory")
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 08/21] nds32/io: Remove useless definition of mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (6 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 07/21] x86/io: " Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 09/21] m68k/io: " Will Deacon
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

mmiowb() only makes sense for SMP platforms, so we can remove it
entirely for nds32.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/nds32/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/nds32/include/asm/io.h b/arch/nds32/include/asm/io.h
index 71cd226d6863..5ef8ae5ba833 100644
--- a/arch/nds32/include/asm/io.h
+++ b/arch/nds32/include/asm/io.h
@@ -55,8 +55,6 @@ static inline u32 __raw_readl(const volatile void __iomem *addr)
 #define __iormb()               rmb()
 #define __iowmb()               wmb()
 
-#define mmiowb()        __asm__ __volatile__ ("msync all" : : : "memory");
-
 /*
  * {read,write}{b,w,l,q}_relaxed() are like the regular version, but
  * are not guaranteed to provide ordering against spinlocks or memory
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 09/21] m68k/io: Remove useless definition of mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (7 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 08/21] nds32/io: " Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 10/21] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() Will Deacon
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

m68k includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/m68k/include/asm/io_mm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/m68k/include/asm/io_mm.h b/arch/m68k/include/asm/io_mm.h
index 782b78f8a048..6c03ca5bc436 100644
--- a/arch/m68k/include/asm/io_mm.h
+++ b/arch/m68k/include/asm/io_mm.h
@@ -377,8 +377,6 @@ static inline void isa_delay(void)
 #define writesw(port, buf, nr)    raw_outsw((port), (u16 *)(buf), (nr))
 #define writesl(port, buf, nr)    raw_outsl((port), (u32 *)(buf), (nr))
 
-#define mmiowb()
-
 #ifndef CONFIG_SUN3
 #define IO_SPACE_LIMIT 0xffff
 #else
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 10/21] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (8 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 09/21] m68k/io: " Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 11/21] mips/mmiowb: " Will Deacon
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for sh. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performed inside the
critical section.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/sh/include/asm/Kbuild          |  1 -
 arch/sh/include/asm/io.h            |  3 ---
 arch/sh/include/asm/mmiowb.h        | 12 ++++++++++++
 arch/sh/include/asm/spinlock-llsc.h |  2 ++
 4 files changed, 14 insertions(+), 4 deletions(-)
 create mode 100644 arch/sh/include/asm/mmiowb.h

diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 162c9054561f..7bf2cb680d32 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -14,7 +14,6 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/sh/include/asm/io.h b/arch/sh/include/asm/io.h
index 4f7f235f15f8..c28e37a344ad 100644
--- a/arch/sh/include/asm/io.h
+++ b/arch/sh/include/asm/io.h
@@ -229,9 +229,6 @@ __BUILD_IOPORT_STRING(q, u64)
 
 #define IO_SPACE_LIMIT 0xffffffff
 
-/* synco on SH-4A, otherwise a nop */
-#define mmiowb()		wmb()
-
 /* We really want to try and get these to memcpy etc */
 void memcpy_fromio(void *, const volatile void __iomem *, unsigned long);
 void memcpy_toio(volatile void __iomem *, const void *, unsigned long);
diff --git a/arch/sh/include/asm/mmiowb.h b/arch/sh/include/asm/mmiowb.h
new file mode 100644
index 000000000000..535d59735f1d
--- /dev/null
+++ b/arch/sh/include/asm/mmiowb.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_SH_MMIOWB_H
+#define __ASM_SH_MMIOWB_H
+
+#include <asm/barrier.h>
+
+/* synco on SH-4A, otherwise a nop */
+#define mmiowb()			wmb()
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* __ASM_SH_MMIOWB_H */
diff --git a/arch/sh/include/asm/spinlock-llsc.h b/arch/sh/include/asm/spinlock-llsc.h
index 786ee0fde3b0..7fd929cd2e7a 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -47,6 +47,8 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	unsigned long tmp;
 
+	/* This could be optimised with ARCH_HAS_MMIOWB */
+	mmiowb();
 	__asm__ __volatile__ (
 		"mov		#1, %0 ! arch_spin_unlock	\n\t"
 		"mov.l		%0, @%1				\n\t"
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 11/21] mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (9 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 10/21] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 12/21] ia64/mmiowb: " Will Deacon
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for mips. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performed inside the
critical section.

Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/mips/include/asm/Kbuild     |  1 -
 arch/mips/include/asm/io.h       |  3 ---
 arch/mips/include/asm/mmiowb.h   | 11 +++++++++++
 arch/mips/include/asm/spinlock.h | 15 +++++++++++++++
 4 files changed, 26 insertions(+), 4 deletions(-)
 create mode 100644 arch/mips/include/asm/mmiowb.h

diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index bf39c2253ec8..87b86cdf126a 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 845fbbc7a2e3..29997e42480e 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -102,9 +102,6 @@ static inline void set_io_port_base(unsigned long base)
 #define iobarrier_w() wmb()
 #define iobarrier_sync() iob()
 
-/* Some callers use this older API instead.  */
-#define mmiowb() iobarrier_w()
-
 /*
  *     virt_to_phys    -       map virtual addresses to physical
  *     @address: address to remap
diff --git a/arch/mips/include/asm/mmiowb.h b/arch/mips/include/asm/mmiowb.h
new file mode 100644
index 000000000000..a40824e3ef8e
--- /dev/null
+++ b/arch/mips/include/asm/mmiowb.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_MMIOWB_H
+#define _ASM_MMIOWB_H
+
+#include <asm/io.h>
+
+#define mmiowb()	iobarrier_w()
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* _ASM_MMIOWB_H */
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index ee81297d9117..8a88eb265516 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -11,6 +11,21 @@
 
 #include <asm/processor.h>
 #include <asm/qrwlock.h>
+
+#include <asm-generic/qspinlock_types.h>
+
+#define	queued_spin_unlock queued_spin_unlock
+/**
+ * queued_spin_unlock - release a queued spinlock
+ * @lock : Pointer to queued spinlock structure
+ */
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+	/* This could be optimised with ARCH_HAS_MMIOWB */
+	mmiowb();
+	smp_store_release(&lock->locked, 0);
+}
+
 #include <asm/qspinlock.h>
 
 #endif /* _ASM_SPINLOCK_H */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 12/21] ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (10 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 11/21] mips/mmiowb: " Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 13/21] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code Will Deacon
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for ia64. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performed inside the
critical section.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/ia64/include/asm/Kbuild     |  1 -
 arch/ia64/include/asm/io.h       | 17 -----------------
 arch/ia64/include/asm/mmiowb.h   | 25 +++++++++++++++++++++++++
 arch/ia64/include/asm/spinlock.h |  2 ++
 4 files changed, 27 insertions(+), 18 deletions(-)
 create mode 100644 arch/ia64/include/asm/mmiowb.h

diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index cabfe0280c33..11f191689c9e 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -5,7 +5,6 @@ generic-y += irq_work.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += trace_clock.h
 generic-y += vtime.h
diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h
index 1e6fef69bb01..a511d62d447a 100644
--- a/arch/ia64/include/asm/io.h
+++ b/arch/ia64/include/asm/io.h
@@ -113,20 +113,6 @@ extern int valid_mmap_phys_addr_range (unsigned long pfn, size_t count);
  */
 #define __ia64_mf_a()	ia64_mfa()
 
-/**
- * ___ia64_mmiowb - I/O write barrier
- *
- * Ensure ordering of I/O space writes.  This will make sure that writes
- * following the barrier will arrive after all previous writes.  For most
- * ia64 platforms, this is a simple 'mf.a' instruction.
- *
- * See Documentation/driver-api/device-io.rst for more information.
- */
-static inline void ___ia64_mmiowb(void)
-{
-	ia64_mfa();
-}
-
 static inline void*
 __ia64_mk_io_addr (unsigned long port)
 {
@@ -161,7 +147,6 @@ __ia64_mk_io_addr (unsigned long port)
 #define __ia64_writew	___ia64_writew
 #define __ia64_writel	___ia64_writel
 #define __ia64_writeq	___ia64_writeq
-#define __ia64_mmiowb	___ia64_mmiowb
 
 /*
  * For the in/out routines, we need to do "mf.a" _after_ doing the I/O access to ensure
@@ -296,7 +281,6 @@ __outsl (unsigned long port, const void *src, unsigned long count)
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
-#define __mmiowb	platform_mmiowb
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -310,7 +294,6 @@ __outsl (unsigned long port, const void *src, unsigned long count)
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
-#define mmiowb()	__mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
diff --git a/arch/ia64/include/asm/mmiowb.h b/arch/ia64/include/asm/mmiowb.h
new file mode 100644
index 000000000000..297b85ac84a0
--- /dev/null
+++ b/arch/ia64/include/asm/mmiowb.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_IA64_MMIOWB_H
+#define _ASM_IA64_MMIOWB_H
+
+#include <asm/machvec.h>
+
+/**
+ * ___ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes.  This will make sure that writes
+ * following the barrier will arrive after all previous writes.  For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ */
+static inline void ___ia64_mmiowb(void)
+{
+	ia64_mfa();
+}
+
+#define __ia64_mmiowb	___ia64_mmiowb
+#define mmiowb()	platform_mmiowb()
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* _ASM_IA64_MMIOWB_H */
diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index afd0b3121b4c..5f620e66384e 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -73,6 +73,8 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
 	unsigned short	*p = (unsigned short *)&lock->lock + 1, tmp;
 
+	/* This could be optimised with ARCH_HAS_MMIOWB */
+	mmiowb();
 	asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
 	WRITE_ONCE(*p, (tmp + 2) & ~1);
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 13/21] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (11 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 12/21] ia64/mmiowb: " Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 14/21] riscv/mmiowb: " Will Deacon
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

In a bid to kill off explicit mmiowb() usage in driver code, hook up
the asm-generic mmiowb() tracking code but provide a definition of
arch_mmiowb_state() so that the tracking data can remain in the paca
as it does at present

This replaces the existing (flawed) implementation.

Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/powerpc/Kconfig                |  1 +
 arch/powerpc/include/asm/Kbuild     |  1 -
 arch/powerpc/include/asm/io.h       | 33 +++------------------------------
 arch/powerpc/include/asm/mmiowb.h   | 20 ++++++++++++++++++++
 arch/powerpc/include/asm/paca.h     |  6 +++++-
 arch/powerpc/include/asm/spinlock.h | 17 -----------------
 arch/powerpc/xmon/xmon.c            |  5 ++++-
 7 files changed, 33 insertions(+), 50 deletions(-)
 create mode 100644 arch/powerpc/include/asm/mmiowb.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2d0be82c3061..5e3d0853c31d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -132,6 +132,7 @@ config PPC
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_MMIOWB			if PPC64
 	select ARCH_HAS_PHYS_TO_DMA
 	select ARCH_HAS_PMEM_API                if PPC64
 	select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 74b6605ca55f..a0c132bedfae 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -7,7 +7,6 @@ generic-y += export.h
 generic-y += irq_regs.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += vtime.h
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 4b73847e9b95..1fad67b46409 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -34,14 +34,11 @@ extern struct pci_dev *isa_bridge_pcidev;
 #include <asm/byteorder.h>
 #include <asm/synch.h>
 #include <asm/delay.h>
+#include <asm/mmiowb.h>
 #include <asm/mmu.h>
 #include <asm/ppc_asm.h>
 #include <asm/pgtable.h>
 
-#ifdef CONFIG_PPC64
-#include <asm/paca.h>
-#endif
-
 #define SIO_CONFIG_RA	0x398
 #define SIO_CONFIG_RD	0x399
 
@@ -107,12 +104,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG()	do { local_paca->io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 #define DEF_MMIO_IN_X(name, size, insn)				\
 static inline u##size name(const volatile u##size __iomem *addr)	\
 {									\
@@ -127,7 +118,7 @@ static inline void name(volatile u##size __iomem *addr, u##size val)	\
 {									\
 	__asm__ __volatile__("sync;"#insn" %1,%y0"			\
 		: "=Z" (*addr) : "r" (val) : "memory");			\
-	IO_SET_SYNC_FLAG();						\
+	mmiowb_set_pending();						\
 }
 
 #define DEF_MMIO_IN_D(name, size, insn)				\
@@ -144,7 +135,7 @@ static inline void name(volatile u##size __iomem *addr, u##size val)	\
 {									\
 	__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0"			\
 		: "=m" (*addr) : "r" (val) : "memory");			\
-	IO_SET_SYNC_FLAG();						\
+	mmiowb_set_pending();						\
 }
 
 DEF_MMIO_IN_D(in_8,     8, lbz);
@@ -652,24 +643,6 @@ static inline void name at					\
 
 #include <asm-generic/iomap.h>
 
-#ifdef CONFIG_PPC32
-#define mmiowb()
-#else
-/*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
- */
-static inline void mmiowb(void)
-{
-	unsigned long tmp;
-
-	__asm__ __volatile__("sync; li %0,0; stb %0,%1(13)"
-	: "=&r" (tmp) : "i" (offsetof(struct paca_struct, io_sync))
-	: "memory");
-}
-#endif /* !CONFIG_PPC32 */
-
 static inline void iosync(void)
 {
         __asm__ __volatile__ ("sync" : : : "memory");
diff --git a/arch/powerpc/include/asm/mmiowb.h b/arch/powerpc/include/asm/mmiowb.h
new file mode 100644
index 000000000000..b10180613507
--- /dev/null
+++ b/arch/powerpc/include/asm/mmiowb.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_MMIOWB_H
+#define _ASM_POWERPC_MMIOWB_H
+
+#ifdef CONFIG_MMIOWB
+
+#include <linux/compiler.h>
+#include <asm/barrier.h>
+#include <asm/paca.h>
+
+#define arch_mmiowb_state()	(&local_paca->mmiowb_state)
+#define mmiowb()		mb()
+
+#else
+#define mmiowb()		do { } while (0)
+#endif /* CONFIG_MMIOWB */
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* _ASM_POWERPC_MMIOWB_H */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e843bc5d1a0f..134e912d403f 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -34,6 +34,8 @@
 #include <asm/cpuidle.h>
 #include <asm/atomic.h>
 
+#include <asm-generic/mmiowb_types.h>
+
 register struct paca_struct *local_paca asm("r13");
 
 #if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_SMP)
@@ -171,7 +173,6 @@ struct paca_struct {
 	u16 trap_save;			/* Used when bad stack is encountered */
 	u8 irq_soft_mask;		/* mask for irq soft masking */
 	u8 irq_happened;		/* irq happened while soft-disabled */
-	u8 io_sync;			/* writel() needs spin_unlock sync */
 	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
 	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -264,6 +265,9 @@ struct paca_struct {
 #ifdef CONFIG_STACKPROTECTOR
 	unsigned long canary;
 #endif
+#ifdef CONFIG_MMIOWB
+	struct mmiowb_state mmiowb_state;
+#endif
 } ____cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 685c72310f5d..15b39c407c4e 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -39,19 +39,6 @@
 #define LOCK_TOKEN	1
 #endif
 
-#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC	(get_paca()->io_sync = 0)
-#define SYNC_IO		do {						\
-				if (unlikely(get_paca()->io_sync)) {	\
-					mb();				\
-					get_paca()->io_sync = 0;	\
-				}					\
-			} while (0)
-#else
-#define CLEAR_IO_SYNC
-#define SYNC_IO
-#endif
-
 #ifdef CONFIG_PPC_PSERIES
 #define vcpu_is_preempted vcpu_is_preempted
 static inline bool vcpu_is_preempted(int cpu)
@@ -99,7 +86,6 @@ static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock)
 
 static inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
-	CLEAR_IO_SYNC;
 	return __arch_spin_trylock(lock) == 0;
 }
 
@@ -130,7 +116,6 @@ extern void __rw_yield(arch_rwlock_t *lock);
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
-	CLEAR_IO_SYNC;
 	while (1) {
 		if (likely(__arch_spin_trylock(lock) == 0))
 			break;
@@ -148,7 +133,6 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
 {
 	unsigned long flags_dis;
 
-	CLEAR_IO_SYNC;
 	while (1) {
 		if (likely(__arch_spin_trylock(lock) == 0))
 			break;
@@ -167,7 +151,6 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	SYNC_IO;
 	__asm__ __volatile__("# arch_spin_unlock\n\t"
 				PPC_RELEASE_BARRIER: : :"memory");
 	lock->slock = 0;
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a0f44f992360..13c6a47e6150 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2429,7 +2429,10 @@ static void dump_one_paca(int cpu)
 	DUMP(p, trap_save, "%#-*x");
 	DUMP(p, irq_soft_mask, "%#-*x");
 	DUMP(p, irq_happened, "%#-*x");
-	DUMP(p, io_sync, "%#-*x");
+#ifdef CONFIG_MMIOWB
+	DUMP(p, mmiowb_state.nesting_count, "%#-*x");
+	DUMP(p, mmiowb_state.mmiowb_pending, "%#-*x");
+#endif
 	DUMP(p, irq_work_pending, "%#-*x");
 	DUMP(p, nap_state_lost, "%#-*x");
 	DUMP(p, sprg_vdso, "%#-*llx");
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 14/21] riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (12 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 13/21] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 15/21] Documentation: Kill all references to mmiowb() Will Deacon
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

In a bid to kill off explicit mmiowb() usage in driver code, hook up
the asm-generic mmiowb() tracking code for riscv, so that an mmiowb()
is automatically issued from spin_unlock() if an I/O write was performed
in the critical section.

Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/riscv/Kconfig              |  1 +
 arch/riscv/include/asm/Kbuild   |  1 -
 arch/riscv/include/asm/io.h     | 15 ++-------------
 arch/riscv/include/asm/mmiowb.h | 14 ++++++++++++++
 4 files changed, 17 insertions(+), 14 deletions(-)
 create mode 100644 arch/riscv/include/asm/mmiowb.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index eb56c82d8aa1..6e30e8126799 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -48,6 +48,7 @@ config RISCV
 	select RISCV_TIMER
 	select GENERIC_IRQ_MULTI_HANDLER
 	select ARCH_HAS_PTE_SPECIAL
+	select ARCH_HAS_MMIOWB
 	select HAVE_EBPF_JIT if 64BIT
 
 config MMU
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 221cd2ec78a4..cccd12cf27d4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += mutex.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
index 1d9c1376dc64..744fd92e77bc 100644
--- a/arch/riscv/include/asm/io.h
+++ b/arch/riscv/include/asm/io.h
@@ -20,6 +20,7 @@
 #define _ASM_RISCV_IO_H
 
 #include <linux/types.h>
+#include <asm/mmiowb.h>
 
 extern void __iomem *ioremap(phys_addr_t offset, unsigned long size);
 
@@ -100,18 +101,6 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
 #endif
 
 /*
- * FIXME: I'm flip-flopping on whether or not we should keep this or enforce
- * the ordering with I/O on spinlocks like PowerPC does.  The worry is that
- * drivers won't get this correct, but I also don't want to introduce a fence
- * into the lock code that otherwise only uses AMOs (and is essentially defined
- * by the ISA to be correct).   For now I'm leaving this here: "o,w" is
- * sufficient to ensure that all writes to the device have completed before the
- * write to the spinlock is allowed to commit.  I surmised this from reading
- * "ACQUIRES VS I/O ACCESSES" in memory-barriers.txt.
- */
-#define mmiowb()	__asm__ __volatile__ ("fence o,w" : : : "memory");
-
-/*
  * Unordered I/O memory access primitives.  These are even more relaxed than
  * the relaxed versions, as they don't even order accesses between successive
  * operations to the I/O regions.
@@ -165,7 +154,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
 #define __io_br()	do {} while (0)
 #define __io_ar(v)	__asm__ __volatile__ ("fence i,r" : : : "memory");
 #define __io_bw()	__asm__ __volatile__ ("fence w,o" : : : "memory");
-#define __io_aw()	do {} while (0)
+#define __io_aw()	mmiowb_set_pending()
 
 #define readb(c)	({ u8  __v; __io_br(); __v = readb_cpu(c); __io_ar(__v); __v; })
 #define readw(c)	({ u16 __v; __io_br(); __v = readw_cpu(c); __io_ar(__v); __v; })
diff --git a/arch/riscv/include/asm/mmiowb.h b/arch/riscv/include/asm/mmiowb.h
new file mode 100644
index 000000000000..5d7e3a2b4e3b
--- /dev/null
+++ b/arch/riscv/include/asm/mmiowb.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_RISCV_MMIOWB_H
+#define _ASM_RISCV_MMIOWB_H
+
+/*
+ * "o,w" is sufficient to ensure that all writes to the device have completed
+ * before the write to the spinlock is allowed to commit.
+ */
+#define mmiowb()	__asm__ __volatile__ ("fence o,w" : : : "memory");
+
+#include <asm-generic/mmiowb.h>
+
+#endif	/* ASM_RISCV_MMIOWB_H */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 15/21] Documentation: Kill all references to mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (13 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 14/21] riscv/mmiowb: " Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 16/21] drivers: Remove useless trailing comments from mmiowb() invocations Will Deacon
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

The guarantees provided by mmiowb() are now provided implicitly by
spin_unlock(), so we can remove all references to this most confusing
of barriers from our Documentation.

Good riddance.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/driver-api/device-io.rst  |  45 --------------
 Documentation/driver-api/pci/p2pdma.rst |   4 --
 Documentation/memory-barriers.txt       | 103 ++------------------------------
 3 files changed, 4 insertions(+), 148 deletions(-)

diff --git a/Documentation/driver-api/device-io.rst b/Documentation/driver-api/device-io.rst
index b00b23903078..0e389378f71d 100644
--- a/Documentation/driver-api/device-io.rst
+++ b/Documentation/driver-api/device-io.rst
@@ -103,51 +103,6 @@ continuing execution::
         ha->flags.ints_enabled = 0;
     }
 
-In addition to write posting, on some large multiprocessing systems
-(e.g. SGI Challenge, Origin and Altix machines) posted writes won't be
-strongly ordered coming from different CPUs. Thus it's important to
-properly protect parts of your driver that do memory-mapped writes with
-locks and use the :c:func:`mmiowb()` to make sure they arrive in the
-order intended. Issuing a regular readX() will also ensure write ordering,
-but should only be used when the 
-driver has to be sure that the write has actually arrived at the device
-(not that it's simply ordered with respect to other writes), since a
-full readX() is a relatively expensive operation.
-
-Generally, one should use :c:func:`mmiowb()` prior to releasing a spinlock
-that protects regions using :c:func:`writeb()` or similar functions that
-aren't surrounded by readb() calls, which will ensure ordering
-and flushing. The following pseudocode illustrates what might occur if
-write ordering isn't guaranteed via :c:func:`mmiowb()` or one of the
-readX() functions::
-
-    CPU A:  spin_lock_irqsave(&dev_lock, flags)
-    CPU A:  ...
-    CPU A:  writel(newval, ring_ptr);
-    CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-            ...
-    CPU B:  spin_lock_irqsave(&dev_lock, flags)
-    CPU B:  writel(newval2, ring_ptr);
-    CPU B:  ...
-    CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-
-In the case above, newval2 could be written to ring_ptr before newval.
-Fixing it is easy though::
-
-    CPU A:  spin_lock_irqsave(&dev_lock, flags)
-    CPU A:  ...
-    CPU A:  writel(newval, ring_ptr);
-    CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
-    CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-            ...
-    CPU B:  spin_lock_irqsave(&dev_lock, flags)
-    CPU B:  writel(newval2, ring_ptr);
-    CPU B:  ...
-    CPU B:  mmiowb();
-    CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-
-See tg3.c for a real world example of how to use :c:func:`mmiowb()`
-
 PCI ordering rules also guarantee that PIO read responses arrive after any
 outstanding DMA writes from that bus, since for some devices the result of
 a readb() call may signal to the driver that a DMA transaction is
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
index 6d85b5a2598d..44deb52beeb4 100644
--- a/Documentation/driver-api/pci/p2pdma.rst
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -132,10 +132,6 @@ precludes passing these pages to userspace.
 P2P memory is also technically IO memory but should never have any side
 effects behind it. Thus, the order of loads and stores should not be important
 and ioreadX(), iowriteX() and friends should not be necessary.
-However, as the memory is not cache coherent, if access ever needs to
-be protected by a spinlock then :c:func:`mmiowb()` must be used before
-unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
-Documentation/memory-barriers.txt)
 
 
 P2P DMA Support Library
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 5eb6f4c6a133..3522f0cc772f 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1937,21 +1937,6 @@ There are some more advanced barrier functions:
      information on consistent memory.
 
 
-MMIO WRITE BARRIER
-------------------
-
-The Linux kernel also has a special barrier for use with memory-mapped I/O
-writes:
-
-	mmiowb();
-
-This is a variation on the mandatory write barrier that causes writes to weakly
-ordered I/O regions to be partially ordered.  Its effects may go beyond the
-CPU->Hardware interface and actually affect the hardware at some level.
-
-See the subsection "Acquires vs I/O accesses" for more information.
-
-
 ===============================
 IMPLICIT KERNEL MEMORY BARRIERS
 ===============================
@@ -2317,75 +2302,6 @@ But it won't see any of:
 	*E, *F or *G following RELEASE Q
 
 
-
-ACQUIRES VS I/O ACCESSES
-------------------------
-
-Under certain circumstances (especially involving NUMA), I/O accesses within
-two spinlocked sections on two different CPUs may be seen as interleaved by the
-PCI bridge, because the PCI bridge does not necessarily participate in the
-cache-coherence protocol, and is therefore incapable of issuing the required
-read memory barriers.
-
-For example:
-
-	CPU 1				CPU 2
-	===============================	===============================
-	spin_lock(Q)
-	writel(0, ADDR)
-	writel(1, DATA);
-	spin_unlock(Q);
-					spin_lock(Q);
-					writel(4, ADDR);
-					writel(5, DATA);
-					spin_unlock(Q);
-
-may be seen by the PCI bridge as follows:
-
-	STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE *DATA = 5
-
-which would probably cause the hardware to malfunction.
-
-
-What is necessary here is to intervene with an mmiowb() before dropping the
-spinlock, for example:
-
-	CPU 1				CPU 2
-	===============================	===============================
-	spin_lock(Q)
-	writel(0, ADDR)
-	writel(1, DATA);
-	mmiowb();
-	spin_unlock(Q);
-					spin_lock(Q);
-					writel(4, ADDR);
-					writel(5, DATA);
-					mmiowb();
-					spin_unlock(Q);
-
-this will ensure that the two stores issued on CPU 1 appear at the PCI bridge
-before either of the stores issued on CPU 2.
-
-
-Furthermore, following a store by a load from the same device obviates the need
-for the mmiowb(), because the load forces the store to complete before the load
-is performed:
-
-	CPU 1				CPU 2
-	===============================	===============================
-	spin_lock(Q)
-	writel(0, ADDR)
-	a = readl(DATA);
-	spin_unlock(Q);
-					spin_lock(Q);
-					writel(4, ADDR);
-					b = readl(DATA);
-					spin_unlock(Q);
-
-
-See Documentation/driver-api/device-io.rst for more information.
-
-
 =================================
 WHERE ARE MEMORY BARRIERS NEEDED?
 =================================
@@ -2532,16 +2448,9 @@ the device to malfunction.
 Inside of the Linux kernel, I/O should be done through the appropriate accessor
 routines - such as inb() or writel() - which know how to make such accesses
 appropriately sequential.  While this, for the most part, renders the explicit
-use of memory barriers unnecessary, there are a couple of situations where they
-might be needed:
-
- (1) On some systems, I/O stores are not strongly ordered across all CPUs, and
-     so for _all_ general drivers locks should be used and mmiowb() must be
-     issued prior to unlocking the critical section.
-
- (2) If the accessor functions are used to refer to an I/O memory window with
-     relaxed memory access properties, then _mandatory_ memory barriers are
-     required to enforce ordering.
+use of memory barriers unnecessary, if the accessor functions are used to refer
+to an I/O memory window with relaxed memory access properties, then _mandatory_
+memory barriers are required to enforce ordering.
 
 See Documentation/driver-api/device-io.rst for more information.
 
@@ -2586,8 +2495,7 @@ explicit barriers are used.
 
 Normally this won't be a problem because the I/O accesses done inside such
 sections will include synchronous load operations on strictly ordered I/O
-registers that form implicit I/O barriers.  If this isn't sufficient then an
-mmiowb() may need to be used explicitly.
+registers that form implicit I/O barriers.
 
 
 A similar situation may occur between an interrupt routine and two routines
@@ -2687,9 +2595,6 @@ guarantees:
 All of these accessors assume that the underlying peripheral is little-endian,
 and will therefore perform byte-swapping operations on big-endian architectures.
 
-Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK
-operations is a dangerous sport which may require the use of mmiowb(). See the
-subsection "Acquires vs I/O accesses" for more information.
 
 ========================================
 ASSUMED MINIMUM EXECUTION ORDERING MODEL
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 16/21] drivers: Remove useless trailing comments from mmiowb() invocations
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (14 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 15/21] Documentation: Kill all references to mmiowb() Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb() Will Deacon
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

In preparation for using coccinelle to remove all mmiowb() instances
from drivers, remove all trailing comments since they won't be picked up
by spatch later on and will end up being preserved in the code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/infiniband/hw/hfi1/chip.c                | 2 +-
 drivers/infiniband/hw/qedr/verbs.c               | 2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  | 2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
 drivers/scsi/bnx2i/bnx2i_hwi.c                   | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 612f04190ed8..12e67a91e578 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -8365,7 +8365,7 @@ static inline void clear_recv_intr(struct hfi1_ctxtdata *rcd)
 	struct hfi1_devdata *dd = rcd->dd;
 	u32 addr = CCE_INT_CLEAR + (8 * rcd->ireg);
 
-	mmiowb();	/* make sure everything before is written */
+	mmiowb();
 	write_csr(dd, addr, rcd->imask);
 	/* force the above write on the chip and get a value back */
 	(void)read_csr(dd, addr);
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 59ad4202422c..4dab2b5ffb0e 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -3700,7 +3700,7 @@ int qedr_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 
 		if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
 			writel(qp->rq.iwarp_db2_data.raw, qp->rq.iwarp_db2);
-			mmiowb();	/* for second doorbell */
+			mmiowb();
 		}
 
 		wr = wr->next;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 2462e7aa0c5d..1ed068509337 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -527,7 +527,7 @@ static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
 		REG_WR_RELAXED(bp, fp->ustorm_rx_prods_offset + i * 4,
 			       ((u32 *)&rx_prods)[i]);
 
-	mmiowb(); /* keep prod updates ordered */
+	mmiowb();
 
 	DP(NETIF_MSG_RX_STATUS,
 	   "queue[%d]:  wrote  bd_prod %u  cqe_prod %u  sge_prod %u\n",
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 626b491f7674..e46786a56b0c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -5244,7 +5244,7 @@ static void bnx2x_update_eq_prod(struct bnx2x *bp, u16 prod)
 {
 	/* No memory barriers */
 	storm_memset_eq_prod(bp, prod, BP_FUNC(bp));
-	mmiowb(); /* keep prod updates ordered */
+	mmiowb();
 }
 
 static int  bnx2x_cnic_handle_cfc_del(struct bnx2x *bp, u32 cid,
diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index fae6f71e677d..d56a78f411cd 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -280,7 +280,7 @@ static void bnx2i_ring_sq_dbell(struct bnx2i_conn *bnx2i_conn, int count)
 	} else
 		writew(count, ep->qp.ctx_base + CNIC_SEND_DOORBELL);
 
-	mmiowb(); /* flush posted PCI writes */
+	mmiowb();
 }
 
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (15 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 16/21] drivers: Remove useless trailing comments from mmiowb() invocations Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 15:50   ` Linus Torvalds
  2019-04-05 13:59 ` [PATCH v2 18/21] scsi/qla1280: Remove stale comment about mmiowb() Will Deacon
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

	@mmiowb@
	@@
	- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/crypto/cavium/nitrox/nitrox_reqmgr.c       |  4 ---
 drivers/dma/txx9dmac.c                             |  3 ---
 drivers/firewire/ohci.c                            |  1 -
 drivers/gpu/drm/i915/intel_hdmi.c                  | 10 --------
 drivers/ide/tx4939ide.c                            |  2 --
 drivers/infiniband/hw/hfi1/chip.c                  |  3 ---
 drivers/infiniband/hw/hfi1/pio.c                   |  1 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c         |  2 --
 drivers/infiniband/hw/mlx4/qp.c                    |  6 -----
 drivers/infiniband/hw/mlx5/qp.c                    |  1 -
 drivers/infiniband/hw/mthca/mthca_cmd.c            |  6 -----
 drivers/infiniband/hw/mthca/mthca_cq.c             |  5 ----
 drivers/infiniband/hw/mthca/mthca_qp.c             | 17 -------------
 drivers/infiniband/hw/mthca/mthca_srq.c            |  6 -----
 drivers/infiniband/hw/qedr/verbs.c                 | 12 ---------
 drivers/infiniband/hw/qib/qib_iba6120.c            |  4 ---
 drivers/infiniband/hw/qib/qib_iba7220.c            |  3 ---
 drivers/infiniband/hw/qib/qib_iba7322.c            |  3 ---
 drivers/infiniband/hw/qib/qib_sd7220.c             |  4 ---
 drivers/media/pci/dt3155/dt3155.c                  |  8 ------
 drivers/memstick/host/jmb38x_ms.c                  |  4 ---
 drivers/misc/ioc4.c                                |  2 --
 drivers/misc/mei/hw-me.c                           |  3 ---
 drivers/misc/tifm_7xx1.c                           |  1 -
 drivers/mmc/host/alcor.c                           |  1 -
 drivers/mmc/host/sdhci.c                           | 13 ----------
 drivers/mmc/host/tifm_sd.c                         |  3 ---
 drivers/mmc/host/via-sdmmc.c                       | 10 --------
 drivers/mtd/nand/raw/r852.c                        |  2 --
 drivers/mtd/nand/raw/txx9ndfmc.c                   |  1 -
 drivers/net/ethernet/aeroflex/greth.c              |  1 -
 drivers/net/ethernet/alacritech/slicoss.c          |  4 ---
 drivers/net/ethernet/amazon/ena/ena_com.c          |  1 -
 drivers/net/ethernet/atheros/atlx/atl1.c           |  1 -
 drivers/net/ethernet/atheros/atlx/atl2.c           |  1 -
 drivers/net/ethernet/broadcom/bnx2.c               |  4 ---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c    |  2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h    |  4 ---
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    |  1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   | 29 ----------------------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c     |  1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c  |  2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c   |  4 ---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  3 ---
 drivers/net/ethernet/broadcom/tg3.c                |  6 -----
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   | 10 --------
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  1 -
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c |  4 ---
 .../net/ethernet/cavium/liquidio/request_manager.c |  1 -
 drivers/net/ethernet/intel/e1000/e1000_main.c      |  5 ----
 drivers/net/ethernet/intel/e1000e/netdev.c         |  7 ------
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c       |  2 --
 drivers/net/ethernet/intel/fm10k/fm10k_main.c      |  5 ----
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  5 ----
 drivers/net/ethernet/intel/iavf/iavf_txrx.c        |  5 ----
 drivers/net/ethernet/intel/ice/ice_txrx.c          |  5 ----
 drivers/net/ethernet/intel/igb/igb_main.c          |  5 ----
 drivers/net/ethernet/intel/igbvf/netdev.c          |  4 ---
 drivers/net/ethernet/intel/igc/igc_main.c          |  5 ----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  5 ----
 drivers/net/ethernet/marvell/sky2.c                |  4 ---
 drivers/net/ethernet/mellanox/mlx4/catas.c         |  4 ---
 drivers/net/ethernet/mellanox/mlx4/cmd.c           | 13 ----------
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      |  1 -
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c   |  2 --
 drivers/net/ethernet/neterion/s2io.c               |  2 --
 drivers/net/ethernet/neterion/vxge/vxge-main.c     |  5 ----
 drivers/net/ethernet/neterion/vxge/vxge-traffic.c  |  4 ---
 drivers/net/ethernet/qlogic/qed/qed_int.c          | 13 ----------
 drivers/net/ethernet/qlogic/qed/qed_spq.c          |  3 ---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c    |  8 ------
 drivers/net/ethernet/qlogic/qede/qede_fp.c         |  8 ------
 drivers/net/ethernet/qlogic/qla3xxx.c              |  1 -
 drivers/net/ethernet/qlogic/qlge/qlge.h            |  1 -
 drivers/net/ethernet/qlogic/qlge/qlge_main.c       |  1 -
 drivers/net/ethernet/renesas/ravb_main.c           |  9 -------
 drivers/net/ethernet/renesas/ravb_ptp.c            |  3 ---
 drivers/net/ethernet/renesas/sh_eth.c              |  1 -
 drivers/net/ethernet/sfc/falcon/io.h               |  2 --
 drivers/net/ethernet/sfc/io.h                      |  2 --
 drivers/net/ethernet/silan/sc92031.c               | 14 -----------
 drivers/net/ethernet/via/via-rhine.c               |  3 ---
 drivers/net/ethernet/wiznet/w5100.c                |  6 -----
 drivers/net/ethernet/wiznet/w5300.c                | 15 -----------
 drivers/net/wireless/ath/ath5k/base.c              |  4 ---
 drivers/net/wireless/ath/ath5k/mac80211-ops.c      |  2 --
 drivers/net/wireless/broadcom/b43/main.c           |  7 ------
 drivers/net/wireless/broadcom/b43/sysfs.c          |  1 -
 drivers/net/wireless/broadcom/b43legacy/ilt.c      |  2 --
 drivers/net/wireless/broadcom/b43legacy/main.c     | 20 ---------------
 drivers/net/wireless/broadcom/b43legacy/phy.c      |  1 -
 drivers/net/wireless/broadcom/b43legacy/pio.h      |  1 -
 drivers/net/wireless/broadcom/b43legacy/radio.c    |  4 ---
 drivers/net/wireless/broadcom/b43legacy/sysfs.c    |  1 -
 drivers/net/wireless/intel/iwlegacy/common.h       |  7 ------
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c    |  1 -
 drivers/ntb/hw/idt/ntb_hw_idt.c                    |  7 ------
 drivers/ntb/test/ntb_perf.c                        |  3 ---
 drivers/scsi/bfa/bfa.h                             |  3 +--
 drivers/scsi/bfa/bfa_hw_cb.c                       |  2 --
 drivers/scsi/bfa/bfa_hw_ct.c                       |  2 --
 drivers/scsi/bnx2fc/bnx2fc_hwi.c                   |  2 --
 drivers/scsi/bnx2i/bnx2i_hwi.c                     |  3 ---
 drivers/scsi/megaraid/megaraid_sas_base.c          |  1 -
 drivers/scsi/megaraid/megaraid_sas_fusion.c        |  1 -
 drivers/scsi/mpt3sas/mpt3sas_base.c                |  1 -
 drivers/scsi/qedf/qedf_io.c                        |  1 -
 drivers/scsi/qedi/qedi_fw.c                        |  1 -
 drivers/scsi/qla1280.c                             |  5 ----
 drivers/ssb/pci.c                                  |  1 -
 drivers/ssb/pcmcia.c                               |  4 ---
 drivers/staging/comedi/drivers/mite.c              |  3 ---
 drivers/staging/comedi/drivers/ni_660x.c           |  2 --
 drivers/staging/comedi/drivers/ni_mio_common.c     |  1 -
 drivers/staging/comedi/drivers/ni_pcidio.c         |  2 --
 drivers/staging/comedi/drivers/ni_tio.c            |  1 -
 drivers/staging/comedi/drivers/s626.c              |  2 --
 drivers/tty/serial/men_z135_uart.c                 |  1 -
 drivers/tty/serial/serial_txx9.c                   |  1 -
 drivers/usb/early/xhci-dbc.c                       |  4 ---
 drivers/usb/host/xhci-dbgcap.c                     |  2 --
 include/linux/qed/qed_if.h                         |  2 --
 sound/soc/txx9/txx9aclc-ac97.c                     |  1 -
 123 files changed, 1 insertion(+), 508 deletions(-)

diff --git a/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c b/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
index 4c97478d44bd..5826c2c98a50 100644
--- a/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
+++ b/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
@@ -303,8 +303,6 @@ static void post_se_instr(struct nitrox_softreq *sr,
 
 	/* Ring doorbell with count 1 */
 	writeq(1, cmdq->dbell_csr_addr);
-	/* orders the doorbell rings */
-	mmiowb();
 
 	cmdq->write_idx = incr_index(idx, 1, ndev->qlen);
 
@@ -599,8 +597,6 @@ void pkt_slc_resp_tasklet(unsigned long data)
 	 * MSI-X interrupt generates if Completion count > Threshold
 	 */
 	writeq(slc_cnts.value, cmdq->compl_cnt_csr_addr);
-	/* order the writes */
-	mmiowb();
 
 	if (atomic_read(&cmdq->backlog_count))
 		schedule_work(&cmdq->backlog_qflush);
diff --git a/drivers/dma/txx9dmac.c b/drivers/dma/txx9dmac.c
index eb45af71d3a3..e8d0881b64d8 100644
--- a/drivers/dma/txx9dmac.c
+++ b/drivers/dma/txx9dmac.c
@@ -327,7 +327,6 @@ static void txx9dmac_reset_chan(struct txx9dmac_chan *dc)
 	channel_writel(dc, SAIR, 0);
 	channel_writel(dc, DAIR, 0);
 	channel_writel(dc, CCR, 0);
-	mmiowb();
 }
 
 /* Called with dc->lock held and bh disabled */
@@ -954,7 +953,6 @@ static void txx9dmac_chain_dynamic(struct txx9dmac_chan *dc,
 	dma_sync_single_for_device(chan2parent(&dc->chan),
 				   prev->txd.phys, ddev->descsize,
 				   DMA_TO_DEVICE);
-	mmiowb();
 	if (!(channel_readl(dc, CSR) & TXX9_DMA_CSR_CHNEN) &&
 	    channel_read_CHAR(dc) == prev->txd.phys)
 		/* Restart chain DMA */
@@ -1080,7 +1078,6 @@ static void txx9dmac_free_chan_resources(struct dma_chan *chan)
 static void txx9dmac_off(struct txx9dmac_dev *ddev)
 {
 	dma_writel(ddev, MCR, 0);
-	mmiowb();
 }
 
 static int __init txx9dmac_chan_probe(struct platform_device *pdev)
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 45c048751f3b..7183ab34269e 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -2939,7 +2939,6 @@ static void set_multichannel_mask(struct fw_ohci *ohci, u64 channels)
 	reg_write(ohci, OHCI1394_IRMultiChanMaskLoClear, ~lo);
 	reg_write(ohci, OHCI1394_IRMultiChanMaskHiSet, hi);
 	reg_write(ohci, OHCI1394_IRMultiChanMaskLoSet, lo);
-	mmiowb();
 	ohci->mc_channels = channels;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c
index f125a62eba8c..a46bffe2b288 100644
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -182,7 +182,6 @@ static void g4x_write_infoframe(struct intel_encoder *encoder,
 
 	I915_WRITE(VIDEO_DIP_CTL, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(VIDEO_DIP_DATA, *data);
 		data++;
@@ -190,7 +189,6 @@ static void g4x_write_infoframe(struct intel_encoder *encoder,
 	/* Write every possible data byte to force correct ECC calculation. */
 	for (; i < VIDEO_DIP_DATA_SIZE; i += 4)
 		I915_WRITE(VIDEO_DIP_DATA, 0);
-	mmiowb();
 
 	val |= g4x_infoframe_enable(type);
 	val &= ~VIDEO_DIP_FREQ_MASK;
@@ -237,7 +235,6 @@ static void ibx_write_infoframe(struct intel_encoder *encoder,
 
 	I915_WRITE(reg, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(TVIDEO_DIP_DATA(intel_crtc->pipe), *data);
 		data++;
@@ -245,7 +242,6 @@ static void ibx_write_infoframe(struct intel_encoder *encoder,
 	/* Write every possible data byte to force correct ECC calculation. */
 	for (; i < VIDEO_DIP_DATA_SIZE; i += 4)
 		I915_WRITE(TVIDEO_DIP_DATA(intel_crtc->pipe), 0);
-	mmiowb();
 
 	val |= g4x_infoframe_enable(type);
 	val &= ~VIDEO_DIP_FREQ_MASK;
@@ -298,7 +294,6 @@ static void cpt_write_infoframe(struct intel_encoder *encoder,
 
 	I915_WRITE(reg, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(TVIDEO_DIP_DATA(intel_crtc->pipe), *data);
 		data++;
@@ -306,7 +301,6 @@ static void cpt_write_infoframe(struct intel_encoder *encoder,
 	/* Write every possible data byte to force correct ECC calculation. */
 	for (; i < VIDEO_DIP_DATA_SIZE; i += 4)
 		I915_WRITE(TVIDEO_DIP_DATA(intel_crtc->pipe), 0);
-	mmiowb();
 
 	val |= g4x_infoframe_enable(type);
 	val &= ~VIDEO_DIP_FREQ_MASK;
@@ -352,7 +346,6 @@ static void vlv_write_infoframe(struct intel_encoder *encoder,
 
 	I915_WRITE(reg, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(VLV_TVIDEO_DIP_DATA(intel_crtc->pipe), *data);
 		data++;
@@ -360,7 +353,6 @@ static void vlv_write_infoframe(struct intel_encoder *encoder,
 	/* Write every possible data byte to force correct ECC calculation. */
 	for (; i < VIDEO_DIP_DATA_SIZE; i += 4)
 		I915_WRITE(VLV_TVIDEO_DIP_DATA(intel_crtc->pipe), 0);
-	mmiowb();
 
 	val |= g4x_infoframe_enable(type);
 	val &= ~VIDEO_DIP_FREQ_MASK;
@@ -406,7 +398,6 @@ static void hsw_write_infoframe(struct intel_encoder *encoder,
 	val &= ~hsw_infoframe_enable(type);
 	I915_WRITE(ctl_reg, val);
 
-	mmiowb();
 	for (i = 0; i < len; i += 4) {
 		I915_WRITE(hsw_dip_data_reg(dev_priv, cpu_transcoder,
 					    type, i >> 2), *data);
@@ -416,7 +407,6 @@ static void hsw_write_infoframe(struct intel_encoder *encoder,
 	for (; i < data_size; i += 4)
 		I915_WRITE(hsw_dip_data_reg(dev_priv, cpu_transcoder,
 					    type, i >> 2), 0);
-	mmiowb();
 
 	val |= hsw_infoframe_enable(type);
 	I915_WRITE(ctl_reg, val);
diff --git a/drivers/ide/tx4939ide.c b/drivers/ide/tx4939ide.c
index 67d4a7d4acc8..88d132edc4e3 100644
--- a/drivers/ide/tx4939ide.c
+++ b/drivers/ide/tx4939ide.c
@@ -156,7 +156,6 @@ static u16 tx4939ide_check_error_ints(ide_hwif_t *hwif)
 		u16 sysctl = tx4939ide_readw(base, TX4939IDE_Sys_Ctl);
 
 		tx4939ide_writew(sysctl | 0x4000, base, TX4939IDE_Sys_Ctl);
-		mmiowb();
 		/* wait 12GBUSCLK (typ. 60ns @ GBUS200MHz, max 270ns) */
 		ndelay(270);
 		tx4939ide_writew(sysctl, base, TX4939IDE_Sys_Ctl);
@@ -396,7 +395,6 @@ static void tx4939ide_init_hwif(ide_hwif_t *hwif)
 
 	/* Soft Reset */
 	tx4939ide_writew(0x8000, base, TX4939IDE_Sys_Ctl);
-	mmiowb();
 	/* at least 20 GBUSCLK (typ. 100ns @ GBUS200MHz, max 450ns) */
 	ndelay(450);
 	tx4939ide_writew(0x0000, base, TX4939IDE_Sys_Ctl);
diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 12e67a91e578..8f270459b63e 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -8365,7 +8365,6 @@ static inline void clear_recv_intr(struct hfi1_ctxtdata *rcd)
 	struct hfi1_devdata *dd = rcd->dd;
 	u32 addr = CCE_INT_CLEAR + (8 * rcd->ireg);
 
-	mmiowb();
 	write_csr(dd, addr, rcd->imask);
 	/* force the above write on the chip and get a value back */
 	(void)read_csr(dd, addr);
@@ -11803,12 +11802,10 @@ void update_usrhead(struct hfi1_ctxtdata *rcd, u32 hd, u32 updegr, u32 egrhd,
 			<< RCV_EGR_INDEX_HEAD_HEAD_SHIFT;
 		write_uctxt_csr(dd, ctxt, RCV_EGR_INDEX_HEAD, reg);
 	}
-	mmiowb();
 	reg = ((u64)rcv_intr_count << RCV_HDR_HEAD_COUNTER_SHIFT) |
 		(((u64)hd & RCV_HDR_HEAD_HEAD_MASK)
 			<< RCV_HDR_HEAD_HEAD_SHIFT);
 	write_uctxt_csr(dd, ctxt, RCV_HDR_HEAD, reg);
-	mmiowb();
 }
 
 u32 hdrqempty(struct hfi1_ctxtdata *rcd)
diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c
index a1de566fe95e..16ba9d52e1b9 100644
--- a/drivers/infiniband/hw/hfi1/pio.c
+++ b/drivers/infiniband/hw/hfi1/pio.c
@@ -1578,7 +1578,6 @@ void hfi1_sc_wantpiobuf_intr(struct send_context *sc, u32 needint)
 		sc_del_credit_return_intr(sc);
 	trace_hfi1_wantpiointr(sc, needint, sc->credit_ctrl);
 	if (needint) {
-		mmiowb();
 		sc_return_credits(sc);
 	}
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 97515c340134..c8555f7704d8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -1750,8 +1750,6 @@ static int hns_roce_v1_post_mbox(struct hns_roce_dev *hr_dev, u64 in_param,
 
 	writel(val, hcr + 5);
 
-	mmiowb();
-
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 429a59c5801c..9426936460f8 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -3744,12 +3744,6 @@ static int _mlx4_ib_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 		writel_relaxed(qp->doorbell_qpn,
 			to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
 
-		/*
-		 * Make sure doorbells don't leak out of SQ spinlock
-		 * and reach the HCA out of order.
-		 */
-		mmiowb();
-
 		stamp_send_wqe(qp, ind + qp->sq_spare_wqes - 1);
 
 		qp->sq_next_wqe = ind;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 7cd006da1dae..b680be1f3f47 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -5123,7 +5123,6 @@ static int _mlx5_ib_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 		/* Make sure doorbells don't leak out of SQ spinlock
 		 * and reach the HCA out of order.
 		 */
-		mmiowb();
 		bf->offset ^= bf->buf_size;
 	}
 
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 83aa47eb81a9..bdf5ed38de22 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -292,12 +292,6 @@ static int mthca_cmd_post(struct mthca_dev *dev,
 		err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier,
 					 op_modifier, op, token, event);
 
-	/*
-	 * Make sure that our HCR writes don't get mixed in with
-	 * writes from another CPU starting a FW command.
-	 */
-	mmiowb();
-
 	mutex_unlock(&dev->cmd.hcr_mutex);
 	return err;
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c
index a6531ffe29a6..877a6daffa98 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -211,11 +211,6 @@ static inline void update_cons_index(struct mthca_dev *dev, struct mthca_cq *cq,
 		mthca_write64(MTHCA_TAVOR_CQ_DB_INC_CI | cq->cqn, incr - 1,
 			      dev->kar + MTHCA_CQ_DOORBELL,
 			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
-		/*
-		 * Make sure doorbells don't leak out of CQ spinlock
-		 * and reach the HCA out of order:
-		 */
-		mmiowb();
 	}
 }
 
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index 7a5b25d13faa..d65b189f20ea 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1809,11 +1809,6 @@ int mthca_tavor_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 			      (qp->qpn << 8) | size0,
 			      dev->kar + MTHCA_SEND_DOORBELL,
 			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
-		/*
-		 * Make sure doorbells don't leak out of SQ spinlock
-		 * and reach the HCA out of order:
-		 */
-		mmiowb();
 	}
 
 	qp->sq.next_ind = ind;
@@ -1924,12 +1919,6 @@ int mthca_tavor_post_receive(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 	qp->rq.next_ind = ind;
 	qp->rq.head    += nreq;
 
-	/*
-	 * Make sure doorbells don't leak out of RQ spinlock and reach
-	 * the HCA out of order:
-	 */
-	mmiowb();
-
 	spin_unlock_irqrestore(&qp->rq.lock, flags);
 	return err;
 }
@@ -2164,12 +2153,6 @@ int mthca_arbel_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 	}
 
-	/*
-	 * Make sure doorbells don't leak out of SQ spinlock and reach
-	 * the HCA out of order:
-	 */
-	mmiowb();
-
 	spin_unlock_irqrestore(&qp->sq.lock, flags);
 	return err;
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c
index 06b920385512..a85935ccce88 100644
--- a/drivers/infiniband/hw/mthca/mthca_srq.c
+++ b/drivers/infiniband/hw/mthca/mthca_srq.c
@@ -570,12 +570,6 @@ int mthca_tavor_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
 			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 	}
 
-	/*
-	 * Make sure doorbells don't leak out of SRQ spinlock and
-	 * reach the HCA out of order:
-	 */
-	mmiowb();
-
 	spin_unlock_irqrestore(&srq->lock, flags);
 	return err;
 }
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 4dab2b5ffb0e..8686a98e113d 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -773,9 +773,6 @@ static void doorbell_cq(struct qedr_cq *cq, u32 cons, u8 flags)
 	cq->db.data.agg_flags = flags;
 	cq->db.data.value = cpu_to_le32(cons);
 	writeq(cq->db.raw, cq->db_addr);
-
-	/* Make sure write would stick */
-	mmiowb();
 }
 
 int qedr_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
@@ -2084,8 +2081,6 @@ static int qedr_update_qp_state(struct qedr_dev *dev,
 
 			if (rdma_protocol_roce(&dev->ibdev, 1)) {
 				writel(qp->rq.db_data.raw, qp->rq.db);
-				/* Make sure write takes effect */
-				mmiowb();
 			}
 			break;
 		case QED_ROCE_QP_STATE_ERR:
@@ -3502,9 +3497,6 @@ int qedr_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 	smp_wmb();
 	writel(qp->sq.db_data.raw, qp->sq.db);
 
-	/* Make sure write sticks */
-	mmiowb();
-
 	spin_unlock_irqrestore(&qp->q_lock, flags);
 
 	return rc;
@@ -3695,12 +3687,8 @@ int qedr_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 
 		writel(qp->rq.db_data.raw, qp->rq.db);
 
-		/* Make sure write sticks */
-		mmiowb();
-
 		if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
 			writel(qp->rq.iwarp_db2_data.raw, qp->rq.iwarp_db2);
-			mmiowb();
 		}
 
 		wr = wr->next;
diff --git a/drivers/infiniband/hw/qib/qib_iba6120.c b/drivers/infiniband/hw/qib/qib_iba6120.c
index cdbf707fa267..531d8a1db2c3 100644
--- a/drivers/infiniband/hw/qib/qib_iba6120.c
+++ b/drivers/infiniband/hw/qib/qib_iba6120.c
@@ -1884,7 +1884,6 @@ static void qib_6120_put_tid(struct qib_devdata *dd, u64 __iomem *tidptr,
 	qib_write_kreg(dd, kr_scratch, 0xfeeddeaf);
 	writel(pa, tidp32);
 	qib_write_kreg(dd, kr_scratch, 0xdeadbeef);
-	mmiowb();
 	spin_unlock_irqrestore(tidlockp, flags);
 }
 
@@ -1928,7 +1927,6 @@ static void qib_6120_put_tid_2(struct qib_devdata *dd, u64 __iomem *tidptr,
 			pa |= 2 << 29;
 	}
 	writel(pa, tidp32);
-	mmiowb();
 }
 
 
@@ -2053,9 +2051,7 @@ static void qib_update_6120_usrhead(struct qib_ctxtdata *rcd, u64 hd,
 {
 	if (updegr)
 		qib_write_ureg(rcd->dd, ur_rcvegrindexhead, egrhd, rcd->ctxt);
-	mmiowb();
 	qib_write_ureg(rcd->dd, ur_rcvhdrhead, hd, rcd->ctxt);
-	mmiowb();
 }
 
 static u32 qib_6120_hdrqempty(struct qib_ctxtdata *rcd)
diff --git a/drivers/infiniband/hw/qib/qib_iba7220.c b/drivers/infiniband/hw/qib/qib_iba7220.c
index 9fde45538f6e..ea3ddb05cbad 100644
--- a/drivers/infiniband/hw/qib/qib_iba7220.c
+++ b/drivers/infiniband/hw/qib/qib_iba7220.c
@@ -2175,7 +2175,6 @@ static void qib_7220_put_tid(struct qib_devdata *dd, u64 __iomem *tidptr,
 		pa = chippa;
 	}
 	writeq(pa, tidptr);
-	mmiowb();
 }
 
 /**
@@ -2704,9 +2703,7 @@ static void qib_update_7220_usrhead(struct qib_ctxtdata *rcd, u64 hd,
 {
 	if (updegr)
 		qib_write_ureg(rcd->dd, ur_rcvegrindexhead, egrhd, rcd->ctxt);
-	mmiowb();
 	qib_write_ureg(rcd->dd, ur_rcvhdrhead, hd, rcd->ctxt);
-	mmiowb();
 }
 
 static u32 qib_7220_hdrqempty(struct qib_ctxtdata *rcd)
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index 17d6b24b3473..ac6a84f11ad0 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -3793,7 +3793,6 @@ static void qib_7322_put_tid(struct qib_devdata *dd, u64 __iomem *tidptr,
 		pa = chippa;
 	}
 	writeq(pa, tidptr);
-	mmiowb();
 }
 
 /**
@@ -4440,10 +4439,8 @@ static void qib_update_7322_usrhead(struct qib_ctxtdata *rcd, u64 hd,
 		adjust_rcv_timeout(rcd, npkts);
 	if (updegr)
 		qib_write_ureg(rcd->dd, ur_rcvegrindexhead, egrhd, rcd->ctxt);
-	mmiowb();
 	qib_write_ureg(rcd->dd, ur_rcvhdrhead, hd, rcd->ctxt);
 	qib_write_ureg(rcd->dd, ur_rcvhdrhead, hd, rcd->ctxt);
-	mmiowb();
 }
 
 static u32 qib_7322_hdrqempty(struct qib_ctxtdata *rcd)
diff --git a/drivers/infiniband/hw/qib/qib_sd7220.c b/drivers/infiniband/hw/qib/qib_sd7220.c
index 12caf3db8c34..4f4a09c2dbcd 100644
--- a/drivers/infiniband/hw/qib/qib_sd7220.c
+++ b/drivers/infiniband/hw/qib/qib_sd7220.c
@@ -1068,7 +1068,6 @@ static int qib_sd_setvals(struct qib_devdata *dd)
 	for (idx = 0; idx < NUM_DDS_REGS; ++idx) {
 		data = ((dds_reg_map & 0xF) << 4) | TX_FAST_ELT;
 		writeq(data, iaddr + idx);
-		mmiowb();
 		qib_read_kreg32(dd, kr_scratch);
 		dds_reg_map >>= 4;
 		for (midx = 0; midx < DDS_ROWS; ++midx) {
@@ -1076,7 +1075,6 @@ static int qib_sd_setvals(struct qib_devdata *dd)
 
 			data = dds_init_vals[midx].reg_vals[idx];
 			writeq(data, daddr);
-			mmiowb();
 			qib_read_kreg32(dd, kr_scratch);
 		} /* End inner for (vals for this reg, each row) */
 	} /* end outer for (regs to be stored) */
@@ -1098,13 +1096,11 @@ static int qib_sd_setvals(struct qib_devdata *dd)
 		didx = idx + min_idx;
 		/* Store the next RXEQ register address */
 		writeq(rxeq_init_vals[idx].rdesc, iaddr + didx);
-		mmiowb();
 		qib_read_kreg32(dd, kr_scratch);
 		/* Iterate through RXEQ values */
 		for (vidx = 0; vidx < 4; vidx++) {
 			data = rxeq_init_vals[idx].rdata[vidx];
 			writeq(data, taddr + (vidx << 6) + idx);
-			mmiowb();
 			qib_read_kreg32(dd, kr_scratch);
 		}
 	} /* end outer for (Reg-writes for RXEQ) */
diff --git a/drivers/media/pci/dt3155/dt3155.c b/drivers/media/pci/dt3155/dt3155.c
index 17d69bd5d7f1..49677ee889e3 100644
--- a/drivers/media/pci/dt3155/dt3155.c
+++ b/drivers/media/pci/dt3155/dt3155.c
@@ -46,7 +46,6 @@ static int read_i2c_reg(void __iomem *addr, u8 index, u8 *data)
 	u32 tmp = index;
 
 	iowrite32((tmp << 17) | IIC_READ, addr + IIC_CSR2);
-	mmiowb();
 	udelay(45); /* wait at least 43 usec for NEW_CYCLE to clear */
 	if (ioread32(addr + IIC_CSR2) & NEW_CYCLE)
 		return -EIO; /* error: NEW_CYCLE not cleared */
@@ -77,7 +76,6 @@ static int write_i2c_reg(void __iomem *addr, u8 index, u8 data)
 	u32 tmp = index;
 
 	iowrite32((tmp << 17) | IIC_WRITE | data, addr + IIC_CSR2);
-	mmiowb();
 	udelay(65); /* wait at least 63 usec for NEW_CYCLE to clear */
 	if (ioread32(addr + IIC_CSR2) & NEW_CYCLE)
 		return -EIO; /* error: NEW_CYCLE not cleared */
@@ -104,7 +102,6 @@ static void write_i2c_reg_nowait(void __iomem *addr, u8 index, u8 data)
 	u32 tmp = index;
 
 	iowrite32((tmp << 17) | IIC_WRITE | data, addr + IIC_CSR2);
-	mmiowb();
 }
 
 /**
@@ -264,7 +261,6 @@ static irqreturn_t dt3155_irq_handler_even(int irq, void *dev_id)
 						FLD_DN_ODD | FLD_DN_EVEN |
 						CAP_CONT_EVEN | CAP_CONT_ODD,
 							ipd->regs + CSR1);
-		mmiowb();
 	}
 
 	spin_lock(&ipd->lock);
@@ -282,7 +278,6 @@ static irqreturn_t dt3155_irq_handler_even(int irq, void *dev_id)
 		iowrite32(dma_addr + ipd->width, ipd->regs + ODD_DMA_START);
 		iowrite32(ipd->width, ipd->regs + EVEN_DMA_STRIDE);
 		iowrite32(ipd->width, ipd->regs + ODD_DMA_STRIDE);
-		mmiowb();
 	}
 
 	/* enable interrupts, clear all irq flags */
@@ -437,12 +432,10 @@ static int dt3155_init_board(struct dt3155_priv *pd)
 	/*  resetting the adapter  */
 	iowrite32(ADDR_ERR_ODD | ADDR_ERR_EVEN | FLD_CRPT_ODD | FLD_CRPT_EVEN |
 			FLD_DN_ODD | FLD_DN_EVEN, pd->regs + CSR1);
-	mmiowb();
 	msleep(20);
 
 	/*  initializing adapter registers  */
 	iowrite32(FIFO_EN | SRST, pd->regs + CSR1);
-	mmiowb();
 	iowrite32(0xEEEEEE01, pd->regs + EVEN_PIXEL_FMT);
 	iowrite32(0xEEEEEE01, pd->regs + ODD_PIXEL_FMT);
 	iowrite32(0x00000020, pd->regs + FIFO_TRIGER);
@@ -454,7 +447,6 @@ static int dt3155_init_board(struct dt3155_priv *pd)
 	iowrite32(0, pd->regs + MASK_LENGTH);
 	iowrite32(0x0005007C, pd->regs + FIFO_FLAG_CNT);
 	iowrite32(0x01010101, pd->regs + IIC_CLK_DUR);
-	mmiowb();
 
 	/* verifying that we have a DT3155 board (not just a SAA7116 chip) */
 	read_i2c_reg(pd->regs, DT_ID, &tmp);
diff --git a/drivers/memstick/host/jmb38x_ms.c b/drivers/memstick/host/jmb38x_ms.c
index bcdca9fbef51..e3a5af65dbce 100644
--- a/drivers/memstick/host/jmb38x_ms.c
+++ b/drivers/memstick/host/jmb38x_ms.c
@@ -644,7 +644,6 @@ static int jmb38x_ms_reset(struct jmb38x_ms_host *host)
 	writel(HOST_CONTROL_RESET_REQ | HOST_CONTROL_CLOCK_EN
 	       | readl(host->addr + HOST_CONTROL),
 	       host->addr + HOST_CONTROL);
-	mmiowb();
 
 	for (cnt = 0; cnt < 20; ++cnt) {
 		if (!(HOST_CONTROL_RESET_REQ
@@ -659,7 +658,6 @@ static int jmb38x_ms_reset(struct jmb38x_ms_host *host)
 	writel(HOST_CONTROL_RESET | HOST_CONTROL_CLOCK_EN
 	       | readl(host->addr + HOST_CONTROL),
 	       host->addr + HOST_CONTROL);
-	mmiowb();
 
 	for (cnt = 0; cnt < 20; ++cnt) {
 		if (!(HOST_CONTROL_RESET
@@ -672,7 +670,6 @@ static int jmb38x_ms_reset(struct jmb38x_ms_host *host)
 	return -EIO;
 
 reset_ok:
-	mmiowb();
 	writel(INT_STATUS_ALL, host->addr + INT_SIGNAL_ENABLE);
 	writel(INT_STATUS_ALL, host->addr + INT_STATUS_ENABLE);
 	return 0;
@@ -1009,7 +1006,6 @@ static void jmb38x_ms_remove(struct pci_dev *dev)
 		tasklet_kill(&host->notify);
 		writel(0, host->addr + INT_SIGNAL_ENABLE);
 		writel(0, host->addr + INT_STATUS_ENABLE);
-		mmiowb();
 		dev_dbg(&jm->pdev->dev, "interrupts off\n");
 		spin_lock_irqsave(&host->lock, flags);
 		if (host->req) {
diff --git a/drivers/misc/ioc4.c b/drivers/misc/ioc4.c
index ec0832278170..9d0445a567db 100644
--- a/drivers/misc/ioc4.c
+++ b/drivers/misc/ioc4.c
@@ -156,7 +156,6 @@ ioc4_clock_calibrate(struct ioc4_driver_data *idd)
 
 	/* Reset to power-on state */
 	writel(0, &idd->idd_misc_regs->int_out.raw);
-	mmiowb();
 
 	/* Set up square wave */
 	int_out.raw = 0;
@@ -164,7 +163,6 @@ ioc4_clock_calibrate(struct ioc4_driver_data *idd)
 	int_out.fields.mode = IOC4_INT_OUT_MODE_TOGGLE;
 	int_out.fields.diag = 0;
 	writel(int_out.raw, &idd->idd_misc_regs->int_out.raw);
-	mmiowb();
 
 	/* Check square wave period averaged over some number of cycles */
 	start = ktime_get_ns();
diff --git a/drivers/misc/mei/hw-me.c b/drivers/misc/mei/hw-me.c
index 3fbbadfa2ae1..8a47a6fc3fc7 100644
--- a/drivers/misc/mei/hw-me.c
+++ b/drivers/misc/mei/hw-me.c
@@ -350,9 +350,6 @@ static void mei_me_hw_reset_release(struct mei_device *dev)
 	hcsr |= H_IG;
 	hcsr &= ~H_RST;
 	mei_hcsr_set(dev, hcsr);
-
-	/* complete this write before we set host ready on another CPU */
-	mmiowb();
 }
 
 /**
diff --git a/drivers/misc/tifm_7xx1.c b/drivers/misc/tifm_7xx1.c
index 9ac95b48ef92..cc729f7ab32e 100644
--- a/drivers/misc/tifm_7xx1.c
+++ b/drivers/misc/tifm_7xx1.c
@@ -403,7 +403,6 @@ static void tifm_7xx1_remove(struct pci_dev *dev)
 	fm->eject = tifm_7xx1_dummy_eject;
 	fm->has_ms_pif = tifm_7xx1_dummy_has_ms_pif;
 	writel(TIFM_IRQ_SETALL, fm->addr + FM_CLEAR_INTERRUPT_ENABLE);
-	mmiowb();
 	free_irq(dev->irq, fm);
 
 	tifm_remove_adapter(fm);
diff --git a/drivers/mmc/host/alcor.c b/drivers/mmc/host/alcor.c
index 82a97866e0cf..546b1fc30e7d 100644
--- a/drivers/mmc/host/alcor.c
+++ b/drivers/mmc/host/alcor.c
@@ -967,7 +967,6 @@ static void alcor_timeout_timer(struct work_struct *work)
 		alcor_request_complete(host, 0);
 	}
 
-	mmiowb();
 	mutex_unlock(&host->cmd_mutex);
 }
 
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index a8141ff9be03..42e1bad024f4 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1807,7 +1807,6 @@ void sdhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
 			sdhci_send_command(host, mrq->cmd);
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 EXPORT_SYMBOL_GPL(sdhci_request);
@@ -2010,8 +2009,6 @@ void sdhci_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
 	 */
 	if (host->quirks & SDHCI_QUIRK_RESET_CMD_DATA_ON_IOS)
 		sdhci_do_reset(host, SDHCI_RESET_CMD | SDHCI_RESET_DATA);
-
-	mmiowb();
 }
 EXPORT_SYMBOL_GPL(sdhci_set_ios);
 
@@ -2105,7 +2102,6 @@ static void sdhci_enable_sdio_irq_nolock(struct sdhci_host *host, int enable)
 
 		sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
 		sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
-		mmiowb();
 	}
 }
 
@@ -2353,7 +2349,6 @@ void sdhci_send_tuning(struct sdhci_host *host, u32 opcode)
 
 	host->tuning_done = 0;
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	/* Wait for Buffer Read Ready interrupt */
@@ -2705,7 +2700,6 @@ static bool sdhci_request_done(struct sdhci_host *host)
 
 	host->mrqs_done[i] = NULL;
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	mmc_request_done(host->mmc, mrq);
@@ -2739,7 +2733,6 @@ static void sdhci_timeout_timer(struct timer_list *t)
 		sdhci_finish_mrq(host, host->cmd->mrq);
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 
@@ -2770,7 +2763,6 @@ static void sdhci_timeout_data_timer(struct timer_list *t)
 		}
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 
@@ -3251,7 +3243,6 @@ int sdhci_resume_host(struct sdhci_host *host)
 		mmc->ops->set_ios(mmc, &mmc->ios);
 	} else {
 		sdhci_init(host, (host->mmc->pm_flags & MMC_PM_KEEP_POWER));
-		mmiowb();
 	}
 
 	if (host->irq_wake_enabled) {
@@ -3391,7 +3382,6 @@ void sdhci_cqe_enable(struct mmc_host *mmc)
 		 mmc_hostname(mmc), host->ier,
 		 sdhci_readl(host, SDHCI_INT_STATUS));
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 EXPORT_SYMBOL_GPL(sdhci_cqe_enable);
@@ -3416,7 +3406,6 @@ void sdhci_cqe_disable(struct mmc_host *mmc, bool recovery)
 		 mmc_hostname(mmc), host->ier,
 		 sdhci_readl(host, SDHCI_INT_STATUS));
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 EXPORT_SYMBOL_GPL(sdhci_cqe_disable);
@@ -4255,8 +4244,6 @@ int __sdhci_add_host(struct sdhci_host *host)
 		goto unirq;
 	}
 
-	mmiowb();
-
 	ret = mmc_add_host(mmc);
 	if (ret)
 		goto unled;
diff --git a/drivers/mmc/host/tifm_sd.c b/drivers/mmc/host/tifm_sd.c
index b6644ce296b2..35dd34b82a4d 100644
--- a/drivers/mmc/host/tifm_sd.c
+++ b/drivers/mmc/host/tifm_sd.c
@@ -889,7 +889,6 @@ static int tifm_sd_initialize_host(struct tifm_sd *host)
 	struct tifm_dev *sock = host->dev;
 
 	writel(0, sock->addr + SOCK_MMCSD_INT_ENABLE);
-	mmiowb();
 	host->clk_div = 61;
 	host->clk_freq = 20000000;
 	writel(TIFM_MMCSD_RESET, sock->addr + SOCK_MMCSD_SYSTEM_CONTROL);
@@ -940,7 +939,6 @@ static int tifm_sd_initialize_host(struct tifm_sd *host)
 	writel(TIFM_MMCSD_CERR | TIFM_MMCSD_BRS | TIFM_MMCSD_EOC
 	       | TIFM_MMCSD_ERRMASK,
 	       sock->addr + SOCK_MMCSD_INT_ENABLE);
-	mmiowb();
 
 	return 0;
 }
@@ -1005,7 +1003,6 @@ static void tifm_sd_remove(struct tifm_dev *sock)
 	spin_lock_irqsave(&sock->lock, flags);
 	host->eject = 1;
 	writel(0, sock->addr + SOCK_MMCSD_INT_ENABLE);
-	mmiowb();
 	spin_unlock_irqrestore(&sock->lock, flags);
 
 	tasklet_kill(&host->finish_tasklet);
diff --git a/drivers/mmc/host/via-sdmmc.c b/drivers/mmc/host/via-sdmmc.c
index 32c4211506fc..412395ac2935 100644
--- a/drivers/mmc/host/via-sdmmc.c
+++ b/drivers/mmc/host/via-sdmmc.c
@@ -686,7 +686,6 @@ static void via_sdc_request(struct mmc_host *mmc, struct mmc_request *mrq)
 		via_sdc_send_command(host, mrq->cmd);
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 
@@ -711,7 +710,6 @@ static void via_sdc_set_power(struct via_crdr_mmc_host *host,
 		gatt &= ~VIA_CRDR_PCICLKGATT_PAD_PWRON;
 	writeb(gatt, host->pcictrl_mmiobase + VIA_CRDR_PCICLKGATT);
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	via_pwron_sleep(host);
@@ -770,7 +768,6 @@ static void via_sdc_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
 	if (readb(addrbase + VIA_CRDR_PCISDCCLK) != clock)
 		writeb(clock, addrbase + VIA_CRDR_PCISDCCLK);
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	if (ios->power_mode != MMC_POWER_OFF)
@@ -830,7 +827,6 @@ static void via_reset_pcictrl(struct via_crdr_mmc_host *host)
 	via_restore_pcictrlreg(host);
 	via_restore_sdcreg(host);
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 }
 
@@ -925,7 +921,6 @@ static irqreturn_t via_sdc_isr(int irq, void *dev_id)
 
 	result = IRQ_HANDLED;
 
-	mmiowb();
 out:
 	spin_unlock(&sdhost->lock);
 
@@ -960,7 +955,6 @@ static void via_sdc_timeout(struct timer_list *t)
 		}
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&sdhost->lock, flags);
 }
 
@@ -1012,7 +1006,6 @@ static void via_sdc_card_detect(struct work_struct *work)
 			tasklet_schedule(&host->finish_tasklet);
 		}
 
-		mmiowb();
 		spin_unlock_irqrestore(&host->lock, flags);
 
 		via_reset_pcictrl(host);
@@ -1020,7 +1013,6 @@ static void via_sdc_card_detect(struct work_struct *work)
 		spin_lock_irqsave(&host->lock, flags);
 	}
 
-	mmiowb();
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	via_print_pcictrl(host);
@@ -1188,7 +1180,6 @@ static void via_sd_remove(struct pci_dev *pcidev)
 
 	/* Disable generating further interrupts */
 	writeb(0x0, sdhost->pcictrl_mmiobase + VIA_CRDR_PCIINTCTRL);
-	mmiowb();
 
 	if (sdhost->mrq) {
 		pr_err("%s: Controller removed during "
@@ -1197,7 +1188,6 @@ static void via_sd_remove(struct pci_dev *pcidev)
 		/* make sure all DMA is stopped */
 		writel(VIA_CRDR_DMACTRL_SFTRST,
 			sdhost->ddma_mmiobase + VIA_CRDR_DMACTRL);
-		mmiowb();
 		sdhost->mrq->cmd->error = -ENOMEDIUM;
 		if (sdhost->mrq->stop)
 			sdhost->mrq->stop->error = -ENOMEDIUM;
diff --git a/drivers/mtd/nand/raw/r852.c b/drivers/mtd/nand/raw/r852.c
index 86456216fb93..7b99831aa046 100644
--- a/drivers/mtd/nand/raw/r852.c
+++ b/drivers/mtd/nand/raw/r852.c
@@ -45,7 +45,6 @@ static inline void r852_write_reg(struct r852_device *dev,
 						int address, uint8_t value)
 {
 	writeb(value, dev->mmio + address);
-	mmiowb();
 }
 
 
@@ -61,7 +60,6 @@ static inline void r852_write_reg_dword(struct r852_device *dev,
 							int address, uint32_t value)
 {
 	writel(cpu_to_le32(value), dev->mmio + address);
-	mmiowb();
 }
 
 /* returns pointer to our private structure */
diff --git a/drivers/mtd/nand/raw/txx9ndfmc.c b/drivers/mtd/nand/raw/txx9ndfmc.c
index ddf0420c0997..97978227aa55 100644
--- a/drivers/mtd/nand/raw/txx9ndfmc.c
+++ b/drivers/mtd/nand/raw/txx9ndfmc.c
@@ -159,7 +159,6 @@ static void txx9ndfmc_cmd_ctrl(struct nand_chip *chip, int cmd,
 		if ((ctrl & NAND_CTRL_CHANGE) && cmd == NAND_CMD_NONE)
 			txx9ndfmc_write(dev, 0, TXX9_NDFDTR);
 	}
-	mmiowb();
 }
 
 static int txx9ndfmc_dev_ready(struct nand_chip *chip)
diff --git a/drivers/net/ethernet/aeroflex/greth.c b/drivers/net/ethernet/aeroflex/greth.c
index 47e5984f16fb..3155f7fa83eb 100644
--- a/drivers/net/ethernet/aeroflex/greth.c
+++ b/drivers/net/ethernet/aeroflex/greth.c
@@ -613,7 +613,6 @@ static irqreturn_t greth_interrupt(int irq, void *dev_id)
 		napi_schedule(&greth->napi);
 	}
 
-	mmiowb();
 	spin_unlock(&greth->devlock);
 
 	return retval;
diff --git a/drivers/net/ethernet/alacritech/slicoss.c b/drivers/net/ethernet/alacritech/slicoss.c
index 16477aa6d61f..4f7e792e50e9 100644
--- a/drivers/net/ethernet/alacritech/slicoss.c
+++ b/drivers/net/ethernet/alacritech/slicoss.c
@@ -345,8 +345,6 @@ static void slic_set_rx_mode(struct net_device *dev)
 	if (sdev->promisc != set_promisc) {
 		sdev->promisc = set_promisc;
 		slic_configure_rcv(sdev);
-		/* make sure writes to receiver cant leak out of the lock */
-		mmiowb();
 	}
 	spin_unlock_bh(&sdev->link_lock);
 }
@@ -1461,8 +1459,6 @@ static netdev_tx_t slic_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (slic_get_free_tx_descs(txq) < SLIC_MAX_REQ_TX_DESCS)
 		netif_stop_queue(dev);
-	/* make sure writes to io-memory cant leak out of tx queue lock */
-	mmiowb();
 
 	return NETDEV_TX_OK;
 drop_skb:
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index b17d435de09f..05798aa5bb73 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -2016,7 +2016,6 @@ void ena_com_aenq_intr_handler(struct ena_com_dev *dev, void *data)
 	mb();
 	writel_relaxed((u32)aenq->head,
 		       dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
-	mmiowb();
 }
 
 int ena_com_dev_reset(struct ena_com_dev *ena_dev,
diff --git a/drivers/net/ethernet/atheros/atlx/atl1.c b/drivers/net/ethernet/atheros/atlx/atl1.c
index 9e07b469066a..f7583c5d9509 100644
--- a/drivers/net/ethernet/atheros/atlx/atl1.c
+++ b/drivers/net/ethernet/atheros/atlx/atl1.c
@@ -2439,7 +2439,6 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb,
 	atl1_tx_map(adapter, skb, ptpd);
 	atl1_tx_queue(adapter, count, ptpd);
 	atl1_update_mailbox(adapter);
-	mmiowb();
 	return NETDEV_TX_OK;
 }
 
diff --git a/drivers/net/ethernet/atheros/atlx/atl2.c b/drivers/net/ethernet/atheros/atlx/atl2.c
index d99317b3d891..1474cac7e892 100644
--- a/drivers/net/ethernet/atheros/atlx/atl2.c
+++ b/drivers/net/ethernet/atheros/atlx/atl2.c
@@ -908,7 +908,6 @@ static netdev_tx_t atl2_xmit_frame(struct sk_buff *skb,
 	ATL2_WRITE_REGW(&adapter->hw, REG_MB_TXD_WR_IDX,
 		(adapter->txd_write_ptr >> 2));
 
-	mmiowb();
 	dev_consume_skb_any(skb);
 	return NETDEV_TX_OK;
 }
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index d63371d70bce..dfdd14eadd57 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -3305,8 +3305,6 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 
 	BNX2_WR(bp, rxr->rx_bseq_addr, rxr->rx_prod_bseq);
 
-	mmiowb();
-
 	return rx_pkt;
 
 }
@@ -6723,8 +6721,6 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	BNX2_WR16(bp, txr->tx_bidx_addr, prod);
 	BNX2_WR(bp, txr->tx_bseq_addr, txr->tx_prod_bseq);
 
-	mmiowb();
-
 	txr->tx_prod = prod;
 
 	if (unlikely(bnx2_tx_avail(bp, txr) <= MAX_SKB_FRAGS)) {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index ecb1bd7eb508..0c8f5b546c6f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4166,8 +4166,6 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
 
-	mmiowb();
-
 	txdata->tx_bd_prod += nbd;
 
 	if (unlikely(bnx2x_tx_avail(bp, txdata) < MAX_DESC_PER_TX_PKT)) {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 1ed068509337..2d57af9c061c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -527,8 +527,6 @@ static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
 		REG_WR_RELAXED(bp, fp->ustorm_rx_prods_offset + i * 4,
 			       ((u32 *)&rx_prods)[i]);
 
-	mmiowb();
-
 	DP(NETIF_MSG_RX_STATUS,
 	   "queue[%d]:  wrote  bd_prod %u  cqe_prod %u  sge_prod %u\n",
 	   fp->index, bd_prod, rx_comp_prod, rx_sge_prod);
@@ -653,7 +651,6 @@ static inline void bnx2x_igu_ack_sb_gen(struct bnx2x *bp, u8 igu_sb_id,
 	REG_WR(bp, igu_addr, cmd_data.sb_id_and_flags);
 
 	/* Make sure that ACK is written */
-	mmiowb();
 	barrier();
 }
 
@@ -674,7 +671,6 @@ static inline void bnx2x_hc_ack_sb(struct bnx2x *bp, u8 sb_id,
 	REG_WR(bp, hc_addr, (*(u32 *)&igu_ack));
 
 	/* Make sure that ACK is written */
-	mmiowb();
 	barrier();
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 749d0ef44371..0745cccd416d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -2623,7 +2623,6 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
 	wmb();
 	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
 
-	mmiowb();
 	barrier();
 
 	num_pkts++;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index e46786a56b0c..3716c828ff5d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -869,9 +869,6 @@ static void bnx2x_hc_int_disable(struct bnx2x *bp)
 	   "write %x to HC %d (addr 0x%x)\n",
 	   val, port, addr);
 
-	/* flush all outstanding writes */
-	mmiowb();
-
 	REG_WR(bp, addr, val);
 	if (REG_RD(bp, addr) != val)
 		BNX2X_ERR("BUG! Proper val not read from IGU!\n");
@@ -887,9 +884,6 @@ static void bnx2x_igu_int_disable(struct bnx2x *bp)
 
 	DP(NETIF_MSG_IFDOWN, "write %x to IGU\n", val);
 
-	/* flush all outstanding writes */
-	mmiowb();
-
 	REG_WR(bp, IGU_REG_PF_CONFIGURATION, val);
 	if (REG_RD(bp, IGU_REG_PF_CONFIGURATION) != val)
 		BNX2X_ERR("BUG! Proper val not read from IGU!\n");
@@ -1595,7 +1589,6 @@ static void bnx2x_hc_int_enable(struct bnx2x *bp)
 	/*
 	 * Ensure that HC_CONFIG is written before leading/trailing edge config
 	 */
-	mmiowb();
 	barrier();
 
 	if (!CHIP_IS_E1(bp)) {
@@ -1611,9 +1604,6 @@ static void bnx2x_hc_int_enable(struct bnx2x *bp)
 		REG_WR(bp, HC_REG_TRAILING_EDGE_0 + port*8, val);
 		REG_WR(bp, HC_REG_LEADING_EDGE_0 + port*8, val);
 	}
-
-	/* Make sure that interrupts are indeed enabled from here on */
-	mmiowb();
 }
 
 static void bnx2x_igu_int_enable(struct bnx2x *bp)
@@ -1674,9 +1664,6 @@ static void bnx2x_igu_int_enable(struct bnx2x *bp)
 
 	REG_WR(bp, IGU_REG_TRAILING_EDGE_LATCH, val);
 	REG_WR(bp, IGU_REG_LEADING_EDGE_LATCH, val);
-
-	/* Make sure that interrupts are indeed enabled from here on */
-	mmiowb();
 }
 
 void bnx2x_int_enable(struct bnx2x *bp)
@@ -3833,7 +3820,6 @@ static void bnx2x_sp_prod_update(struct bnx2x *bp)
 
 	REG_WR16_RELAXED(bp, BAR_XSTRORM_INTMEM + XSTORM_SPQ_PROD_OFFSET(func),
 			 bp->spq_prod_idx);
-	mmiowb();
 }
 
 /**
@@ -5244,7 +5230,6 @@ static void bnx2x_update_eq_prod(struct bnx2x *bp, u16 prod)
 {
 	/* No memory barriers */
 	storm_memset_eq_prod(bp, prod, BP_FUNC(bp));
-	mmiowb();
 }
 
 static int  bnx2x_cnic_handle_cfc_del(struct bnx2x *bp, u32 cid,
@@ -6513,7 +6498,6 @@ void bnx2x_nic_init_cnic(struct bnx2x *bp)
 
 	/* flush all */
 	mb();
-	mmiowb();
 }
 
 void bnx2x_pre_irq_nic_init(struct bnx2x *bp)
@@ -6553,7 +6537,6 @@ void bnx2x_post_irq_nic_init(struct bnx2x *bp, u32 load_code)
 
 	/* flush all before enabling interrupts */
 	mb();
-	mmiowb();
 
 	bnx2x_int_enable(bp);
 
@@ -7775,12 +7758,10 @@ void bnx2x_igu_clear_sb_gen(struct bnx2x *bp, u8 func, u8 idu_sb_id, bool is_pf)
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 			 data, igu_addr_data);
 	REG_WR(bp, igu_addr_data, data);
-	mmiowb();
 	barrier();
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 			  ctl, igu_addr_ctl);
 	REG_WR(bp, igu_addr_ctl, ctl);
-	mmiowb();
 	barrier();
 
 	/* wait for clean up to finish */
@@ -9550,7 +9531,6 @@ static void bnx2x_set_234_gates(struct bnx2x *bp, bool close)
 
 	DP(NETIF_MSG_HW | NETIF_MSG_IFUP, "%s gates #2, #3 and #4\n",
 		close ? "closing" : "opening");
-	mmiowb();
 }
 
 #define SHARED_MF_CLP_MAGIC  0x80000000 /* `magic' bit */
@@ -9674,7 +9654,6 @@ static void bnx2x_pxp_prep(struct bnx2x *bp)
 	if (!CHIP_IS_E1(bp)) {
 		REG_WR(bp, PXP2_REG_RD_START_INIT, 0);
 		REG_WR(bp, PXP2_REG_RQ_RBC_DONE, 0);
-		mmiowb();
 	}
 }
 
@@ -9774,16 +9753,13 @@ static void bnx2x_process_kill_chip_reset(struct bnx2x *bp, bool global)
 	       reset_mask1 & (~not_reset_mask1));
 
 	barrier();
-	mmiowb();
 
 	REG_WR(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_2_SET,
 	       reset_mask2 & (~stay_reset2));
 
 	barrier();
-	mmiowb();
 
 	REG_WR(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_1_SET, reset_mask1);
-	mmiowb();
 }
 
 /**
@@ -9867,9 +9843,6 @@ static int bnx2x_process_kill(struct bnx2x *bp, bool global)
 	REG_WR(bp, MISC_REG_UNPREPARED, 0);
 	barrier();
 
-	/* Make sure all is written to the chip before the reset */
-	mmiowb();
-
 	/* Wait for 1ms to empty GLUE and PCI-E core queues,
 	 * PSWHST, GRC and PSWRD Tetris buffer.
 	 */
@@ -14828,7 +14801,6 @@ static int bnx2x_drv_ctl(struct net_device *dev, struct drv_ctl_info *ctl)
 		if (rc)
 			break;
 
-		mmiowb();
 		barrier();
 
 		/* Start accepting on iSCSI L2 ring */
@@ -14863,7 +14835,6 @@ static int bnx2x_drv_ctl(struct net_device *dev, struct drv_ctl_info *ctl)
 		if (!bnx2x_wait_sp_comp(bp, sp_bits))
 			BNX2X_ERR("rx_mode completion timed out!\n");
 
-		mmiowb();
 		barrier();
 
 		/* Unset iSCSI L2 MAC */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
index 7b22a6d8514c..80d250a6d048 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
@@ -5039,7 +5039,6 @@ static inline int bnx2x_q_init(struct bnx2x *bp,
 	/* As no ramrod is sent, complete the command immediately  */
 	o->complete_cmd(bp, o, BNX2X_Q_CMD_INIT);
 
-	mmiowb();
 	smp_mb();
 
 	return 0;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index c97b642e6537..0edbb0a76847 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -100,13 +100,11 @@ static void bnx2x_vf_igu_ack_sb(struct bnx2x *bp, struct bnx2x_virtf *vf,
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 	   cmd_data.sb_id_and_flags, igu_addr_data);
 	REG_WR(bp, igu_addr_data, cmd_data.sb_id_and_flags);
-	mmiowb();
 	barrier();
 
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 	   ctl, igu_addr_ctl);
 	REG_WR(bp, igu_addr_ctl, ctl);
-	mmiowb();
 	barrier();
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index a9bdc21873d3..672b57f0b84d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -172,8 +172,6 @@ static int bnx2x_send_msg2pf(struct bnx2x *bp, u8 *done, dma_addr_t msg_mapping)
 	/* Trigger the PF FW */
 	writeb_relaxed(1, &zone_data->trigger.vf_pf_channel.addr_valid);
 
-	mmiowb();
-
 	/* Wait for PF to complete */
 	while ((tout >= 0) && (!*done)) {
 		msleep(interval);
@@ -1179,7 +1177,6 @@ static void bnx2x_vf_mbx_resp_send_msg(struct bnx2x *bp,
 
 	/* ack the FW */
 	storm_memset_vf_mbx_ack(bp, vf->abs_vfid);
-	mmiowb();
 
 	/* copy the response header including status-done field,
 	 * must be last dmae, must be after FW is acked
@@ -2174,7 +2171,6 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf,
 		 */
 		storm_memset_vf_mbx_ack(bp, vf->abs_vfid);
 		/* Firmware ack should be written before unlocking channel */
-		mmiowb();
 		bnx2x_unlock_vf_pf_channel(bp, vf, mbx->first_tlv.tl.type);
 	}
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 0bb9d7b3a2b6..b8b68d408ad0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -556,8 +556,6 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 tx_done:
 
-	mmiowb();
-
 	if (unlikely(bnxt_tx_avail(bp, txr) <= MAX_SKB_FRAGS + 1)) {
 		if (skb->xmit_more && !tx_buf->is_push)
 			bnxt_db_write(bp, &txr->tx_db, prod);
@@ -2123,7 +2121,6 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 			       &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
-	mmiowb();
 	return work_done;
 }
 
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 328373e0578f..821bccc0915c 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -1073,7 +1073,6 @@ static void tg3_int_reenable(struct tg3_napi *tnapi)
 	struct tg3 *tp = tnapi->tp;
 
 	tw32_mailbox(tnapi->int_mbox, tnapi->last_tag << 24);
-	mmiowb();
 
 	/* When doing tagged status, this work check is unnecessary.
 	 * The last_tag we write above tells the chip which piece of
@@ -6999,7 +6998,6 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 			tw32_rx_mbox(TG3_RX_JMB_PROD_IDX_REG,
 				     tpr->rx_jmb_prod_idx);
 		}
-		mmiowb();
 	} else if (work_mask) {
 		/* rx_std_buffers[] and rx_jmb_buffers[] entries must be
 		 * updated before the producer indices can be updated.
@@ -7210,8 +7208,6 @@ static int tg3_poll_work(struct tg3_napi *tnapi, int work_done, int budget)
 			tw32_rx_mbox(TG3_RX_JMB_PROD_IDX_REG,
 				     dpr->rx_jmb_prod_idx);
 
-		mmiowb();
-
 		if (err)
 			tw32_f(HOSTCC_MODE, tp->coal_now);
 	}
@@ -7278,7 +7274,6 @@ static int tg3_poll_msix(struct napi_struct *napi, int budget)
 						  HOSTCC_MODE_ENABLE |
 						  tnapi->coal_now);
 			}
-			mmiowb();
 			break;
 		}
 	}
@@ -8159,7 +8154,6 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (!skb->xmit_more || netif_xmit_stopped(txq)) {
 		/* Packets are ready, update Tx producer idx on card. */
 		tw32_tx_mbox(tnapi->prodmbox, entry);
-		mmiowb();
 	}
 
 	return NETDEV_TX_OK;
diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
index 2df7440f58df..39643be8c30a 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
@@ -38,9 +38,6 @@ int lio_cn6xxx_soft_reset(struct octeon_device *oct)
 	lio_pci_readq(oct, CN6XXX_CIU_SOFT_RST);
 	lio_pci_writeq(oct, 1, CN6XXX_CIU_SOFT_RST);
 
-	/* make sure that the reset is written before starting timer */
-	mmiowb();
-
 	/* Wait for 10ms as Octeon resets. */
 	mdelay(100);
 
@@ -487,9 +484,6 @@ void lio_cn6xxx_disable_interrupt(struct octeon_device *oct,
 
 	/* Disable Interrupts */
 	writeq(0, cn6xxx->intr_enb_reg64);
-
-	/* make sure interrupts are really disabled */
-	mmiowb();
 }
 
 static void lio_cn6xxx_get_pcie_qlmport(struct octeon_device *oct)
@@ -555,10 +549,6 @@ static int lio_cn6xxx_process_droq_intr_regs(struct octeon_device *oct)
 				value &= ~(1 << oq_no);
 				octeon_write_csr(oct, reg, value);
 
-				/* Ensure that the enable register is written.
-				 */
-				mmiowb();
-
 				spin_unlock(&cn6xxx->lock_for_droq_int_enb_reg);
 			}
 		}
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index ce8c3f818666..934115d18488 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -1449,7 +1449,6 @@ void lio_enable_irq(struct octeon_droq *droq, struct octeon_instr_queue *iq)
 		iq->pkt_in_done -= iq->pkts_processed;
 		iq->pkts_processed = 0;
 		/* this write needs to be flushed before we release the lock */
-		mmiowb();
 		spin_unlock_bh(&iq->lock);
 		oct = iq->oct_dev;
 	}
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
index a0c099f71524..017169023cca 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
@@ -513,8 +513,6 @@ int octeon_retry_droq_refill(struct octeon_droq *droq)
 		 */
 		wmb();
 		writel(desc_refilled, droq->pkts_credit_reg);
-		/* make sure mmio write completes */
-		mmiowb();
 
 		if (pkts_credit + desc_refilled >= CN23XX_SLI_DEF_BP)
 			reschedule = 0;
@@ -712,8 +710,6 @@ octeon_droq_fast_process_packets(struct octeon_device *oct,
 				 */
 				wmb();
 				writel(desc_refilled, droq->pkts_credit_reg);
-				/* make sure mmio write completes */
-				mmiowb();
 			}
 		}
 	}                       /* for (each packet)... */
diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index c6f4cbda040f..fcf20a8f92d9 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -278,7 +278,6 @@ ring_doorbell(struct octeon_device *oct, struct octeon_instr_queue *iq)
 	if (atomic_read(&oct->status) == OCT_DEV_RUNNING) {
 		writel(iq->fill_cnt, iq->doorbell_reg);
 		/* make sure doorbell write goes through */
-		mmiowb();
 		iq->fill_cnt = 0;
 		iq->last_db_time = jiffies;
 		return;
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 8fe9af0e2ab7..466bf1ea186d 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3270,11 +3270,6 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 		if (!skb->xmit_more ||
 		    netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
 			writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
-			/* we need this if more than one processor can write to
-			 * our tail at a time, it synchronizes IO on IA64/Altix
-			 * systems
-			 */
-			mmiowb();
 		}
 	} else {
 		dev_kfree_skb_any(skb);
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7acc61e4f645..022c3ac0e40f 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3816,7 +3816,6 @@ static void e1000_flush_tx_ring(struct e1000_adapter *adapter)
 	if (tx_ring->next_to_use == tx_ring->count)
 		tx_ring->next_to_use = 0;
 	ew32(TDT(0), tx_ring->next_to_use);
-	mmiowb();
 	usleep_range(200, 250);
 }
 
@@ -5904,12 +5903,6 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 						     tx_ring->next_to_use);
 			else
 				writel(tx_ring->next_to_use, tx_ring->tail);
-
-			/* we need this if more than one processor can write
-			 * to our tail at a time, it synchronizes IO on
-			 *IA64/Altix systems
-			 */
-			mmiowb();
 		}
 	} else {
 		dev_kfree_skb_any(skb);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 5d4f1761dc0c..8de77155f2e7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -321,8 +321,6 @@ static void fm10k_mask_aer_comp_abort(struct pci_dev *pdev)
 	pci_read_config_dword(pdev, pos + PCI_ERR_UNCOR_MASK, &err_mask);
 	err_mask |= PCI_ERR_UNC_COMP_ABORT;
 	pci_write_config_dword(pdev, pos + PCI_ERR_UNCOR_MASK, err_mask);
-
-	mmiowb();
 }
 
 int fm10k_iov_resume(struct pci_dev *pdev)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 5a0419421511..1f48298f01e6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1037,11 +1037,6 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 6c97667d20ef..ffb611bbedfa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3471,11 +3471,6 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 9b4d7cec2e18..6bfef82e7607 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -2360,11 +2360,6 @@ static inline void iavf_tx_map(struct iavf_ring *tx_ring, struct sk_buff *skb,
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index c289d97f477d..1af21bbe180e 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1356,11 +1356,6 @@ ice_tx_map(struct ice_ring *tx_ring, struct ice_tx_buf *first,
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 69b230c53fed..09ba94496742 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6028,11 +6028,6 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 	return 0;
 
diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index 4eab83faec62..34cd30d7162f 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -2279,10 +2279,6 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
 	writel(i, adapter->hw.hw_addr + tx_ring->tail);
-	/* we need this if more than one processor can write to our tail
-	 * at a time, it synchronizes IO on IA64/Altix systems
-	 */
-	mmiowb();
 }
 
 static netdev_tx_t igbvf_xmit_frame_ring_adv(struct sk_buff *skb,
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 87a11879bf2d..f8d692f6aa4f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -892,11 +892,6 @@ static int igc_tx_map(struct igc_ring *tx_ring,
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e100054a3765..99e23cf6a73a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8299,11 +8299,6 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);
-
-		/* we need this if more than one processor can write to our tail
-		 * at a time, it synchronizes IO on IA64/Altix systems
-		 */
-		mmiowb();
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index 8b3495ee2b6e..49486c10ef81 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -1139,9 +1139,6 @@ static inline void sky2_put_idx(struct sky2_hw *hw, unsigned q, u16 idx)
 	/* Make sure write' to descriptors are complete before we tell hardware */
 	wmb();
 	sky2_write16(hw, Y2_QADDR(q, PREF_UNIT_PUT_IDX), idx);
-
-	/* Synchronize I/O on since next processor may write to tail */
-	mmiowb();
 }
 
 
@@ -1354,7 +1351,6 @@ static void sky2_rx_stop(struct sky2_port *sky2)
 
 	/* reset the Rx prefetch unit */
 	sky2_write32(hw, Y2_QADDR(rxq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);
-	mmiowb();
 }
 
 /* Clean out receive buffer area, assumes receiver hardware stopped */
diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c
index c81d15bf259c..87e90b5d4d7d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/catas.c
+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c
@@ -129,10 +129,6 @@ static int mlx4_reset_slave(struct mlx4_dev *dev)
 	comm_flags = rst_req << COM_CHAN_RST_REQ_OFFSET;
 	__raw_writel((__force u32)cpu_to_be32(comm_flags),
 		     (__iomem char *)priv->mfunc.comm + MLX4_COMM_CHAN_FLAGS);
-	/* Make sure that our comm channel write doesn't
-	 * get mixed in with writes from another CPU.
-	 */
-	mmiowb();
 
 	end = msecs_to_jiffies(MLX4_COMM_TIME) + jiffies;
 	while (time_before(jiffies, end)) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index a5d5d6fc1da0..c678344d22a2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -281,7 +281,6 @@ static int mlx4_comm_cmd_post(struct mlx4_dev *dev, u8 cmd, u16 param)
 	val = param | (cmd << 16) | (priv->cmd.comm_toggle << 31);
 	__raw_writel((__force u32) cpu_to_be32(val),
 		     &priv->mfunc.comm->slave_write);
-	mmiowb();
 	mutex_unlock(&dev->persist->device_state_mutex);
 	return 0;
 }
@@ -496,12 +495,6 @@ static int mlx4_cmd_post(struct mlx4_dev *dev, u64 in_param, u64 out_param,
 					       (op_modifier << HCR_OPMOD_SHIFT) |
 					       op), hcr + 6);
 
-	/*
-	 * Make sure that our HCR writes don't get mixed in with
-	 * writes from another CPU starting a FW command.
-	 */
-	mmiowb();
-
 	cmd->toggle = cmd->toggle ^ 1;
 
 	ret = 0;
@@ -2206,7 +2199,6 @@ static void mlx4_master_do_cmd(struct mlx4_dev *dev, int slave, u8 cmd,
 	}
 	__raw_writel((__force u32) cpu_to_be32(reply),
 		     &priv->mfunc.comm[slave].slave_read);
-	mmiowb();
 
 	return;
 
@@ -2410,7 +2402,6 @@ int mlx4_multi_func_init(struct mlx4_dev *dev)
 				     &priv->mfunc.comm[i].slave_write);
 			__raw_writel((__force u32) 0,
 				     &priv->mfunc.comm[i].slave_read);
-			mmiowb();
 			for (port = 1; port <= MLX4_MAX_PORTS; port++) {
 				struct mlx4_vport_state *admin_vport;
 				struct mlx4_vport_state *oper_vport;
@@ -2576,10 +2567,6 @@ void mlx4_report_internal_err_comm_event(struct mlx4_dev *dev)
 		slave_read |= (u32)COMM_CHAN_EVENT_INTERNAL_ERR;
 		__raw_writel((__force u32)cpu_to_be32(slave_read),
 			     &priv->mfunc.comm[slave].slave_read);
-		/* Make sure that our comm channel write doesn't
-		 * get mixed in with writes from another CPU.
-		 */
-		mmiowb();
 	}
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index be48c6440251..c087d1014b09 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -917,7 +917,6 @@ static void cmd_work_handler(struct work_struct *work)
 	mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 << ent->idx);
 	wmb();
 	iowrite32be(1 << ent->idx, &dev->iseg->cmd_dbell);
-	mmiowb();
 	/* if not in polling don't use ent after this point */
 	if (cmd_mode == CMD_MODE_POLLING || poll_cmd) {
 		poll_timeout(ent);
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index e0340f778d8f..d8b7fba96d58 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -1439,7 +1439,6 @@ myri10ge_tx_done(struct myri10ge_slice_state *ss, int mcp_index)
 			tx->queue_active = 0;
 			put_be32(htonl(1), tx->send_stop);
 			mb();
-			mmiowb();
 		}
 		__netif_tx_unlock(dev_queue);
 	}
@@ -2861,7 +2860,6 @@ static netdev_tx_t myri10ge_xmit(struct sk_buff *skb,
 		tx->queue_active = 1;
 		put_be32(htonl(1), tx->send_go);
 		mb();
-		mmiowb();
 	}
 	tx->pkt_start++;
 	if ((avail - count) < MXGEFW_MAX_SEND_DESC) {
diff --git a/drivers/net/ethernet/neterion/s2io.c b/drivers/net/ethernet/neterion/s2io.c
index feda9644289d..3b2ae1a21678 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -4153,8 +4153,6 @@ static netdev_tx_t s2io_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	writeq(val64, &tx_fifo->List_Control);
 
-	mmiowb();
-
 	put_off++;
 	if (put_off == fifo->tx_curr_put_info.fifo_len + 1)
 		put_off = 0;
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index b877acec5cde..1d334f2e0a56 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -1826,7 +1826,6 @@ static int vxge_poll_msix(struct napi_struct *napi, int budget)
 		vxge_hw_channel_msix_unmask(
 				(struct __vxge_hw_channel *)ring->handle,
 				ring->rx_vector_no);
-		mmiowb();
 	}
 
 	/* We are copying and returning the local variable, in case if after
@@ -2234,8 +2233,6 @@ static irqreturn_t vxge_tx_msix_handle(int irq, void *dev_id)
 	vxge_hw_channel_msix_unmask((struct __vxge_hw_channel *)fifo->handle,
 				    fifo->tx_vector_no);
 
-	mmiowb();
-
 	return IRQ_HANDLED;
 }
 
@@ -2272,14 +2269,12 @@ vxge_alarm_msix_handle(int irq, void *dev_id)
 		 */
 		vxge_hw_vpath_msix_mask(vdev->vpaths[i].handle, msix_id);
 		vxge_hw_vpath_msix_clear(vdev->vpaths[i].handle, msix_id);
-		mmiowb();
 
 		status = vxge_hw_vpath_alarm_process(vdev->vpaths[i].handle,
 			vdev->exec_mode);
 		if (status == VXGE_HW_OK) {
 			vxge_hw_vpath_msix_unmask(vdev->vpaths[i].handle,
 						  msix_id);
-			mmiowb();
 			continue;
 		}
 		vxge_debug_intr(VXGE_ERR,
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-traffic.c b/drivers/net/ethernet/neterion/vxge/vxge-traffic.c
index 59e77e3086bb..709d20d9938f 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-traffic.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-traffic.c
@@ -1399,11 +1399,7 @@ static void __vxge_hw_non_offload_db_post(struct __vxge_hw_fifo *fifo,
 		VXGE_HW_NODBW_GET_NO_SNOOP(no_snoop),
 		&fifo->nofl_db->control_0);
 
-	mmiowb();
-
 	writeq(txdl_ptr, &fifo->nofl_db->txdl_ptr);
-
-	mmiowb();
 }
 
 /**
diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c
index e23980e301b6..69e6a90edf2f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -774,18 +774,12 @@ static inline u16 qed_attn_update_idx(struct qed_hwfn *p_hwfn,
 {
 	u16 rc = 0, index;
 
-	/* Make certain HW write took affect */
-	mmiowb();
-
 	index = le16_to_cpu(p_sb_desc->sb_attn->sb_index);
 	if (p_sb_desc->index != index) {
 		p_sb_desc->index	= index;
 		rc		      = QED_SB_ATT_IDX;
 	}
 
-	/* Make certain we got a consistent view with HW */
-	mmiowb();
-
 	return rc;
 }
 
@@ -1170,7 +1164,6 @@ static void qed_sb_ack_attn(struct qed_hwfn *p_hwfn,
 	/* Both segments (interrupts & acks) are written to same place address;
 	 * Need to guarantee all commands will be received (in-order) by HW.
 	 */
-	mmiowb();
 	barrier();
 }
 
@@ -1805,9 +1798,6 @@ static void qed_int_igu_enable_attn(struct qed_hwfn *p_hwfn,
 	qed_wr(p_hwfn, p_ptt, IGU_REG_TRAILING_EDGE_LATCH, 0xfff);
 	qed_wr(p_hwfn, p_ptt, IGU_REG_ATTENTION_ENABLE, 0xfff);
 
-	/* Flush the writes to IGU */
-	mmiowb();
-
 	/* Unmask AEU signals toward IGU */
 	qed_wr(p_hwfn, p_ptt, MISC_REG_AEU_MASK_ATTN_IGU, 0xff);
 }
@@ -1871,9 +1861,6 @@ static void qed_int_igu_cleanup_sb(struct qed_hwfn *p_hwfn,
 
 	qed_wr(p_hwfn, p_ptt, IGU_REG_COMMAND_REG_CTRL, cmd_ctrl);
 
-	/* Flush the write to IGU */
-	mmiowb();
-
 	/* calculate where to read the status bit from */
 	sb_bit = 1 << (igu_sb_id % 32);
 	sb_bit_addr = igu_sb_id / 32 * sizeof(u32);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c b/drivers/net/ethernet/qlogic/qed/qed_spq.c
index 79b311b86f66..f5f3c03b9dd2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_spq.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c
@@ -341,9 +341,6 @@ void qed_eq_prod_update(struct qed_hwfn *p_hwfn, u16 prod)
 		   USTORM_EQE_CONS_OFFSET(p_hwfn->rel_pf_id);
 
 	REG_WR16(p_hwfn, addr, prod);
-
-	/* keep prod updates ordered */
-	mmiowb();
 }
 
 int qed_eq_completion(struct qed_hwfn *p_hwfn, void *cookie)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index b4c8949933f1..4555c0b161ef 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -1526,14 +1526,6 @@ static int qede_selftest_transmit_traffic(struct qede_dev *edev,
 	barrier();
 	writel(txq->tx_db.raw, txq->doorbell_addr);
 
-	/* mmiowb is needed to synchronize doorbell writes from more than one
-	 * processor. It guarantees that the write arrives to the device before
-	 * the queue lock is released and another start_xmit is called (possibly
-	 * on another CPU). Without this barrier, the next doorbell can bypass
-	 * this doorbell. This is applicable to IA64/Altix systems.
-	 */
-	mmiowb();
-
 	for (i = 0; i < QEDE_SELFTEST_POLL_COUNT; i++) {
 		if (qede_txq_has_work(txq))
 			break;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index 31b046e24565..6f7e3622c6b4 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -580,14 +580,6 @@ void qede_update_rx_prod(struct qede_dev *edev, struct qede_rx_queue *rxq)
 
 	internal_ram_wr(rxq->hw_rxq_prod_addr, sizeof(rx_prods),
 			(u32 *)&rx_prods);
-
-	/* mmiowb is needed to synchronize doorbell writes from more than one
-	 * processor. It guarantees that the write arrives to the device before
-	 * the napi lock is released and another qede_poll is called (possibly
-	 * on another CPU). Without this barrier, the next doorbell can bypass
-	 * this doorbell. This is applicable to IA64/Altix systems.
-	 */
-	mmiowb();
 }
 
 static void qede_get_rxhash(struct sk_buff *skb, u8 bitfields, __le32 rss_hash)
diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c
index b61b88cbc0c7..457444894d80 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -1858,7 +1858,6 @@ static void ql_update_small_bufq_prod_index(struct ql3_adapter *qdev)
 		wmb();
 		writel_relaxed(qdev->small_buf_q_producer_index,
 			       &port_regs->CommonRegs.rxSmallQProducerIndex);
-		mmiowb();
 	}
 }
 
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index 3e71b65a9546..ad7c5eb8a3b6 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -2181,7 +2181,6 @@ static inline void ql_write32(const struct ql_adapter *qdev, int reg, u32 val)
 static inline void ql_write_db_reg(u32 val, void __iomem *addr)
 {
 	writel(val, addr);
-	mmiowb();
 }
 
 /*
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 07e1c623048e..6cae33072496 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2695,7 +2695,6 @@ static netdev_tx_t qlge_send(struct sk_buff *skb, struct net_device *ndev)
 	wmb();
 
 	ql_write_db_reg_relaxed(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
-	mmiowb();
 	netif_printk(qdev, tx_queued, KERN_DEBUG, qdev->ndev,
 		     "tx queued, slot %d, len %d\n",
 		     tx_ring->prod_idx, skb->len);
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 8154b38c08f7..316b47741d3f 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -728,7 +728,6 @@ static irqreturn_t ravb_emac_interrupt(int irq, void *dev_id)
 
 	spin_lock(&priv->lock);
 	ravb_emac_interrupt_unlocked(ndev);
-	mmiowb();
 	spin_unlock(&priv->lock);
 	return IRQ_HANDLED;
 }
@@ -848,7 +847,6 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id)
 		result = IRQ_HANDLED;
 	}
 
-	mmiowb();
 	spin_unlock(&priv->lock);
 	return result;
 }
@@ -881,7 +879,6 @@ static irqreturn_t ravb_multi_interrupt(int irq, void *dev_id)
 		result = IRQ_HANDLED;
 	}
 
-	mmiowb();
 	spin_unlock(&priv->lock);
 	return result;
 }
@@ -898,7 +895,6 @@ static irqreturn_t ravb_dma_interrupt(int irq, void *dev_id, int q)
 	if (ravb_queue_interrupt(ndev, q))
 		result = IRQ_HANDLED;
 
-	mmiowb();
 	spin_unlock(&priv->lock);
 	return result;
 }
@@ -943,7 +939,6 @@ static int ravb_poll(struct napi_struct *napi, int budget)
 			ravb_write(ndev, ~(mask | TIS_RESERVED), TIS);
 			ravb_tx_free(ndev, q, true);
 			netif_wake_subqueue(ndev, q);
-			mmiowb();
 			spin_unlock_irqrestore(&priv->lock, flags);
 		}
 	}
@@ -959,7 +954,6 @@ static int ravb_poll(struct napi_struct *napi, int budget)
 		ravb_write(ndev, mask, RIE0);
 		ravb_write(ndev, mask, TIE);
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	/* Receive error message handling */
@@ -1008,7 +1002,6 @@ static void ravb_adjust_link(struct net_device *ndev)
 	if (priv->no_avb_link && phydev->link)
 		ravb_rcv_snd_enable(ndev);
 
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	if (new_state && netif_msg_link(priv))
@@ -1601,7 +1594,6 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 		netif_stop_subqueue(ndev, q);
 
 exit:
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 	return NETDEV_TX_OK;
 
@@ -1673,7 +1665,6 @@ static void ravb_set_rx_mode(struct net_device *ndev)
 	spin_lock_irqsave(&priv->lock, flags);
 	ravb_modify(ndev, ECMR, ECMR_PRM,
 		    ndev->flags & IFF_PROMISC ? ECMR_PRM : 0);
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 }
 
diff --git a/drivers/net/ethernet/renesas/ravb_ptp.c b/drivers/net/ethernet/renesas/ravb_ptp.c
index dce2a40a31e3..9a42580693cb 100644
--- a/drivers/net/ethernet/renesas/ravb_ptp.c
+++ b/drivers/net/ethernet/renesas/ravb_ptp.c
@@ -196,7 +196,6 @@ static int ravb_ptp_extts(struct ptp_clock_info *ptp,
 		ravb_write(ndev, GIE_PTCS, GIE);
 	else
 		ravb_write(ndev, GID_PTCD, GID);
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	return 0;
@@ -259,7 +258,6 @@ static int ravb_ptp_perout(struct ptp_clock_info *ptp,
 		else
 			ravb_write(ndev, GID_PTMD0, GID);
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	return error;
@@ -331,7 +329,6 @@ void ravb_ptp_init(struct net_device *ndev, struct platform_device *pdev)
 	spin_lock_irqsave(&priv->lock, flags);
 	ravb_wait(ndev, GCCR, GCCR_TCR, GCCR_TCR_NOREQ);
 	ravb_modify(ndev, GCCR, GCCR_TCSS, GCCR_TCSS_ADJGPTP);
-	mmiowb();
 	spin_unlock_irqrestore(&priv->lock, flags);
 
 	priv->ptp.clock = ptp_clock_register(&priv->ptp.info, &pdev->dev);
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index e33af371b169..ed30aebdb941 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2010,7 +2010,6 @@ static void sh_eth_adjust_link(struct net_device *ndev)
 	if ((mdp->cd->no_psr || mdp->no_ether_link) && phydev->link)
 		sh_eth_rcv_snd_enable(ndev);
 
-	mmiowb();
 	spin_unlock_irqrestore(&mdp->lock, flags);
 
 	if (new_state && netif_msg_link(mdp))
diff --git a/drivers/net/ethernet/sfc/falcon/io.h b/drivers/net/ethernet/sfc/falcon/io.h
index 7085ee1d5e2b..c3577643fbda 100644
--- a/drivers/net/ethernet/sfc/falcon/io.h
+++ b/drivers/net/ethernet/sfc/falcon/io.h
@@ -108,7 +108,6 @@ static inline void ef4_writeo(struct ef4_nic *efx, const ef4_oword_t *value,
 	_ef4_writed(efx, value->u32[2], reg + 8);
 	_ef4_writed(efx, value->u32[3], reg + 12);
 #endif
-	mmiowb();
 	spin_unlock_irqrestore(&efx->biu_lock, flags);
 }
 
@@ -130,7 +129,6 @@ static inline void ef4_sram_writeq(struct ef4_nic *efx, void __iomem *membase,
 	__raw_writel((__force u32)value->u32[0], membase + addr);
 	__raw_writel((__force u32)value->u32[1], membase + addr + 4);
 #endif
-	mmiowb();
 	spin_unlock_irqrestore(&efx->biu_lock, flags);
 }
 
diff --git a/drivers/net/ethernet/sfc/io.h b/drivers/net/ethernet/sfc/io.h
index 89563170af52..2774a10f44e9 100644
--- a/drivers/net/ethernet/sfc/io.h
+++ b/drivers/net/ethernet/sfc/io.h
@@ -120,7 +120,6 @@ static inline void efx_writeo(struct efx_nic *efx, const efx_oword_t *value,
 	_efx_writed(efx, value->u32[2], reg + 8);
 	_efx_writed(efx, value->u32[3], reg + 12);
 #endif
-	mmiowb();
 	spin_unlock_irqrestore(&efx->biu_lock, flags);
 }
 
@@ -142,7 +141,6 @@ static inline void efx_sram_writeq(struct efx_nic *efx, void __iomem *membase,
 	__raw_writel((__force u32)value->u32[0], membase + addr);
 	__raw_writel((__force u32)value->u32[1], membase + addr + 4);
 #endif
-	mmiowb();
 	spin_unlock_irqrestore(&efx->biu_lock, flags);
 }
 
diff --git a/drivers/net/ethernet/silan/sc92031.c b/drivers/net/ethernet/silan/sc92031.c
index c07fd594fe71..db5dc8ce0aff 100644
--- a/drivers/net/ethernet/silan/sc92031.c
+++ b/drivers/net/ethernet/silan/sc92031.c
@@ -361,7 +361,6 @@ static void sc92031_disable_interrupts(struct net_device *dev)
 	/* stop interrupts */
 	iowrite32(0, port_base + IntrMask);
 	_sc92031_dummy_read(port_base);
-	mmiowb();
 
 	/* wait for any concurrent interrupt/tasklet to finish */
 	synchronize_irq(priv->pdev->irq);
@@ -379,7 +378,6 @@ static void sc92031_enable_interrupts(struct net_device *dev)
 	wmb();
 
 	iowrite32(IntrBits, port_base + IntrMask);
-	mmiowb();
 }
 
 static void _sc92031_disable_tx_rx(struct net_device *dev)
@@ -867,7 +865,6 @@ static void sc92031_tasklet(unsigned long data)
 	rmb();
 
 	iowrite32(intr_mask, port_base + IntrMask);
-	mmiowb();
 
 	spin_unlock(&priv->lock);
 }
@@ -901,7 +898,6 @@ static irqreturn_t sc92031_interrupt(int irq, void *dev_id)
 	rmb();
 
 	iowrite32(intr_mask, port_base + IntrMask);
-	mmiowb();
 
 	return IRQ_NONE;
 }
@@ -978,7 +974,6 @@ static netdev_tx_t sc92031_start_xmit(struct sk_buff *skb,
 	iowrite32(priv->tx_bufs_dma_addr + entry * TX_BUF_SIZE,
 			port_base + TxAddr0 + entry * 4);
 	iowrite32(tx_status, port_base + TxStatus0 + entry * 4);
-	mmiowb();
 
 	if (priv->tx_head - priv->tx_tail >= NUM_TX_DESC)
 		netif_stop_queue(dev);
@@ -1024,7 +1019,6 @@ static int sc92031_open(struct net_device *dev)
 	spin_lock_bh(&priv->lock);
 
 	_sc92031_reset(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 	sc92031_enable_interrupts(dev);
@@ -1060,7 +1054,6 @@ static int sc92031_stop(struct net_device *dev)
 
 	_sc92031_disable_tx_rx(dev);
 	_sc92031_tx_clear(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1081,7 +1074,6 @@ static void sc92031_set_multicast_list(struct net_device *dev)
 
 	_sc92031_set_mar(dev);
 	_sc92031_set_rx_config(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 }
@@ -1098,7 +1090,6 @@ static void sc92031_tx_timeout(struct net_device *dev)
 	priv->tx_timeouts++;
 
 	_sc92031_reset(dev);
-	mmiowb();
 
 	spin_unlock(&priv->lock);
 
@@ -1140,7 +1131,6 @@ sc92031_ethtool_get_link_ksettings(struct net_device *dev,
 
 	output_status = _sc92031_mii_read(port_base, MII_OutputStatus);
 	_sc92031_mii_scan(port_base);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1311,7 +1301,6 @@ static int sc92031_ethtool_set_wol(struct net_device *dev,
 
 	priv->pm_config = pm_config;
 	iowrite32(pm_config, port_base + PMConfig);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1337,7 +1326,6 @@ static int sc92031_ethtool_nway_reset(struct net_device *dev)
 
 out:
 	_sc92031_mii_scan(port_base);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1530,7 +1518,6 @@ static int sc92031_suspend(struct pci_dev *pdev, pm_message_t state)
 
 	_sc92031_disable_tx_rx(dev);
 	_sc92031_tx_clear(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 
@@ -1555,7 +1542,6 @@ static int sc92031_resume(struct pci_dev *pdev)
 	spin_lock_bh(&priv->lock);
 
 	_sc92031_reset(dev);
-	mmiowb();
 
 	spin_unlock_bh(&priv->lock);
 	sc92031_enable_interrupts(dev);
diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
index 33949248c829..ab55416a10fa 100644
--- a/drivers/net/ethernet/via/via-rhine.c
+++ b/drivers/net/ethernet/via/via-rhine.c
@@ -571,7 +571,6 @@ static void rhine_ack_events(struct rhine_private *rp, u32 mask)
 	if (rp->quirks & rqStatusWBRace)
 		iowrite8(mask >> 16, ioaddr + IntrStatus2);
 	iowrite16(mask, ioaddr + IntrStatus);
-	mmiowb();
 }
 
 /*
@@ -863,7 +862,6 @@ static int rhine_napipoll(struct napi_struct *napi, int budget)
 	if (work_done < budget) {
 		napi_complete_done(napi, work_done);
 		iowrite16(enable_mask, ioaddr + IntrEnable);
-		mmiowb();
 	}
 	return work_done;
 }
@@ -1893,7 +1891,6 @@ static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 static void rhine_irq_disable(struct rhine_private *rp)
 {
 	iowrite16(0x0000, rp->base + IntrEnable);
-	mmiowb();
 }
 
 /* The interrupt handler does all of the Rx thread work and cleans up
diff --git a/drivers/net/ethernet/wiznet/w5100.c b/drivers/net/ethernet/wiznet/w5100.c
index d8ba512f166a..1713c2d2dccf 100644
--- a/drivers/net/ethernet/wiznet/w5100.c
+++ b/drivers/net/ethernet/wiznet/w5100.c
@@ -219,7 +219,6 @@ static inline int __w5100_write_direct(struct net_device *ndev, u32 addr,
 static inline int w5100_write_direct(struct net_device *ndev, u32 addr, u8 data)
 {
 	__w5100_write_direct(ndev, addr, data);
-	mmiowb();
 
 	return 0;
 }
@@ -236,7 +235,6 @@ static int w5100_write16_direct(struct net_device *ndev, u32 addr, u16 data)
 {
 	__w5100_write_direct(ndev, addr, data >> 8);
 	__w5100_write_direct(ndev, addr + 1, data);
-	mmiowb();
 
 	return 0;
 }
@@ -260,8 +258,6 @@ static int w5100_writebulk_direct(struct net_device *ndev, u32 addr,
 	for (i = 0; i < len; i++, addr++)
 		__w5100_write_direct(ndev, addr, *buf++);
 
-	mmiowb();
-
 	return 0;
 }
 
@@ -375,7 +371,6 @@ static int w5100_readbulk_indirect(struct net_device *ndev, u32 addr, u8 *buf,
 	for (i = 0; i < len; i++)
 		*buf++ = w5100_read_direct(ndev, W5100_IDM_DR);
 
-	mmiowb();
 	spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
 
 	return 0;
@@ -394,7 +389,6 @@ static int w5100_writebulk_indirect(struct net_device *ndev, u32 addr,
 	for (i = 0; i < len; i++)
 		__w5100_write_direct(ndev, W5100_IDM_DR, *buf++);
 
-	mmiowb();
 	spin_unlock_irqrestore(&mmio_priv->reg_lock, flags);
 
 	return 0;
diff --git a/drivers/net/ethernet/wiznet/w5300.c b/drivers/net/ethernet/wiznet/w5300.c
index f9da5d6172e3..3f03eecc0479 100644
--- a/drivers/net/ethernet/wiznet/w5300.c
+++ b/drivers/net/ethernet/wiznet/w5300.c
@@ -141,7 +141,6 @@ static u16 w5300_read_indirect(struct w5300_priv *priv, u16 addr)
 
 	spin_lock_irqsave(&priv->reg_lock, flags);
 	w5300_write_direct(priv, W5300_IDM_AR, addr);
-	mmiowb();
 	data = w5300_read_direct(priv, W5300_IDM_DR);
 	spin_unlock_irqrestore(&priv->reg_lock, flags);
 
@@ -154,9 +153,7 @@ static void w5300_write_indirect(struct w5300_priv *priv, u16 addr, u16 data)
 
 	spin_lock_irqsave(&priv->reg_lock, flags);
 	w5300_write_direct(priv, W5300_IDM_AR, addr);
-	mmiowb();
 	w5300_write_direct(priv, W5300_IDM_DR, data);
-	mmiowb();
 	spin_unlock_irqrestore(&priv->reg_lock, flags);
 }
 
@@ -192,7 +189,6 @@ static int w5300_command(struct w5300_priv *priv, u16 cmd)
 	unsigned long timeout = jiffies + msecs_to_jiffies(100);
 
 	w5300_write(priv, W5300_S0_CR, cmd);
-	mmiowb();
 
 	while (w5300_read(priv, W5300_S0_CR) != 0) {
 		if (time_after(jiffies, timeout))
@@ -241,18 +237,15 @@ static void w5300_write_macaddr(struct w5300_priv *priv)
 	w5300_write(priv, W5300_SHARH,
 		      ndev->dev_addr[4] << 8 |
 		      ndev->dev_addr[5]);
-	mmiowb();
 }
 
 static void w5300_hw_reset(struct w5300_priv *priv)
 {
 	w5300_write_direct(priv, W5300_MR, MR_RST);
-	mmiowb();
 	mdelay(5);
 	w5300_write_direct(priv, W5300_MR, priv->indirect ?
 				 MR_WDF(7) | MR_PB | MR_IND :
 				 MR_WDF(7) | MR_PB);
-	mmiowb();
 	w5300_write(priv, W5300_IMR, 0);
 	w5300_write_macaddr(priv);
 
@@ -264,24 +257,20 @@ static void w5300_hw_reset(struct w5300_priv *priv)
 	w5300_write32(priv, W5300_TMSRL, 64 << 24);
 	w5300_write32(priv, W5300_TMSRH, 0);
 	w5300_write(priv, W5300_MTYPE, 0x00ff);
-	mmiowb();
 }
 
 static void w5300_hw_start(struct w5300_priv *priv)
 {
 	w5300_write(priv, W5300_S0_MR, priv->promisc ?
 			  S0_MR_MACRAW : S0_MR_MACRAW_MF);
-	mmiowb();
 	w5300_command(priv, S0_CR_OPEN);
 	w5300_write(priv, W5300_S0_IMR, S0_IR_RECV | S0_IR_SENDOK);
 	w5300_write(priv, W5300_IMR, IR_S0);
-	mmiowb();
 }
 
 static void w5300_hw_close(struct w5300_priv *priv)
 {
 	w5300_write(priv, W5300_IMR, 0);
-	mmiowb();
 	w5300_command(priv, S0_CR_CLOSE);
 }
 
@@ -372,7 +361,6 @@ static netdev_tx_t w5300_start_tx(struct sk_buff *skb, struct net_device *ndev)
 	netif_stop_queue(ndev);
 
 	w5300_write_frame(priv, skb->data, skb->len);
-	mmiowb();
 	ndev->stats.tx_packets++;
 	ndev->stats.tx_bytes += skb->len;
 	dev_kfree_skb(skb);
@@ -419,7 +407,6 @@ static int w5300_napi_poll(struct napi_struct *napi, int budget)
 	if (rx_count < budget) {
 		napi_complete_done(napi, rx_count);
 		w5300_write(priv, W5300_IMR, IR_S0);
-		mmiowb();
 	}
 
 	return rx_count;
@@ -434,7 +421,6 @@ static irqreturn_t w5300_interrupt(int irq, void *ndev_instance)
 	if (!ir)
 		return IRQ_NONE;
 	w5300_write(priv, W5300_S0_IR, ir);
-	mmiowb();
 
 	if (ir & S0_IR_SENDOK) {
 		netif_dbg(priv, tx_done, ndev, "tx done\n");
@@ -444,7 +430,6 @@ static irqreturn_t w5300_interrupt(int irq, void *ndev_instance)
 	if (ir & S0_IR_RECV) {
 		if (napi_schedule_prep(&priv->napi)) {
 			w5300_write(priv, W5300_IMR, 0);
-			mmiowb();
 			__napi_schedule(&priv->napi);
 		}
 	}
diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index a2351ef45ae0..65a4c142640d 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -837,7 +837,6 @@ ath5k_txbuf_setup(struct ath5k_hw *ah, struct ath5k_buf *bf,
 
 	txq->link = &ds->ds_link;
 	ath5k_hw_start_tx_dma(ah, txq->qnum);
-	mmiowb();
 	spin_unlock_bh(&txq->lock);
 
 	return 0;
@@ -2174,7 +2173,6 @@ ath5k_beacon_config(struct ath5k_hw *ah)
 	}
 
 	ath5k_hw_set_imr(ah, ah->imask);
-	mmiowb();
 	spin_unlock_bh(&ah->block);
 }
 
@@ -2779,7 +2777,6 @@ int ath5k_start(struct ieee80211_hw *hw)
 
 	ret = 0;
 done:
-	mmiowb();
 	mutex_unlock(&ah->lock);
 
 	set_bit(ATH_STAT_STARTED, ah->status);
@@ -2839,7 +2836,6 @@ void ath5k_stop(struct ieee80211_hw *hw)
 				"putting device to sleep\n");
 	}
 
-	mmiowb();
 	mutex_unlock(&ah->lock);
 
 	ath5k_stop_tasklets(ah);
diff --git a/drivers/net/wireless/ath/ath5k/mac80211-ops.c b/drivers/net/wireless/ath/ath5k/mac80211-ops.c
index 16e052d02c94..5e866a193ed0 100644
--- a/drivers/net/wireless/ath/ath5k/mac80211-ops.c
+++ b/drivers/net/wireless/ath/ath5k/mac80211-ops.c
@@ -263,7 +263,6 @@ ath5k_bss_info_changed(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
 		memcpy(common->curbssid, bss_conf->bssid, ETH_ALEN);
 		common->curaid = 0;
 		ath5k_hw_set_bssid(ah);
-		mmiowb();
 	}
 
 	if (changes & BSS_CHANGED_BEACON_INT)
@@ -528,7 +527,6 @@ ath5k_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd,
 		ret = -EINVAL;
 	}
 
-	mmiowb();
 	mutex_unlock(&ah->lock);
 	return ret;
 }
diff --git a/drivers/net/wireless/broadcom/b43/main.c b/drivers/net/wireless/broadcom/b43/main.c
index 74be3c809225..4c7980f84591 100644
--- a/drivers/net/wireless/broadcom/b43/main.c
+++ b/drivers/net/wireless/broadcom/b43/main.c
@@ -485,7 +485,6 @@ static void b43_ram_write(struct b43_wldev *dev, u16 offset, u32 val)
 		val = swab32(val);
 
 	b43_write32(dev, B43_MMIO_RAM_CONTROL, offset);
-	mmiowb();
 	b43_write32(dev, B43_MMIO_RAM_DATA, val);
 }
 
@@ -656,9 +655,7 @@ static void b43_tsf_write_locked(struct b43_wldev *dev, u64 tsf)
 	/* The hardware guarantees us an atomic write, if we
 	 * write the low register first. */
 	b43_write32(dev, B43_MMIO_REV3PLUS_TSF_LOW, low);
-	mmiowb();
 	b43_write32(dev, B43_MMIO_REV3PLUS_TSF_HIGH, high);
-	mmiowb();
 }
 
 void b43_tsf_write(struct b43_wldev *dev, u64 tsf)
@@ -1822,11 +1819,9 @@ static void b43_beacon_update_trigger_work(struct work_struct *work)
 		if (b43_bus_host_is_sdio(dev->dev)) {
 			/* wl->mutex is enough. */
 			b43_do_beacon_update_trigger_work(dev);
-			mmiowb();
 		} else {
 			spin_lock_irq(&wl->hardirq_lock);
 			b43_do_beacon_update_trigger_work(dev);
-			mmiowb();
 			spin_unlock_irq(&wl->hardirq_lock);
 		}
 	}
@@ -2078,7 +2073,6 @@ static irqreturn_t b43_interrupt_thread_handler(int irq, void *dev_id)
 
 	mutex_lock(&dev->wl->mutex);
 	b43_do_interrupt_thread(dev);
-	mmiowb();
 	mutex_unlock(&dev->wl->mutex);
 
 	return IRQ_HANDLED;
@@ -2143,7 +2137,6 @@ static irqreturn_t b43_interrupt_handler(int irq, void *dev_id)
 
 	spin_lock(&dev->wl->hardirq_lock);
 	ret = b43_do_interrupt(dev);
-	mmiowb();
 	spin_unlock(&dev->wl->hardirq_lock);
 
 	return ret;
diff --git a/drivers/net/wireless/broadcom/b43/sysfs.c b/drivers/net/wireless/broadcom/b43/sysfs.c
index 3190493bd07f..93d03b673670 100644
--- a/drivers/net/wireless/broadcom/b43/sysfs.c
+++ b/drivers/net/wireless/broadcom/b43/sysfs.c
@@ -129,7 +129,6 @@ static ssize_t b43_attr_interfmode_store(struct device *dev,
 	} else
 		err = -ENOSYS;
 
-	mmiowb();
 	mutex_unlock(&wldev->wl->mutex);
 
 	return err ? err : count;
diff --git a/drivers/net/wireless/broadcom/b43legacy/ilt.c b/drivers/net/wireless/broadcom/b43legacy/ilt.c
index ee5682e54204..6d15fb4d30c6 100644
--- a/drivers/net/wireless/broadcom/b43legacy/ilt.c
+++ b/drivers/net/wireless/broadcom/b43legacy/ilt.c
@@ -315,14 +315,12 @@ const u16 b43legacy_ilt_sigmasqr2[B43legacy_ILT_SIGMASQR_SIZE] = {
 void b43legacy_ilt_write(struct b43legacy_wldev *dev, u16 offset, u16 val)
 {
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_CTRL, offset);
-	mmiowb();
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA1, val);
 }
 
 void b43legacy_ilt_write32(struct b43legacy_wldev *dev, u16 offset, u32 val)
 {
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_CTRL, offset);
-	mmiowb();
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA2,
 			    (val & 0xFFFF0000) >> 16);
 	b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA1,
diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c b/drivers/net/wireless/broadcom/b43legacy/main.c
index 55f411925960..c777efc6dc13 100644
--- a/drivers/net/wireless/broadcom/b43legacy/main.c
+++ b/drivers/net/wireless/broadcom/b43legacy/main.c
@@ -264,7 +264,6 @@ static void b43legacy_ram_write(struct b43legacy_wldev *dev, u16 offset,
 		val = swab32(val);
 
 	b43legacy_write32(dev, B43legacy_MMIO_RAM_CONTROL, offset);
-	mmiowb();
 	b43legacy_write32(dev, B43legacy_MMIO_RAM_DATA, val);
 }
 
@@ -341,14 +340,11 @@ void b43legacy_shm_write32(struct b43legacy_wldev *dev,
 		if (offset & 0x0003) {
 			/* Unaligned access */
 			b43legacy_shm_control_word(dev, routing, offset >> 2);
-			mmiowb();
 			b43legacy_write16(dev,
 					  B43legacy_MMIO_SHM_DATA_UNALIGNED,
 					  (value >> 16) & 0xffff);
-			mmiowb();
 			b43legacy_shm_control_word(dev, routing,
 						   (offset >> 2) + 1);
-			mmiowb();
 			b43legacy_write16(dev, B43legacy_MMIO_SHM_DATA,
 					  value & 0xffff);
 			return;
@@ -356,7 +352,6 @@ void b43legacy_shm_write32(struct b43legacy_wldev *dev,
 		offset >>= 2;
 	}
 	b43legacy_shm_control_word(dev, routing, offset);
-	mmiowb();
 	b43legacy_write32(dev, B43legacy_MMIO_SHM_DATA, value);
 }
 
@@ -368,7 +363,6 @@ void b43legacy_shm_write16(struct b43legacy_wldev *dev, u16 routing, u16 offset,
 		if (offset & 0x0003) {
 			/* Unaligned access */
 			b43legacy_shm_control_word(dev, routing, offset >> 2);
-			mmiowb();
 			b43legacy_write16(dev,
 					  B43legacy_MMIO_SHM_DATA_UNALIGNED,
 					  value);
@@ -377,7 +371,6 @@ void b43legacy_shm_write16(struct b43legacy_wldev *dev, u16 routing, u16 offset,
 		offset >>= 2;
 	}
 	b43legacy_shm_control_word(dev, routing, offset);
-	mmiowb();
 	b43legacy_write16(dev, B43legacy_MMIO_SHM_DATA, value);
 }
 
@@ -471,7 +464,6 @@ static void b43legacy_time_lock(struct b43legacy_wldev *dev)
 	status = b43legacy_read32(dev, B43legacy_MMIO_MACCTL);
 	status |= B43legacy_MACCTL_TBTTHOLD;
 	b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
-	mmiowb();
 }
 
 static void b43legacy_time_unlock(struct b43legacy_wldev *dev)
@@ -494,10 +486,8 @@ static void b43legacy_tsf_write_locked(struct b43legacy_wldev *dev, u64 tsf)
 		u32 hi = (tsf & 0xFFFFFFFF00000000ULL) >> 32;
 
 		b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_LOW, 0);
-		mmiowb();
 		b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_HIGH,
 				    hi);
-		mmiowb();
 		b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_LOW,
 				    lo);
 	} else {
@@ -507,13 +497,9 @@ static void b43legacy_tsf_write_locked(struct b43legacy_wldev *dev, u64 tsf)
 		u16 v3 = (tsf & 0xFFFF000000000000ULL) >> 48;
 
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_0, 0);
-		mmiowb();
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_3, v3);
-		mmiowb();
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_2, v2);
-		mmiowb();
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_1, v1);
-		mmiowb();
 		b43legacy_write16(dev, B43legacy_MMIO_TSF_0, v0);
 	}
 }
@@ -1250,7 +1236,6 @@ static void b43legacy_beacon_update_trigger_work(struct work_struct *work)
 		/* The handler might have updated the IRQ mask. */
 		b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK,
 				  dev->irq_mask);
-		mmiowb();
 		spin_unlock_irq(&wl->irq_lock);
 	}
 	mutex_unlock(&wl->mutex);
@@ -1346,7 +1331,6 @@ static void b43legacy_interrupt_tasklet(struct b43legacy_wldev *dev)
 			       dma_reason[2], dma_reason[3],
 			       dma_reason[4], dma_reason[5]);
 			b43legacy_controller_restart(dev, "DMA error");
-			mmiowb();
 			spin_unlock_irqrestore(&dev->wl->irq_lock, flags);
 			return;
 		}
@@ -1396,7 +1380,6 @@ static void b43legacy_interrupt_tasklet(struct b43legacy_wldev *dev)
 		handle_irq_transmit_status(dev);
 
 	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, dev->irq_mask);
-	mmiowb();
 	spin_unlock_irqrestore(&dev->wl->irq_lock, flags);
 }
 
@@ -1488,7 +1471,6 @@ static irqreturn_t b43legacy_interrupt_handler(int irq, void *dev_id)
 	dev->irq_reason = reason;
 	tasklet_schedule(&dev->isr_tasklet);
 out:
-	mmiowb();
 	spin_unlock(&dev->wl->irq_lock);
 
 	return ret;
@@ -2781,7 +2763,6 @@ static int b43legacy_op_dev_config(struct ieee80211_hw *hw,
 
 	spin_lock_irqsave(&wl->irq_lock, flags);
 	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, dev->irq_mask);
-	mmiowb();
 	spin_unlock_irqrestore(&wl->irq_lock, flags);
 out_unlock_mutex:
 	mutex_unlock(&wl->mutex);
@@ -2900,7 +2881,6 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
 	spin_lock_irqsave(&wl->irq_lock, flags);
 	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, dev->irq_mask);
 	/* XXX: why? */
-	mmiowb();
 	spin_unlock_irqrestore(&wl->irq_lock, flags);
  out_unlock_mutex:
 	mutex_unlock(&wl->mutex);
diff --git a/drivers/net/wireless/broadcom/b43legacy/phy.c b/drivers/net/wireless/broadcom/b43legacy/phy.c
index 995c7d0c212a..f949766d27ca 100644
--- a/drivers/net/wireless/broadcom/b43legacy/phy.c
+++ b/drivers/net/wireless/broadcom/b43legacy/phy.c
@@ -134,7 +134,6 @@ u16 b43legacy_phy_read(struct b43legacy_wldev *dev, u16 offset)
 void b43legacy_phy_write(struct b43legacy_wldev *dev, u16 offset, u16 val)
 {
 	b43legacy_write16(dev, B43legacy_MMIO_PHY_CONTROL, offset);
-	mmiowb();
 	b43legacy_write16(dev, B43legacy_MMIO_PHY_DATA, val);
 }
 
diff --git a/drivers/net/wireless/broadcom/b43legacy/pio.h b/drivers/net/wireless/broadcom/b43legacy/pio.h
index 1cd1b9ca5e9c..08cd02282beb 100644
--- a/drivers/net/wireless/broadcom/b43legacy/pio.h
+++ b/drivers/net/wireless/broadcom/b43legacy/pio.h
@@ -92,7 +92,6 @@ void b43legacy_pio_write(struct b43legacy_pioqueue *queue,
 		       u16 offset, u16 value)
 {
 	b43legacy_write16(queue->dev, queue->mmio_base + offset, value);
-	mmiowb();
 }
 
 
diff --git a/drivers/net/wireless/broadcom/b43legacy/radio.c b/drivers/net/wireless/broadcom/b43legacy/radio.c
index eab1c9387846..c6db444ea07e 100644
--- a/drivers/net/wireless/broadcom/b43legacy/radio.c
+++ b/drivers/net/wireless/broadcom/b43legacy/radio.c
@@ -95,7 +95,6 @@ void b43legacy_radio_lock(struct b43legacy_wldev *dev)
 	B43legacy_WARN_ON(status & B43legacy_MACCTL_RADIOLOCK);
 	status |= B43legacy_MACCTL_RADIOLOCK;
 	b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
-	mmiowb();
 	udelay(10);
 }
 
@@ -108,7 +107,6 @@ void b43legacy_radio_unlock(struct b43legacy_wldev *dev)
 	B43legacy_WARN_ON(!(status & B43legacy_MACCTL_RADIOLOCK));
 	status &= ~B43legacy_MACCTL_RADIOLOCK;
 	b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
-	mmiowb();
 }
 
 u16 b43legacy_radio_read16(struct b43legacy_wldev *dev, u16 offset)
@@ -141,7 +139,6 @@ u16 b43legacy_radio_read16(struct b43legacy_wldev *dev, u16 offset)
 void b43legacy_radio_write16(struct b43legacy_wldev *dev, u16 offset, u16 val)
 {
 	b43legacy_write16(dev, B43legacy_MMIO_RADIO_CONTROL, offset);
-	mmiowb();
 	b43legacy_write16(dev, B43legacy_MMIO_RADIO_DATA_LOW, val);
 }
 
@@ -333,7 +330,6 @@ u8 b43legacy_radio_aci_scan(struct b43legacy_wldev *dev)
 void b43legacy_nrssi_hw_write(struct b43legacy_wldev *dev, u16 offset, s16 val)
 {
 	b43legacy_phy_write(dev, B43legacy_PHY_NRSSILT_CTRL, offset);
-	mmiowb();
 	b43legacy_phy_write(dev, B43legacy_PHY_NRSSILT_DATA, (u16)val);
 }
 
diff --git a/drivers/net/wireless/broadcom/b43legacy/sysfs.c b/drivers/net/wireless/broadcom/b43legacy/sysfs.c
index 2a1da15c913b..2db83eec7a11 100644
--- a/drivers/net/wireless/broadcom/b43legacy/sysfs.c
+++ b/drivers/net/wireless/broadcom/b43legacy/sysfs.c
@@ -143,7 +143,6 @@ static ssize_t b43legacy_attr_interfmode_store(struct device *dev,
 	if (err)
 		b43legacyerr(wldev->wl, "Interference Mitigation not "
 		       "supported by device\n");
-	mmiowb();
 	spin_unlock_irqrestore(&wldev->wl->irq_lock, flags);
 	mutex_unlock(&wldev->wl->mutex);
 
diff --git a/drivers/net/wireless/intel/iwlegacy/common.h b/drivers/net/wireless/intel/iwlegacy/common.h
index b079c64ca014..986646af8dfd 100644
--- a/drivers/net/wireless/intel/iwlegacy/common.h
+++ b/drivers/net/wireless/intel/iwlegacy/common.h
@@ -2030,13 +2030,6 @@ static inline void
 _il_release_nic_access(struct il_priv *il)
 {
 	_il_clear_bit(il, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
-	/*
-	 * In above we are reading CSR_GP_CNTRL register, what will flush any
-	 * previous writes, but still want write, which clear MAC_ACCESS_REQ
-	 * bit, be performed on PCI bus before any other writes scheduled on
-	 * different CPUs (after we drop reg_lock).
-	 */
-	mmiowb();
 }
 
 static inline u32
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index fe8269d023de..abbfc9cc80fc 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2067,7 +2067,6 @@ static void iwl_trans_pcie_release_nic_access(struct iwl_trans *trans,
 	 * MAC_ACCESS_REQ bit to be performed before any other writes
 	 * scheduled on different CPUs (after we drop reg_lock).
 	 */
-	mmiowb();
 out:
 	spin_unlock_irqrestore(&trans_pcie->reg_lock, *flags);
 }
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 1dede87dd54f..dcf234680535 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -358,8 +358,6 @@ static void idt_sw_write(struct idt_ntb_dev *ndev,
 	iowrite32((u32)reg, ndev->cfgspc + (ptrdiff_t)IDT_NT_GASAADDR);
 	/* Put the new value of the register */
 	iowrite32(data, ndev->cfgspc + (ptrdiff_t)IDT_NT_GASADATA);
-	/* Make sure the PCIe transactions are executed */
-	mmiowb();
 	/* Unlock GASA registers operations */
 	spin_unlock_irqrestore(&ndev->gasa_lock, irqflags);
 }
@@ -750,7 +748,6 @@ static void idt_ntb_local_link_enable(struct idt_ntb_dev *ndev)
 	spin_lock_irqsave(&ndev->mtbl_lock, irqflags);
 	idt_nt_write(ndev, IDT_NT_NTMTBLADDR, ndev->part);
 	idt_nt_write(ndev, IDT_NT_NTMTBLDATA, mtbldata);
-	mmiowb();
 	spin_unlock_irqrestore(&ndev->mtbl_lock, irqflags);
 
 	/* Notify the peers by setting and clearing the global signal bit */
@@ -778,7 +775,6 @@ static void idt_ntb_local_link_disable(struct idt_ntb_dev *ndev)
 	spin_lock_irqsave(&ndev->mtbl_lock, irqflags);
 	idt_nt_write(ndev, IDT_NT_NTMTBLADDR, ndev->part);
 	idt_nt_write(ndev, IDT_NT_NTMTBLDATA, 0);
-	mmiowb();
 	spin_unlock_irqrestore(&ndev->mtbl_lock, irqflags);
 
 	/* Notify the peers by setting and clearing the global signal bit */
@@ -1339,7 +1335,6 @@ static int idt_ntb_peer_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
 		idt_nt_write(ndev, IDT_NT_LUTLDATA, (u32)addr);
 		idt_nt_write(ndev, IDT_NT_LUTMDATA, (u32)(addr >> 32));
 		idt_nt_write(ndev, IDT_NT_LUTUDATA, data);
-		mmiowb();
 		spin_unlock_irqrestore(&ndev->lut_lock, irqflags);
 		/* Limit address isn't specified since size is fixed for LUT */
 	}
@@ -1393,7 +1388,6 @@ static int idt_ntb_peer_mw_clear_trans(struct ntb_dev *ntb, int pidx,
 		idt_nt_write(ndev, IDT_NT_LUTLDATA, 0);
 		idt_nt_write(ndev, IDT_NT_LUTMDATA, 0);
 		idt_nt_write(ndev, IDT_NT_LUTUDATA, 0);
-		mmiowb();
 		spin_unlock_irqrestore(&ndev->lut_lock, irqflags);
 	}
 
@@ -1812,7 +1806,6 @@ static int idt_ntb_peer_msg_write(struct ntb_dev *ntb, int pidx, int midx,
 	/* Set the route and send the data */
 	idt_sw_write(ndev, partdata_tbl[ndev->part].msgctl[midx], swpmsgctl);
 	idt_nt_write(ndev, ntdata_tbl.msgs[midx].out, msg);
-	mmiowb();
 	/* Unlock the messages routing table */
 	spin_unlock_irqrestore(&ndev->msg_locks[midx], irqflags);
 
diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 2a9d6b0d1f19..11a6cd374004 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -284,11 +284,9 @@ static int perf_spad_cmd_send(struct perf_peer *peer, enum perf_cmd cmd,
 		ntb_peer_spad_write(perf->ntb, peer->pidx,
 				    PERF_SPAD_HDATA(perf->gidx),
 				    upper_32_bits(data));
-		mmiowb();
 		ntb_peer_spad_write(perf->ntb, peer->pidx,
 				    PERF_SPAD_CMD(perf->gidx),
 				    cmd);
-		mmiowb();
 		ntb_peer_db_set(perf->ntb, PERF_SPAD_NOTIFY(peer->gidx));
 
 		dev_dbg(&perf->ntb->dev, "DB ring peer %#llx\n",
@@ -379,7 +377,6 @@ static int perf_msg_cmd_send(struct perf_peer *peer, enum perf_cmd cmd,
 
 		ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_HDATA,
 				   upper_32_bits(data));
-		mmiowb();
 
 		/* This call shall trigger peer message event */
 		ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_CMD, cmd);
diff --git a/drivers/scsi/bfa/bfa.h b/drivers/scsi/bfa/bfa.h
index 0e119d838e1b..762cb77253b9 100644
--- a/drivers/scsi/bfa/bfa.h
+++ b/drivers/scsi/bfa/bfa.h
@@ -62,8 +62,7 @@ void bfa_isr_unhandled(struct bfa_s *bfa, struct bfi_msg_s *m);
 			((__bfa)->iocfc.cfg.drvcfg.num_reqq_elems - 1); \
 		writel((__bfa)->iocfc.req_cq_pi[__reqq],		\
 			(__bfa)->iocfc.bfa_regs.cpe_q_pi[__reqq]);	\
-		mmiowb();      \
-	} while (0)
+		} while (0)
 
 #define bfa_rspq_pi(__bfa, __rspq)					\
 	(*(u32 *)((__bfa)->iocfc.rsp_cq_shadow_pi[__rspq].kva))
diff --git a/drivers/scsi/bfa/bfa_hw_cb.c b/drivers/scsi/bfa/bfa_hw_cb.c
index c4a0c0eb88a5..4a0d881b2602 100644
--- a/drivers/scsi/bfa/bfa_hw_cb.c
+++ b/drivers/scsi/bfa/bfa_hw_cb.c
@@ -61,7 +61,6 @@ bfa_hwcb_rspq_ack_msix(struct bfa_s *bfa, int rspq, u32 ci)
 
 	bfa_rspq_ci(bfa, rspq) = ci;
 	writel(ci, bfa->iocfc.bfa_regs.rme_q_ci[rspq]);
-	mmiowb();
 }
 
 void
@@ -72,7 +71,6 @@ bfa_hwcb_rspq_ack(struct bfa_s *bfa, int rspq, u32 ci)
 
 	bfa_rspq_ci(bfa, rspq) = ci;
 	writel(ci, bfa->iocfc.bfa_regs.rme_q_ci[rspq]);
-	mmiowb();
 }
 
 void
diff --git a/drivers/scsi/bfa/bfa_hw_ct.c b/drivers/scsi/bfa/bfa_hw_ct.c
index b0ff378dece2..b7be5f4f02a5 100644
--- a/drivers/scsi/bfa/bfa_hw_ct.c
+++ b/drivers/scsi/bfa/bfa_hw_ct.c
@@ -81,7 +81,6 @@ bfa_hwct_rspq_ack(struct bfa_s *bfa, int rspq, u32 ci)
 
 	bfa_rspq_ci(bfa, rspq) = ci;
 	writel(ci, bfa->iocfc.bfa_regs.rme_q_ci[rspq]);
-	mmiowb();
 }
 
 /*
@@ -94,7 +93,6 @@ bfa_hwct2_rspq_ack(struct bfa_s *bfa, int rspq, u32 ci)
 {
 	bfa_rspq_ci(bfa, rspq) = ci;
 	writel(ci, bfa->iocfc.bfa_regs.rme_q_ci[rspq]);
-	mmiowb();
 }
 
 void
diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
index 039328d9ef13..19734ec7f42e 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
@@ -991,7 +991,6 @@ void bnx2fc_arm_cq(struct bnx2fc_rport *tgt)
 			FCOE_CQE_TOGGLE_BIT_SHIFT);
 	msg = *((u32 *)rx_db);
 	writel(cpu_to_le32(msg), tgt->ctx_base);
-	mmiowb();
 
 }
 
@@ -1409,7 +1408,6 @@ void bnx2fc_ring_doorbell(struct bnx2fc_rport *tgt)
 				(tgt->sq_curr_toggle_bit << 15);
 	msg = *((u32 *)sq_db);
 	writel(cpu_to_le32(msg), tgt->ctx_base);
-	mmiowb();
 
 }
 
diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index d56a78f411cd..12666313b937 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -253,7 +253,6 @@ void bnx2i_put_rq_buf(struct bnx2i_conn *bnx2i_conn, int count)
 		writew(ep->qp.rq_prod_idx,
 		       ep->qp.ctx_base + CNIC_RECV_DOORBELL);
 	}
-	mmiowb();
 }
 
 
@@ -279,8 +278,6 @@ static void bnx2i_ring_sq_dbell(struct bnx2i_conn *bnx2i_conn, int count)
 		bnx2i_ring_577xx_doorbell(bnx2i_conn);
 	} else
 		writew(count, ep->qp.ctx_base + CNIC_SEND_DOORBELL);
-
-	mmiowb();
 }
 
 
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 293f5cf524d7..59a6546fd602 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -815,7 +815,6 @@ megasas_fire_cmd_skinny(struct megasas_instance *instance,
 	       &(regs)->inbound_high_queue_port);
 	writel((lower_32_bits(frame_phys_addr) | (frame_count<<1))|1,
 	       &(regs)->inbound_low_queue_port);
-	mmiowb();
 	spin_unlock_irqrestore(&instance->hba_lock, flags);
 }
 
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 1d17128030cd..e35c2b64c145 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -242,7 +242,6 @@ megasas_fire_cmd_fusion(struct megasas_instance *instance,
 		&instance->reg_set->inbound_low_queue_port);
 	writel(le32_to_cpu(req_desc->u.high),
 		&instance->reg_set->inbound_high_queue_port);
-	mmiowb();
 	spin_unlock_irqrestore(&instance->hba_lock, flags);
 #endif
 }
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1d8c584ec1e9..f60b9e0a6ca6 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3333,7 +3333,6 @@ _base_mpi_ep_writeq(__u64 b, volatile void __iomem *addr,
 	spin_lock_irqsave(writeq_lock, flags);
 	__raw_writel((u32)(b), addr);
 	__raw_writel((u32)(b >> 32), (addr + 4));
-	mmiowb();
 	spin_unlock_irqrestore(writeq_lock, flags);
 }
 
diff --git a/drivers/scsi/qedf/qedf_io.c b/drivers/scsi/qedf/qedf_io.c
index 6ca583bdde23..53e8221f6816 100644
--- a/drivers/scsi/qedf/qedf_io.c
+++ b/drivers/scsi/qedf/qedf_io.c
@@ -807,7 +807,6 @@ void qedf_ring_doorbell(struct qedf_rport *fcport)
 	writel(*(u32 *)&dbell, fcport->p_doorbell);
 	/* Make sure SQ index is updated so f/w prcesses requests in order */
 	wmb();
-	mmiowb();
 }
 
 static void qedf_trace_io(struct qedf_rport *fcport, struct qedf_ioreq *io_req,
diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index e2a995a6e8e7..f8f86774f77f 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -985,7 +985,6 @@ static void qedi_ring_doorbell(struct qedi_conn *qedi_conn)
 	 * others they are two different assembly operations.
 	 */
 	wmb();
-	mmiowb();
 	QEDI_INFO(&qedi_conn->qedi->dbg_ctx, QEDI_LOG_MP_REQ,
 		  "prod_idx=0x%x, fw_prod_idx=0x%x, cid=0x%x\n",
 		  qedi_conn->ep->sq_prod_idx, qedi_conn->ep->fw_sq_prod_idx,
diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c
index 6856dfdfa473..93acbc5094f0 100644
--- a/drivers/scsi/qla1280.c
+++ b/drivers/scsi/qla1280.c
@@ -3004,8 +3004,6 @@ qla1280_64bit_start_scsi(struct scsi_qla_host *ha, struct srb * sp)
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
-	mmiowb();
 
  out:
 	if (status)
@@ -3254,8 +3252,6 @@ qla1280_32bit_start_scsi(struct scsi_qla_host *ha, struct srb * sp)
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
-	mmiowb();
 
 out:
 	if (status)
@@ -3379,7 +3375,6 @@ qla1280_isp_cmd(struct scsi_qla_host *ha)
 	 * See Documentation/driver-api/device-io.rst for more information.
 	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	mmiowb();
 
 	LEAVE("qla1280_isp_cmd");
 }
diff --git a/drivers/ssb/pci.c b/drivers/ssb/pci.c
index 84807a9b4b13..da2d2ab8104d 100644
--- a/drivers/ssb/pci.c
+++ b/drivers/ssb/pci.c
@@ -305,7 +305,6 @@ static int sprom_do_write(struct ssb_bus *bus, const u16 *sprom)
 		else if (i % 2)
 			pr_cont(".");
 		writew(sprom[i], bus->mmio + bus->sprom_offset + (i * 2));
-		mmiowb();
 		msleep(20);
 	}
 	err = pci_read_config_dword(pdev, SSB_SPROMCTL, &spromctl);
diff --git a/drivers/ssb/pcmcia.c b/drivers/ssb/pcmcia.c
index 567013f8a8be..d7d730c245c5 100644
--- a/drivers/ssb/pcmcia.c
+++ b/drivers/ssb/pcmcia.c
@@ -338,7 +338,6 @@ static void ssb_pcmcia_write8(struct ssb_device *dev, u16 offset, u8 value)
 	err = select_core_and_segment(dev, &offset);
 	if (likely(!err))
 		writeb(value, bus->mmio + offset);
-	mmiowb();
 	spin_unlock_irqrestore(&bus->bar_lock, flags);
 }
 
@@ -352,7 +351,6 @@ static void ssb_pcmcia_write16(struct ssb_device *dev, u16 offset, u16 value)
 	err = select_core_and_segment(dev, &offset);
 	if (likely(!err))
 		writew(value, bus->mmio + offset);
-	mmiowb();
 	spin_unlock_irqrestore(&bus->bar_lock, flags);
 }
 
@@ -368,7 +366,6 @@ static void ssb_pcmcia_write32(struct ssb_device *dev, u16 offset, u32 value)
 		writew((value & 0x0000FFFF), bus->mmio + offset);
 		writew(((value & 0xFFFF0000) >> 16), bus->mmio + offset + 2);
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&bus->bar_lock, flags);
 }
 
@@ -424,7 +421,6 @@ static void ssb_pcmcia_block_write(struct ssb_device *dev, const void *buffer,
 		WARN_ON(1);
 	}
 unlock:
-	mmiowb();
 	spin_unlock_irqrestore(&bus->bar_lock, flags);
 }
 #endif /* CONFIG_SSB_BLOCKIO */
diff --git a/drivers/staging/comedi/drivers/mite.c b/drivers/staging/comedi/drivers/mite.c
index 61e03ad84123..639ec1586976 100644
--- a/drivers/staging/comedi/drivers/mite.c
+++ b/drivers/staging/comedi/drivers/mite.c
@@ -371,7 +371,6 @@ static unsigned int mite_get_status(struct mite_channel *mite_chan)
 		writel(CHOR_CLRDONE,
 		       mite->mmio + MITE_CHOR(mite_chan->channel));
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&mite->lock, flags);
 	return status;
 }
@@ -451,7 +450,6 @@ void mite_dma_arm(struct mite_channel *mite_chan)
 	mite_chan->done = 0;
 	/* arm */
 	writel(CHOR_START, mite->mmio + MITE_CHOR(mite_chan->channel));
-	mmiowb();
 	spin_unlock_irqrestore(&mite->lock, flags);
 }
 EXPORT_SYMBOL_GPL(mite_dma_arm);
@@ -638,7 +636,6 @@ void mite_release_channel(struct mite_channel *mite_chan)
 		       CHCR_CLR_LC_IE | CHCR_CLR_CONT_RB_IE,
 		       mite->mmio + MITE_CHCR(mite_chan->channel));
 		mite_chan->ring = NULL;
-		mmiowb();
 	}
 	spin_unlock_irqrestore(&mite->lock, flags);
 }
diff --git a/drivers/staging/comedi/drivers/ni_660x.c b/drivers/staging/comedi/drivers/ni_660x.c
index 405573e927cf..4ee9b260eab0 100644
--- a/drivers/staging/comedi/drivers/ni_660x.c
+++ b/drivers/staging/comedi/drivers/ni_660x.c
@@ -320,7 +320,6 @@ static inline void ni_660x_set_dma_channel(struct comedi_device *dev,
 	ni_660x_write(dev, chip, devpriv->dma_cfg[chip] |
 		      NI660X_DMA_CFG_RESET(mite_channel),
 		      NI660X_DMA_CFG);
-	mmiowb();
 }
 
 static inline void ni_660x_unset_dma_channel(struct comedi_device *dev,
@@ -333,7 +332,6 @@ static inline void ni_660x_unset_dma_channel(struct comedi_device *dev,
 	devpriv->dma_cfg[chip] &= ~NI660X_DMA_CFG_SEL_MASK(mite_channel);
 	devpriv->dma_cfg[chip] |= NI660X_DMA_CFG_SEL_NONE(mite_channel);
 	ni_660x_write(dev, chip, devpriv->dma_cfg[chip], NI660X_DMA_CFG);
-	mmiowb();
 }
 
 static int ni_660x_request_mite_channel(struct comedi_device *dev,
diff --git a/drivers/staging/comedi/drivers/ni_mio_common.c b/drivers/staging/comedi/drivers/ni_mio_common.c
index b04dad8c7092..668f2aa16baa 100644
--- a/drivers/staging/comedi/drivers/ni_mio_common.c
+++ b/drivers/staging/comedi/drivers/ni_mio_common.c
@@ -547,7 +547,6 @@ static inline void ni_set_bitfield(struct comedi_device *dev, int reg,
 			reg);
 		break;
 	}
-	mmiowb();
 	spin_unlock_irqrestore(&devpriv->soft_reg_copy_lock, flags);
 }
 
diff --git a/drivers/staging/comedi/drivers/ni_pcidio.c b/drivers/staging/comedi/drivers/ni_pcidio.c
index 4bdef87d5dd7..8f3864799c19 100644
--- a/drivers/staging/comedi/drivers/ni_pcidio.c
+++ b/drivers/staging/comedi/drivers/ni_pcidio.c
@@ -310,7 +310,6 @@ static int ni_pcidio_request_di_mite_channel(struct comedi_device *dev)
 	writeb(primary_DMAChannel_bits(devpriv->di_mite_chan->channel) |
 	       secondary_DMAChannel_bits(devpriv->di_mite_chan->channel),
 	       dev->mmio + DMA_LINE_CONTROL_GROUP1);
-	mmiowb();
 	spin_unlock_irqrestore(&devpriv->mite_channel_lock, flags);
 	return 0;
 }
@@ -327,7 +326,6 @@ static void ni_pcidio_release_di_mite_channel(struct comedi_device *dev)
 		writeb(primary_DMAChannel_bits(0) |
 		       secondary_DMAChannel_bits(0),
 		       dev->mmio + DMA_LINE_CONTROL_GROUP1);
-		mmiowb();
 	}
 	spin_unlock_irqrestore(&devpriv->mite_channel_lock, flags);
 }
diff --git a/drivers/staging/comedi/drivers/ni_tio.c b/drivers/staging/comedi/drivers/ni_tio.c
index 048cb35723ad..c1131a1622c0 100644
--- a/drivers/staging/comedi/drivers/ni_tio.c
+++ b/drivers/staging/comedi/drivers/ni_tio.c
@@ -234,7 +234,6 @@ static void ni_tio_set_bits_transient(struct ni_gpct *counter,
 		regs[reg] &= ~mask;
 		regs[reg] |= (value & mask);
 		ni_tio_write(counter, regs[reg] | transient, reg);
-		mmiowb();
 		spin_unlock_irqrestore(&counter_dev->regs_lock, flags);
 	}
 }
diff --git a/drivers/staging/comedi/drivers/s626.c b/drivers/staging/comedi/drivers/s626.c
index f5af6f4069dc..39049d3c56d7 100644
--- a/drivers/staging/comedi/drivers/s626.c
+++ b/drivers/staging/comedi/drivers/s626.c
@@ -108,7 +108,6 @@ static void s626_mc_enable(struct comedi_device *dev,
 {
 	unsigned int val = (cmd << 16) | cmd;
 
-	mmiowb();
 	writel(val, dev->mmio + reg);
 }
 
@@ -116,7 +115,6 @@ static void s626_mc_disable(struct comedi_device *dev,
 			    unsigned int cmd, unsigned int reg)
 {
 	writel(cmd << 16, dev->mmio + reg);
-	mmiowb();
 }
 
 static bool s626_mc_test(struct comedi_device *dev,
diff --git a/drivers/tty/serial/men_z135_uart.c b/drivers/tty/serial/men_z135_uart.c
index ef89534dd760..e5d3ebab6dae 100644
--- a/drivers/tty/serial/men_z135_uart.c
+++ b/drivers/tty/serial/men_z135_uart.c
@@ -353,7 +353,6 @@ static void men_z135_handle_tx(struct men_z135_port *uart)
 
 	memcpy_toio(port->membase + MEN_Z135_TX_RAM, &xmit->buf[xmit->tail], n);
 	xmit->tail = (xmit->tail + n) & (UART_XMIT_SIZE - 1);
-	mmiowb();
 
 	iowrite32(n & 0x3ff, port->membase + MEN_Z135_TX_CTRL);
 
diff --git a/drivers/tty/serial/serial_txx9.c b/drivers/tty/serial/serial_txx9.c
index 1b4008d022bf..d22ccb32aa9b 100644
--- a/drivers/tty/serial/serial_txx9.c
+++ b/drivers/tty/serial/serial_txx9.c
@@ -248,7 +248,6 @@ static void serial_txx9_initialize(struct uart_port *port)
 	sio_out(up, TXX9_SIFCR, TXX9_SIFCR_SWRST);
 	/* TX4925 BUG WORKAROUND.  Accessing SIOC register
 	 * immediately after soft reset causes bus error. */
-	mmiowb();
 	udelay(1);
 	while ((sio_in(up, TXX9_SIFCR) & TXX9_SIFCR_SWRST) && --tmout)
 		udelay(1);
diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
index c9cfb100ecdc..cac991173ac0 100644
--- a/drivers/usb/early/xhci-dbc.c
+++ b/drivers/usb/early/xhci-dbc.c
@@ -533,8 +533,6 @@ static int xdbc_handle_external_reset(void)
 
 	xdbc_mem_init();
 
-	mmiowb();
-
 	ret = xdbc_start();
 	if (ret < 0)
 		goto reset_out;
@@ -587,8 +585,6 @@ static int __init xdbc_early_setup(void)
 
 	xdbc_mem_init();
 
-	mmiowb();
-
 	ret = xdbc_start();
 	if (ret < 0) {
 		writel(0, &xdbc.xdbc_reg->control);
diff --git a/drivers/usb/host/xhci-dbgcap.c b/drivers/usb/host/xhci-dbgcap.c
index d932cc31711e..52e32644a4b2 100644
--- a/drivers/usb/host/xhci-dbgcap.c
+++ b/drivers/usb/host/xhci-dbgcap.c
@@ -421,8 +421,6 @@ static int xhci_dbc_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 	string_length = xhci_dbc_populate_strings(dbc->string);
 	xhci_dbc_init_contexts(xhci, string_length);
 
-	mmiowb();
-
 	xhci_dbc_eps_init(xhci);
 	dbc->state = DS_INITIALIZED;
 
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index f6165d304b4d..48841e5dab90 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -1338,7 +1338,6 @@ static inline u16 qed_sb_update_sb_idx(struct qed_sb_info *sb_info)
 	}
 
 	/* Let SB update */
-	mmiowb();
 	return rc;
 }
 
@@ -1374,7 +1373,6 @@ static inline void qed_sb_ack(struct qed_sb_info *sb_info,
 	/* Both segments (interrupts & acks) are written to same place address;
 	 * Need to guarantee all commands will be received (in-order) by HW.
 	 */
-	mmiowb();
 	barrier();
 }
 
diff --git a/sound/soc/txx9/txx9aclc-ac97.c b/sound/soc/txx9/txx9aclc-ac97.c
index 1cfca698ae4b..b0fa285c7ba2 100644
--- a/sound/soc/txx9/txx9aclc-ac97.c
+++ b/sound/soc/txx9/txx9aclc-ac97.c
@@ -102,7 +102,6 @@ static void txx9aclc_ac97_cold_reset(struct snd_ac97 *ac97)
 	u32 ready = ACINT_CODECRDY(ac97->num) | ACINT_REGACCRDY;
 
 	__raw_writel(ACCTL_ENLINK, base + ACCTLDIS);
-	mmiowb();
 	udelay(1);
 	__raw_writel(ACCTL_ENLINK, base + ACCTLEN);
 	/* wait for primary codec ready status */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 18/21] scsi/qla1280: Remove stale comment about mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (16 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb() Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 19/21] i40iw: Redefine i40iw_mmiowb() to do nothing Will Deacon
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

All mmiowb() invocations have been removed, so there's no need to keep
banging on about it.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/scsi/qla1280.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c
index 93acbc5094f0..327eff67a1ee 100644
--- a/drivers/scsi/qla1280.c
+++ b/drivers/scsi/qla1280.c
@@ -3363,16 +3363,6 @@ qla1280_isp_cmd(struct scsi_qla_host *ha)
 
 	/*
 	 * Update request index to mailbox4 (Request Queue In).
-	 * The mmiowb() ensures that this write is ordered with writes by other
-	 * CPUs.  Without the mmiowb(), it is possible for the following:
-	 *    CPUA posts write of index 5 to mailbox4
-	 *    CPUA releases host lock
-	 *    CPUB acquires host lock
-	 *    CPUB posts write of index 6 to mailbox4
-	 *    On PCI bus, order reverses and write of 6 posts, then index 5,
-	 *       causing chip to issue full queue of stale commands
-	 * The mmiowb() prevents future writes from crossing the barrier.
-	 * See Documentation/driver-api/device-io.rst for more information.
 	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 19/21] i40iw: Redefine i40iw_mmiowb() to do nothing
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (17 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 18/21] scsi/qla1280: Remove stale comment about mmiowb() Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 20/21] net/ethernet/silan/sc92031: Remove stale comment about mmiowb() Will Deacon
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

mmiowb() is now implicit in spin_unlock(), so there's no reason to call
it from driver code. Redefine i40iw_mmiowb() to do nothing instead.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/infiniband/hw/i40iw/i40iw_osdep.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_osdep.h b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
index f27be3e7830b..d474aad62a81 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_osdep.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
@@ -211,7 +211,7 @@ enum i40iw_status_code i40iw_hw_manage_vf_pble_bp(struct i40iw_device *iwdev,
 struct i40iw_sc_vsi;
 void i40iw_hw_stats_start_timer(struct i40iw_sc_vsi *vsi);
 void i40iw_hw_stats_stop_timer(struct i40iw_sc_vsi *vsi);
-#define i40iw_mmiowb() mmiowb()
+#define i40iw_mmiowb() do { } while (0)
 void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value);
 u32  i40iw_rd32(struct i40iw_hw *hw, u32 reg);
 #endif				/* _I40IW_OSDEP_H_ */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 20/21] net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (18 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 19/21] i40iw: Redefine i40iw_mmiowb() to do nothing Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 13:59 ` [PATCH v2 21/21] arch: Remove dummy mmiowb() definitions from arch code Will Deacon
  2019-04-05 15:55 ` [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

mmiowb() is no more. It has ceased to be. It is an ex-barrier. So remove
all references to it from comments.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/silan/sc92031.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/silan/sc92031.c b/drivers/net/ethernet/silan/sc92031.c
index db5dc8ce0aff..02b3962b0e63 100644
--- a/drivers/net/ethernet/silan/sc92031.c
+++ b/drivers/net/ethernet/silan/sc92031.c
@@ -251,7 +251,6 @@ enum PMConfigBits {
  * use of mdelay() at _sc92031_reset.
  * Functions prefixed with _sc92031_ must be called with the lock held;
  * functions prefixed with sc92031_ must be called without the lock held.
- * Use mmiowb() before unlocking if the hardware was written to.
  */
 
 /* Locking rules for the interrupt:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 21/21] arch: Remove dummy mmiowb() definitions from arch code
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (19 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 20/21] net/ethernet/silan/sc92031: Remove stale comment about mmiowb() Will Deacon
@ 2019-04-05 13:59 ` Will Deacon
  2019-04-05 15:55 ` [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
  21 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 13:59 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-kernel, Will Deacon, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

Now that no driver code is using mmiowb() directly, we can remove the
dummy definitions remaining in architectures that don't make use of
asm-generic/io.h, as well as the definition in asm-generic/io,h itself.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/alpha/include/asm/io.h       | 2 --
 arch/hexagon/include/asm/io.h     | 2 --
 arch/parisc/include/asm/io.h      | 2 --
 arch/powerpc/include/asm/mmiowb.h | 2 --
 arch/sparc/include/asm/io_64.h    | 2 --
 include/asm-generic/io.h          | 4 ----
 6 files changed, 14 deletions(-)

diff --git a/arch/alpha/include/asm/io.h b/arch/alpha/include/asm/io.h
index 4c533fc94d62..ccf9d65166bb 100644
--- a/arch/alpha/include/asm/io.h
+++ b/arch/alpha/include/asm/io.h
@@ -513,8 +513,6 @@ extern inline void writeq(u64 b, volatile void __iomem *addr)
 #define writel_relaxed(b, addr)	__raw_writel(b, addr)
 #define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
 
-#define mmiowb()
-
 /*
  * String version of IO memory access ops:
  */
diff --git a/arch/hexagon/include/asm/io.h b/arch/hexagon/include/asm/io.h
index e17262ad125e..3d0ae09c2b8e 100644
--- a/arch/hexagon/include/asm/io.h
+++ b/arch/hexagon/include/asm/io.h
@@ -184,8 +184,6 @@ static inline void writel(u32 data, volatile void __iomem *addr)
 #define writew_relaxed __raw_writew
 #define writel_relaxed __raw_writel
 
-#define mmiowb()
-
 /*
  * Need an mtype somewhere in here, for cache type deals?
  * This is probably too long for an inline.
diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h
index 30a8315d5c07..93d37010b375 100644
--- a/arch/parisc/include/asm/io.h
+++ b/arch/parisc/include/asm/io.h
@@ -229,8 +229,6 @@ static inline void writeq(unsigned long long q, volatile void __iomem *addr)
 #define writel_relaxed(l, addr)	writel(l, addr)
 #define writeq_relaxed(q, addr)	writeq(q, addr)
 
-#define mmiowb() do { } while (0)
-
 void memset_io(volatile void __iomem *addr, unsigned char val, int count);
 void memcpy_fromio(void *dst, const volatile void __iomem *src, int count);
 void memcpy_toio(volatile void __iomem *dst, const void *src, int count);
diff --git a/arch/powerpc/include/asm/mmiowb.h b/arch/powerpc/include/asm/mmiowb.h
index b10180613507..74a00127eb20 100644
--- a/arch/powerpc/include/asm/mmiowb.h
+++ b/arch/powerpc/include/asm/mmiowb.h
@@ -11,8 +11,6 @@
 #define arch_mmiowb_state()	(&local_paca->mmiowb_state)
 #define mmiowb()		mb()
 
-#else
-#define mmiowb()		do { } while (0)
 #endif /* CONFIG_MMIOWB */
 
 #include <asm-generic/mmiowb.h>
diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
index b162c23ae8c2..688911051b44 100644
--- a/arch/sparc/include/asm/io_64.h
+++ b/arch/sparc/include/asm/io_64.h
@@ -396,8 +396,6 @@ static inline void memcpy_toio(volatile void __iomem *dst, const void *src,
 	}
 }
 
-#define mmiowb()
-
 #ifdef __KERNEL__
 
 /* On sparc64 we have the whole physical IO address space accessible
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index bc490a746602..8f3bf95a36d1 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -22,10 +22,6 @@
 #include <asm/mmiowb.h>
 #include <asm-generic/pci_iomap.h>
 
-#ifndef mmiowb
-#define mmiowb() do {} while (0)
-#endif
-
 #ifndef __io_br
 #define __io_br()      barrier()
 #endif
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 07/21] x86/io: Remove useless definition of mmiowb()
  2019-04-05 13:59 ` [PATCH v2 07/21] x86/io: " Will Deacon
@ 2019-04-05 14:14   ` Thomas Gleixner
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Gleixner @ 2019-04-05 14:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, linux-kernel, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, 5 Apr 2019, Will Deacon wrote:

> x86 maps mmiowb() to barrier(), but this is superfluous because a
> compiler barrier is already implied by spin_unlock(). Since x86 also
> includes asm-generic/io.h in its asm/io.h file, we can remove the

s/we can//

> definition entirely and pick up the dummy definition from core code.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb()
  2019-04-05 13:59 ` [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb() Will Deacon
@ 2019-04-05 15:50   ` Linus Torvalds
  2019-04-09  9:00     ` Nicholas Piggin
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2019-04-05 15:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, Linux List Kernel Mailing, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, Apr 5, 2019 at 4:01 AM Will Deacon <will.deacon@arm.com> wrote:
>
> mmiowb() is now implied by spin_unlock() on architectures that require
> it, so there is no reason to call it from driver code. This patch was
> generated using coccinelle:
>
>         @mmiowb@
>         @@
>         - mmiowb();

So I love the patch series, and think we should just do it, but I do
wonder if some of the drivers involved end up relying on memory
ordering things (store_release -> load_aquire) and IO ordering rather
than using locking...

Wouldn't such use now be broken on ia64 SN platforms? Do we care?

So it might be worth noting that a lot of the mmiowb()s here weren't
paired with spin_unlock?

                    Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
  2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
                   ` (20 preceding siblings ...)
  2019-04-05 13:59 ` [PATCH v2 21/21] arch: Remove dummy mmiowb() definitions from arch code Will Deacon
@ 2019-04-05 15:55 ` Linus Torvalds
  2019-04-05 16:09   ` Will Deacon
  21 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2019-04-05 15:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, Linux List Kernel Mailing, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, Apr 5, 2019 at 3:59 AM Will Deacon <will.deacon@arm.com> wrote:
>
> I've also pushed this series out here:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/mmiowb
>
> and I would like to get it into -next once the first patch has been acked.

Ack on it all.

With the afore-mentioned slight worry about non-spinlocked IO
ordering, but I _think_ it's purely limited to ia64 and wmb() and
friends should work elsewhere?

Or did I miss something? I think the ia64() mb/rmb/wmb stuff only
works on normal memory on ia64.

      Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
  2019-04-05 15:55 ` [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
@ 2019-04-05 16:09   ` Will Deacon
  2019-04-05 16:15     ` Linus Torvalds
  0 siblings, 1 reply; 40+ messages in thread
From: Will Deacon @ 2019-04-05 16:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, Linux List Kernel Mailing, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, Apr 05, 2019 at 05:55:37AM -1000, Linus Torvalds wrote:
> On Fri, Apr 5, 2019 at 3:59 AM Will Deacon <will.deacon@arm.com> wrote:
> >
> > I've also pushed this series out here:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/mmiowb
> >
> > and I would like to get it into -next once the first patch has been acked.
> 
> Ack on it all.

Thanks.

> With the afore-mentioned slight worry about non-spinlocked IO
> ordering, but I _think_ it's purely limited to ia64 and wmb() and
> friends should work elsewhere?
> 
> Or did I miss something? I think the ia64() mb/rmb/wmb stuff only
> works on normal memory on ia64.

I was worried about RISC-V, but actually their wmb() is "fence ow,ow"
which I think is stronger than their mmiowb() "fence o,w" implementation.

Everybody else should be fine with wmb() afaict, so if a driver writer
is smart enough to want this ordering outside of spinlocks, they can
do that for everybody apart from ia64.

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
  2019-04-05 16:09   ` Will Deacon
@ 2019-04-05 16:15     ` Linus Torvalds
  2019-04-05 16:30       ` Will Deacon
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2019-04-05 16:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, Linux List Kernel Mailing, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, Apr 5, 2019 at 6:09 AM Will Deacon <will.deacon@arm.com> wrote:
> >
> > Or did I miss something? I think the ia64() mb/rmb/wmb stuff only
> > works on normal memory on ia64.
>
> I was worried about RISC-V, but actually their wmb() is "fence ow,ow"
> which I think is stronger than their mmiowb() "fence o,w" implementation.

Also with smp_store_release -> smp_load_acquire kind of ordering?

Again, this is not at all a NAK - I think we should do this - just
perhaps a request to add a note to the commit and make people aware of
the issue.

I suspect very few drivers use non-locking serialization to begin with.

                 Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
  2019-04-05 16:15     ` Linus Torvalds
@ 2019-04-05 16:30       ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-05 16:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, Linux List Kernel Mailing, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, Apr 05, 2019 at 06:15:12AM -1000, Linus Torvalds wrote:
> On Fri, Apr 5, 2019 at 6:09 AM Will Deacon <will.deacon@arm.com> wrote:
> > >
> > > Or did I miss something? I think the ia64() mb/rmb/wmb stuff only
> > > works on normal memory on ia64.
> >
> > I was worried about RISC-V, but actually their wmb() is "fence ow,ow"
> > which I think is stronger than their mmiowb() "fence o,w" implementation.
> 
> Also with smp_store_release -> smp_load_acquire kind of ordering?

Hmm, to be honest, I'm not convinced that smp_load_acquire() is ordered
wrt subsequent I/O on RISC-V anyway, so in the pattern of:

CPU 0:
writel(1, dev);
wmb();
smp_store_release(&x, 1);

CPU 1:
if (smp_load_acquire(&x) == 1)
	writel(2, dev)

then I think it's actually the control dependency in CPU 1 that provides
the expected ordering. That's probably quite fragile.

> Again, this is not at all a NAK - I think we should do this - just
> perhaps a request to add a note to the commit and make people aware of
> the issue.

Right, I'll do that.

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb()
  2019-04-05 15:50   ` Linus Torvalds
@ 2019-04-09  9:00     ` Nicholas Piggin
  2019-04-09 13:46       ` Will Deacon
  0 siblings, 1 reply; 40+ messages in thread
From: Nicholas Piggin @ 2019-04-09  9:00 UTC (permalink / raw)
  To: Linus Torvalds, Will Deacon
  Cc: Akira Yokosawa, Andrea Parri, Arnd Bergmann,
	Benjamin Herrenschmidt, Rich Felker, David Howells,
	Daniel Lustig, linux-arch, Linux List Kernel Mailing,
	Maciej W. Rozycki, Luis Chamberlain, Ingo Molnar,
	Mikulas Patocka, Michael Ellerman, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Yoshinori Sato

Linus Torvalds's on April 6, 2019 1:50 am:
> On Fri, Apr 5, 2019 at 4:01 AM Will Deacon <will.deacon@arm.com> wrote:
>>
>> mmiowb() is now implied by spin_unlock() on architectures that require
>> it, so there is no reason to call it from driver code. This patch was
>> generated using coccinelle:
>>
>>         @mmiowb@
>>         @@
>>         - mmiowb();
> 
> So I love the patch series, and think we should just do it, but I do
> wonder if some of the drivers involved end up relying on memory
> ordering things (store_release -> load_aquire) and IO ordering rather
> than using locking...

Hopefully the convention that smp_ prefix does not work for MMIO
ordering helps there. Drivers relying on that would be broken today
on powerpc, at least.

> Wouldn't such use now be broken on ia64 SN platforms? Do we care?

Hopefully not too much, what changed since last thread? :)

> So it might be worth noting that a lot of the mmiowb()s here weren't
> paired with spin_unlock?

I repeat myself, but the correct change is for ia64 to #define wmb to
mmiowb, then nothing is silently broken, nothing has to be noted, and 
nobody has to care. The ia64/sn2 platform will run a little slower 
that's all.

But deliberately breaking sn2 I guess is implicitly acknowledging the 
same end result that I wanted, so fine.

I think it might be an idea to remove all the mmiowb() that obviously
come before spin_unlock in one big patch, but then submit the rest 
individually to driver maintainers. I could do that rather than ask
more work from Will, if he and you agree.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb()
  2019-04-09  9:00     ` Nicholas Piggin
@ 2019-04-09 13:46       ` Will Deacon
  2019-04-10  0:25         ` Nicholas Piggin
  0 siblings, 1 reply; 40+ messages in thread
From: Will Deacon @ 2019-04-09 13:46 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Linus Torvalds, Akira Yokosawa, Andrea Parri, Arnd Bergmann,
	Benjamin Herrenschmidt, Rich Felker, David Howells,
	Daniel Lustig, linux-arch, Linux List Kernel Mailing,
	Maciej W. Rozycki, Luis Chamberlain, Ingo Molnar,
	Mikulas Patocka, Michael Ellerman, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Yoshinori Sato

Hi Nick,

On Tue, Apr 09, 2019 at 07:00:52PM +1000, Nicholas Piggin wrote:
> Linus Torvalds's on April 6, 2019 1:50 am:
> > On Fri, Apr 5, 2019 at 4:01 AM Will Deacon <will.deacon@arm.com> wrote:
> >>
> >> mmiowb() is now implied by spin_unlock() on architectures that require
> >> it, so there is no reason to call it from driver code. This patch was
> >> generated using coccinelle:
> >>
> >>         @mmiowb@
> >>         @@
> >>         - mmiowb();
> > 
> > So I love the patch series, and think we should just do it, but I do
> > wonder if some of the drivers involved end up relying on memory
> > ordering things (store_release -> load_aquire) and IO ordering rather
> > than using locking...
> 
> Hopefully the convention that smp_ prefix does not work for MMIO
> ordering helps there. Drivers relying on that would be broken today
> on powerpc, at least.
> 
> > Wouldn't such use now be broken on ia64 SN platforms? Do we care?
> 
> Hopefully not too much, what changed since last thread? :)
> 
> > So it might be worth noting that a lot of the mmiowb()s here weren't
> > paired with spin_unlock?
> 
> I repeat myself, but the correct change is for ia64 to #define wmb to
> mmiowb, then nothing is silently broken, nothing has to be noted, and 
> nobody has to care. The ia64/sn2 platform will run a little slower 
> that's all.

That's certainly something for the ia64 maintainers to consider, if they
care about this behaviour. I still have hope that we'll drop ia64 in the
near future :)

> But deliberately breaking sn2 I guess is implicitly acknowledging the 
> same end result that I wanted, so fine.
> 
> I think it might be an idea to remove all the mmiowb() that obviously
> come before spin_unlock in one big patch, but then submit the rest 
> individually to driver maintainers. I could do that rather than ask
> more work from Will, if he and you agree.

That's an option, I suppose, but I'd much rather just kill off mmiowb() in
one fell swoop and be done with it. I've added the following message to
the commit of the coccinelle patch so any breakage should be easily
rectified:

 | NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
 | spin_unlock(). However, pairing each mmiowb() removal in this patch
 | with the corresponding call to spin_unlock() is not at all trivial,
 | so there is a small chance that this change may regress any drivers
 | incorrectly relying on mmiowb() to order MMIO writes between CPUs using
 | lock-free synchronisation. If you've ended up bisecting to this commit,
 | you can reintroduce the mmiowb() calls using wmb() instead, which should
 | restore the old behaviour on all architectures other than some esoteric
 | ia64 systems.

That way we don't have to worry about the long tail of commits removing
undocumented, dangling barriers.

It's not like we're losing the information about where the mmiowb()s used to
be, so it should be easy to address any fallout (but I'm not really expecting
anything significant, to be honest with you).

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb()
  2019-04-09 13:46       ` Will Deacon
@ 2019-04-10  0:25         ` Nicholas Piggin
  0 siblings, 0 replies; 40+ messages in thread
From: Nicholas Piggin @ 2019-04-10  0:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Akira Yokosawa, Andrea Parri, Arnd Bergmann,
	Benjamin Herrenschmidt, Rich Felker, David Howells,
	Daniel Lustig, linux-arch, Linux List Kernel Mailing,
	Maciej W. Rozycki, Luis Chamberlain, Ingo Molnar,
	Mikulas Patocka, Michael Ellerman, Palmer Dabbelt, Paul Burton,
	Paul E. McKenney, Peter Zijlstra, Alan Stern, Tony Luck,
	Linus Torvalds, Yoshinori Sato

Will Deacon's on April 9, 2019 11:46 pm:
> Hi Nick,
> 
> On Tue, Apr 09, 2019 at 07:00:52PM +1000, Nicholas Piggin wrote:
>> Linus Torvalds's on April 6, 2019 1:50 am:
>> > On Fri, Apr 5, 2019 at 4:01 AM Will Deacon <will.deacon@arm.com> wrote:
>> >>
>> >> mmiowb() is now implied by spin_unlock() on architectures that require
>> >> it, so there is no reason to call it from driver code. This patch was
>> >> generated using coccinelle:
>> >>
>> >>         @mmiowb@
>> >>         @@
>> >>         - mmiowb();
>> > 
>> > So I love the patch series, and think we should just do it, but I do
>> > wonder if some of the drivers involved end up relying on memory
>> > ordering things (store_release -> load_aquire) and IO ordering rather
>> > than using locking...
>> 
>> Hopefully the convention that smp_ prefix does not work for MMIO
>> ordering helps there. Drivers relying on that would be broken today
>> on powerpc, at least.
>> 
>> > Wouldn't such use now be broken on ia64 SN platforms? Do we care?
>> 
>> Hopefully not too much, what changed since last thread? :)
>> 
>> > So it might be worth noting that a lot of the mmiowb()s here weren't
>> > paired with spin_unlock?
>> 
>> I repeat myself, but the correct change is for ia64 to #define wmb to
>> mmiowb, then nothing is silently broken, nothing has to be noted, and 
>> nobody has to care. The ia64/sn2 platform will run a little slower 
>> that's all.
> 
> That's certainly something for the ia64 maintainers to consider, if they
> care about this behaviour. I still have hope that we'll drop ia64 in the
> near future :)

Well we don't need to for this reason, at least. Wouldn't cost
architecture independent code anything.

I don't have much opinion about it, but Itaniums of course are still
being sold and the latest chip released in 2017. The last Itanium
Altix seems more than 10 years old though so it might be reasonable 
to remove sn2 (if it's causing a big headache).

> 
>> But deliberately breaking sn2 I guess is implicitly acknowledging the 
>> same end result that I wanted, so fine.
>> 
>> I think it might be an idea to remove all the mmiowb() that obviously
>> come before spin_unlock in one big patch, but then submit the rest 
>> individually to driver maintainers. I could do that rather than ask
>> more work from Will, if he and you agree.
> 
> That's an option, I suppose, but I'd much rather just kill off mmiowb() in
> one fell swoop and be done with it. I've added the following message to
> the commit of the coccinelle patch so any breakage should be easily
> rectified:
> 
>  | NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
>  | spin_unlock(). However, pairing each mmiowb() removal in this patch
>  | with the corresponding call to spin_unlock() is not at all trivial,
>  | so there is a small chance that this change may regress any drivers
>  | incorrectly relying on mmiowb() to order MMIO writes between CPUs using
>  | lock-free synchronisation. If you've ended up bisecting to this commit,
>  | you can reintroduce the mmiowb() calls using wmb() instead, which should
>  | restore the old behaviour on all architectures other than some esoteric
>  | ia64 systems.
> 
> That way we don't have to worry about the long tail of commits removing
> undocumented, dangling barriers.
> 
> It's not like we're losing the information about where the mmiowb()s used to
> be, so it should be easy to address any fallout (but I'm not really expecting
> anything significant, to be honest with you).

Well if you feel strongly about it I don't object.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-05 13:59 ` [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Will Deacon
@ 2019-04-10 10:58   ` Ingo Molnar
  2019-04-10 12:28     ` Will Deacon
  2019-04-11 22:12   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 40+ messages in thread
From: Ingo Molnar @ 2019-04-10 10:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, linux-kernel, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin


Mostly minor grammer fixes:

* Will Deacon <will.deacon@arm.com> wrote:

> + (*) readX(), writeX():
>  
> +     The readX() and writeX() MMIO accessors take a pointer to the peripheral
> +     being accessed as an __iomem * parameter. For pointers mapped with the
> +     default I/O attributes (e.g. those returned by ioremap()), then the
> +     ordering guarantees are as follows:

s/then the
 /the

> +     1. All readX() and writeX() accesses to the same peripheral are ordered
> +        with respect to each other. For example, this ensures that MMIO register
> +	writes by the CPU to a particular device will arrive in program order.

Vertical alignment whitespace damage: some indentations are done via 
spaces, one via tabs. Please standardize to tabs.

I'd also suggest:

s/For example, this ensures
 /For example this ensures


for the rest of the text too. The comma after the 'For example,' 
introductory phrase is grammatically correct but stylistically confusing, 
because in reality there's a *second* introductory phrase via "this 
ensures".

>  
> +     2. A writeX() by the CPU to the peripheral will first wait for the
> +        completion of all prior CPU writes to memory. For example, this ensures
> +        that writes by the CPU to an outbound DMA buffer allocated by
> +        dma_alloc_coherent() will be visible to a DMA engine when the CPU writes
> +        to its MMIO control register to trigger the transfer.
>  
> +     3. A readX() by the CPU from the peripheral will complete before any
> +	subsequent CPU reads from memory can begin. For example, this ensures
> +	that reads by the CPU from an incoming DMA buffer allocated by
> +	dma_alloc_coherent() will not see stale data after reading from the DMA
> +	engine's MMIO status register to establish that the DMA transfer has
> +	completed.
>  
> +     4. A readX() by the CPU from the peripheral will complete before any
> +	subsequent delay() loop can begin execution. For example, this ensures
> +	that two MMIO register writes by the CPU to a peripheral will arrive at
> +	least 1us apart if the first write is immediately read back with readX()
> +	and udelay(1) is called prior to the second writeX().

This might be more readable via some short code sequence instead?

>  
> +     __iomem pointers obtained with non-default attributes (e.g. those returned
> +     by ioremap_wc()) are unlikely to provide many of these guarantees.

This part is a bit confusing I think, because it's so cryptic. "Unlikely" 
as in probabilistic? ;-) So I think we should at least give some scope of 
the exceptions and expected trouble, or at least direct people to those 
APIs to see what the semantics are?

>  
> + (*) readX_relaxed(), writeX_relaxed():
>  
> +     These are similar to readX() and writeX(), but provide weaker memory
> +     ordering guarantees. Specifically, they do not guarantee ordering with
> +     respect to normal memory accesses or delay() loops (i.e bullets 2-4 above)
> +     but they are still guaranteed to be ordered with respect to other accesses
> +     to the same peripheral when operating on __iomem pointers mapped with the
> +     default I/O attributes.
>  
> + (*) readsX(), writesX():
>  
> +     The readsX() and writesX() MMIO accessors are designed for accessing
> +     register-based, memory-mapped FIFOs residing on peripherals that are not
> +     capable of performing DMA. Consequently, they provide only the ordering
> +     guarantees of readX_relaxed() and writeX_relaxed(), as documented above.

So is there any difference between 'X_relaxed' and 'sX' variants? What is 
the 's' about?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-10 10:58   ` Ingo Molnar
@ 2019-04-10 12:28     ` Will Deacon
  2019-04-11 11:00       ` Ingo Molnar
  0 siblings, 1 reply; 40+ messages in thread
From: Will Deacon @ 2019-04-10 12:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-arch, linux-kernel, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

Hi Ingo,

Thanks for taking a look (diff at the end).

On Wed, Apr 10, 2019 at 12:58:33PM +0200, Ingo Molnar wrote:
> * Will Deacon <will.deacon@arm.com> wrote:
> 
> > + (*) readX(), writeX():
> >  
> > +     The readX() and writeX() MMIO accessors take a pointer to the peripheral
> > +     being accessed as an __iomem * parameter. For pointers mapped with the
> > +     default I/O attributes (e.g. those returned by ioremap()), then the
> > +     ordering guarantees are as follows:
> 
> s/then the
>  /the

Fixed.

> > +     1. All readX() and writeX() accesses to the same peripheral are ordered
> > +        with respect to each other. For example, this ensures that MMIO register
> > +	writes by the CPU to a particular device will arrive in program order.
> 
> Vertical alignment whitespace damage: some indentations are done via 
> spaces, one via tabs. Please standardize to tabs.

Fixed.

> I'd also suggest:
> 
> s/For example, this ensures
>  /For example this ensures

I'll just drop the "For example, " prefix altogether.

> > +     4. A readX() by the CPU from the peripheral will complete before any
> > +	subsequent delay() loop can begin execution. For example, this ensures
> > +	that two MMIO register writes by the CPU to a peripheral will arrive at
> > +	least 1us apart if the first write is immediately read back with readX()
> > +	and udelay(1) is called prior to the second writeX().
> 
> This might be more readable via some short code sequence instead?

Fixed.

> > +     __iomem pointers obtained with non-default attributes (e.g. those returned
> > +     by ioremap_wc()) are unlikely to provide many of these guarantees.
> 
> This part is a bit confusing I think, because it's so cryptic. "Unlikely" 
> as in probabilistic? ;-) So I think we should at least give some scope of 
> the exceptions and expected trouble, or at least direct people to those 
> APIs to see what the semantics are?

Right, so I'm trying to tackle the common case of ioremap() in this patch.
There isn't an agreed portable semantics for ioremap_wc() yet, so I'm taking
a punt for now but it's something I'd like to get back to in the future.
I've tried to clean up the wording in the diff below to spell out that
this this is currently arch-specific.

> >  
> > + (*) readX_relaxed(), writeX_relaxed():
> >  
> > +     These are similar to readX() and writeX(), but provide weaker memory
> > +     ordering guarantees. Specifically, they do not guarantee ordering with
> > +     respect to normal memory accesses or delay() loops (i.e bullets 2-4 above)
> > +     but they are still guaranteed to be ordered with respect to other accesses
> > +     to the same peripheral when operating on __iomem pointers mapped with the
> > +     default I/O attributes.
> >  
> > + (*) readsX(), writesX():
> >  
> > +     The readsX() and writesX() MMIO accessors are designed for accessing
> > +     register-based, memory-mapped FIFOs residing on peripherals that are not
> > +     capable of performing DMA. Consequently, they provide only the ordering
> > +     guarantees of readX_relaxed() and writeX_relaxed(), as documented above.
> 
> So is there any difference between 'X_relaxed' and 'sX' variants? What is 
> the 's' about?

From the ordering perspective, there isn't a difference, but they are very
different operations in the I/O access API so I think it's worth calling
them out separately. The 's' stands for "string", and these accessors
repeatedly access the same address, hence the comment about "memory-mapped
FIFOs".

Anyway, please take a look at the diff below and let me know if you're
happy with it. The original commit is currently at the bottom of my mmiowb
tree, so I'll probably put this on top as an extra commit with your
Reported-by.

Cheers,

Will

--->8

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 3522f0cc772f..d0e332161a40 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2517,80 +2517,88 @@ guarantees:
 
  (*) readX(), writeX():
 
-     The readX() and writeX() MMIO accessors take a pointer to the peripheral
-     being accessed as an __iomem * parameter. For pointers mapped with the
-     default I/O attributes (e.g. those returned by ioremap()), then the
-     ordering guarantees are as follows:
-
-     1. All readX() and writeX() accesses to the same peripheral are ordered
-        with respect to each other. For example, this ensures that MMIO register
-	writes by the CPU to a particular device will arrive in program order.
-
-     2. A writeX() by the CPU to the peripheral will first wait for the
-        completion of all prior CPU writes to memory. For example, this ensures
-        that writes by the CPU to an outbound DMA buffer allocated by
-        dma_alloc_coherent() will be visible to a DMA engine when the CPU writes
-        to its MMIO control register to trigger the transfer.
-
-     3. A readX() by the CPU from the peripheral will complete before any
-	subsequent CPU reads from memory can begin. For example, this ensures
-	that reads by the CPU from an incoming DMA buffer allocated by
-	dma_alloc_coherent() will not see stale data after reading from the DMA
-	engine's MMIO status register to establish that the DMA transfer has
-	completed.
-
-     4. A readX() by the CPU from the peripheral will complete before any
-	subsequent delay() loop can begin execution. For example, this ensures
-	that two MMIO register writes by the CPU to a peripheral will arrive at
-	least 1us apart if the first write is immediately read back with readX()
-	and udelay(1) is called prior to the second writeX().
-
-     __iomem pointers obtained with non-default attributes (e.g. those returned
-     by ioremap_wc()) are unlikely to provide many of these guarantees.
+	The readX() and writeX() MMIO accessors take a pointer to the
+	peripheral being accessed as an __iomem * parameter. For pointers
+	mapped with the default I/O attributes (e.g. those returned by
+	ioremap()), the ordering guarantees are as follows:
+
+	1. All readX() and writeX() accesses to the same peripheral are ordered
+	   with respect to each other. This ensures that MMIO register writes by
+	   the CPU to a particular device will arrive in program order.
+
+	2. A writeX() by the CPU to the peripheral will first wait for the
+	   completion of all prior CPU writes to memory. This ensures that
+	   writes by the CPU to an outbound DMA buffer allocated by
+	   dma_alloc_coherent() will be visible to a DMA engine when the CPU
+	   writes to its MMIO control register to trigger the transfer.
+
+	3. A readX() by the CPU from the peripheral will complete before any
+	   subsequent CPU reads from memory can begin. This ensures that reads
+	   by the CPU from an incoming DMA buffer allocated by
+	   dma_alloc_coherent() will not see stale data after reading from the
+	   DMA engine's MMIO status register to establish that the DMA transfer
+	   has completed.
+
+	4. A readX() by the CPU from the peripheral will complete before any
+	   subsequent delay() loop can begin execution. This ensures that two
+	   MMIO register writes by the CPU to a peripheral will arrive at least
+	   1us apart if the first write is immediately read back with readX()
+	   and udelay(1) is called prior to the second writeX():
+
+		writel(42, DEVICE_REGISTER_0); // Arrives at the device...
+		readl(DEVICE_REGISTER_0);
+		udelay(1);
+		writel(42, DEVICE_REGISTER_1); // ... at least 1us before this.
+
+	The ordering properties of __iomem pointers obtained with non-default
+	attributes (e.g. those returned by ioremap_wc()) are specific to the
+	underlying architecture and therefore the guarantees listed above cannot
+	generally be relied upon for these types of mappings.
 
  (*) readX_relaxed(), writeX_relaxed():
 
-     These are similar to readX() and writeX(), but provide weaker memory
-     ordering guarantees. Specifically, they do not guarantee ordering with
-     respect to normal memory accesses or delay() loops (i.e bullets 2-4 above)
-     but they are still guaranteed to be ordered with respect to other accesses
-     to the same peripheral when operating on __iomem pointers mapped with the
-     default I/O attributes.
+	These are similar to readX() and writeX(), but provide weaker memory
+	ordering guarantees. Specifically, they do not guarantee ordering with
+	respect to normal memory accesses or delay() loops (i.e. bullets 2-4
+	above) but they are still guaranteed to be ordered with respect to other
+	accesses to the same peripheral when operating on __iomem pointers
+	mapped with the default I/O attributes.
 
  (*) readsX(), writesX():
 
-     The readsX() and writesX() MMIO accessors are designed for accessing
-     register-based, memory-mapped FIFOs residing on peripherals that are not
-     capable of performing DMA. Consequently, they provide only the ordering
-     guarantees of readX_relaxed() and writeX_relaxed(), as documented above.
+	The readsX() and writesX() MMIO accessors are designed for accessing
+	register-based, memory-mapped FIFOs residing on peripherals that are not
+	capable of performing DMA. Consequently, they provide only the ordering
+	guarantees of readX_relaxed() and writeX_relaxed(), as documented above.
 
  (*) inX(), outX():
 
-     The inX() and outX() accessors are intended to access legacy port-mapped
-     I/O peripherals, which may require special instructions on some
-     architectures (notably x86). The port number of the peripheral being
-     accessed is passed as an argument.
+	The inX() and outX() accessors are intended to access legacy port-mapped
+	I/O peripherals, which may require special instructions on some
+	architectures (notably x86). The port number of the peripheral being
+	accessed is passed as an argument.
 
-     Since many CPU architectures ultimately access these peripherals via an
-     internal virtual memory mapping, the portable ordering guarantees provided
-     by inX() and outX() are the same as those provided by readX() and writeX()
-     respectively when accessing a mapping with the default I/O attributes.
+	Since many CPU architectures ultimately access these peripherals via an
+	internal virtual memory mapping, the portable ordering guarantees
+	provided by inX() and outX() are the same as those provided by readX()
+	and writeX() respectively when accessing a mapping with the default I/O
+	attributes.
 
-     Device drivers may expect outX() to emit a non-posted write transaction
-     that waits for a completion response from the I/O peripheral before
-     returning. This is not guaranteed by all architectures and is therefore
-     not part of the portable ordering semantics.
+	Device drivers may expect outX() to emit a non-posted write transaction
+	that waits for a completion response from the I/O peripheral before
+	returning. This is not guaranteed by all architectures and is therefore
+	not part of the portable ordering semantics.
 
  (*) insX(), outsX():
 
-     As above, the insX() and outsX() accessors provide the same ordering
-     guarantees as readsX() and writesX() respectively when accessing a mapping
-     with the default I/O attributes.
+	As above, the insX() and outsX() accessors provide the same ordering
+	guarantees as readsX() and writesX() respectively when accessing a
+	mapping with the default I/O attributes.
 
- (*) ioreadX(), iowriteX()
+ (*) ioreadX(), iowriteX():
 
-     These will perform appropriately for the type of access they're actually
-     doing, be it inX()/outX() or readX()/writeX().
+	These will perform appropriately for the type of access they're actually
+	doing, be it inX()/outX() or readX()/writeX().
 
 All of these accessors assume that the underlying peripheral is little-endian,
 and will therefore perform byte-swapping operations on big-endian architectures.

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-10 12:28     ` Will Deacon
@ 2019-04-11 11:00       ` Ingo Molnar
  0 siblings, 0 replies; 40+ messages in thread
From: Ingo Molnar @ 2019-04-11 11:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, linux-kernel, Paul E. McKenney,
	Benjamin Herrenschmidt, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin


* Will Deacon <will.deacon@arm.com> wrote:

> Anyway, please take a look at the diff below and let me know if you're
> happy with it. The original commit is currently at the bottom of my mmiowb
> tree, so I'll probably put this on top as an extra commit with your
> Reported-by.

Sure - and the changes look good to me:

  Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-05 13:59 ` [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Will Deacon
  2019-04-10 10:58   ` Ingo Molnar
@ 2019-04-11 22:12   ` Benjamin Herrenschmidt
  2019-04-11 22:34     ` Linus Torvalds
  1 sibling, 1 reply; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2019-04-11 22:12 UTC (permalink / raw)
  To: Will Deacon, linux-arch
  Cc: linux-kernel, Paul E. McKenney, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Linus Torvalds, Maciej W. Rozycki,
	Paul Burton, Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, 2019-04-05 at 14:59 +0100, Will Deacon wrote:
> +     1. All readX() and writeX() accesses to the same peripheral are ordered
> +        with respect to each other. For example, this ensures that MMIO register
> +       writes by the CPU to a particular device will arrive in program order.

Minor nit... I would have said "All readX() and writeX() accesses _from
the same CPU_ to the same peripheral... and then s/the CPU/this CPU.

> -     Accesses to this space may be fully synchronous (as on i386), but
> -     intermediary bridges (such as the PCI host bridge) may not fully honour
> -     that.
> +     2. A writeX() by the CPU to the peripheral will first wait for the
> +        completion of all prior CPU writes to memory. For example, this ensures
> +        that writes by the CPU to an outbound DMA buffer allocated by
> +        dma_alloc_coherent() will be visible to a DMA engine when the CPU writes
> +        to its MMIO control register to trigger the transfer.

Similarily "the CPU" -> "a CPU"
>  
> -     They are guaranteed to be fully ordered with respect to each other.
> +     3. A readX() by the CPU from the peripheral will complete before any
> +       subsequent CPU reads from memory can begin. For example, this ensures
> +       that reads by the CPU from an incoming DMA buffer allocated by
> +       dma_alloc_coherent() will not see stale data after reading from the DMA
> +       engine's MMIO status register to establish that the DMA transfer has
> +       completed.
>  
> -     They are not guaranteed to be fully ordered with respect to other types of
> -     memory and I/O operation.
> +     4. A readX() by the CPU from the peripheral will complete before any
> +       subsequent delay() loop can begin execution. For example, this ensures
> +       that two MMIO register writes by the CPU to a peripheral will arrive at
> +       least 1us apart if the first write is immediately read back with readX()
> +       and udelay(1) is called prior to the second writeX().
>  
> - (*) readX(), writeX():
> +     __iomem pointers obtained with non-default attributes (e.g. those returned
> +     by ioremap_wc()) are unlikely to provide many of these guarantees.

So we give up on defining _wc semantics ? :-) Fair enough, it's a
mess...

 .../...

> +All of these accessors assume that the underlying peripheral is little-endian,
> +and will therefore perform byte-swapping operations on big-endian architectures.

This is not true of readsX/writesX, those will perform native accesses and are
intrinsically endian neutral.

> +Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK
> +operations is a dangerous sport which may require the use of mmiowb(). See the
> +subsection "Acquires vs I/O accesses" for more information.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-11 22:12   ` Benjamin Herrenschmidt
@ 2019-04-11 22:34     ` Linus Torvalds
  2019-04-12  2:07       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2019-04-11 22:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Will Deacon, linux-arch, Linux List Kernel Mailing,
	Paul E. McKenney, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Thu, Apr 11, 2019 at 3:13 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> Minor nit... I would have said "All readX() and writeX() accesses _from
> the same CPU_ to the same peripheral... and then s/the CPU/this CPU.

Maybe talk about "same thread" rather than "same cpu", with the
understanding that scheduling/preemption has to include the
appropriate cross-CPU IO barrier?

                   Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-11 22:34     ` Linus Torvalds
@ 2019-04-12  2:07       ` Benjamin Herrenschmidt
  2019-04-12 13:17         ` Will Deacon
  0 siblings, 1 reply; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2019-04-12  2:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Will Deacon, linux-arch, Linux List Kernel Mailing,
	Paul E. McKenney, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Thu, 2019-04-11 at 15:34 -0700, Linus Torvalds wrote:
> On Thu, Apr 11, 2019 at 3:13 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > Minor nit... I would have said "All readX() and writeX() accesses
> > _from
> > the same CPU_ to the same peripheral... and then s/the CPU/this
> > CPU.
> 
> Maybe talk about "same thread" rather than "same cpu", with the
> understanding that scheduling/preemption has to include the
> appropriate cross-CPU IO barrier?

Works for me, but why not spell all this out in the document ? We know,
but others might not.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-12  2:07       ` Benjamin Herrenschmidt
@ 2019-04-12 13:17         ` Will Deacon
  2019-04-15  4:05           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 40+ messages in thread
From: Will Deacon @ 2019-04-12 13:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, linux-arch, Linux List Kernel Mailing,
	Paul E. McKenney, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, Apr 12, 2019 at 12:07:09PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2019-04-11 at 15:34 -0700, Linus Torvalds wrote:
> > On Thu, Apr 11, 2019 at 3:13 PM Benjamin Herrenschmidt
> > <benh@kernel.crashing.org> wrote:
> > > 
> > > Minor nit... I would have said "All readX() and writeX() accesses
> > > _from
> > > the same CPU_ to the same peripheral... and then s/the CPU/this
> > > CPU.
> > 
> > Maybe talk about "same thread" rather than "same cpu", with the
> > understanding that scheduling/preemption has to include the
> > appropriate cross-CPU IO barrier?
> 
> Works for me, but why not spell all this out in the document ? We know,
> but others might not.

Ok, how about the diff below on top of:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/mmiowb

?

I do plan to investigate ioremap_wc() and friends in the future, but it's
been painful enough just dealing with the common case! I'll almost certainly
need your help with that too.

Will

--->8

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1660dde75e14..8ce298e09d54 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2524,26 +2524,30 @@ guarantees:
 
 	1. All readX() and writeX() accesses to the same peripheral are ordered
 	   with respect to each other. This ensures that MMIO register writes by
-	   the CPU to a particular device will arrive in program order.
-
-	2. A writeX() by the CPU to the peripheral will first wait for the
-	   completion of all prior CPU writes to memory. This ensures that
-	   writes by the CPU to an outbound DMA buffer allocated by
-	   dma_alloc_coherent() will be visible to a DMA engine when the CPU
-	   writes to its MMIO control register to trigger the transfer.
-
-	3. A readX() by the CPU from the peripheral will complete before any
-	   subsequent CPU reads from memory can begin. This ensures that reads
-	   by the CPU from an incoming DMA buffer allocated by
-	   dma_alloc_coherent() will not see stale data after reading from the
-	   DMA engine's MMIO status register to establish that the DMA transfer
-	   has completed.
-
-	4. A readX() by the CPU from the peripheral will complete before any
-	   subsequent delay() loop can begin execution. This ensures that two
-	   MMIO register writes by the CPU to a peripheral will arrive at least
-	   1us apart if the first write is immediately read back with readX()
-	   and udelay(1) is called prior to the second writeX():
+	   the same CPU thread to a particular device will arrive in program
+	   order.
+
+	2. A writeX() by a CPU thread to the peripheral will first wait for the
+	   completion of all prior writes to memory either issued by the thread
+	   or issued while holding a spinlock that was subsequently taken by the
+	   thread. This ensures that writes by the CPU to an outbound DMA
+	   buffer allocated by dma_alloc_coherent() will be visible to a DMA
+	   engine when the CPU writes to its MMIO control register to trigger
+	   the transfer.
+
+	3. A readX() by a CPU thread from the peripheral will complete before
+	   any subsequent reads from memory by the same thread can begin. This
+	   ensures that reads by the CPU from an incoming DMA buffer allocated
+	   by dma_alloc_coherent() will not see stale data after reading from
+	   the DMA engine's MMIO status register to establish that the DMA
+	   transfer has completed.
+
+	4. A readX() by a CPU thread from the peripheral will complete before
+	   any subsequent delay() loop can begin execution on the same thread.
+	   This ensures that two MMIO register writes by the CPU to a peripheral
+	   will arrive at least 1us apart if the first write is immediately read
+	   back with readX() and udelay(1) is called prior to the second
+	   writeX():
 
 		writel(42, DEVICE_REGISTER_0); // Arrives at the device...
 		readl(DEVICE_REGISTER_0);
@@ -2600,8 +2604,10 @@ guarantees:
 	These will perform appropriately for the type of access they're actually
 	doing, be it inX()/outX() or readX()/writeX().
 
-All of these accessors assume that the underlying peripheral is little-endian,
-and will therefore perform byte-swapping operations on big-endian architectures.
+With the exception of the string accessors (insX(), outsX(), readsX() and
+writesX()), all of the above assume that the underlying peripheral is
+little-endian and will therefore perform byte-swapping operations on big-endian
+architectures.
 
 
 ========================================

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-12 13:17         ` Will Deacon
@ 2019-04-15  4:05           ` Benjamin Herrenschmidt
  2019-04-16  9:13             ` Will Deacon
  0 siblings, 1 reply; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2019-04-15  4:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, linux-arch, Linux List Kernel Mailing,
	Paul E. McKenney, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

On Fri, 2019-04-12 at 14:17 +0100, Will Deacon wrote:
> 
> +	   the same CPU thread to a particular device will arrive in program
> +	   order.
> +
> +	2. A writeX() by a CPU thread to the peripheral will first wait for the
> +	   completion of all prior writes to memory either issued by the thread
> +	   or issued while holding a spinlock that was subsequently taken by the
> +	   thread. This ensures that writes by the CPU to an outbound DMA
> +	   buffer allocated by dma_alloc_coherent() will be visible to a DMA
> +	   engine when the CPU writes to its MMIO control register to trigger
> +	   the transfer.

Not particularily trying to be annoying here but I find the above
rather hard to parse :) I know what you're getting at but I'm not sure
somebody who doesn't will understand.

One way would be to instead prefix the whole thing with a blurb along
the lines of:

	readX() and writeX() provide some ordering guarantees versus
        each other and other memory accesses that are described below. 
	Those guarantees apply to accesses performed either by the same
        logical thread of execution, or by different threads but while 
        holding the same lock (spinlock or mutex).

Then have as simpler description of each case. No ?

> +	3. A readX() by a CPU thread from the peripheral will complete before
> +	   any subsequent reads from memory by the same thread can begin. This
> +	   ensures that reads by the CPU from an incoming DMA buffer allocated
> +	   by dma_alloc_coherent() will not see stale data after reading from
> +	   the DMA engine's MMIO status register to establish that the DMA
> +	   transfer has completed.
> +
> +	4. A readX() by a CPU thread from the peripheral will complete before
> +	   any subsequent delay() loop can begin execution on the same thread.
> +	   This ensures that two MMIO register writes by the CPU to a peripheral
> +	   will arrive at least 1us apart if the first write is immediately read
> +	   back with readX() and udelay(1) is called prior to the second
> +	   writeX():
>  
>  		writel(42, DEVICE_REGISTER_0); // Arrives at the device...
>  		readl(DEVICE_REGISTER_0);
> @@ -2600,8 +2604,10 @@ guarantees:
>  	These will perform appropriately for the type of access they're actually
>  	doing, be it inX()/outX() or readX()/writeX().
>  
> -All of these accessors assume that the underlying peripheral is little-endian,
> -and will therefore perform byte-swapping operations on big-endian architectures.
> +With the exception of the string accessors (insX(), outsX(), readsX() and
> +writesX()), all of the above assume that the underlying peripheral is
> +little-endian and will therefore perform byte-swapping operations on big-endian
> +architectures.
>  
>  
>  ========================================


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  2019-04-15  4:05           ` Benjamin Herrenschmidt
@ 2019-04-16  9:13             ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2019-04-16  9:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, linux-arch, Linux List Kernel Mailing,
	Paul E. McKenney, Michael Ellerman, Arnd Bergmann,
	Peter Zijlstra, Andrea Parri, Palmer Dabbelt, Daniel Lustig,
	David Howells, Alan Stern, Maciej W. Rozycki, Paul Burton,
	Ingo Molnar, Yoshinori Sato, Rich Felker, Tony Luck,
	Mikulas Patocka, Akira Yokosawa, Luis Chamberlain,
	Nicholas Piggin

Hi Ben,

On Mon, Apr 15, 2019 at 02:05:30PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2019-04-12 at 14:17 +0100, Will Deacon wrote:
> > 
> > +	   the same CPU thread to a particular device will arrive in program
> > +	   order.
> > +
> > +	2. A writeX() by a CPU thread to the peripheral will first wait for the
> > +	   completion of all prior writes to memory either issued by the thread
> > +	   or issued while holding a spinlock that was subsequently taken by the
> > +	   thread. This ensures that writes by the CPU to an outbound DMA
> > +	   buffer allocated by dma_alloc_coherent() will be visible to a DMA
> > +	   engine when the CPU writes to its MMIO control register to trigger
> > +	   the transfer.
> 
> Not particularily trying to be annoying here but I find the above
> rather hard to parse :) I know what you're getting at but I'm not sure
> somebody who doesn't will understand.
> 
> One way would be to instead prefix the whole thing with a blurb along
> the lines of:
> 
> 	readX() and writeX() provide some ordering guarantees versus
>         each other and other memory accesses that are described below. 
> 	Those guarantees apply to accesses performed either by the same
>         logical thread of execution, or by different threads but while 
>         holding the same lock (spinlock or mutex).
> 
> Then have as simpler description of each case. No ?

Argh, I think we've ended up confusing two different things in our edits:

  1. Ordering of readX()/writeX() between threads
  2. Ordering of memory accesses in one thread vs readX()/writeX() in another

and these are very different beasts.

For (1), with my mmiowb() patches we can provide some guarantees for
writeX() in conjunction with spinlocks. I'm not convinced we can provide
these same guarantees for combinations involving readX(). For example:

	CPU 1:
	val1 = readl(dev_base + REG1);
	flag = 1;
	spin_unlock(&dev_lock);

	CPU 2:
	spin_lock(&dev_lock);
	if (flag == 1)
		val2 = readl(dev_base + REG2);

In the case that CPU 2 sees the updated flag, do we require that CPU 1's readl()
reads from the device first? I'm not sure that RISC-V's implementation ensures
that readl() is ordered with a subsequent spin_unlock().

For (2), we would need to make this part of LKMM if we wanted to capture
the precise semantics here (e.g. by using the 'prop' relation to figure out
which writes are ordered by a writel). This is a pretty significant piece of
work, so perhaps just referring informally to propagation would be better for
the English text.

Updated diff below.

Will

--->8

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1660dde75e14..bc4c6a76c53a 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2524,26 +2524,36 @@ guarantees:
 
 	1. All readX() and writeX() accesses to the same peripheral are ordered
 	   with respect to each other. This ensures that MMIO register writes by
-	   the CPU to a particular device will arrive in program order.
-
-	2. A writeX() by the CPU to the peripheral will first wait for the
-	   completion of all prior CPU writes to memory. This ensures that
-	   writes by the CPU to an outbound DMA buffer allocated by
-	   dma_alloc_coherent() will be visible to a DMA engine when the CPU
-	   writes to its MMIO control register to trigger the transfer.
-
-	3. A readX() by the CPU from the peripheral will complete before any
-	   subsequent CPU reads from memory can begin. This ensures that reads
-	   by the CPU from an incoming DMA buffer allocated by
-	   dma_alloc_coherent() will not see stale data after reading from the
-	   DMA engine's MMIO status register to establish that the DMA transfer
-	   has completed.
-
-	4. A readX() by the CPU from the peripheral will complete before any
-	   subsequent delay() loop can begin execution. This ensures that two
-	   MMIO register writes by the CPU to a peripheral will arrive at least
-	   1us apart if the first write is immediately read back with readX()
-	   and udelay(1) is called prior to the second writeX():
+	   the same CPU thread to a particular device will arrive in program
+	   order.
+
+	2. A writeX() issued by a CPU thread holding a spinlock is ordered
+	   before a writeX() to the same peripheral from another CPU thread
+	   issued after a later acquisition of the same spinlock. This ensures
+	   that MMIO register writes to a particular device issued while holding
+	   a spinlock will arrive in an order consistent with acquisitions of
+	   the lock.
+
+	3. A writeX() by a CPU thread to the peripheral will first wait for the
+	   completion of all prior writes to memory either issued by, or
+	   propagated to, the same thread. This ensures that writes by the CPU
+	   to an outbound DMA buffer allocated by dma_alloc_coherent() will be
+	   visible to a DMA engine when the CPU writes to its MMIO control
+	   register to trigger the transfer.
+
+	4. A readX() by a CPU thread from the peripheral will complete before
+	   any subsequent reads from memory by the same thread can begin. This
+	   ensures that reads by the CPU from an incoming DMA buffer allocated
+	   by dma_alloc_coherent() will not see stale data after reading from
+	   the DMA engine's MMIO status register to establish that the DMA
+	   transfer has completed.
+
+	5. A readX() by a CPU thread from the peripheral will complete before
+	   any subsequent delay() loop can begin execution on the same thread.
+	   This ensures that two MMIO register writes by the CPU to a peripheral
+	   will arrive at least 1us apart if the first write is immediately read
+	   back with readX() and udelay(1) is called prior to the second
+	   writeX():
 
 		writel(42, DEVICE_REGISTER_0); // Arrives at the device...
 		readl(DEVICE_REGISTER_0);
@@ -2559,10 +2569,11 @@ guarantees:
 
 	These are similar to readX() and writeX(), but provide weaker memory
 	ordering guarantees. Specifically, they do not guarantee ordering with
-	respect to normal memory accesses or delay() loops (i.e. bullets 2-4
-	above) but they are still guaranteed to be ordered with respect to other
-	accesses to the same peripheral when operating on __iomem pointers
-	mapped with the default I/O attributes.
+	respect to locking, normal memory accesses or delay() loops (i.e.
+	bullets 2-5 above) but they are still guaranteed to be ordered with
+	respect to other accesses from the same CPU thread to the same
+	peripheral when operating on __iomem pointers mapped with the default
+	I/O attributes.
 
  (*) readsX(), writesX():
 
@@ -2600,8 +2611,10 @@ guarantees:
 	These will perform appropriately for the type of access they're actually
 	doing, be it inX()/outX() or readX()/writeX().
 
-All of these accessors assume that the underlying peripheral is little-endian,
-and will therefore perform byte-swapping operations on big-endian architectures.
+With the exception of the string accessors (insX(), outsX(), readsX() and
+writesX()), all of the above assume that the underlying peripheral is
+little-endian and will therefore perform byte-swapping operations on big-endian
+architectures.
 
 
 ========================================

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-04-16  9:14 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
2019-04-05 13:59 ` [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Will Deacon
2019-04-10 10:58   ` Ingo Molnar
2019-04-10 12:28     ` Will Deacon
2019-04-11 11:00       ` Ingo Molnar
2019-04-11 22:12   ` Benjamin Herrenschmidt
2019-04-11 22:34     ` Linus Torvalds
2019-04-12  2:07       ` Benjamin Herrenschmidt
2019-04-12 13:17         ` Will Deacon
2019-04-15  4:05           ` Benjamin Herrenschmidt
2019-04-16  9:13             ` Will Deacon
2019-04-05 13:59 ` [PATCH v2 02/21] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
2019-04-05 13:59 ` [PATCH v2 03/21] arch: Use asm-generic header for asm/mmiowb.h Will Deacon
2019-04-05 13:59 ` [PATCH v2 04/21] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors Will Deacon
2019-04-05 13:59 ` [PATCH v2 05/21] ARM/io: Remove useless definition of mmiowb() Will Deacon
2019-04-05 13:59 ` [PATCH v2 06/21] arm64/io: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 07/21] x86/io: " Will Deacon
2019-04-05 14:14   ` Thomas Gleixner
2019-04-05 13:59 ` [PATCH v2 08/21] nds32/io: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 09/21] m68k/io: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 10/21] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() Will Deacon
2019-04-05 13:59 ` [PATCH v2 11/21] mips/mmiowb: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 12/21] ia64/mmiowb: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 13/21] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code Will Deacon
2019-04-05 13:59 ` [PATCH v2 14/21] riscv/mmiowb: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 15/21] Documentation: Kill all references to mmiowb() Will Deacon
2019-04-05 13:59 ` [PATCH v2 16/21] drivers: Remove useless trailing comments from mmiowb() invocations Will Deacon
2019-04-05 13:59 ` [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb() Will Deacon
2019-04-05 15:50   ` Linus Torvalds
2019-04-09  9:00     ` Nicholas Piggin
2019-04-09 13:46       ` Will Deacon
2019-04-10  0:25         ` Nicholas Piggin
2019-04-05 13:59 ` [PATCH v2 18/21] scsi/qla1280: Remove stale comment about mmiowb() Will Deacon
2019-04-05 13:59 ` [PATCH v2 19/21] i40iw: Redefine i40iw_mmiowb() to do nothing Will Deacon
2019-04-05 13:59 ` [PATCH v2 20/21] net/ethernet/silan/sc92031: Remove stale comment about mmiowb() Will Deacon
2019-04-05 13:59 ` [PATCH v2 21/21] arch: Remove dummy mmiowb() definitions from arch code Will Deacon
2019-04-05 15:55 ` [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
2019-04-05 16:09   ` Will Deacon
2019-04-05 16:15     ` Linus Torvalds
2019-04-05 16:30       ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).