From: Will Deacon <will.deacon@arm.com>
To: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Will Deacon <will.deacon@arm.com>,
"Paul E. McKenney" <paulmck@linux.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Arnd Bergmann <arnd@arndb.de>,
Peter Zijlstra <peterz@infradead.org>,
Andrea Parri <andrea.parri@amarulasolutions.com>,
Palmer Dabbelt <palmer@sifive.com>,
Daniel Lustig <dlustig@nvidia.com>,
David Howells <dhowells@redhat.com>,
Alan Stern <stern@rowland.harvard.edu>,
Linus Torvalds <torvalds@linux-foundation.org>,
"Maciej W. Rozycki" <macro@linux-mips.org>,
Paul Burton <paul.burton@mips.com>,
Ingo Molnar <mingo@kernel.org>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Rich Felker <dalias@libc.org>, Tony Luck <tony.luck@intel.com>,
Mikulas Patocka <mpatocka@redhat.com>,
Akira Yokosawa <akiyks@gmail.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Nicholas Piggin <npiggin@gmail.com>
Subject: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section
Date: Fri, 5 Apr 2019 14:59:16 +0100 [thread overview]
Message-ID: <20190405135936.7266-2-will.deacon@arm.com> (raw)
In-Reply-To: <20190405135936.7266-1-will.deacon@arm.com>
The "KERNEL I/O BARRIER EFFECTS" section of memory-barriers.txt is vague,
x86-centric, out-of-date, incomplete and demonstrably incorrect in places.
This is largely because I/O ordering is a horrible can of worms, but also
because the document has stagnated as our understanding has evolved.
Attempt to address some of that, by rewriting the section based on
recent(-ish) discussions with Arnd, BenH and others. Maybe one day we'll
find a way to formalise this stuff, but for now let's at least try to
make the English easier to understand.
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
Documentation/memory-barriers.txt | 115 +++++++++++++++++++++++---------------
1 file changed, 70 insertions(+), 45 deletions(-)
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1c22b21ae922..5eb6f4c6a133 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2599,72 +2599,97 @@ likely, then interrupt-disabling locks should be used to guarantee ordering.
KERNEL I/O BARRIER EFFECTS
==========================
-When accessing I/O memory, drivers should use the appropriate accessor
-functions:
+Interfacing with peripherals via I/O accesses is deeply architecture and device
+specific. Therefore, drivers which are inherently non-portable may rely on
+specific behaviours of their target systems in order to achieve synchronization
+in the most lightweight manner possible. For drivers intending to be portable
+between multiple architectures and bus implementations, the kernel offers a
+series of accessor functions that provide various degrees of ordering
+guarantees:
- (*) inX(), outX():
+ (*) readX(), writeX():
- These are intended to talk to I/O space rather than memory space, but
- that's primarily a CPU-specific concept. The i386 and x86_64 processors
- do indeed have special I/O space access cycles and instructions, but many
- CPUs don't have such a concept.
+ The readX() and writeX() MMIO accessors take a pointer to the peripheral
+ being accessed as an __iomem * parameter. For pointers mapped with the
+ default I/O attributes (e.g. those returned by ioremap()), then the
+ ordering guarantees are as follows:
- The PCI bus, amongst others, defines an I/O space concept which - on such
- CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O
- space. However, it may also be mapped as a virtual I/O space in the CPU's
- memory map, particularly on those CPUs that don't support alternate I/O
- spaces.
+ 1. All readX() and writeX() accesses to the same peripheral are ordered
+ with respect to each other. For example, this ensures that MMIO register
+ writes by the CPU to a particular device will arrive in program order.
- Accesses to this space may be fully synchronous (as on i386), but
- intermediary bridges (such as the PCI host bridge) may not fully honour
- that.
+ 2. A writeX() by the CPU to the peripheral will first wait for the
+ completion of all prior CPU writes to memory. For example, this ensures
+ that writes by the CPU to an outbound DMA buffer allocated by
+ dma_alloc_coherent() will be visible to a DMA engine when the CPU writes
+ to its MMIO control register to trigger the transfer.
- They are guaranteed to be fully ordered with respect to each other.
+ 3. A readX() by the CPU from the peripheral will complete before any
+ subsequent CPU reads from memory can begin. For example, this ensures
+ that reads by the CPU from an incoming DMA buffer allocated by
+ dma_alloc_coherent() will not see stale data after reading from the DMA
+ engine's MMIO status register to establish that the DMA transfer has
+ completed.
- They are not guaranteed to be fully ordered with respect to other types of
- memory and I/O operation.
+ 4. A readX() by the CPU from the peripheral will complete before any
+ subsequent delay() loop can begin execution. For example, this ensures
+ that two MMIO register writes by the CPU to a peripheral will arrive at
+ least 1us apart if the first write is immediately read back with readX()
+ and udelay(1) is called prior to the second writeX().
- (*) readX(), writeX():
+ __iomem pointers obtained with non-default attributes (e.g. those returned
+ by ioremap_wc()) are unlikely to provide many of these guarantees.
- Whether these are guaranteed to be fully ordered and uncombined with
- respect to each other on the issuing CPU depends on the characteristics
- defined for the memory window through which they're accessing. On later
- i386 architecture machines, for example, this is controlled by way of the
- MTRR registers.
+ (*) readX_relaxed(), writeX_relaxed():
- Ordinarily, these will be guaranteed to be fully ordered and uncombined,
- provided they're not accessing a prefetchable device.
+ These are similar to readX() and writeX(), but provide weaker memory
+ ordering guarantees. Specifically, they do not guarantee ordering with
+ respect to normal memory accesses or delay() loops (i.e bullets 2-4 above)
+ but they are still guaranteed to be ordered with respect to other accesses
+ to the same peripheral when operating on __iomem pointers mapped with the
+ default I/O attributes.
- However, intermediary hardware (such as a PCI bridge) may indulge in
- deferral if it so wishes; to flush a store, a load from the same location
- is preferred[*], but a load from the same device or from configuration
- space should suffice for PCI.
+ (*) readsX(), writesX():
- [*] NOTE! attempting to load from the same location as was written to may
- cause a malfunction - consider the 16550 Rx/Tx serial registers for
- example.
+ The readsX() and writesX() MMIO accessors are designed for accessing
+ register-based, memory-mapped FIFOs residing on peripherals that are not
+ capable of performing DMA. Consequently, they provide only the ordering
+ guarantees of readX_relaxed() and writeX_relaxed(), as documented above.
- Used with prefetchable I/O memory, an mmiowb() barrier may be required to
- force stores to be ordered.
+ (*) inX(), outX():
- Please refer to the PCI specification for more information on interactions
- between PCI transactions.
+ The inX() and outX() accessors are intended to access legacy port-mapped
+ I/O peripherals, which may require special instructions on some
+ architectures (notably x86). The port number of the peripheral being
+ accessed is passed as an argument.
- (*) readX_relaxed(), writeX_relaxed()
+ Since many CPU architectures ultimately access these peripherals via an
+ internal virtual memory mapping, the portable ordering guarantees provided
+ by inX() and outX() are the same as those provided by readX() and writeX()
+ respectively when accessing a mapping with the default I/O attributes.
- These are similar to readX() and writeX(), but provide weaker memory
- ordering guarantees. Specifically, they do not guarantee ordering with
- respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
- ordering with respect to LOCK or UNLOCK operations. If the latter is
- required, an mmiowb() barrier can be used. Note that relaxed accesses to
- the same peripheral are guaranteed to be ordered with respect to each
- other.
+ Device drivers may expect outX() to emit a non-posted write transaction
+ that waits for a completion response from the I/O peripheral before
+ returning. This is not guaranteed by all architectures and is therefore
+ not part of the portable ordering semantics.
+
+ (*) insX(), outsX():
+
+ As above, the insX() and outsX() accessors provide the same ordering
+ guarantees as readsX() and writesX() respectively when accessing a mapping
+ with the default I/O attributes.
(*) ioreadX(), iowriteX()
These will perform appropriately for the type of access they're actually
doing, be it inX()/outX() or readX()/writeX().
+All of these accessors assume that the underlying peripheral is little-endian,
+and will therefore perform byte-swapping operations on big-endian architectures.
+
+Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK
+operations is a dangerous sport which may require the use of mmiowb(). See the
+subsection "Acquires vs I/O accesses" for more information.
========================================
ASSUMED MINIMUM EXECUTION ORDERING MODEL
--
2.11.0
next prev parent reply other threads:[~2019-04-05 13:59 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-05 13:59 [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Will Deacon
2019-04-05 13:59 ` Will Deacon [this message]
2019-04-10 10:58 ` [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Ingo Molnar
2019-04-10 12:28 ` Will Deacon
2019-04-11 11:00 ` Ingo Molnar
2019-04-11 22:12 ` Benjamin Herrenschmidt
2019-04-11 22:34 ` Linus Torvalds
2019-04-12 2:07 ` Benjamin Herrenschmidt
2019-04-12 13:17 ` Will Deacon
2019-04-15 4:05 ` Benjamin Herrenschmidt
2019-04-16 9:13 ` Will Deacon
2019-04-05 13:59 ` [PATCH v2 02/21] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking Will Deacon
2019-04-05 13:59 ` [PATCH v2 03/21] arch: Use asm-generic header for asm/mmiowb.h Will Deacon
2019-04-05 13:59 ` [PATCH v2 04/21] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors Will Deacon
2019-04-05 13:59 ` [PATCH v2 05/21] ARM/io: Remove useless definition of mmiowb() Will Deacon
2019-04-05 13:59 ` [PATCH v2 06/21] arm64/io: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 07/21] x86/io: " Will Deacon
2019-04-05 14:14 ` Thomas Gleixner
2019-04-05 13:59 ` [PATCH v2 08/21] nds32/io: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 09/21] m68k/io: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 10/21] sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() Will Deacon
2019-04-05 13:59 ` [PATCH v2 11/21] mips/mmiowb: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 12/21] ia64/mmiowb: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 13/21] powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code Will Deacon
2019-04-05 13:59 ` [PATCH v2 14/21] riscv/mmiowb: " Will Deacon
2019-04-05 13:59 ` [PATCH v2 15/21] Documentation: Kill all references to mmiowb() Will Deacon
2019-04-05 13:59 ` [PATCH v2 16/21] drivers: Remove useless trailing comments from mmiowb() invocations Will Deacon
2019-04-05 13:59 ` [PATCH v2 17/21] drivers: Remove explicit invocations of mmiowb() Will Deacon
2019-04-05 15:50 ` Linus Torvalds
2019-04-09 9:00 ` Nicholas Piggin
2019-04-09 13:46 ` Will Deacon
2019-04-10 0:25 ` Nicholas Piggin
2019-04-05 13:59 ` [PATCH v2 18/21] scsi/qla1280: Remove stale comment about mmiowb() Will Deacon
2019-04-05 13:59 ` [PATCH v2 19/21] i40iw: Redefine i40iw_mmiowb() to do nothing Will Deacon
2019-04-05 13:59 ` [PATCH v2 20/21] net/ethernet/silan/sc92031: Remove stale comment about mmiowb() Will Deacon
2019-04-05 13:59 ` [PATCH v2 21/21] arch: Remove dummy mmiowb() definitions from arch code Will Deacon
2019-04-05 15:55 ` [PATCH v2 00/21] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Linus Torvalds
2019-04-05 16:09 ` Will Deacon
2019-04-05 16:15 ` Linus Torvalds
2019-04-05 16:30 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190405135936.7266-2-will.deacon@arm.com \
--to=will.deacon@arm.com \
--cc=akiyks@gmail.com \
--cc=andrea.parri@amarulasolutions.com \
--cc=arnd@arndb.de \
--cc=benh@kernel.crashing.org \
--cc=dalias@libc.org \
--cc=dhowells@redhat.com \
--cc=dlustig@nvidia.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=macro@linux-mips.org \
--cc=mcgrof@kernel.org \
--cc=mingo@kernel.org \
--cc=mpatocka@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=palmer@sifive.com \
--cc=paul.burton@mips.com \
--cc=paulmck@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=stern@rowland.harvard.edu \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=ysato@users.sourceforge.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).