All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: io_ordering.rst vs. memory-barriers.txt
       [not found] <1924eda8-aea6-da64-04a7-35f3327a7f4f@cybernetics.com>
@ 2022-11-08 23:07 ` Tony Battersby
  2022-11-09  0:28   ` Akira Yokosawa
  0 siblings, 1 reply; 3+ messages in thread
From: Tony Battersby @ 2022-11-08 23:07 UTC (permalink / raw)
  To: Will Deacon, Jonathan Corbet
  Cc: Alan Stern, Andrea Parri, Peter Zijlstra, Boqun Feng,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	linux-kernel, linux-arch

(resending in plaintext; damn Thunderbird upgrades...)

While looking up documentation for PCI write posting I noticed that the
example in Documentation/driver-api/io_ordering.rst seems to contradict
Documentation/memory-barriers.txt:

-----------

Documentation/driver-api/io_ordering.rst:

On some platforms, so-called memory-mapped I/O is weakly ordered.  On such
platforms, driver writers are responsible for ensuring that I/O writes to
memory-mapped addresses on their device arrive in the order intended.  This is
typically done by reading a 'safe' device or bridge register, causing the I/O
chipset to flush pending writes to the device before any reads are posted.  A
driver would usually use this technique immediately prior to the exit of a
critical section of code protected by spinlocks.  This would ensure that
subsequent writes to I/O space arrived only after all prior writes (much like a
memory barrier op, mb(), only with respect to I/O).

A more concrete example from a hypothetical device driver::

		...
	CPU A:  spin_lock_irqsave(&dev_lock, flags)
	CPU A:  val = readl(my_status);
	CPU A:  ...
	CPU A:  writel(newval, ring_ptr);
	CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
		...
	CPU B:  spin_lock_irqsave(&dev_lock, flags)
	CPU B:  val = readl(my_status);
	CPU B:  ...
	CPU B:  writel(newval2, ring_ptr);
	CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
		...

In the case above, the device may receive newval2 before it receives newval,
which could cause problems.  Fixing it is easy enough though::

		...
	CPU A:  spin_lock_irqsave(&dev_lock, flags)
	CPU A:  val = readl(my_status);
	CPU A:  ...
	CPU A:  writel(newval, ring_ptr);
	CPU A:  (void)readl(safe_register); /* maybe a config register? */
	CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
		...
	CPU B:  spin_lock_irqsave(&dev_lock, flags)
	CPU B:  val = readl(my_status);
	CPU B:  ...
	CPU B:  writel(newval2, ring_ptr);
	CPU B:  (void)readl(safe_register); /* maybe a config register? */
	CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

Here, the reads from safe_register will cause the I/O chipset to flush any
pending writes before actually posting the read to the chipset, preventing
possible data corruption.

-----------

Documentation/memory-barriers.txt:

==========================
KERNEL I/O BARRIER EFFECTS
==========================

Interfacing with peripherals via I/O accesses is deeply architecture and device
specific. Therefore, drivers which are inherently non-portable may rely on
specific behaviours of their target systems in order to achieve synchronization
in the most lightweight manner possible. For drivers intending to be portable
between multiple architectures and bus implementations, the kernel offers a
series of accessor functions that provide various degrees of ordering
guarantees:

 (*) readX(), writeX():

	The readX() and writeX() MMIO accessors take a pointer to the
	peripheral being accessed as an __iomem * parameter. For pointers
	mapped with the default I/O attributes (e.g. those returned by
	ioremap()), the ordering guarantees are as follows:

	1. All readX() and writeX() accesses to the same peripheral are ordered
	   with respect to each other. This ensures that MMIO register accesses
	   by the same CPU thread to a particular device will arrive in program
	   order.

	2. A writeX() issued by a CPU thread holding a spinlock is ordered
	   before a writeX() to the same peripheral from another CPU thread
	   issued after a later acquisition of the same spinlock. This ensures
	   that MMIO register writes to a particular device issued while holding
	   a spinlock will arrive in an order consistent with acquisitions of
	   the lock.

-----------

To summarize:

io_ordering.rst says to use readX() before spin_unlock to order writeX()
calls (on some platforms).

memory-barriers.txt says writeX() calls are already ordered when holding
the same spinlock.

So...which one is correct?


There is another example of flushing posted PCI writes at the bottom of
Documentation/PCI/pci.rst, but that one is consistent with
memory-barriers.txt.

Tony Battersby
Cybernetics


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: io_ordering.rst vs. memory-barriers.txt
  2022-11-08 23:07 ` io_ordering.rst vs. memory-barriers.txt Tony Battersby
@ 2022-11-09  0:28   ` Akira Yokosawa
  2022-11-09  9:38     ` Arnd Bergmann
  0 siblings, 1 reply; 3+ messages in thread
From: Akira Yokosawa @ 2022-11-09  0:28 UTC (permalink / raw)
  To: Tony Battersby, Will Deacon, Jonathan Corbet
  Cc: Alan Stern, Andrea Parri, Peter Zijlstra, Boqun Feng,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Daniel Lustig, Joel Fernandes, linux-kernel,
	linux-arch, Akira Yokosawa

Hi Tony,

On Tue, 8 Nov 2022 18:07:37 -0500, Tony Battersby wrote:
> (resending in plaintext; damn Thunderbird upgrades...)
> 
> While looking up documentation for PCI write posting I noticed that the
> example in Documentation/driver-api/io_ordering.rst seems to contradict
> Documentation/memory-barriers.txt:
> 
> -----------
> 
> Documentation/driver-api/io_ordering.rst:
> 
> On some platforms, so-called memory-mapped I/O is weakly ordered.  On such
> platforms, driver writers are responsible for ensuring that I/O writes to
> memory-mapped addresses on their device arrive in the order intended.  This is
> typically done by reading a 'safe' device or bridge register, causing the I/O
> chipset to flush pending writes to the device before any reads are posted.  A
> driver would usually use this technique immediately prior to the exit of a
> critical section of code protected by spinlocks.  This would ensure that
> subsequent writes to I/O space arrived only after all prior writes (much like a
> memory barrier op, mb(), only with respect to I/O).
> 
> A more concrete example from a hypothetical device driver::
> 
> 		...
> 	CPU A:  spin_lock_irqsave(&dev_lock, flags)
> 	CPU A:  val = readl(my_status);
> 	CPU A:  ...
> 	CPU A:  writel(newval, ring_ptr);
> 	CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
> 		...
> 	CPU B:  spin_lock_irqsave(&dev_lock, flags)
> 	CPU B:  val = readl(my_status);
> 	CPU B:  ...
> 	CPU B:  writel(newval2, ring_ptr);
> 	CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
> 		...
> 
> In the case above, the device may receive newval2 before it receives newval,
> which could cause problems.  Fixing it is easy enough though::
> 
> 		...
> 	CPU A:  spin_lock_irqsave(&dev_lock, flags)
> 	CPU A:  val = readl(my_status);
> 	CPU A:  ...
> 	CPU A:  writel(newval, ring_ptr);
> 	CPU A:  (void)readl(safe_register); /* maybe a config register? */
> 	CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
> 		...
> 	CPU B:  spin_lock_irqsave(&dev_lock, flags)
> 	CPU B:  val = readl(my_status);
> 	CPU B:  ...
> 	CPU B:  writel(newval2, ring_ptr);
> 	CPU B:  (void)readl(safe_register); /* maybe a config register? */
> 	CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
> 
> Here, the reads from safe_register will cause the I/O chipset to flush any
> pending writes before actually posting the read to the chipset, preventing
> possible data corruption.
> 
> -----------
> 
> Documentation/memory-barriers.txt:
> 
> ==========================
> KERNEL I/O BARRIER EFFECTS
> ==========================
> 
> Interfacing with peripherals via I/O accesses is deeply architecture and device
> specific. Therefore, drivers which are inherently non-portable may rely on
> specific behaviours of their target systems in order to achieve synchronization
> in the most lightweight manner possible. For drivers intending to be portable
> between multiple architectures and bus implementations, the kernel offers a
> series of accessor functions that provide various degrees of ordering
> guarantees:
> 
>  (*) readX(), writeX():
> 
> 	The readX() and writeX() MMIO accessors take a pointer to the
> 	peripheral being accessed as an __iomem * parameter. For pointers
> 	mapped with the default I/O attributes (e.g. those returned by
> 	ioremap()), the ordering guarantees are as follows:
> 
> 	1. All readX() and writeX() accesses to the same peripheral are ordered
> 	   with respect to each other. This ensures that MMIO register accesses
> 	   by the same CPU thread to a particular device will arrive in program
> 	   order.
> 
> 	2. A writeX() issued by a CPU thread holding a spinlock is ordered
> 	   before a writeX() to the same peripheral from another CPU thread
> 	   issued after a later acquisition of the same spinlock. This ensures
> 	   that MMIO register writes to a particular device issued while holding
> 	   a spinlock will arrive in an order consistent with acquisitions of
> 	   the lock.
> 
> -----------
> 
> To summarize:
> 
> io_ordering.rst says to use readX() before spin_unlock to order writeX()
> calls (on some platforms).
> 
> memory-barriers.txt says writeX() calls are already ordered when holding
> the same spinlock.
> 
> So...which one is correct?

From quick glance of io_ordering.rst's git history, contents of this file
is never updated since the beginning of Git history (v2.6.12.rc2).

Which strongly suggests that you can ignore io_ordering.rst.

        Thanks, Akira

PS:
Do we need to keep that outdated document???

I think Documentation/driver-api/device-io.rst is the one properly
maintained.

> 
> 
> There is another example of flushing posted PCI writes at the bottom of
> Documentation/PCI/pci.rst, but that one is consistent with
> memory-barriers.txt.
> 
> Tony Battersby
> Cybernetics
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: io_ordering.rst vs. memory-barriers.txt
  2022-11-09  0:28   ` Akira Yokosawa
@ 2022-11-09  9:38     ` Arnd Bergmann
  0 siblings, 0 replies; 3+ messages in thread
From: Arnd Bergmann @ 2022-11-09  9:38 UTC (permalink / raw)
  To: Akira Yokosawa, Tony Battersby, Will Deacon, Jonathan Corbet
  Cc: Alan Stern, Andrea Parri, Peter Zijlstra, Boqun Feng,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Daniel Lustig, Joel Fernandes, linux-kernel,
	Linux-Arch

On Wed, Nov 9, 2022, at 01:28, Akira Yokosawa wrote:

> From quick glance of io_ordering.rst's git history, contents of this file
> is never updated since the beginning of Git history (v2.6.12.rc2).
>
> Which strongly suggests that you can ignore io_ordering.rst.
>
I managed to track the origin of the document further to a bitkeeper
pull request and a patch on the ia64 mailing list:

https://lore.kernel.org/all/200304091927.h39JRob0010157@napali.hpl.hp.com/
https://marc.info/?l=git-commits-head&m=104992658231313&w=1

While the document that was added here (and survives to this day)
talks about MMIO (writel), the code changes in this patch appear
to be only about PIO (outl), so I suspect that it already mixed
things up at the time.

> PS:
> Do we need to keep that outdated document???
>
> I think Documentation/driver-api/device-io.rst is the one properly
> maintained.

I think we need to have one location that properly documents
what can and cannot go wrong with posted writel() vs spinlock.
At the moment, device-io.rst refers to this one saying:

  "Note that posted writes are not strictly ordered against a spinlock,
  see Documentation/driver-api/io_ordering.rst."

The same document recently gained a section on ioremap_np() that
explains some of this better, and it also contradicts io_ordering.rst
by saying

   "A driver author must issue a read from the same device to ensure
   that writes have occurred in the specific cases the author cares."

which I think implies that the "problem" example in io_ordering.rst
is actually fine as long as "my_status" and "ring_ptr" are part
of the same device: while the writel() can still leak past the
spin_unlock(), it won't ever bypass the spin_lock()+readl() on
another CPU, right?

      Arnd

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-09  9:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1924eda8-aea6-da64-04a7-35f3327a7f4f@cybernetics.com>
2022-11-08 23:07 ` io_ordering.rst vs. memory-barriers.txt Tony Battersby
2022-11-09  0:28   ` Akira Yokosawa
2022-11-09  9:38     ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.