linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LKMM: Read dependencies of writes ordered by dma_wmb()?
@ 2021-08-16 10:12 Marco Elver
  2021-08-16 14:59 ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Marco Elver @ 2021-08-16 10:12 UTC (permalink / raw)
  To: Paul E. McKenney, Boqun Feng, Alan Stern, Andrea Parri,
	Will Deacon, Mark Rutland
  Cc: Dmitry Vyukov, kasan-dev, linux-kernel

Hello,

Commit c58a801701693 added a paragraph to the LKMM:

	+Although we said that plain accesses are not linked by the ppo
	+relation, they do contribute to it indirectly.  Namely, when there is
	+an address dependency from a marked load R to a plain store W,
	+followed by smp_wmb() and then a marked store W', the LKMM creates a
	+ppo link from R to W'.

Defining that certain _marked reads_ will also be ordered by smp_wmb().
But otherwise, other reads (especially plain reads!) will _never_ be
ordered by smp_wmb(). Is my understanding correct?

I am asking because KCSAN is growing limited support for weak memory
modeling and memory barriers, and I'm trying to figure out if I'm seeing
a false positive or genuinely allowed race.

One caveat is the case I'm trying to understand doesn't involve just 2
CPUs but also a device. And for now, I'm assuming that dma_wmb() is as
strong as smp_wmb() also wrt other CPUs (but my guess is this
assumption is already too strong).

The whole area of the memory model that includes talking to devices and
devices talking back to CPUs seems quite murky, and need to confirm that
I either got it right or wrong. :-)

The report (explained below):

| assert no accesses to 0xffff8880077b5500 of 232 bytes by interrupt on cpu 1:
|  __cache_free mm/slab.c:3450 [inline]
|  kmem_cache_free+0x4b/0xe0 mm/slab.c:3740
|  kfree_skbmem net/core/skbuff.c:709 [inline]
|  __kfree_skb+0x145/0x190 net/core/skbuff.c:745
|  consume_skb+0x6d/0x190 net/core/skbuff.c:900
|  __dev_kfree_skb_any+0xb8/0xc0 net/core/dev.c:3195
|  dev_kfree_skb_any include/linux/netdevice.h:3979 [inline]
|  e1000_unmap_and_free_tx_resource drivers/net/ethernet/intel/e1000/e1000_main.c:1969 [inline]
|  e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3859 [inline]
|  e1000_clean+0x302/0x2080 drivers/net/ethernet/intel/e1000/e1000_main.c:3800
|  __napi_poll+0x81/0x430 net/core/dev.c:7019
|  napi_poll net/core/dev.c:7086 [inline]
|  net_rx_action+0x2cf/0x6b0 net/core/dev.c:7173
|  __do_softirq+0x12c/0x275 kernel/softirq.c:558
| [...]
| 
| read (reordered) to 0xffff8880077b5570 of 4 bytes by task 1985 on cpu 0:
|  skb_headlen include/linux/skbuff.h:2139 [inline]
|  e1000_tx_map drivers/net/ethernet/intel/e1000/e1000_main.c:2829 [inline]
|  e1000_xmit_frame+0x12fd/0x2720 drivers/net/ethernet/intel/e1000/e1000_main.c:3243
|  __netdev_start_xmit include/linux/netdevice.h:4944 [inline]
|  netdev_start_xmit include/linux/netdevice.h:4958 [inline]
|  xmit_one+0x103/0x2c0 net/core/dev.c:3658
|  dev_hard_start_xmit+0x70/0x130 net/core/dev.c:3674
|  sch_direct_xmit+0x1e5/0x600 net/sched/sch_generic.c:342
|  __dev_xmit_skb net/core/dev.c:3874 [inline]
|  __dev_queue_xmit+0xd26/0x1990 net/core/dev.c:4241
|  dev_queue_xmit+0x1d/0x30 net/core/dev.c:4306
| [...]
|   |
|   +-> reordered to: e1000_xmit_frame+0x2294/0x2720 drivers/net/ethernet/intel/e1000/e1000_main.c:3282

KCSAN is saying there is a potential use-after-free read of an skb due
to the read to 0xffff8880077b5570 potentially being delayed/reordered
later. If the memory was reallocated and reused concurrently, the read
could read garbage data:

1.	The e1000 driver is being instructed to transmit in
	e1000_xmit_frame(). Here it uses the data in the skb in various
	places (e.g. in skb_headlen() above) to set up a new element in
	the ring buffer to be consumed by the device via DMA.

2.	Eventually it calls e1000_tx_queue(), which seems to publish the
	next entry into the ring buffer and finally calls dma_wmb().
	Until this point I see no other barriers (although there's a
	writel(), but it doesn't always seem to be called).

3.	e1000_clean_tx_irq() is called on another CPU after transmit
	completes, and we know the device has consumed that entry from
	the ring buffer. At this point the driver then says that the
	associated skb can be kfree()'d.

4.	If I interpreted dma_wmb() (and smp_wmb()) right, plain reads
	may be reordered after it, irrespective if a write that depended
	on such reads was ordered by the wmb(). Which means the
	reordering of the plain reads accessing the skb before it may in
	fact happen concurrently with the kfree() of skb if reordered
	after. For example reordered to the very end of
	e1000_xmit_frame() (line 3282) as KCSAN simulated in this case.

Is the above result allowed by the kernel's memory model?

In practice, my guess is no compiler and architecture combination would
allow this today; or is there an arch where it could?

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-16 10:12 LKMM: Read dependencies of writes ordered by dma_wmb()? Marco Elver
@ 2021-08-16 14:59 ` Alan Stern
  2021-08-16 17:23   ` Marco Elver
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2021-08-16 14:59 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Boqun Feng, Andrea Parri, Will Deacon,
	Mark Rutland, Dmitry Vyukov, kasan-dev, linux-kernel

On Mon, Aug 16, 2021 at 12:12:01PM +0200, Marco Elver wrote:
> Hello,
> 
> Commit c58a801701693 added a paragraph to the LKMM:
> 
> 	+Although we said that plain accesses are not linked by the ppo
> 	+relation, they do contribute to it indirectly.  Namely, when there is
> 	+an address dependency from a marked load R to a plain store W,
> 	+followed by smp_wmb() and then a marked store W', the LKMM creates a
> 	+ppo link from R to W'.
> 
> Defining that certain _marked reads_ will also be ordered by smp_wmb().
> But otherwise, other reads (especially plain reads!) will _never_ be
> ordered by smp_wmb(). Is my understanding correct?

The ordering is indirect, but yes.

> I am asking because KCSAN is growing limited support for weak memory
> modeling and memory barriers, and I'm trying to figure out if I'm seeing
> a false positive or genuinely allowed race.
> 
> One caveat is the case I'm trying to understand doesn't involve just 2
> CPUs but also a device. And for now, I'm assuming that dma_wmb() is as
> strong as smp_wmb() also wrt other CPUs (but my guess is this
> assumption is already too strong).

I'm not sure that is right.  dma_wmb affects the visibility of writes to 
a DMA buffer from the point of view of the device, not necessarily from 
the point of view of other CPUs.  At least, there doesn't seem to be any 
claim in memory-barriers.txt that it does so.

> The whole area of the memory model that includes talking to devices and
> devices talking back to CPUs seems quite murky, and need to confirm that
> I either got it right or wrong. :-)

The LKMM itself doesn't include anything about device I/O.  So you're 
already going beyond the known limits.  :-)

...

> KCSAN is saying there is a potential use-after-free read of an skb due
> to the read to 0xffff8880077b5570 potentially being delayed/reordered
> later. If the memory was reallocated and reused concurrently, the read
> could read garbage data:
> 
> 1.	The e1000 driver is being instructed to transmit in
> 	e1000_xmit_frame(). Here it uses the data in the skb in various
> 	places (e.g. in skb_headlen() above) to set up a new element in
> 	the ring buffer to be consumed by the device via DMA.

You mean here the driver reads some stuff from the skb, right?  And 
various writes depend on the data that was read, but these dependencies 
aren't evident to the memory model because they all involve plain 
accesses.

> 2.	Eventually it calls e1000_tx_queue(), which seems to publish the
> 	next entry into the ring buffer and finally calls dma_wmb().
> 	Until this point I see no other barriers (although there's a
> 	writel(), but it doesn't always seem to be called).

And potentially those reads from above could be delayed (or repeated) 
after this point.

But you're missing something.  What matters isn't the dma_wmb.  Rather, 
it's the call which transfers ownership of the buffer to the device.  
That call must certainly include its own memory barrier, meaning that 
the reads must complete before the call returns.  We don't depend on a 
dma_wmb which might or might not be present to enforce this ordering.

Unless this buffer mapping is supposed to be coherent, of course, in 
which case there would be no ownership transfers.

> 3.	e1000_clean_tx_irq() is called on another CPU after transmit
> 	completes, and we know the device has consumed that entry from
> 	the ring buffer. At this point the driver then says that the
> 	associated skb can be kfree()'d.
> 
> 4.	If I interpreted dma_wmb() (and smp_wmb()) right, plain reads
> 	may be reordered after it, irrespective if a write that depended
> 	on such reads was ordered by the wmb(). Which means the
> 	reordering of the plain reads accessing the skb before it may in
> 	fact happen concurrently with the kfree() of skb if reordered
> 	after. For example reordered to the very end of
> 	e1000_xmit_frame() (line 3282) as KCSAN simulated in this case.
> 
> Is the above result allowed by the kernel's memory model?

This can't happen, for the reason explained above, if the buffer is 
non-coherent.  But if the DMA mapping is coherent, this does sound like 
a bug.

> In practice, my guess is no compiler and architecture combination would
> allow this today; or is there an arch where it could?

Probably not; reordering of reads tends to take place over time 
scales a lot shorter than lengthy I/O operations.

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-16 14:59 ` Alan Stern
@ 2021-08-16 17:23   ` Marco Elver
  2021-08-16 19:21     ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Marco Elver @ 2021-08-16 17:23 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Boqun Feng, Andrea Parri, Will Deacon,
	Mark Rutland, Dmitry Vyukov, kasan-dev, linux-kernel

On Mon, Aug 16, 2021 at 10:59AM -0400, Alan Stern wrote:
[...]
> > One caveat is the case I'm trying to understand doesn't involve just 2
> > CPUs but also a device. And for now, I'm assuming that dma_wmb() is as
> > strong as smp_wmb() also wrt other CPUs (but my guess is this
> > assumption is already too strong).
> 
> I'm not sure that is right.  dma_wmb affects the visibility of writes to 
> a DMA buffer from the point of view of the device, not necessarily from 
> the point of view of other CPUs.  At least, there doesn't seem to be any 
> claim in memory-barriers.txt that it does so.

Thanks, I thought so.

While I could just not instrument dma_*mb() at all, because KCSAN
obviously can't instrument what devices do, I wonder if the resulting
reports are at all interesting.

For example, if I do not make the assumption that dma_wmb==smp_smb, and
don't instrument dma_*mb() at all, I also get racy UAF reordered writes:
I could imagine some architecture where dma_wmb() propagates the write
to devices from CPU 0; but CPU 1 then does the kfree(), reallocates,
reuses the data, but then gets its data overwritten by CPU 0.

What would be more useful?

1. Let the architecture decide how they want KCSAN to instrument non-smp
   barriers, given it's underspecified. This means KCSAN would report
   different races on different architectures, but keep the noise down.

2. Assume the weakest possible model, where non-smp barriers just do
   nothing wrt other CPUs.

> > The whole area of the memory model that includes talking to devices and
> > devices talking back to CPUs seems quite murky, and need to confirm that
> > I either got it right or wrong. :-)
> 
> The LKMM itself doesn't include anything about device I/O.  So you're 
> already going beyond the known limits.  :-)
> 
> ...
> 
> > KCSAN is saying there is a potential use-after-free read of an skb due
> > to the read to 0xffff8880077b5570 potentially being delayed/reordered
> > later. If the memory was reallocated and reused concurrently, the read
> > could read garbage data:
> > 
> > 1.	The e1000 driver is being instructed to transmit in
> > 	e1000_xmit_frame(). Here it uses the data in the skb in various
> > 	places (e.g. in skb_headlen() above) to set up a new element in
> > 	the ring buffer to be consumed by the device via DMA.
> 
> You mean here the driver reads some stuff from the skb, right?  And 
> various writes depend on the data that was read, but these dependencies 
> aren't evident to the memory model because they all involve plain 
> accesses.

Yes, correct.

> > 2.	Eventually it calls e1000_tx_queue(), which seems to publish the
> > 	next entry into the ring buffer and finally calls dma_wmb().
> > 	Until this point I see no other barriers (although there's a
> > 	writel(), but it doesn't always seem to be called).
> 
> And potentially those reads from above could be delayed (or repeated) 
> after this point.
> 
> But you're missing something.  What matters isn't the dma_wmb.  Rather, 
> it's the call which transfers ownership of the buffer to the device.  
> That call must certainly include its own memory barrier, meaning that 
> the reads must complete before the call returns.  We don't depend on a 
> dma_wmb which might or might not be present to enforce this ordering.
> 
> Unless this buffer mapping is supposed to be coherent, of course, in 
> which case there would be no ownership transfers.

I think it's coherent:

	txdr->desc = dma_alloc_coherent(&pdev->dev, txdr->size, &txdr->dma,
					GFP_KERNEL);

and then in:

	static void e1000_tx_queue(...)
	{
		... writes to desc ...
		/* Force memory writes to complete before letting h/w
		 * know there are new descriptors to fetch.  (Only
		 * applicable for weak-ordered memory model archs,
		 * such as IA-64).
		 */
		dma_wmb();
		tx_ring->next_to_use = i;
	}

used by

	static netdev_tx_t e1000_xmit_frame(...)
	{
		...
		e1000_tx_queue(..., tx_ring, ...);
		...
		if (!netdev_xmit_more() ||
		    netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
			writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
		}
		...
	}

My best guess: as long as the device is fetching from the ring, the
driver can just append to it and does not do the writel().

> > 3.	e1000_clean_tx_irq() is called on another CPU after transmit
> > 	completes, and we know the device has consumed that entry from
> > 	the ring buffer. At this point the driver then says that the
> > 	associated skb can be kfree()'d.
> > 
> > 4.	If I interpreted dma_wmb() (and smp_wmb()) right, plain reads
> > 	may be reordered after it, irrespective if a write that depended
> > 	on such reads was ordered by the wmb(). Which means the
> > 	reordering of the plain reads accessing the skb before it may in
> > 	fact happen concurrently with the kfree() of skb if reordered
> > 	after. For example reordered to the very end of
> > 	e1000_xmit_frame() (line 3282) as KCSAN simulated in this case.
> > 
> > Is the above result allowed by the kernel's memory model?
> 
> This can't happen, for the reason explained above, if the buffer is 
> non-coherent.  But if the DMA mapping is coherent, this does sound like 
> a bug.

Makes sense.

> > In practice, my guess is no compiler and architecture combination would
> > allow this today; or is there an arch where it could?
> 
> Probably not; reordering of reads tends to take place over time 
> scales a lot shorter than lengthy I/O operations.

Which might be an argument to make KCSAN's non-smp barrier
instrumentation arch-dependent, because some drivers might in fact be
written with some target architectures and their properties in mind. At
least it would help keep the noise down, and those architecture that
want to see such races certainly still could.

Any preferences?

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-16 17:23   ` Marco Elver
@ 2021-08-16 19:21     ` Alan Stern
  2021-08-16 20:50       ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2021-08-16 19:21 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Boqun Feng, Andrea Parri, Will Deacon,
	Mark Rutland, Dmitry Vyukov, kasan-dev, linux-kernel

On Mon, Aug 16, 2021 at 07:23:51PM +0200, Marco Elver wrote:
> On Mon, Aug 16, 2021 at 10:59AM -0400, Alan Stern wrote:
> [...]
> > > One caveat is the case I'm trying to understand doesn't involve just 2
> > > CPUs but also a device. And for now, I'm assuming that dma_wmb() is as
> > > strong as smp_wmb() also wrt other CPUs (but my guess is this
> > > assumption is already too strong).
> > 
> > I'm not sure that is right.  dma_wmb affects the visibility of writes to 
> > a DMA buffer from the point of view of the device, not necessarily from 
> > the point of view of other CPUs.  At least, there doesn't seem to be any 
> > claim in memory-barriers.txt that it does so.
> 
> Thanks, I thought so.
> 
> While I could just not instrument dma_*mb() at all, because KCSAN
> obviously can't instrument what devices do, I wonder if the resulting
> reports are at all interesting.
> 
> For example, if I do not make the assumption that dma_wmb==smp_smb, and
> don't instrument dma_*mb() at all, I also get racy UAF reordered writes:
> I could imagine some architecture where dma_wmb() propagates the write
> to devices from CPU 0; but CPU 1 then does the kfree(), reallocates,
> reuses the data, but then gets its data overwritten by CPU 0.

Access ordering of devices is difficult to describe.  How do you tell a 
memory model (either a theoretical one or one embedded in code like 
KCSAN) that a particular interrupt handler routine can't be called until 
after a particular write has enabled the device to generate an IRQ?

In the case you mention, how do you tell the memory model that the code 
on CPU 1 can't run until after CPU 0 has executed a particular write, one 
which is forced by some memory barrier to occur _after_ all the potential 
overwrites its worried about?

> What would be more useful?
> 
> 1. Let the architecture decide how they want KCSAN to instrument non-smp
>    barriers, given it's underspecified. This means KCSAN would report
>    different races on different architectures, but keep the noise down.
> 
> 2. Assume the weakest possible model, where non-smp barriers just do
>    nothing wrt other CPUs.

I don't think either of those would work out very well.  The problem 
isn't how you handle the non-smp barriers; the problem is how you 
describe to the memory model the way devices behave.

...

> > > In practice, my guess is no compiler and architecture combination would
> > > allow this today; or is there an arch where it could?
> > 
> > Probably not; reordering of reads tends to take place over time 
> > scales a lot shorter than lengthy I/O operations.
> 
> Which might be an argument to make KCSAN's non-smp barrier
> instrumentation arch-dependent, because some drivers might in fact be
> written with some target architectures and their properties in mind. At
> least it would help keep the noise down, and those architecture that
> want to see such races certainly still could.
> 
> Any preferences?

I'm not a good person to ask; I have never used KCSAN.  However...

While some drivers are indeed written for particular architectures or 
systems, I doubt that they rely very heavily on the special properties of 
their target architectures/systems to avoid races.  Rather, they rely on 
the hardware to behave correctly, just as non-arch-specific drivers do.

Furthermore, the kernel tries pretty hard to factor out arch-specific 
synchronization mechanisms and related concepts into general-purpose 
abstractions (in the way that smp_mb() is generally available but is 
defined differently for different architectures, for example).  Drivers 
tend to rely on these abstractions rather than on the arch-specific 
properties directly.

In short, trying to make KCSAN's handling of device I/O into something 
arch-specific doesn't seem (to me) like a particular advantageous 
approach.  Other people are likely to have different opinions.

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-16 19:21     ` Alan Stern
@ 2021-08-16 20:50       ` Paul E. McKenney
  2021-08-17 12:14         ` Marco Elver
  2021-08-17 12:28         ` Will Deacon
  0 siblings, 2 replies; 11+ messages in thread
From: Paul E. McKenney @ 2021-08-16 20:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: Marco Elver, Boqun Feng, Andrea Parri, Will Deacon, Mark Rutland,
	Dmitry Vyukov, kasan-dev, linux-kernel

On Mon, Aug 16, 2021 at 03:21:09PM -0400, Alan Stern wrote:
> On Mon, Aug 16, 2021 at 07:23:51PM +0200, Marco Elver wrote:
> > On Mon, Aug 16, 2021 at 10:59AM -0400, Alan Stern wrote:
> > [...]
> > > > One caveat is the case I'm trying to understand doesn't involve just 2
> > > > CPUs but also a device. And for now, I'm assuming that dma_wmb() is as
> > > > strong as smp_wmb() also wrt other CPUs (but my guess is this
> > > > assumption is already too strong).
> > > 
> > > I'm not sure that is right.  dma_wmb affects the visibility of writes to 
> > > a DMA buffer from the point of view of the device, not necessarily from 
> > > the point of view of other CPUs.  At least, there doesn't seem to be any 
> > > claim in memory-barriers.txt that it does so.
> > 
> > Thanks, I thought so.
> > 
> > While I could just not instrument dma_*mb() at all, because KCSAN
> > obviously can't instrument what devices do, I wonder if the resulting
> > reports are at all interesting.
> > 
> > For example, if I do not make the assumption that dma_wmb==smp_smb, and
> > don't instrument dma_*mb() at all, I also get racy UAF reordered writes:
> > I could imagine some architecture where dma_wmb() propagates the write
> > to devices from CPU 0; but CPU 1 then does the kfree(), reallocates,
> > reuses the data, but then gets its data overwritten by CPU 0.
> 
> Access ordering of devices is difficult to describe.  How do you tell a 
> memory model (either a theoretical one or one embedded in code like 
> KCSAN) that a particular interrupt handler routine can't be called until 
> after a particular write has enabled the device to generate an IRQ?
> 
> In the case you mention, how do you tell the memory model that the code 
> on CPU 1 can't run until after CPU 0 has executed a particular write, one 
> which is forced by some memory barrier to occur _after_ all the potential 
> overwrites its worried about?

What Alan said on the difficulty!

However, KCSAN has the advantage of not needing to specify the outcomes,
which is much of the complexity.  For LKMM to do a good job of handling
devices, we would need a model of each device(!).

> > What would be more useful?
> > 
> > 1. Let the architecture decide how they want KCSAN to instrument non-smp
> >    barriers, given it's underspecified. This means KCSAN would report
> >    different races on different architectures, but keep the noise down.
> > 
> > 2. Assume the weakest possible model, where non-smp barriers just do
> >    nothing wrt other CPUs.
> 
> I don't think either of those would work out very well.  The problem 
> isn't how you handle the non-smp barriers; the problem is how you 
> describe to the memory model the way devices behave.

There are some architecture-independent ordering guarantees for MMIO
which go something like this:

0.	MMIO readX() and writeX() accesses to the same device are
	implicitly ordered, whether relaxed or not.

1.	Locking partitions non-relaxed MMIO accesses in the manner that
	you would expect.  For example, if CPU 0 does an MMIO write,
	then releases a lock, and later CPU 1 acquires that same lock and
	does an MMIO read, CPU 0's MMIO write is guaranteed to happen
	before CPU 1's MMIO read.  PowerPC has to jump through a few
	hoops to make this happen.

	Relaxed MMIO accesses such as readb_relaxed() can be reordered
	with locking primitives on some architectures.

2.	smp_*() memory barriers are not guaranteed to affect MMIO
	accesses, especially not in kernels built with CONFIG_SMP=n.

3.	The mb() memory barrier is required to order prior MMIO
	accesses against subsequent MMIO accesses.  The wmb() and rmb()
	memory barriers are required to order prior order prior MMIO
	write/reads against later MMIO writes/reads, respectively.
	These memory barriers also order normal memory accesses in
	the same way as their smp_*() counterparts.

4.	The mmiowb() memory barrier can be slightly weaker than wmb(),
	as it is in ia64, but I have lost track of the details.

5.	The dma_mb(), dma_rmb(), and dma_wmb() appear to be specific
	to ARMv8.

6.	Non-relaxed MMIO writeX() accesses force ordering of prior
	normal memory writes before any DMA initiated by the writeX().

7.	Non-relaxed MMIO readX() accesses force ordering of later
	normal memory reads after any DMA whose completion is reported
	by the readX().  These readX() accesses are also ordered before
	any subsequent delay loops.

Some more detail is available in memory-barriers.txt and in this LWN
article:  https://lwn.net/Articles/698014/

I wish I could promise you that these are both fully up to date, but
it is almost certain that updates are needed.

> ...
> 
> > > > In practice, my guess is no compiler and architecture combination would
> > > > allow this today; or is there an arch where it could?
> > > 
> > > Probably not; reordering of reads tends to take place over time 
> > > scales a lot shorter than lengthy I/O operations.
> > 
> > Which might be an argument to make KCSAN's non-smp barrier
> > instrumentation arch-dependent, because some drivers might in fact be
> > written with some target architectures and their properties in mind. At
> > least it would help keep the noise down, and those architecture that
> > want to see such races certainly still could.
> > 
> > Any preferences?
> 
> I'm not a good person to ask; I have never used KCSAN.  However...
> 
> While some drivers are indeed written for particular architectures or 
> systems, I doubt that they rely very heavily on the special properties of 
> their target architectures/systems to avoid races.  Rather, they rely on 
> the hardware to behave correctly, just as non-arch-specific drivers do.
> 
> Furthermore, the kernel tries pretty hard to factor out arch-specific 
> synchronization mechanisms and related concepts into general-purpose 
> abstractions (in the way that smp_mb() is generally available but is 
> defined differently for different architectures, for example).  Drivers 
> tend to rely on these abstractions rather than on the arch-specific 
> properties directly.
> 
> In short, trying to make KCSAN's handling of device I/O into something 
> arch-specific doesn't seem (to me) like a particular advantageous 
> approach.  Other people are likely to have different opinions.

No preconceived notions here, at least not on this topic.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-16 20:50       ` Paul E. McKenney
@ 2021-08-17 12:14         ` Marco Elver
  2021-08-17 12:28         ` Will Deacon
  1 sibling, 0 replies; 11+ messages in thread
From: Marco Elver @ 2021-08-17 12:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Alan Stern, Boqun Feng, Andrea Parri, Will Deacon, Mark Rutland,
	Dmitry Vyukov, kasan-dev, linux-kernel

On Mon, Aug 16, 2021 at 01:50PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 16, 2021 at 03:21:09PM -0400, Alan Stern wrote:
[...]
> > Access ordering of devices is difficult to describe.  How do you tell a 
> > memory model (either a theoretical one or one embedded in code like 
> > KCSAN) that a particular interrupt handler routine can't be called until 
> > after a particular write has enabled the device to generate an IRQ?
> > 
> > In the case you mention, how do you tell the memory model that the code 
> > on CPU 1 can't run until after CPU 0 has executed a particular write, one 
> > which is forced by some memory barrier to occur _after_ all the potential 
> > overwrites its worried about?
> 
> What Alan said on the difficulty!
> 
> However, KCSAN has the advantage of not needing to specify the outcomes,
> which is much of the complexity.  For LKMM to do a good job of handling
> devices, we would need a model of each device(!).

For full models, like the formal LKMM, I agree it's extremely difficult!

KCSAN has the advantage that it's a "dynamic analysis" tool, relying on
merely instrumenting real code and running on the real HW. The real HW
is still in charge of generating interrupts, and real devices (like that
E1000 device, though in this case virtualized by QEMU) aren't in any way
abstracted or modeled.

KCSAN's (and any other sanitizer's) primary goals is to just _detect_
certain classes of bugs by making these detectable via instrumentation
but otherwise run real code and HW.

Thus far, for KCSAN this has been trivial because all it does is keep an
eye on reads and writes, and observes if accesses race; and then, per
rules for data races (it needs to know about plain and marked accesses),
it decides if something is a reportable data race.

The real HW is entirely in charge of when and if something executes
concurrently.

One problem with instrumentation, however, is that it adds certain
overheads which make some effects of the hardware very unlikely to
observe. For example, the effects of weak memory. Therefore, I'm
teaching KCSAN a limited set of weak memory effects allowed by the LKMM
by pretending the current CPU reordered an access (currently just
"load/store buffering").

To avoid false positives, however, the tool now has to know about memory
barriers, otherwise it might simulate reordering too aggressively.

Because KCSAN relies on compiler instrumentation, we are simply limited
to analyzing what is happening on CPUs, but devices are invisible, and
just observe what happens as a result on other CPUs if a device is
involved.

The case with E1000 and dma_wmb() came about because KCSAN is now able
to detect races between 2 CPUs because dma_wmb() doesn't seem to say
anything about ordering among CPUs.

The main points are:

1. KCSAN doesn't need a model for devices because it's still running on
   real HW with real devices that are in charge of generating interrupts.

2. In the case with the E1000 driver, a real device causes CPU 1 to run
   the interrupt, which does a free to memory that might still be read/written
   to if CPU 0 reordered its accesses (simulated by KCSAN). That reordering
   can be inhibited by the right barrier, but we haven't found it in the
   code yet. At least the dma_wmb() isn't required to order the writes
   between the 2 CPUs (AFAIK).

> > > What would be more useful?
> > > 
> > > 1. Let the architecture decide how they want KCSAN to instrument non-smp
> > >    barriers, given it's underspecified. This means KCSAN would report
> > >    different races on different architectures, but keep the noise down.
> > > 
> > > 2. Assume the weakest possible model, where non-smp barriers just do
> > >    nothing wrt other CPUs.
> > 
> > I don't think either of those would work out very well.  The problem 
> > isn't how you handle the non-smp barriers; the problem is how you 
> > describe to the memory model the way devices behave.
> 
> There are some architecture-independent ordering guarantees for MMIO
> which go something like this:
> 
> 0.	MMIO readX() and writeX() accesses to the same device are
> 	implicitly ordered, whether relaxed or not.
> 
> 1.	Locking partitions non-relaxed MMIO accesses in the manner that
> 	you would expect.  For example, if CPU 0 does an MMIO write,
> 	then releases a lock, and later CPU 1 acquires that same lock and
> 	does an MMIO read, CPU 0's MMIO write is guaranteed to happen
> 	before CPU 1's MMIO read.  PowerPC has to jump through a few
> 	hoops to make this happen.
> 
> 	Relaxed MMIO accesses such as readb_relaxed() can be reordered
> 	with locking primitives on some architectures.
> 
> 2.	smp_*() memory barriers are not guaranteed to affect MMIO
> 	accesses, especially not in kernels built with CONFIG_SMP=n.
> 
> 3.	The mb() memory barrier is required to order prior MMIO
> 	accesses against subsequent MMIO accesses.  The wmb() and rmb()
> 	memory barriers are required to order prior order prior MMIO
> 	write/reads against later MMIO writes/reads, respectively.
> 	These memory barriers also order normal memory accesses in
> 	the same way as their smp_*() counterparts.
> 
> 4.	The mmiowb() memory barrier can be slightly weaker than wmb(),
> 	as it is in ia64, but I have lost track of the details.
> 
> 5.	The dma_mb(), dma_rmb(), and dma_wmb() appear to be specific
> 	to ARMv8.
> 
> 6.	Non-relaxed MMIO writeX() accesses force ordering of prior
> 	normal memory writes before any DMA initiated by the writeX().
> 
> 7.	Non-relaxed MMIO readX() accesses force ordering of later
> 	normal memory reads after any DMA whose completion is reported
> 	by the readX().  These readX() accesses are also ordered before
> 	any subsequent delay loops.
> 
> Some more detail is available in memory-barriers.txt and in this LWN
> article:  https://lwn.net/Articles/698014/
> 
> I wish I could promise you that these are both fully up to date, but
> it is almost certain that updates are needed.

Thanks, that's useful. What I can tell is that most I/O ops and barriers
have no effect on other CPUs (except for mb() etc.). For KCSAN that's
all that matters.

[...]
> > > Which might be an argument to make KCSAN's non-smp barrier
> > > instrumentation arch-dependent, because some drivers might in fact be
> > > written with some target architectures and their properties in mind. At
> > > least it would help keep the noise down, and those architecture that
> > > want to see such races certainly still could.
> > > 
> > > Any preferences?
> > 
> > I'm not a good person to ask; I have never used KCSAN.  However...
> > 
> > While some drivers are indeed written for particular architectures or 
> > systems, I doubt that they rely very heavily on the special properties of 
> > their target architectures/systems to avoid races.  Rather, they rely on 
> > the hardware to behave correctly, just as non-arch-specific drivers do.
> > 
> > Furthermore, the kernel tries pretty hard to factor out arch-specific 
> > synchronization mechanisms and related concepts into general-purpose 
> > abstractions (in the way that smp_mb() is generally available but is 
> > defined differently for different architectures, for example).  Drivers 
> > tend to rely on these abstractions rather than on the arch-specific 
> > properties directly.
> > 
> > In short, trying to make KCSAN's handling of device I/O into something 
> > arch-specific doesn't seem (to me) like a particular advantageous 
> > approach.  Other people are likely to have different opinions.

As explained above, KCSAN just instruments C code but still runs on real
HW with real devices. All I'm trying to figure out is what I/O ops and
barriers say about making accesses visible to other CPUs to avoid false
positives.

However, it seems by this discussing I'm starting to conclude that the
E1000 race might in fact be something allowed, although very unlikely.

The main question I was trying to answer is "should such cases be
reported or not?", since KCSAN's goal is not to model the system
faithfully, but to detect bugs. Either way is possible, and I don't have
a preference. I'm leaning towards "no assumptions, report everything"
now, because the "access reordering" mode won't be enabled by default.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-16 20:50       ` Paul E. McKenney
  2021-08-17 12:14         ` Marco Elver
@ 2021-08-17 12:28         ` Will Deacon
  2021-08-17 13:27           ` Marco Elver
  2021-08-17 13:53           ` Paul E. McKenney
  1 sibling, 2 replies; 11+ messages in thread
From: Will Deacon @ 2021-08-17 12:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Alan Stern, Marco Elver, Boqun Feng, Andrea Parri, Mark Rutland,
	Dmitry Vyukov, kasan-dev, linux-kernel

Just on this bit...

On Mon, Aug 16, 2021 at 01:50:57PM -0700, Paul E. McKenney wrote:
> 5.	The dma_mb(), dma_rmb(), and dma_wmb() appear to be specific
> 	to ARMv8.

These are useful on other architectures too! IIRC, they were added by x86 in
the first place. They're designed to be used with dma_alloc_coherent()
allocations where you're sharing something like a ring buffer with a device
and they guarantee accesses won't be reordered before they become visible
to the device. They _also_ provide the same ordering to other CPUs.

I gave a talk at LPC about some of this, which might help (or might make
things worse...):

https://www.youtube.com/watch?v=i6DayghhA8Q

Ignore the bits about mmiowb() as we got rid of that.

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-17 12:28         ` Will Deacon
@ 2021-08-17 13:27           ` Marco Elver
  2021-08-17 13:53           ` Paul E. McKenney
  1 sibling, 0 replies; 11+ messages in thread
From: Marco Elver @ 2021-08-17 13:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paul E. McKenney, Alan Stern, Boqun Feng, Andrea Parri,
	Mark Rutland, Dmitry Vyukov, kasan-dev, linux-kernel

On Tue, 17 Aug 2021 at 14:28, Will Deacon <will@kernel.org> wrote:
> Just on this bit...
>
> On Mon, Aug 16, 2021 at 01:50:57PM -0700, Paul E. McKenney wrote:
> > 5.    The dma_mb(), dma_rmb(), and dma_wmb() appear to be specific
> >       to ARMv8.
>
> These are useful on other architectures too! IIRC, they were added by x86 in
> the first place. They're designed to be used with dma_alloc_coherent()
> allocations where you're sharing something like a ring buffer with a device
> and they guarantee accesses won't be reordered before they become visible
> to the device. They _also_ provide the same ordering to other CPUs.

Ah, good you pointed it out again. Re-reading memory-barriers.txt and
it does also say these provide order for other CPUs...

> I gave a talk at LPC about some of this, which might help (or might make
> things worse...):
>
> https://www.youtube.com/watch?v=i6DayghhA8Q

Nice, thank you!

> Ignore the bits about mmiowb() as we got rid of that.
>
> Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-17 12:28         ` Will Deacon
  2021-08-17 13:27           ` Marco Elver
@ 2021-08-17 13:53           ` Paul E. McKenney
  2021-08-18 11:39             ` Will Deacon
  1 sibling, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2021-08-17 13:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alan Stern, Marco Elver, Boqun Feng, Andrea Parri, Mark Rutland,
	Dmitry Vyukov, kasan-dev, linux-kernel

On Tue, Aug 17, 2021 at 01:28:16PM +0100, Will Deacon wrote:
> Just on this bit...
> 
> On Mon, Aug 16, 2021 at 01:50:57PM -0700, Paul E. McKenney wrote:
> > 5.	The dma_mb(), dma_rmb(), and dma_wmb() appear to be specific
> > 	to ARMv8.
> 
> These are useful on other architectures too! IIRC, they were added by x86 in
> the first place. They're designed to be used with dma_alloc_coherent()
> allocations where you're sharing something like a ring buffer with a device
> and they guarantee accesses won't be reordered before they become visible
> to the device. They _also_ provide the same ordering to other CPUs.
> 
> I gave a talk at LPC about some of this, which might help (or might make
> things worse...):
> 
> https://www.youtube.com/watch?v=i6DayghhA8Q

The slides are here, correct?  Nice summary and examples!

https://elinux.org/images/a/a8/Uh-oh-Its-IO-Ordering-Will-Deacon-Arm.pdf

And this is all I see for dma_mb():

arch/arm64/include/asm/barrier.h:#define dma_mb()	dmb(osh)
arch/arm64/include/asm/io.h:#define __iomb()		dma_mb()

And then for __iomb():

arch/arm64/include/asm/io.h:#define __iomb()		dma_mb()
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:	__iomb();

But yes, dma_rmb() and dma_wmb() do look to have a few hundred uses
between them, and not just within ARMv8.  I gave up too soon, so
thank you!

> Ignore the bits about mmiowb() as we got rid of that.

Should the leftovers in current mainline be replaced by wmb()?  Or are
patches to that effect on their way in somewhere?

$ git grep 'mmiowb()'
arch/ia64/include/asm/mmiowb.h:#define mmiowb()	ia64_mfa()
arch/ia64/include/asm/spinlock.h:	mmiowb();
arch/mips/include/asm/mmiowb.h:#define mmiowb()	iobarrier_w()
arch/mips/include/asm/spinlock.h:	mmiowb();
arch/mips/kernel/gpio_txx9.c:	mmiowb();
arch/mips/kernel/gpio_txx9.c:	mmiowb();
arch/mips/kernel/gpio_txx9.c:	mmiowb();
arch/mips/kernel/irq_txx9.c:	mmiowb();
arch/mips/loongson2ef/common/bonito-irq.c:	mmiowb();
arch/mips/loongson2ef/common/bonito-irq.c:	mmiowb();
arch/mips/loongson2ef/common/mem.c:		mmiowb();
arch/mips/loongson2ef/common/pm.c:	mmiowb();
arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
arch/mips/pci/ops-bonito64.c:	mmiowb();
arch/mips/pci/ops-loongson2.c:	mmiowb();
arch/mips/txx9/generic/irq_tx4939.c:	mmiowb();
arch/mips/txx9/generic/setup.c:	mmiowb();
arch/mips/txx9/rbtx4927/irq.c:	mmiowb();
arch/mips/txx9/rbtx4938/irq.c:	mmiowb();
arch/mips/txx9/rbtx4938/irq.c:	mmiowb();
arch/mips/txx9/rbtx4938/setup.c:	mmiowb();
arch/mips/txx9/rbtx4939/irq.c:	mmiowb();
arch/powerpc/include/asm/mmiowb.h:#define mmiowb()		mb()
arch/riscv/include/asm/mmiowb.h:#define mmiowb()	__asm__ __volatile__ ("fence o,w" : : : "memory");
arch/s390/include/asm/io.h:#define mmiowb()	zpci_barrier()
arch/sh/include/asm/mmiowb.h:#define mmiowb()			wmb()
arch/sh/include/asm/spinlock-llsc.h:	mmiowb();
include/asm-generic/mmiowb.h: * Generic implementation of mmiowb() tracking for spinlocks.
include/asm-generic/mmiowb.h: * 	1. Implement mmiowb() (and arch_mmiowb_state() if you're fancy)
include/asm-generic/mmiowb.h:		mmiowb();

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-17 13:53           ` Paul E. McKenney
@ 2021-08-18 11:39             ` Will Deacon
  2021-08-18 23:17               ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Will Deacon @ 2021-08-18 11:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Alan Stern, Marco Elver, Boqun Feng, Andrea Parri, Mark Rutland,
	Dmitry Vyukov, kasan-dev, linux-kernel

Hi Paul.

On Tue, Aug 17, 2021 at 06:53:08AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 17, 2021 at 01:28:16PM +0100, Will Deacon wrote:
> > Just on this bit...
> > 
> > On Mon, Aug 16, 2021 at 01:50:57PM -0700, Paul E. McKenney wrote:
> > > 5.	The dma_mb(), dma_rmb(), and dma_wmb() appear to be specific
> > > 	to ARMv8.
> > 
> > These are useful on other architectures too! IIRC, they were added by x86 in
> > the first place. They're designed to be used with dma_alloc_coherent()
> > allocations where you're sharing something like a ring buffer with a device
> > and they guarantee accesses won't be reordered before they become visible
> > to the device. They _also_ provide the same ordering to other CPUs.
> > 
> > I gave a talk at LPC about some of this, which might help (or might make
> > things worse...):
> > 
> > https://www.youtube.com/watch?v=i6DayghhA8Q
> 
> The slides are here, correct?  Nice summary and examples!
> 
> https://elinux.org/images/a/a8/Uh-oh-Its-IO-Ordering-Will-Deacon-Arm.pdf

Yes, that looks like them. I've also put them up here:

https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/slides/elce-2018.pdf

(turns out it was ELCE not LPC!)

> And this is all I see for dma_mb():
> 
> arch/arm64/include/asm/barrier.h:#define dma_mb()	dmb(osh)
> arch/arm64/include/asm/io.h:#define __iomb()		dma_mb()
> 
> And then for __iomb():
> 
> arch/arm64/include/asm/io.h:#define __iomb()		dma_mb()
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:	__iomb();
> 
> But yes, dma_rmb() and dma_wmb() do look to have a few hundred uses
> between them, and not just within ARMv8.  I gave up too soon, so
> thank you!

No problem, and yes, dma_mb() is an arm64-internal thing which we should
probably rename.

> > Ignore the bits about mmiowb() as we got rid of that.
> 
> Should the leftovers in current mainline be replaced by wmb()?  Or are
> patches to that effect on their way in somewhere?

I already got rid of the non-arch usage of mmiowb(), but I wasn't bravei
enough to change the arch code as it may well be that they're relying on
some specific instruction semantics.

Despite my earlier comment, mmiowb() still exists, but only as a part of
ARCH_HAS_MMIOWB where it is used to add additional spinlock ordering so
that the rest of the kernel doesn't need to use mmiowb() at all.

So I suppose for these:

> arch/mips/kernel/gpio_txx9.c:	mmiowb();
> arch/mips/kernel/gpio_txx9.c:	mmiowb();
> arch/mips/kernel/gpio_txx9.c:	mmiowb();
> arch/mips/kernel/irq_txx9.c:	mmiowb();
> arch/mips/loongson2ef/common/bonito-irq.c:	mmiowb();
> arch/mips/loongson2ef/common/bonito-irq.c:	mmiowb();
> arch/mips/loongson2ef/common/mem.c:		mmiowb();
> arch/mips/loongson2ef/common/pm.c:	mmiowb();
> arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> arch/mips/pci/ops-bonito64.c:	mmiowb();
> arch/mips/pci/ops-loongson2.c:	mmiowb();
> arch/mips/txx9/generic/irq_tx4939.c:	mmiowb();
> arch/mips/txx9/generic/setup.c:	mmiowb();
> arch/mips/txx9/rbtx4927/irq.c:	mmiowb();
> arch/mips/txx9/rbtx4938/irq.c:	mmiowb();
> arch/mips/txx9/rbtx4938/irq.c:	mmiowb();
> arch/mips/txx9/rbtx4938/setup.c:	mmiowb();
> arch/mips/txx9/rbtx4939/irq.c:	mmiowb();

we could replace mmiowb() with iobarrier_w().

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LKMM: Read dependencies of writes ordered by dma_wmb()?
  2021-08-18 11:39             ` Will Deacon
@ 2021-08-18 23:17               ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2021-08-18 23:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alan Stern, Marco Elver, Boqun Feng, Andrea Parri, Mark Rutland,
	Dmitry Vyukov, kasan-dev, linux-kernel

On Wed, Aug 18, 2021 at 12:39:36PM +0100, Will Deacon wrote:
> Hi Paul.

Hello, Will,

> On Tue, Aug 17, 2021 at 06:53:08AM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 17, 2021 at 01:28:16PM +0100, Will Deacon wrote:

[ . . . ]

> > > Ignore the bits about mmiowb() as we got rid of that.
> > 
> > Should the leftovers in current mainline be replaced by wmb()?  Or are
> > patches to that effect on their way in somewhere?
> 
> I already got rid of the non-arch usage of mmiowb(), but I wasn't bravei
> enough to change the arch code as it may well be that they're relying on
> some specific instruction semantics.
> 
> Despite my earlier comment, mmiowb() still exists, but only as a part of
> ARCH_HAS_MMIOWB where it is used to add additional spinlock ordering so
> that the rest of the kernel doesn't need to use mmiowb() at all.
> 
> So I suppose for these:
> 
> > arch/mips/kernel/gpio_txx9.c:	mmiowb();
> > arch/mips/kernel/gpio_txx9.c:	mmiowb();
> > arch/mips/kernel/gpio_txx9.c:	mmiowb();
> > arch/mips/kernel/irq_txx9.c:	mmiowb();
> > arch/mips/loongson2ef/common/bonito-irq.c:	mmiowb();
> > arch/mips/loongson2ef/common/bonito-irq.c:	mmiowb();
> > arch/mips/loongson2ef/common/mem.c:		mmiowb();
> > arch/mips/loongson2ef/common/pm.c:	mmiowb();
> > arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> > arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> > arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> > arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> > arch/mips/loongson2ef/lemote-2f/reset.c:	mmiowb();
> > arch/mips/pci/ops-bonito64.c:	mmiowb();
> > arch/mips/pci/ops-loongson2.c:	mmiowb();
> > arch/mips/txx9/generic/irq_tx4939.c:	mmiowb();
> > arch/mips/txx9/generic/setup.c:	mmiowb();
> > arch/mips/txx9/rbtx4927/irq.c:	mmiowb();
> > arch/mips/txx9/rbtx4938/irq.c:	mmiowb();
> > arch/mips/txx9/rbtx4938/irq.c:	mmiowb();
> > arch/mips/txx9/rbtx4938/setup.c:	mmiowb();
> > arch/mips/txx9/rbtx4939/irq.c:	mmiowb();
> 
> we could replace mmiowb() with iobarrier_w().

Not having MIPS hardware at my disposal, I will leave these to those
who do.  I would suggest adding iobarrier_*() to memory-barriers.txt,
but they appear to be specific to MIPS and PowerPC.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-18 23:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16 10:12 LKMM: Read dependencies of writes ordered by dma_wmb()? Marco Elver
2021-08-16 14:59 ` Alan Stern
2021-08-16 17:23   ` Marco Elver
2021-08-16 19:21     ` Alan Stern
2021-08-16 20:50       ` Paul E. McKenney
2021-08-17 12:14         ` Marco Elver
2021-08-17 12:28         ` Will Deacon
2021-08-17 13:27           ` Marco Elver
2021-08-17 13:53           ` Paul E. McKenney
2021-08-18 11:39             ` Will Deacon
2021-08-18 23:17               ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).