All of lore.kernel.org
 help / color / mirror / Atom feed
* Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
@ 2022-06-06 15:21 Will Deacon
  2022-06-06 15:35 ` Russell King (Oracle)
  2022-06-08  8:48 ` Christoph Hellwig
  0 siblings, 2 replies; 13+ messages in thread
From: Will Deacon @ 2022-06-06 15:21 UTC (permalink / raw)
  To: linux-arch
  Cc: catalin.marinas, maz, mark.rutland, robin.murphy, hch, vgupta,
	linux, arnd, bcain, geert, monstr, dinguyen, shorne, mpe, dalias

[CC'ing a bunch of arch maintainers who are not "obviously" immune to
 this issue; please can you check your archs?]

Hi folks,

Some off-list discussions about the arm64 implementation of
arch_sync_dma_for_device() have left me unsure about the following
architectures in this area:

arc
arm
arm64
hexagon
m68k
microblaze
nios2
openrisc
powerpc
sh

The background is that when using the streaming DMA API to map a buffer
prior to inbound non-coherent DMA (i.e. DMA_FROM_DEVICE), it is often
necessary to remove any dirty CPU cache lines so that they can't get
evicted during the transfer and corrupt the buffer contents written by
the DMA. The architectures listed above appear to achieve this using
*invalidation*, which tends to imply discarding of the CPU cache lines
rather than writing them back, and this consequently raises two related
questions:

  (1) What if the DMA transfer doesn't write to every byte in the buffer?
  (2) What if the buffer has a virtual alias in userspace (e.g. because
      the kernel has GUP'd the buffer?

In both cases, stale data (possibly containing random values written
prior to the initial clearing of the page) can be read from the buffer.
In case (1), this applies to the unwritten bytes after the transfer has
completed and in case (2) it applies to the user alias of the whole
buffer during the narrow window between arch_sync_dma_for_device() and
the DMA writes landing in memory.

The simplest fix (diff for arm64 below) seems to be changing the
invalidation in this case to be a "clean" in arm(64)-speak so that any
dirty lines are written back, therefore limiting the stale data to the
initial buffer contents. In doing so, this makes the FROM_DEVICE and
BIDIRECTIONAL cases identical which makes some intuitive sense if you
think of FROM_DEVICE as first doing a TO_DEVICE of any dirty CPU cache
lines. One interesting thing I noticed is that the csky implementation
additionally zeroes the buffer prior to the clean, but this seems to be
overkill.

An alternative solution would be to ensure that newly allocated pages
are clean rather than just zeroed; although this would then handle any
variants of case (2) (e.g. where userspace can access a buffer with
non-cacheable attributes), it's likely to have an intolerable
performance cost.

Finally, on arm(64), the DMA mapping code tries to deal with buffers
that are not cacheline aligned by issuing clean-and-invalidate
operations for the overlapping portions at each end of the buffer. I
don't think this makes a tonne of sense, as inevitably one of the
writers (either the CPU or the DMA) is going to win and somebody is
going to run into silent data loss. Having the caller receive
DMA_MAPPING_ERROR in this case would probably be better.

Thoughts? I haven't tried to reproduce the problem above in practice and
this code has been like this since the dawn of time, so we could also
elect to leave it as-is...

Will

--->8

diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 0ea6cc25dc66..21c907987080 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -218,8 +218,6 @@ SYM_FUNC_ALIAS(__dma_flush_area, __pi___dma_flush_area)
  */
 SYM_FUNC_START(__pi___dma_map_area)
        add     x1, x0, x1
-       cmp     w2, #DMA_FROM_DEVICE
-       b.eq    __pi_dcache_inval_poc
        b       __pi_dcache_clean_poc
 SYM_FUNC_END(__pi___dma_map_area)
 SYM_FUNC_ALIAS(__dma_map_area, __pi___dma_map_area)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-06 15:21 Cache maintenance for non-coherent DMA in arch_sync_dma_for_device() Will Deacon
@ 2022-06-06 15:35 ` Russell King (Oracle)
  2022-06-06 16:02   ` Robin Murphy
                     ` (2 more replies)
  2022-06-08  8:48 ` Christoph Hellwig
  1 sibling, 3 replies; 13+ messages in thread
From: Russell King (Oracle) @ 2022-06-06 15:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, catalin.marinas, maz, mark.rutland, robin.murphy,
	hch, vgupta, arnd, bcain, geert, monstr, dinguyen, shorne, mpe,
	dalias

On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
>   (1) What if the DMA transfer doesn't write to every byte in the buffer?

The data that is in RAM gets pulled into the cache and is visible to
the CPU - but if DMA doesn't write to every byte in the buffer, isn't
that a DMA failure? Should a buffer that suffers DMA failure be passed
to the user?

>   (2) What if the buffer has a virtual alias in userspace (e.g. because
>       the kernel has GUP'd the buffer?

Then userspace needs to avoid writing to cachelines that overlap the
buffer to avoid destroying the action of the DMA. It shouldn't be doing
this anyway (what happens if userspace writes to the same location that
is being DMA'd to... who wins?)

However, you're right that invalidating in this case could expose data
that userspace shouldn't see, and I'd suggest in this case that DMA
buffers should be cleaned in this circumstance before they're exposed
to userspace - so userspace only ever gets to see the data that was
there at the point they're mapped, or is subsequently written to
afterwards by DMA.

I don't think there's anything to be worried about if the invalidation
reveals stale data provided the stale data is not older than the data
that was there on first mapping.

> Finally, on arm(64), the DMA mapping code tries to deal with buffers
> that are not cacheline aligned by issuing clean-and-invalidate
> operations for the overlapping portions at each end of the buffer. I
> don't think this makes a tonne of sense, as inevitably one of the
> writers (either the CPU or the DMA) is going to win and somebody is
> going to run into silent data loss. Having the caller receive
> DMA_MAPPING_ERROR in this case would probably be better.

Sadly unavoidable - people really like passing unaligned buffers to the
DMA API, sometimes those buffers contain information that needs to be
preserved. I really wish it wasn't that way, because it would make life
a lot better, but it's what we've had to deal with over the years with
the likes of the SCSI subsystem (and e.g. it's sense buffer that was
embedded non-cacheline aligned into other structures that had to be
DMA'd to.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-06 15:35 ` Russell King (Oracle)
@ 2022-06-06 16:02   ` Robin Murphy
  2022-06-06 16:16     ` Russell King (Oracle)
  2022-06-08  8:49     ` Christoph Hellwig
  2022-06-06 16:13   ` Catalin Marinas
  2022-06-06 16:15   ` Ard Biesheuvel
  2 siblings, 2 replies; 13+ messages in thread
From: Robin Murphy @ 2022-06-06 16:02 UTC (permalink / raw)
  To: Russell King (Oracle), Will Deacon
  Cc: linux-arch, catalin.marinas, maz, mark.rutland, hch, vgupta,
	arnd, bcain, geert, monstr, dinguyen, shorne, mpe, dalias

On 2022-06-06 16:35, Russell King (Oracle) wrote:
> On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
>>    (1) What if the DMA transfer doesn't write to every byte in the buffer?
> 
> The data that is in RAM gets pulled into the cache and is visible to
> the CPU - but if DMA doesn't write to every byte in the buffer, isn't
> that a DMA failure? Should a buffer that suffers DMA failure be passed
> to the user?

No, partial DMA writes can sometimes effectively be expected behaviour, 
see the whole SWIOTLB CVE fiasco for the most recent discussion on that:

https://lore.kernel.org/lkml/1812355.tdWV9SEqCh@natalenko.name/

>>    (2) What if the buffer has a virtual alias in userspace (e.g. because
>>        the kernel has GUP'd the buffer?
> 
> Then userspace needs to avoid writing to cachelines that overlap the
> buffer to avoid destroying the action of the DMA. It shouldn't be doing
> this anyway (what happens if userspace writes to the same location that
> is being DMA'd to... who wins?)
> 
> However, you're right that invalidating in this case could expose data
> that userspace shouldn't see, and I'd suggest in this case that DMA
> buffers should be cleaned in this circumstance before they're exposed
> to userspace - so userspace only ever gets to see the data that was
> there at the point they're mapped, or is subsequently written to
> afterwards by DMA.
> 
> I don't think there's anything to be worried about if the invalidation
> reveals stale data provided the stale data is not older than the data
> that was there on first mapping.

Indeed as above that may actually be required. I think cleaning the 
caches on dma_map_* is the most correct thing to do.

Robin.

>> Finally, on arm(64), the DMA mapping code tries to deal with buffers
>> that are not cacheline aligned by issuing clean-and-invalidate
>> operations for the overlapping portions at each end of the buffer. I
>> don't think this makes a tonne of sense, as inevitably one of the
>> writers (either the CPU or the DMA) is going to win and somebody is
>> going to run into silent data loss. Having the caller receive
>> DMA_MAPPING_ERROR in this case would probably be better.
> 
> Sadly unavoidable - people really like passing unaligned buffers to the
> DMA API, sometimes those buffers contain information that needs to be
> preserved. I really wish it wasn't that way, because it would make life
> a lot better, but it's what we've had to deal with over the years with
> the likes of the SCSI subsystem (and e.g. it's sense buffer that was
> embedded non-cacheline aligned into other structures that had to be
> DMA'd to.)
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-06 15:35 ` Russell King (Oracle)
  2022-06-06 16:02   ` Robin Murphy
@ 2022-06-06 16:13   ` Catalin Marinas
  2022-06-06 16:15   ` Ard Biesheuvel
  2 siblings, 0 replies; 13+ messages in thread
From: Catalin Marinas @ 2022-06-06 16:13 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Will Deacon, linux-arch, maz, mark.rutland, robin.murphy, hch,
	vgupta, arnd, bcain, geert, monstr, dinguyen, shorne, mpe,
	dalias

On Mon, Jun 06, 2022 at 04:35:07PM +0100, Russell King wrote:
> On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
> > Finally, on arm(64), the DMA mapping code tries to deal with buffers
> > that are not cacheline aligned by issuing clean-and-invalidate
> > operations for the overlapping portions at each end of the buffer. I
> > don't think this makes a tonne of sense, as inevitably one of the
> > writers (either the CPU or the DMA) is going to win and somebody is
> > going to run into silent data loss. Having the caller receive
> > DMA_MAPPING_ERROR in this case would probably be better.
> 
> Sadly unavoidable - people really like passing unaligned buffers to the
> DMA API, sometimes those buffers contain information that needs to be
> preserved. I really wish it wasn't that way, because it would make life
> a lot better, but it's what we've had to deal with over the years with
> the likes of the SCSI subsystem (and e.g. it's sense buffer that was
> embedded non-cacheline aligned into other structures that had to be
> DMA'd to.)

As Will said, you either corrupt the DMA buffer or the kernel data
sharing the cache line, you can't really win. Current behaviour favours
the kernel data and somehow hopes that there won't be any write to the
cache line by the CPU before the DMA transfer completes. If this hope
holds, the fix to clean (instead of invalidate) in __dma_map_area()
should be sufficient, no need for boundary checks and alignment.

-- 
Catalin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-06 15:35 ` Russell King (Oracle)
  2022-06-06 16:02   ` Robin Murphy
  2022-06-06 16:13   ` Catalin Marinas
@ 2022-06-06 16:15   ` Ard Biesheuvel
  2022-06-08 16:51     ` Catalin Marinas
  2 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2022-06-06 16:15 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Will Deacon, linux-arch, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Robin Murphy, Christoph Hellwig, vgupta,
	Arnd Bergmann, bcain, Geert Uytterhoeven, Michal Simek,
	Dinh Nguyen, Stafford Horne, Michael Ellerman, Rich Felker

On Mon, 6 Jun 2022 at 17:36, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
> >   (1) What if the DMA transfer doesn't write to every byte in the buffer?
>
> The data that is in RAM gets pulled into the cache and is visible to
> the CPU - but if DMA doesn't write to every byte in the buffer, isn't
> that a DMA failure? Should a buffer that suffers DMA failure be passed
> to the user?
>
> >   (2) What if the buffer has a virtual alias in userspace (e.g. because
> >       the kernel has GUP'd the buffer?
>
> Then userspace needs to avoid writing to cachelines that overlap the
> buffer to avoid destroying the action of the DMA. It shouldn't be doing
> this anyway (what happens if userspace writes to the same location that
> is being DMA'd to... who wins?)
>
> However, you're right that invalidating in this case could expose data
> that userspace shouldn't see, and I'd suggest in this case that DMA
> buffers should be cleaned in this circumstance before they're exposed
> to userspace - so userspace only ever gets to see the data that was
> there at the point they're mapped, or is subsequently written to
> afterwards by DMA.
>
> I don't think there's anything to be worried about if the invalidation
> reveals stale data provided the stale data is not older than the data
> that was there on first mapping.
>

Given that cache invalidate without clean could potentially nullify
the effect of, e.g., a memzero_explicit() call, I think a clean is
definitely safer, but  OTOH, it is also more costly, and not strictly
necessary for correctness of the DMA operation itself.

So I agree with the suggested change, as long as it annotated
sufficiently clearly to make the above distinction.

> > Finally, on arm(64), the DMA mapping code tries to deal with buffers
> > that are not cacheline aligned by issuing clean-and-invalidate
> > operations for the overlapping portions at each end of the buffer. I
> > don't think this makes a tonne of sense, as inevitably one of the
> > writers (either the CPU or the DMA) is going to win and somebody is
> > going to run into silent data loss. Having the caller receive
> > DMA_MAPPING_ERROR in this case would probably be better.
>
> Sadly unavoidable - people really like passing unaligned buffers to the
> DMA API, sometimes those buffers contain information that needs to be
> preserved. I really wish it wasn't that way, because it would make life
> a lot better, but it's what we've had to deal with over the years with
> the likes of the SCSI subsystem (and e.g. it's sense buffer that was
> embedded non-cacheline aligned into other structures that had to be
> DMA'd to.)
>

As discussed in the thread related to Catalin's DMA granularity
series, this is something I think we should be addressing with bounce
buffering for inbound DMA. This would allow us to reduce the kmalloc
alignment as well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-06 16:02   ` Robin Murphy
@ 2022-06-06 16:16     ` Russell King (Oracle)
  2022-06-08  8:49     ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Russell King (Oracle) @ 2022-06-06 16:16 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, linux-arch, catalin.marinas, maz, mark.rutland, hch,
	vgupta, arnd, bcain, geert, monstr, dinguyen, shorne, mpe,
	dalias

On Mon, Jun 06, 2022 at 05:02:50PM +0100, Robin Murphy wrote:
> On 2022-06-06 16:35, Russell King (Oracle) wrote:
> > On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
> > >    (1) What if the DMA transfer doesn't write to every byte in the buffer?
> > 
> > The data that is in RAM gets pulled into the cache and is visible to
> > the CPU - but if DMA doesn't write to every byte in the buffer, isn't
> > that a DMA failure? Should a buffer that suffers DMA failure be passed
> > to the user?
> 
> No, partial DMA writes can sometimes effectively be expected behaviour, see
> the whole SWIOTLB CVE fiasco for the most recent discussion on that:
> 
> https://lore.kernel.org/lkml/1812355.tdWV9SEqCh@natalenko.name/

So we have a CVE'd security hole that was never reported to
maintainers...

> > >    (2) What if the buffer has a virtual alias in userspace (e.g. because
> > >        the kernel has GUP'd the buffer?
> > 
> > Then userspace needs to avoid writing to cachelines that overlap the
> > buffer to avoid destroying the action of the DMA. It shouldn't be doing
> > this anyway (what happens if userspace writes to the same location that
> > is being DMA'd to... who wins?)
> > 
> > However, you're right that invalidating in this case could expose data
> > that userspace shouldn't see, and I'd suggest in this case that DMA
> > buffers should be cleaned in this circumstance before they're exposed
> > to userspace - so userspace only ever gets to see the data that was
> > there at the point they're mapped, or is subsequently written to
> > afterwards by DMA.
> > 
> > I don't think there's anything to be worried about if the invalidation
> > reveals stale data provided the stale data is not older than the data
> > that was there on first mapping.
> 
> Indeed as above that may actually be required. I think cleaning the caches
> on dma_map_* is the most correct thing to do.

It's also the most expensive thing to do as it can push up the slow
on-bus traffic to memory quite a bit, especially when big buffers
are involved.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-06 15:21 Cache maintenance for non-coherent DMA in arch_sync_dma_for_device() Will Deacon
  2022-06-06 15:35 ` Russell King (Oracle)
@ 2022-06-08  8:48 ` Christoph Hellwig
  2022-06-08 12:02   ` Russell King (Oracle)
  2022-06-09 13:59   ` Will Deacon
  1 sibling, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-06-08  8:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, catalin.marinas, maz, mark.rutland, robin.murphy,
	hch, vgupta, linux, arnd, bcain, geert, monstr, dinguyen, shorne,
	mpe, dalias

On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
> The simplest fix (diff for arm64 below) seems to be changing the
> invalidation in this case to be a "clean" in arm(64)-speak so that any
> dirty lines are written back, therefore limiting the stale data to the
> initial buffer contents. In doing so, this makes the FROM_DEVICE and
> BIDIRECTIONAL cases identical which makes some intuitive sense if you
> think of FROM_DEVICE as first doing a TO_DEVICE of any dirty CPU cache
> lines. One interesting thing I noticed is that the csky implementation
> additionally zeroes the buffer prior to the clean, but this seems to be
> overkill.

Btw, one thing I'd love to (and might need some help from the arch
maintainers) is to change how the dma cache maintainance hooks work.

Right now they are high-level and these kinds of decisions need to
be take in the arch code.  I'd prefer to move over to the architectures
providing very low-level helpers to:

  - writeback
  - invalidate
  - invalidate+writeback

Note arch/arc/mm/dma.c has a ver nice documentation of what we need to
based on a mail from Russell, and we should keep it uptodate with any
changes to the status quo and probably move it to common documentation
at leat.

> Finally, on arm(64), the DMA mapping code tries to deal with buffers
> that are not cacheline aligned by issuing clean-and-invalidate
> operations for the overlapping portions at each end of the buffer. I
> don't think this makes a tonne of sense, as inevitably one of the
> writers (either the CPU or the DMA) is going to win and somebody is
> going to run into silent data loss. Having the caller receive
> DMA_MAPPING_ERROR in this case would probably be better.

Yes, the mapping are supposed to be cache line aligned or at least
have padding around them.  But due to the later case we can't really
easily verify this in dma-debug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-06 16:02   ` Robin Murphy
  2022-06-06 16:16     ` Russell King (Oracle)
@ 2022-06-08  8:49     ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-06-08  8:49 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Russell King (Oracle),
	Will Deacon, linux-arch, catalin.marinas, maz, mark.rutland, hch,
	vgupta, arnd, bcain, geert, monstr, dinguyen, shorne, mpe,
	dalias

On Mon, Jun 06, 2022 at 05:02:50PM +0100, Robin Murphy wrote:
> No, partial DMA writes can sometimes effectively be expected behaviour, see 
> the whole SWIOTLB CVE fiasco for the most recent discussion on that:

Yes, and I still have a TODO list item for interfaces that deal
with the case of a transfer smaller than the mapping sanely, but
I haven't managed to get to it yet.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-08  8:48 ` Christoph Hellwig
@ 2022-06-08 12:02   ` Russell King (Oracle)
  2022-06-08 15:14     ` Christoph Hellwig
  2022-06-09 13:59   ` Will Deacon
  1 sibling, 1 reply; 13+ messages in thread
From: Russell King (Oracle) @ 2022-06-08 12:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Will Deacon, linux-arch, catalin.marinas, maz, mark.rutland,
	robin.murphy, vgupta, arnd, bcain, geert, monstr, dinguyen,
	shorne, mpe, dalias

On Wed, Jun 08, 2022 at 10:48:41AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
> > The simplest fix (diff for arm64 below) seems to be changing the
> > invalidation in this case to be a "clean" in arm(64)-speak so that any
> > dirty lines are written back, therefore limiting the stale data to the
> > initial buffer contents. In doing so, this makes the FROM_DEVICE and
> > BIDIRECTIONAL cases identical which makes some intuitive sense if you
> > think of FROM_DEVICE as first doing a TO_DEVICE of any dirty CPU cache
> > lines. One interesting thing I noticed is that the csky implementation
> > additionally zeroes the buffer prior to the clean, but this seems to be
> > overkill.
> 
> Btw, one thing I'd love to (and might need some help from the arch
> maintainers) is to change how the dma cache maintainance hooks work.
> 
> Right now they are high-level and these kinds of decisions need to
> be take in the arch code.  I'd prefer to move over to the architectures
> providing very low-level helpers to:
> 
>   - writeback
>   - invalidate
>   - invalidate+writeback
> 
> Note arch/arc/mm/dma.c has a ver nice documentation of what we need to
> based on a mail from Russell, and we should keep it uptodate with any
> changes to the status quo and probably move it to common documentation
> at leat.

Note that simply devolving the operations to this set is not optimal.
If you notice, both my email and the table that was copied from my
email makes two of the invalidate options dependent on the properties
of the CPU cache architecture.

While we could invalidate anyway at that point, this just fuels the
view that generic code == non-optimal, performance degrading code.
This is why the underlying interfaces on 32-bit ARM are not these
cache operations, but are instead based on buffer ownership
transitions - __dma_page_cpu_to_dev() and __dma_page_dev_to_cpu()
are the two things that are really needed.

There's also additional code that deals with the d-cache state
(which itself is architecture specific, which avoids useless
d-cache maintenance when page cache pages get mapped into userspace)
as well as the complexities of dealing with more than one level of
cache - where the order of the inner and outer cache maintenance
can't be expressed as per the simple functions you mention above.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-08 12:02   ` Russell King (Oracle)
@ 2022-06-08 15:14     ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-06-08 15:14 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Christoph Hellwig, Will Deacon, linux-arch, catalin.marinas, maz,
	mark.rutland, robin.murphy, vgupta, arnd, bcain, geert, monstr,
	dinguyen, shorne, mpe, dalias

On Wed, Jun 08, 2022 at 01:02:21PM +0100, Russell King (Oracle) wrote:
> Note that simply devolving the operations to this set is not optimal.
> If you notice, both my email and the table that was copied from my
> email makes two of the invalidate options dependent on the properties
> of the CPU cache architecture.

Yes, but that is nothing we can't do with a little optional arch
tunable.  The whole point is that most architectures simply copy
and paste in the best case or get it wrong.  So I'd have a nicely
working piece of generic code which is a win even if we still need
to override it in one or two places.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-06 16:15   ` Ard Biesheuvel
@ 2022-06-08 16:51     ` Catalin Marinas
  0 siblings, 0 replies; 13+ messages in thread
From: Catalin Marinas @ 2022-06-08 16:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King (Oracle),
	Will Deacon, linux-arch, Marc Zyngier, Mark Rutland,
	Robin Murphy, Christoph Hellwig, vgupta, Arnd Bergmann, bcain,
	Geert Uytterhoeven, Michal Simek, Dinh Nguyen, Stafford Horne,
	Michael Ellerman, Rich Felker

On Mon, Jun 06, 2022 at 06:15:13PM +0200, Ard Biesheuvel wrote:
> On Mon, 6 Jun 2022 at 17:36, Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> > On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
> > > Finally, on arm(64), the DMA mapping code tries to deal with buffers
> > > that are not cacheline aligned by issuing clean-and-invalidate
> > > operations for the overlapping portions at each end of the buffer. I
> > > don't think this makes a tonne of sense, as inevitably one of the
> > > writers (either the CPU or the DMA) is going to win and somebody is
> > > going to run into silent data loss. Having the caller receive
> > > DMA_MAPPING_ERROR in this case would probably be better.
> >
> > Sadly unavoidable - people really like passing unaligned buffers to the
> > DMA API, sometimes those buffers contain information that needs to be
> > preserved. I really wish it wasn't that way, because it would make life
> > a lot better, but it's what we've had to deal with over the years with
> > the likes of the SCSI subsystem (and e.g. it's sense buffer that was
> > embedded non-cacheline aligned into other structures that had to be
> > DMA'd to.)
> 
> As discussed in the thread related to Catalin's DMA granularity
> series, this is something I think we should be addressing with bounce
> buffering for inbound DMA. This would allow us to reduce the kmalloc
> alignment as well.

It depends on the size. My plan was to do bouncing only if the size is
below ARCH_DMA_MINALIGN. For larger buffers, kmalloc() gives us
alignment to a power of two (well, other than 96 and 192) and no
bouncing needed. If some buggy driver allocates a large structure only
to hope it can do DMA into unaligned parts of it while modifying the
adjacent bytes, we should just mark it as broken, we can't fix it.

However, if the driver doesn't modify the cache line while the DMA takes
place (only expecting it to be preserved), the fix from Will to clean on
__dma_map_area() is sufficient.

-- 
Catalin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-08  8:48 ` Christoph Hellwig
  2022-06-08 12:02   ` Russell King (Oracle)
@ 2022-06-09 13:59   ` Will Deacon
  2022-06-09 14:18     ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2022-06-09 13:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch, catalin.marinas, maz, mark.rutland, robin.murphy,
	vgupta, linux, arnd, bcain, geert, monstr, dinguyen, shorne, mpe,
	dalias

On Wed, Jun 08, 2022 at 10:48:41AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
> > The simplest fix (diff for arm64 below) seems to be changing the
> > invalidation in this case to be a "clean" in arm(64)-speak so that any
> > dirty lines are written back, therefore limiting the stale data to the
> > initial buffer contents. In doing so, this makes the FROM_DEVICE and
> > BIDIRECTIONAL cases identical which makes some intuitive sense if you
> > think of FROM_DEVICE as first doing a TO_DEVICE of any dirty CPU cache
> > lines. One interesting thing I noticed is that the csky implementation
> > additionally zeroes the buffer prior to the clean, but this seems to be
> > overkill.
> 
> Btw, one thing I'd love to (and might need some help from the arch
> maintainers) is to change how the dma cache maintainance hooks work.
> 
> Right now they are high-level and these kinds of decisions need to
> be take in the arch code.  I'd prefer to move over to the architectures
> providing very low-level helpers to:
> 
>   - writeback
>   - invalidate
>   - invalidate+writeback
> 
> Note arch/arc/mm/dma.c has a ver nice documentation of what we need to
> based on a mail from Russell, and we should keep it uptodate with any
> changes to the status quo and probably move it to common documentation
> at leat

That makes sense to me (assuming an opt-out for archs that want it), but
I'd like to make sure that these low-level helpers aren't generally
available for e.g. driver modules to dip into directly; it's pretty common
for folks to request that we EXPORT our cache maintenance routines because
they're violating the DMA API someplace and so far we've been pretty good
at asking them to fix their code instead.

> > Finally, on arm(64), the DMA mapping code tries to deal with buffers
> > that are not cacheline aligned by issuing clean-and-invalidate
> > operations for the overlapping portions at each end of the buffer. I
> > don't think this makes a tonne of sense, as inevitably one of the
> > writers (either the CPU or the DMA) is going to win and somebody is
> > going to run into silent data loss. Having the caller receive
> > DMA_MAPPING_ERROR in this case would probably be better.
> 
> Yes, the mapping are supposed to be cache line aligned or at least
> have padding around them.  But due to the later case we can't really
> easily verify this in dma-debug.

Damn, I hadn't considered padding. That probably means that the csky
implementation that I mentioned (which zeroes the buffer) is buggy...

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
  2022-06-09 13:59   ` Will Deacon
@ 2022-06-09 14:18     ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-06-09 14:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: Christoph Hellwig, linux-arch, catalin.marinas, maz,
	mark.rutland, robin.murphy, vgupta, linux, arnd, bcain, geert,
	monstr, dinguyen, shorne, mpe, dalias

On Thu, Jun 09, 2022 at 02:59:54PM +0100, Will Deacon wrote:
> That makes sense to me (assuming an opt-out for archs that want it), but
> I'd like to make sure that these low-level helpers aren't generally
> available for e.g. driver modules to dip into directly; it's pretty common
> for folks to request that we EXPORT our cache maintenance routines because
> they're violating the DMA API someplace and so far we've been pretty good
> at asking them to fix their code instead.

Yes, they have absolutely no business being available to drivers
directly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-06-09 14:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 15:21 Cache maintenance for non-coherent DMA in arch_sync_dma_for_device() Will Deacon
2022-06-06 15:35 ` Russell King (Oracle)
2022-06-06 16:02   ` Robin Murphy
2022-06-06 16:16     ` Russell King (Oracle)
2022-06-08  8:49     ` Christoph Hellwig
2022-06-06 16:13   ` Catalin Marinas
2022-06-06 16:15   ` Ard Biesheuvel
2022-06-08 16:51     ` Catalin Marinas
2022-06-08  8:48 ` Christoph Hellwig
2022-06-08 12:02   ` Russell King (Oracle)
2022-06-08 15:14     ` Christoph Hellwig
2022-06-09 13:59   ` Will Deacon
2022-06-09 14:18     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.