All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] riscv: dma-mapping: Use conventional cache operations for dma_sync*()
@ 2022-08-18 16:51 Sergei Miroshnichenko
  2022-08-18 18:12 ` Jessica Clarke
  0 siblings, 1 reply; 4+ messages in thread
From: Sergei Miroshnichenko @ 2022-08-18 16:51 UTC (permalink / raw)
  To: Heiko Stuebner, Samuel Holland
  Cc: linux-riscv, linux, Palmer Dabbelt, Sergei Miroshnichenko

Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added recently.

Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus invalidate)
cache operation, which can overwrite the data (received from a device via
DMA) with dirty cache lines. Replace it with the inval to avoid data
corruptions.

The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the cache,
but it is called after the device had owned the buffer. So in order to not
loose the received data, stick to the inval here as well.

As of for_device(FROM_DEVICE), according to the inspirational discussion
about matching the DMA API and cache operations [1], the inval or a no-op
should be here, and this would resonate with most other arches. But in a
later discussion [2] it was decided that the clean (which is used now) is
indeed preferable and safer, though slower. The arm64 has been recently
changed in the same way [3].

[1] dma_sync_*_for_cpu and direction=TO_DEVICE
Link: https://lkml.org/lkml/2018/5/18/979

[2] Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/

[3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer")

Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension")
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/riscv/mm/dma-noncoherent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index cd2225304c82..2a8f6124021f 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -45,7 +45,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 		break;
 	case DMA_FROM_DEVICE:
 	case DMA_BIDIRECTIONAL:
-		ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+		ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size);
 		break;
 	default:
 		break;
-- 
2.37.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] riscv: dma-mapping: Use conventional cache operations for dma_sync*()
  2022-08-18 16:51 [PATCH] riscv: dma-mapping: Use conventional cache operations for dma_sync*() Sergei Miroshnichenko
@ 2022-08-18 18:12 ` Jessica Clarke
  2022-08-18 20:52   ` Sergei Miroshnichenko
  0 siblings, 1 reply; 4+ messages in thread
From: Jessica Clarke @ 2022-08-18 18:12 UTC (permalink / raw)
  To: Sergei Miroshnichenko
  Cc: Heiko Stuebner, Samuel Holland, linux-riscv, linux, Palmer Dabbelt

On 18 Aug 2022, at 17:51, Sergei Miroshnichenko <s.miroshnichenko@yadro.com> wrote:
> 
> Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added recently.
> 
> Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus invalidate)
> cache operation, which can overwrite the data (received from a device via
> DMA) with dirty cache lines. Replace it with the inval to avoid data
> corruptions.
> 
> The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the cache,
> but it is called after the device had owned the buffer. So in order to not
> loose the received data, stick to the inval here as well.
> 
> As of for_device(FROM_DEVICE), according to the inspirational discussion
> about matching the DMA API and cache operations [1], the inval or a no-op
> should be here, and this would resonate with most other arches. But in a
> later discussion [2] it was decided that the clean (which is used now) is
> indeed preferable and safer, though slower. The arm64 has been recently
> changed in the same way [3].

Flush is always a valid way to implement invalidate, and if it does not
work then you have bugs elsewhere, since the CPU is free to write out
the dirty cache lines at any point before an invalidate. So whilst it’s
more efficient to use invalidate and hint to the core that it does not
need to do any writing, it is no more correct, and this this patch
description is misguided. You even might wish to use the fact that it
is currently a flush as a convenient sanitiser for whatever is
currently violating the non-coherent DMA rules and fix that before it’s
switched to invalidate and papers over the issue (at least in whatever
implementation you’re running on).

Jess

> [1] dma_sync_*_for_cpu and direction=TO_DEVICE
> Link: https://lkml.org/lkml/2018/5/18/979
> 
> [2] Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
> Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/
> 
> [3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer")
> 
> Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension")
> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
> arch/riscv/mm/dma-noncoherent.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> index cd2225304c82..2a8f6124021f 100644
> --- a/arch/riscv/mm/dma-noncoherent.c
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -45,7 +45,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
> 		break;
> 	case DMA_FROM_DEVICE:
> 	case DMA_BIDIRECTIONAL:
> -		ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> +		ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size);
> 		break;
> 	default:
> 		break;
> -- 
> 2.37.2
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] riscv: dma-mapping: Use conventional cache operations for dma_sync*()
  2022-08-18 18:12 ` Jessica Clarke
@ 2022-08-18 20:52   ` Sergei Miroshnichenko
  2022-08-18 21:35     ` Jessica Clarke
  0 siblings, 1 reply; 4+ messages in thread
From: Sergei Miroshnichenko @ 2022-08-18 20:52 UTC (permalink / raw)
  To: jrtc27; +Cc: linux-riscv, samuel, linux, heiko, palmer

Hi Jess,

On Thu, 2022-08-18 at 19:12 +0100, Jessica Clarke wrote:
> On 18 Aug 2022, at 17:51, Sergei Miroshnichenko
> <s.miroshnichenko@yadro.com> wrote:
> > 
> > Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added
> > recently.
> > 
> > Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus
> > invalidate)
> > cache operation, which can overwrite the data (received from a
> > device via
> > DMA) with dirty cache lines. Replace it with the inval to avoid
> > data
> > corruptions.
> > 
> > The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the
> > cache,
> > but it is called after the device had owned the buffer. So in order
> > to not
> > loose the received data, stick to the inval here as well.
> > 
> > As of for_device(FROM_DEVICE), according to the inspirational
> > discussion
> > about matching the DMA API and cache operations [1], the inval or a
> > no-op
> > should be here, and this would resonate with most other arches. But
> > in a
> > later discussion [2] it was decided that the clean (which is used
> > now) is
> > indeed preferable and safer, though slower. The arm64 has been
> > recently
> > changed in the same way [3].
> 
> Flush is always a valid way to implement invalidate, and if it does
> not
> work then you have bugs elsewhere, since the CPU is free to write out
> the dirty cache lines at any point before an invalidate. So whilst
> it’s
> more efficient to use invalidate and hint to the core that it does
> not
> need to do any writing, it is no more correct, and this this patch
> description is misguided. You even might wish to use the fact that it
> is currently a flush as a convenient sanitiser for whatever is
> currently violating the non-coherent DMA rules and fix that before
> it’s
> switched to invalidate and papers over the issue (at least in
> whatever
> implementation you’re running on).

You are right, thank you, the description *is* misleading: cache lines
can't get dirty before for_cpu(), as they are cleaned during
for_device(), and are not written to by the CPU in between.

I gave it another thought: the reason to invalidate in for_cpu() is
that the CPU can speculatively prefetch the buffer after
map()/for_device() but before the DMA is finished.

Even if replace the clean in for_device(FROM_DEVICE) with the flush, a
prefetch still may happen, requiring an invalidation.

Serge

> 
> Jess
> 
> > [1] dma_sync_*_for_cpu and direction=TO_DEVICE
> > Link: https://lkml.org/lkml/2018/5/18/979
> > 
> > [2] Cache maintenance for non-coherent DMA in
> > arch_sync_dma_for_device()
> > Link:
> > https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/
> > 
> > [3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE
> > buffers at start of DMA transfer")
> > 
> > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices
> > using zicbom extension")
> > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > ---
> > arch/riscv/mm/dma-noncoherent.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-
> > noncoherent.c
> > index cd2225304c82..2a8f6124021f 100644
> > --- a/arch/riscv/mm/dma-noncoherent.c
> > +++ b/arch/riscv/mm/dma-noncoherent.c
> > @@ -45,7 +45,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr,
> > size_t size,
> >               break;
> >       case DMA_FROM_DEVICE:
> >       case DMA_BIDIRECTIONAL:
> > -             ALT_CMO_OP(flush, vaddr, size,
> > riscv_cbom_block_size);
> > +             ALT_CMO_OP(inval, vaddr, size,
> > riscv_cbom_block_size);
> >               break;
> >       default:
> >               break;
> > --
> > 2.37.2
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] riscv: dma-mapping: Use conventional cache operations for dma_sync*()
  2022-08-18 20:52   ` Sergei Miroshnichenko
@ 2022-08-18 21:35     ` Jessica Clarke
  0 siblings, 0 replies; 4+ messages in thread
From: Jessica Clarke @ 2022-08-18 21:35 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-riscv, samuel, linux, heiko, palmer

On 18 Aug 2022, at 21:52, Sergei Miroshnichenko <s.miroshnichenko@yadro.com> wrote:
> 
> Hi Jess,
> 
> On Thu, 2022-08-18 at 19:12 +0100, Jessica Clarke wrote:
>> On 18 Aug 2022, at 17:51, Sergei Miroshnichenko
>> <s.miroshnichenko@yadro.com> wrote:
>>> 
>>> Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added
>>> recently.
>>> 
>>> Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus
>>> invalidate)
>>> cache operation, which can overwrite the data (received from a
>>> device via
>>> DMA) with dirty cache lines. Replace it with the inval to avoid
>>> data
>>> corruptions.
>>> 
>>> The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the
>>> cache,
>>> but it is called after the device had owned the buffer. So in order
>>> to not
>>> loose the received data, stick to the inval here as well.
>>> 
>>> As of for_device(FROM_DEVICE), according to the inspirational
>>> discussion
>>> about matching the DMA API and cache operations [1], the inval or a
>>> no-op
>>> should be here, and this would resonate with most other arches. But
>>> in a
>>> later discussion [2] it was decided that the clean (which is used
>>> now) is
>>> indeed preferable and safer, though slower. The arm64 has been
>>> recently
>>> changed in the same way [3].
>> 
>> Flush is always a valid way to implement invalidate, and if it does
>> not
>> work then you have bugs elsewhere, since the CPU is free to write out
>> the dirty cache lines at any point before an invalidate. So whilst
>> it’s
>> more efficient to use invalidate and hint to the core that it does
>> not
>> need to do any writing, it is no more correct, and this this patch
>> description is misguided. You even might wish to use the fact that it
>> is currently a flush as a convenient sanitiser for whatever is
>> currently violating the non-coherent DMA rules and fix that before
>> it’s
>> switched to invalidate and papers over the issue (at least in
>> whatever
>> implementation you’re running on).
> 
> You are right, thank you, the description *is* misleading: cache lines
> can't get dirty before for_cpu(), as they are cleaned during
> for_device(), and are not written to by the CPU in between.
> 
> I gave it another thought: the reason to invalidate in for_cpu() is
> that the CPU can speculatively prefetch the buffer after
> map()/for_device() but before the DMA is finished.
> 
> Even if replace the clean in for_device(FROM_DEVICE) with the flush, a
> prefetch still may happen, requiring an invalidation.

In that case the line is still clean, not dirty, so a flush will behave
the same as an invalidate. So no, this is still wrong, flush is always
correct, even if invalidate is potentially faster.

Jess


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-18 21:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-18 16:51 [PATCH] riscv: dma-mapping: Use conventional cache operations for dma_sync*() Sergei Miroshnichenko
2022-08-18 18:12 ` Jessica Clarke
2022-08-18 20:52   ` Sergei Miroshnichenko
2022-08-18 21:35     ` Jessica Clarke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.