dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
       [not found]   ` <bad125dff49f6e49c895e818c9d1abb346a46e8e.camel@synopsys.com>
@ 2018-05-18 17:20     ` Vineet Gupta
  2018-05-18 17:50       ` Russell King - ARM Linux
       [not found]     ` <20180518132731.GA31125@lst.de>
  1 sibling, 1 reply; 7+ messages in thread
From: Vineet Gupta @ 2018-05-18 17:20 UTC (permalink / raw)
  To: Alexey Brodkin, hch
  Cc: linux-arch, linux-xtensa, monstr, linux-snps-arc, linux-c6x-dev,
	linux-parisc, linux-sh, linux-hexagon, linux-kernel, iommu,
	linux-m68k, openrisc, green.hu, linux-alpha, sparclinux,
	nios2-dev, deanbo422, linux-arm-kernel, linux-mm, Andrew Morton

On 05/18/2018 06:11 AM, Alexey Brodkin wrote:
>   void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
>                  size_t size, enum dma_data_direction dir)
>   {
> +       if (dir != DMA_TO_DEVICE){
> +               dump_stack();
> +               printk(" *** %s@%d: DMA direction is %s instead of %s\n",
> +                      __func__, __LINE__, dir_to_str(dir), dir_to_str(DMA_TO_DEVICE));
> +       }
> +
>          return _dma_cache_sync(dev, paddr, size, dir);
>   }
>   
>   void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
>                  size_t size, enum dma_data_direction dir)
>   {
> +       if (dir != DMA_FROM_DEVICE) {
> +               dump_stack();
> +               printk(" *** %s@%d: DMA direction is %s instead of %s\n",
> +                      __func__, __LINE__, dir_to_str(dir), dir_to_str(DMA_FROM_DEVICE));
> +       }
> +
>          return _dma_cache_sync(dev, paddr, size, dir);
>   }

...
> In case of MMC/DW_MCI (AKA DesignWare MobileStorage controller) that's an execution flow:
> 1) __dw_mci_start_request()
> 2) dw_mci_pre_dma_transfer()
> 3) dma_map_sg(..., mmc_get_dma_dir(data))
>
> Note mmc_get_dma_dir() is just "data->flags & MMC_DATA_WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE".
> I.e. if we're preparing for sending data dma_noncoherent_map_sg() will have DMA_TO_DEVICE which
> is quite OK for passing to dma_noncoherent_sync_sg_for_device() but in case of reading we'll have
> DMA_FROM_DEVICE which we'll pass to dma_noncoherent_sync_sg_for_device() in dma_noncoherent_map_sg().
>
> I'd say this is not entirely correct because IMHO arch_sync_dma_for_cpu() is supposed to only be used
> in case of DMA_FROM_DEVICE and arch_sync_dma_for_device() only in case of DMA_TO_DEVICE.

So roughly 10 years ago, some kernel rookie name Vineet Gupta, asked the exact 
same question :-)

http://kernelnewbies.kernelnewbies.narkive.com/aGW1QcDv/query-about-dma-sync-for-cpu-and-direction-to-device

I never understood the need for this direction. And if memory serves me right, at 
that time I was seeing twice the amount of cache flushing !

> But the real fix of my problem is:
> ---------------------------------------->8------------------------------------
> --- a/lib/dma-noncoherent.c
> +++ b/lib/dma-noncoherent.c
> @@ -35,7 +35,7 @@ static dma_addr_t dma_noncoherent_map_page(struct device *dev, struct page *page
>   
>          addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>          if (!dma_mapping_error(dev, addr) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> -               arch_sync_dma_for_device(dev, page_to_phys(page), size, dir);
> +               arch_sync_dma_for_device(dev, page_to_phys(page) + offset, size, dir);
>          return addr;
>   }
> ---------------------------------------->8------------------------------------
>
> You seem to lost an offset in the page so if we happen to have a buffer not aligned to
> a page boundary then we were obviously corrupting data outside our data :)

Neat !

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation
       [not found]     ` <20180518132731.GA31125@lst.de>
@ 2018-05-18 17:28       ` Vineet Gupta
  0 siblings, 0 replies; 7+ messages in thread
From: Vineet Gupta @ 2018-05-18 17:28 UTC (permalink / raw)
  To: hch, Alexey Brodkin
  Cc: linux-arch, linux-xtensa, monstr, linux-snps-arc, linux-c6x-dev,
	linux-parisc, linux-sh, linux-hexagon, linux-kernel, iommu,
	linux-m68k, openrisc, green.hu, linux-alpha, sparclinux,
	nios2-dev, deanbo422, linux-arm-kernel, linux-mm, Andrew Morton

On 05/18/2018 06:23 AM, hch@lst.de wrote:
>   Fri, May 18, 2018 at 01:03:46PM +0000, Alexey Brodkin wrote:
>> Note mmc_get_dma_dir() is just "data->flags & MMC_DATA_WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE".
>> I.e. if we're preparing for sending data dma_noncoherent_map_sg() will have DMA_TO_DEVICE which
>> is quite OK for passing to dma_noncoherent_sync_sg_for_device() but in case of reading we'll have
>> DMA_FROM_DEVICE which we'll pass to dma_noncoherent_sync_sg_for_device() in dma_noncoherent_map_sg().
>>
>> I'd say this is not entirely correct because IMHO arch_sync_dma_for_cpu() is supposed to only be used
>> in case of DMA_FROM_DEVICE and arch_sync_dma_for_device() only in case of DMA_TO_DEVICE.
> arc overrides the dir paramter of the dma_sync_single_for_device/
> dma_sync_single_for_cpu calls.  My patches dropped that, and I have
> restored that, and audit for the other architectures is pending.

Right, for now lets retain that and do a sweeping audit of @direction - to me it 
seems extraneous (as it did 10 years ago), but I'm not an expert in this are so 
perhaps it is needed for some device / arches and it would be good to understand 
that finally.

> That being said the existing arc code still looks rather odd as it
> didn't do the same thing for the scatterlist versions of the calls.
> I've thrown in a few patches into my new tree to make the sg versions
> make the normal calls, and to clean up the area a bit.

Not calling names or anything here, but it doesn't exist for sg variants, because 
I didn't write that code :-)
It was introduced by your commi:

2016-01-20 052c96dbe33b arc: convert to dma_map_ops

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
  2018-05-18 17:20     ` dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation) Vineet Gupta
@ 2018-05-18 17:50       ` Russell King - ARM Linux
  2018-05-18 19:57         ` Alexey Brodkin
  2018-05-18 20:35         ` Vineet Gupta
  0 siblings, 2 replies; 7+ messages in thread
From: Russell King - ARM Linux @ 2018-05-18 17:50 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Alexey Brodkin, hch, linux-arch, linux-xtensa, monstr, deanbo422,
	linux-c6x-dev, linux-parisc, linux-sh, linux-m68k, linux-hexagon,
	linux-kernel, linux-mm, iommu, openrisc, green.hu, linux-alpha,
	sparclinux, nios2-dev, Andrew Morton, linux-snps-arc,
	linux-arm-kernel

On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote:
> I never understood the need for this direction. And if memory serves me
> right, at that time I was seeing twice the amount of cache flushing !

It's necessary.  Take a moment to think carefully about this:

	dma_map_single(, dir)

	dma_sync_single_for_cpu(, dir)

	dma_sync_single_for_device(, dir)

	dma_unmap_single(, dir)

In the case of a DMA-incoherent architecture, the operations done at each
stage depend on the direction argument:

	map		for_cpu		for_device	unmap
TO_DEV	writeback	none		writeback	none
TO_CPU	invalidate	invalidate*	invalidate	invalidate*
BIDIR	writeback	invalidate	writeback	invalidate

* - only necessary if the CPU speculatively prefetches.

The multiple invalidations for the TO_CPU case handles different
conditions that can result in data corruption, and for some CPUs, all
four are necessary.

This is what is implemented for 32-bit ARM, depending on the CPU
capabilities, as we have DMA incoherent devices and we have CPUs that
speculatively prefetch data, and so may load data into the caches while
DMA is in operation.


Things get more interesting if the implementation behind the DMA API has
to copy data between the buffer supplied to the mapping and some DMA
accessible buffer:

	map		for_cpu		for_device	unmap
TO_DEV	copy to dma	none		copy to dma	none
TO_CPU	none		copy to cpu	none		copy to cpu
BIDIR	copy to dma	copy to cpu	copy to dma	copy to cpu

So, in both cases, the value of the direction argument defines what you
need to do in each call.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
  2018-05-18 17:50       ` Russell King - ARM Linux
@ 2018-05-18 19:57         ` Alexey Brodkin
  2018-05-18 21:33           ` Russell King - ARM Linux
  2018-05-18 20:35         ` Vineet Gupta
  1 sibling, 1 reply; 7+ messages in thread
From: Alexey Brodkin @ 2018-05-18 19:57 UTC (permalink / raw)
  To: linux
  Cc: deanbo422, linux-sh, linux-kernel, nios2-dev, linux-xtensa,
	linux-m68k, linux-mm, linux-hexagon, hch, linux-alpha,
	linux-snps-arc, iommu, green.hu, Vineet.Gupta1, akpm, openrisc,
	linux-arm-kernel, monstr, linux-parisc, linux-c6x-dev,
	linux-arch, sparclinux

Hi Russel,

On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote:
> It's necessary.  Take a moment to think carefully about this:
> 
>         dma_map_single(, dir)
> 
>         dma_sync_single_for_cpu(, dir)
> 
>         dma_sync_single_for_device(, dir)
> 
>         dma_unmap_single(, dir)
> 
> In the case of a DMA-incoherent architecture, the operations done at each
> stage depend on the direction argument:
> 
>         map             for_cpu         for_device      unmap
> TO_DEV  writeback       none            writeback       none
> TO_CPU  invalidate      invalidate*     invalidate      invalidate*
> BIDIR   writeback       invalidate      writeback       invalidate
> 
> * - only necessary if the CPU speculatively prefetches.

I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even
if CPU doesn't preferch - what if we reuse the same buffer for multiple
reads from DMA device?

> The multiple invalidations for the TO_CPU case handles different
> conditions that can result in data corruption, and for some CPUs, all
> four are necessary.

I would agree that map()/unmap() a quite a special cases and so depending
on direction we need to execute in them either for_cpu() or for_device()
call-backs depending on direction.

As for invalidation in case of for_device(TO_CPU) I still don't see
a rationale behind it. Would be interesting to see a real example where
we benefit from this.

> This is what is implemented for 32-bit ARM, depending on the CPU
> capabilities, as we have DMA incoherent devices and we have CPUs that
> speculatively prefetch data, and so may load data into the caches while
> DMA is in operation.
> 
> 
> Things get more interesting if the implementation behind the DMA API has
> to copy data between the buffer supplied to the mapping and some DMA
> accessible buffer:
> 
>         map             for_cpu         for_device      unmap
> TO_DEV  copy to dma     none            copy to dma     none
> TO_CPU  none            copy to cpu     none            copy to cpu
> BIDIR   copy to dma     copy to cpu     copy to dma     copy to cpu
> 
> So, in both cases, the value of the direction argument defines what you
> need to do in each call.

Interesting enough in your seond table (which describes more complicated
case indeed) you set "none" for for_device(TO_CPU) which looks logical
to me.

So IMHO that's what make sense:
---------------------------->8-----------------------------
        map             for_cpu         for_device      unmap
TO_DEV  writeback       none            writeback       none
TO_CPU  none            invalidate      none            invalidate*
BIDIR   writeback       invalidate      writeback       invalidate*
---------------------------->8-----------------------------

* is the case for prefetching CPU.

-Alexey

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
  2018-05-18 17:50       ` Russell King - ARM Linux
  2018-05-18 19:57         ` Alexey Brodkin
@ 2018-05-18 20:35         ` Vineet Gupta
  2018-05-18 21:55           ` Russell King - ARM Linux
  1 sibling, 1 reply; 7+ messages in thread
From: Vineet Gupta @ 2018-05-18 20:35 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Alexey Brodkin, hch, linux-arch, linux-xtensa, monstr, deanbo422,
	linux-c6x-dev, linux-parisc, linux-sh, linux-m68k, linux-hexagon,
	linux-kernel, linux-mm, iommu, openrisc, green.hu, linux-alpha,
	sparclinux, nios2-dev, Andrew Morton, linux-snps-arc,
	linux-arm-kernel

On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote:
> On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote:
>> I never understood the need for this direction. And if memory serves me
>> right, at that time I was seeing twice the amount of cache flushing !
> It's necessary.  Take a moment to think carefully about this:
>
> 	dma_map_single(, dir)
>
> 	dma_sync_single_for_cpu(, dir)
>
> 	dma_sync_single_for_device(, dir)
>
> 	dma_unmap_single(, dir)

As an aside, do these imply a state machine of sorts - does a driver needs to 
always call map_single first ?

My original point of contention/confusion is the specific combinations of API and 
direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU)

Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non dma 
coherent arch.
Your tables below have "none" for both, implying it is unlikely to be a real 
combination (for ARM and ARC atleast).

The other case, actually @dir TO_CPU, independent of for_{cpu, device}A  implies 
driver intends to touch it after the call, so it would invalidate any stray lines, 
unconditionally (and not just for speculative prefetch case).


> In the case of a DMA-incoherent architecture, the operations done at each
> stage depend on the direction argument:
>
> 	map		for_cpu		for_device	unmap
> TO_DEV	writeback	none		writeback	none
> TO_CPU	invalidate	invalidate*	invalidate	invalidate*
> BIDIR	writeback	invalidate	writeback	invalidate
>
> * - only necessary if the CPU speculatively prefetches.
>
> The multiple invalidations for the TO_CPU case handles different
> conditions that can result in data corruption, and for some CPUs, all
> four are necessary.

Can you please explain in some more detail, TO_CPU row, why invalidate is 
conditional sometimes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
  2018-05-18 19:57         ` Alexey Brodkin
@ 2018-05-18 21:33           ` Russell King - ARM Linux
  0 siblings, 0 replies; 7+ messages in thread
From: Russell King - ARM Linux @ 2018-05-18 21:33 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: deanbo422, linux-sh, linux-kernel, nios2-dev, linux-xtensa,
	linux-m68k, linux-mm, linux-hexagon, hch, linux-alpha,
	linux-snps-arc, iommu, green.hu, Vineet.Gupta1, akpm, openrisc,
	linux-arm-kernel, monstr, linux-parisc, linux-c6x-dev,
	linux-arch, sparclinux

On Fri, May 18, 2018 at 07:57:34PM +0000, Alexey Brodkin wrote:
> Hi Russel,

That's Russell.

> On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote:
> > It's necessary.  Take a moment to think carefully about this:
> > 
> >         dma_map_single(, dir)
> > 
> >         dma_sync_single_for_cpu(, dir)
> > 
> >         dma_sync_single_for_device(, dir)
> > 
> >         dma_unmap_single(, dir)
> > 
> > In the case of a DMA-incoherent architecture, the operations done at each
> > stage depend on the direction argument:
> > 
> >         map             for_cpu         for_device      unmap
> > TO_DEV  writeback       none            writeback       none
> > TO_CPU  invalidate      invalidate*     invalidate      invalidate*
> > BIDIR   writeback       invalidate      writeback       invalidate
> > 
> > * - only necessary if the CPU speculatively prefetches.
> 
> I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even
> if CPU doesn't preferch - what if we reuse the same buffer for multiple
> reads from DMA device?

That's fine - for non-coherent DMA, the CPU caches will only end up
containing data for that memory if:
- the CPU speculatively fetches data from that memory, or
- the CPU explicitly touches that memory

> > The multiple invalidations for the TO_CPU case handles different
> > conditions that can result in data corruption, and for some CPUs, all
> > four are necessary.
> 
> I would agree that map()/unmap() a quite a special cases and so depending
> on direction we need to execute in them either for_cpu() or for_device()
> call-backs depending on direction.
> 
> As for invalidation in case of for_device(TO_CPU) I still don't see
> a rationale behind it. Would be interesting to see a real example where
> we benefit from this.

Yes, you could avoid that, but depending how you structure the
architecture implementation, it can turn out to be a corner case.
The above table is precisely how 32-bit ARM is implemented, because
the way we implement them is based on who owns the memory - the "map"
and "for_device" operations translate internally to a cpu-to-device
ownership transition of the buffer.  Similar for "unmap" and "to_cpu".
It basically avoids having to add additional functions at the lower
implementation levels.

> > Things get more interesting if the implementation behind the DMA API has
> > to copy data between the buffer supplied to the mapping and some DMA
> > accessible buffer:
> > 
> >         map             for_cpu         for_device      unmap
> > TO_DEV  copy to dma     none            copy to dma     none
> > TO_CPU  none            copy to cpu     none            copy to cpu
> > BIDIR   copy to dma     copy to cpu     copy to dma     copy to cpu
> > 
> > So, in both cases, the value of the direction argument defines what you
> > need to do in each call.
> 
> Interesting enough in your seond table (which describes more complicated
> case indeed) you set "none" for for_device(TO_CPU) which looks logical
> to me.
> 
> So IMHO that's what make sense:
> ---------------------------->8-----------------------------
>         map             for_cpu         for_device      unmap
> TO_DEV  writeback       none            writeback       none
> TO_CPU  none            invalidate      none            invalidate*
> BIDIR   writeback       invalidate      writeback       invalidate*
> ---------------------------->8-----------------------------

That doesn't make sense for the TO_CPU case.  If the caches contain
dirty cache lines, and you're DMAing from the device to the system
RAM, other system activity can cause the dirty cache lines to be
evicted (written back) to memory which the DMA has already overwritten.
The result is data corruption.  So, you really can't have "none" in
the "map" case there.

Given that, the "for_cpu" case becomes dependent on whether the CPU
speculatively prefetches.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
  2018-05-18 20:35         ` Vineet Gupta
@ 2018-05-18 21:55           ` Russell King - ARM Linux
  0 siblings, 0 replies; 7+ messages in thread
From: Russell King - ARM Linux @ 2018-05-18 21:55 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Alexey Brodkin, hch, linux-arch, linux-xtensa, monstr, deanbo422,
	linux-c6x-dev, linux-parisc, linux-sh, linux-m68k, linux-hexagon,
	linux-kernel, linux-mm, iommu, openrisc, green.hu, linux-alpha,
	sparclinux, nios2-dev, Andrew Morton, linux-snps-arc,
	linux-arm-kernel

On Fri, May 18, 2018 at 01:35:08PM -0700, Vineet Gupta wrote:
> On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote:
> >On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote:
> >>I never understood the need for this direction. And if memory serves me
> >>right, at that time I was seeing twice the amount of cache flushing !
> >It's necessary.  Take a moment to think carefully about this:
> >
> >	dma_map_single(, dir)
> >
> >	dma_sync_single_for_cpu(, dir)
> >
> >	dma_sync_single_for_device(, dir)
> >
> >	dma_unmap_single(, dir)
> 
> As an aside, do these imply a state machine of sorts - does a driver needs
> to always call map_single first ?

Kind-of, but some drivers do omit some of the dma_sync_*() calls.
For example, if a buffer is written to, then mapped with TO_DEVICE,
and then the CPU wishes to write to it, it's fairly common that a
driver omits the dma_sync_single_for_cpu() call.  If you think about
the cases I gave and what cache operations happen, such a scenario
practically turns out to be safe.

> My original point of contention/confusion is the specific combinations of
> API and direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU)

Remember that it is expected that all calls for a mapping use the
same direction argument while that mapping exists.  In other words,
if you call dma_map_single(TO_DEVICE) and then use any of the other
functions, the other functions will also use TO_DEVICE.  The DMA
direction argument describes the direction of the DMA operation
being performed on the buffer, not on the individual dma_* operation.

What isn't expected at arch level is for drivers to do:

	dma_map_single(TO_DEVICE)
	dma_sync_single_for_cpu(FROM_DEVICE)

or vice versa.

> Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non
> dma coherent arch.
> 
> Your tables below have "none" for both, implying it is unlikely to be a real
> combination (for ARM and ARC atleast).

Very little for the cases that I've stated (and as I mentioned
above, some drivers do omit the call in that case.)

> The other case, actually @dir TO_CPU, independent of for_{cpu, device} 
> implies driver intends to touch it after the call, so it would invalidate
> any stray lines, unconditionally (and not just for speculative prefetch
> case).

If you don't have a CPU that speculatively prefetches, and you've
already had to invalidate the cache lines (to avoid write-backs
corrupting DMA'd data) then there's no need for the architecture
to do any work at the for_cpu(TO_CPU) case - the CPU shouldn't
be touching cache lines that are part of the buffer while it is
mapped, which means a non-speculating CPU won't pull in any
cache lines without an explicit access.

Speculating CPUs are different.  The action of the speculation is
to try and guess what data the program wants to access ahead of
the program flow.  That causes the CPU to prefetch data into the
cache.  The point in the program flow that this happens is not
really determinant to the programmer.  This means that if you try
to read from the DMA buffer after the DMA operation has complete
without invalidating the cache between the DMA completing and the
CPU reading, you have no guarantee that you're reading the data
that the DMA operation has been written.  The cache may have
loaded itself with data before the DMA operation completed, and
the CPU may see that stale data.

The difference between non-speculating CPUs and speculating CPUs
is that for non-speculating CPUs, caches work according to explicit
accesses by the program, and the program is stalled while the data
is fetched from external memory.  Speculating CPUs try to predict
ahead of time what data the program will require in the future,
and attempt to load that data into the caches _before_ the program
requires it - which means that the program suffers fewer stalls.

> >In the case of a DMA-incoherent architecture, the operations done at each
> >stage depend on the direction argument:
> >
> >	map		for_cpu		for_device	unmap
> >TO_DEV	writeback	none		writeback	none
> >TO_CPU	invalidate	invalidate*	invalidate	invalidate*
> >BIDIR	writeback	invalidate	writeback	invalidate
> >
> >* - only necessary if the CPU speculatively prefetches.
> >
> >The multiple invalidations for the TO_CPU case handles different
> >conditions that can result in data corruption, and for some CPUs, all
> >four are necessary.
> 
> Can you please explain in some more detail, TO_CPU row, why invalidate is
> conditional sometimes.

See above - I hope my explanation above is sufficient.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-18 21:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20180511075945.16548-1-hch@lst.de>
     [not found] ` <20180511075945.16548-3-hch@lst.de>
     [not found]   ` <bad125dff49f6e49c895e818c9d1abb346a46e8e.camel@synopsys.com>
2018-05-18 17:20     ` dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation) Vineet Gupta
2018-05-18 17:50       ` Russell King - ARM Linux
2018-05-18 19:57         ` Alexey Brodkin
2018-05-18 21:33           ` Russell King - ARM Linux
2018-05-18 20:35         ` Vineet Gupta
2018-05-18 21:55           ` Russell King - ARM Linux
     [not found]     ` <20180518132731.GA31125@lst.de>
2018-05-18 17:28       ` [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).