linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell King - ARM Linux <linux@armlinux.org.uk>
To: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>,
	"hch@lst.de" <hch@lst.de>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-xtensa@linux-xtensa.org" <linux-xtensa@linux-xtensa.org>,
	"monstr@monstr.eu" <monstr@monstr.eu>,
	"deanbo422@gmail.com" <deanbo422@gmail.com>,
	"linux-c6x-dev@linux-c6x.org" <linux-c6x-dev@linux-c6x.org>,
	"linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>,
	"linux-sh@vger.kernel.org" <linux-sh@vger.kernel.org>,
	"linux-m68k@lists.linux-m68k.org"
	<linux-m68k@lists.linux-m68k.org>,
	"linux-hexagon@vger.kernel.org" <linux-hexagon@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	openrisc@lis
Subject: Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
Date: Fri, 18 May 2018 22:55:48 +0100	[thread overview]
Message-ID: <20180518215548.GH17671@n2100.armlinux.org.uk> (raw)
In-Reply-To: <cecfe6bd-ef1f-1e25-bfcf-992d1f828efb@synopsys.com>

On Fri, May 18, 2018 at 01:35:08PM -0700, Vineet Gupta wrote:
> On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote:
> >On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote:
> >>I never understood the need for this direction. And if memory serves me
> >>right, at that time I was seeing twice the amount of cache flushing !
> >It's necessary.  Take a moment to think carefully about this:
> >
> >	dma_map_single(, dir)
> >
> >	dma_sync_single_for_cpu(, dir)
> >
> >	dma_sync_single_for_device(, dir)
> >
> >	dma_unmap_single(, dir)
> 
> As an aside, do these imply a state machine of sorts - does a driver needs
> to always call map_single first ?

Kind-of, but some drivers do omit some of the dma_sync_*() calls.
For example, if a buffer is written to, then mapped with TO_DEVICE,
and then the CPU wishes to write to it, it's fairly common that a
driver omits the dma_sync_single_for_cpu() call.  If you think about
the cases I gave and what cache operations happen, such a scenario
practically turns out to be safe.

> My original point of contention/confusion is the specific combinations of
> API and direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU)

Remember that it is expected that all calls for a mapping use the
same direction argument while that mapping exists.  In other words,
if you call dma_map_single(TO_DEVICE) and then use any of the other
functions, the other functions will also use TO_DEVICE.  The DMA
direction argument describes the direction of the DMA operation
being performed on the buffer, not on the individual dma_* operation.

What isn't expected at arch level is for drivers to do:

	dma_map_single(TO_DEVICE)
	dma_sync_single_for_cpu(FROM_DEVICE)

or vice versa.

> Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non
> dma coherent arch.
> 
> Your tables below have "none" for both, implying it is unlikely to be a real
> combination (for ARM and ARC atleast).

Very little for the cases that I've stated (and as I mentioned
above, some drivers do omit the call in that case.)

> The other case, actually @dir TO_CPU, independent of for_{cpu, device} 
> implies driver intends to touch it after the call, so it would invalidate
> any stray lines, unconditionally (and not just for speculative prefetch
> case).

If you don't have a CPU that speculatively prefetches, and you've
already had to invalidate the cache lines (to avoid write-backs
corrupting DMA'd data) then there's no need for the architecture
to do any work at the for_cpu(TO_CPU) case - the CPU shouldn't
be touching cache lines that are part of the buffer while it is
mapped, which means a non-speculating CPU won't pull in any
cache lines without an explicit access.

Speculating CPUs are different.  The action of the speculation is
to try and guess what data the program wants to access ahead of
the program flow.  That causes the CPU to prefetch data into the
cache.  The point in the program flow that this happens is not
really determinant to the programmer.  This means that if you try
to read from the DMA buffer after the DMA operation has complete
without invalidating the cache between the DMA completing and the
CPU reading, you have no guarantee that you're reading the data
that the DMA operation has been written.  The cache may have
loaded itself with data before the DMA operation completed, and
the CPU may see that stale data.

The difference between non-speculating CPUs and speculating CPUs
is that for non-speculating CPUs, caches work according to explicit
accesses by the program, and the program is stalled while the data
is fetched from external memory.  Speculating CPUs try to predict
ahead of time what data the program will require in the future,
and attempt to load that data into the caches _before_ the program
requires it - which means that the program suffers fewer stalls.

> >In the case of a DMA-incoherent architecture, the operations done at each
> >stage depend on the direction argument:
> >
> >	map		for_cpu		for_device	unmap
> >TO_DEV	writeback	none		writeback	none
> >TO_CPU	invalidate	invalidate*	invalidate	invalidate*
> >BIDIR	writeback	invalidate	writeback	invalidate
> >
> >* - only necessary if the CPU speculatively prefetches.
> >
> >The multiple invalidations for the TO_CPU case handles different
> >conditions that can result in data corruption, and for some CPUs, all
> >four are necessary.
> 
> Can you please explain in some more detail, TO_CPU row, why invalidate is
> conditional sometimes.

See above - I hope my explanation above is sufficient.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux <linux@armlinux.org.uk>
To: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>,
	"hch@lst.de" <hch@lst.de>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-xtensa@linux-xtensa.org" <linux-xtensa@linux-xtensa.org>,
	"monstr@monstr.eu" <monstr@monstr.eu>,
	"deanbo422@gmail.com" <deanbo422@gmail.com>,
	"linux-c6x-dev@linux-c6x.org" <linux-c6x-dev@linux-c6x.org>,
	"linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>,
	"linux-sh@vger.kernel.org" <linux-sh@vger.kernel.org>,
	"linux-m68k@lists.linux-m68k.org"
	<linux-m68k@lists.linux-m68k.org>,
	"linux-hexagon@vger.kernel.org" <linux-hexagon@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"openrisc@lists.librecores.org" <openrisc@lists.librecores.org>,
	"green.hu@gmail.com" <green.hu@gmail.com>,
	"linux-alpha@vger.kernel.org" <linux-alpha@vger.kernel.org>,
	"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>,
	"nios2-dev@lists.rocketboards.org"
	<nios2-dev@lists.rocketboards.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-snps-arc@lists.infradead.org"
	<linux-snps-arc@lists.infradead.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
Date: Fri, 18 May 2018 22:55:48 +0100	[thread overview]
Message-ID: <20180518215548.GH17671@n2100.armlinux.org.uk> (raw)
Message-ID: <20180518215548.BRhzaJYfiDYS02MpjOFb5mzHCbYYObSHEGuohJ7B09k@z> (raw)
In-Reply-To: <cecfe6bd-ef1f-1e25-bfcf-992d1f828efb@synopsys.com>

On Fri, May 18, 2018 at 01:35:08PM -0700, Vineet Gupta wrote:
> On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote:
> >On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote:
> >>I never understood the need for this direction. And if memory serves me
> >>right, at that time I was seeing twice the amount of cache flushing !
> >It's necessary.  Take a moment to think carefully about this:
> >
> >	dma_map_single(, dir)
> >
> >	dma_sync_single_for_cpu(, dir)
> >
> >	dma_sync_single_for_device(, dir)
> >
> >	dma_unmap_single(, dir)
> 
> As an aside, do these imply a state machine of sorts - does a driver needs
> to always call map_single first ?

Kind-of, but some drivers do omit some of the dma_sync_*() calls.
For example, if a buffer is written to, then mapped with TO_DEVICE,
and then the CPU wishes to write to it, it's fairly common that a
driver omits the dma_sync_single_for_cpu() call.  If you think about
the cases I gave and what cache operations happen, such a scenario
practically turns out to be safe.

> My original point of contention/confusion is the specific combinations of
> API and direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU)

Remember that it is expected that all calls for a mapping use the
same direction argument while that mapping exists.  In other words,
if you call dma_map_single(TO_DEVICE) and then use any of the other
functions, the other functions will also use TO_DEVICE.  The DMA
direction argument describes the direction of the DMA operation
being performed on the buffer, not on the individual dma_* operation.

What isn't expected at arch level is for drivers to do:

	dma_map_single(TO_DEVICE)
	dma_sync_single_for_cpu(FROM_DEVICE)

or vice versa.

> Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non
> dma coherent arch.
> 
> Your tables below have "none" for both, implying it is unlikely to be a real
> combination (for ARM and ARC atleast).

Very little for the cases that I've stated (and as I mentioned
above, some drivers do omit the call in that case.)

> The other case, actually @dir TO_CPU, independent of for_{cpu, device} 
> implies driver intends to touch it after the call, so it would invalidate
> any stray lines, unconditionally (and not just for speculative prefetch
> case).

If you don't have a CPU that speculatively prefetches, and you've
already had to invalidate the cache lines (to avoid write-backs
corrupting DMA'd data) then there's no need for the architecture
to do any work at the for_cpu(TO_CPU) case - the CPU shouldn't
be touching cache lines that are part of the buffer while it is
mapped, which means a non-speculating CPU won't pull in any
cache lines without an explicit access.

Speculating CPUs are different.  The action of the speculation is
to try and guess what data the program wants to access ahead of
the program flow.  That causes the CPU to prefetch data into the
cache.  The point in the program flow that this happens is not
really determinant to the programmer.  This means that if you try
to read from the DMA buffer after the DMA operation has complete
without invalidating the cache between the DMA completing and the
CPU reading, you have no guarantee that you're reading the data
that the DMA operation has been written.  The cache may have
loaded itself with data before the DMA operation completed, and
the CPU may see that stale data.

The difference between non-speculating CPUs and speculating CPUs
is that for non-speculating CPUs, caches work according to explicit
accesses by the program, and the program is stalled while the data
is fetched from external memory.  Speculating CPUs try to predict
ahead of time what data the program will require in the future,
and attempt to load that data into the caches _before_ the program
requires it - which means that the program suffers fewer stalls.

> >In the case of a DMA-incoherent architecture, the operations done at each
> >stage depend on the direction argument:
> >
> >	map		for_cpu		for_device	unmap
> >TO_DEV	writeback	none		writeback	none
> >TO_CPU	invalidate	invalidate*	invalidate	invalidate*
> >BIDIR	writeback	invalidate	writeback	invalidate
> >
> >* - only necessary if the CPU speculatively prefetches.
> >
> >The multiple invalidations for the TO_CPU case handles different
> >conditions that can result in data corruption, and for some CPUs, all
> >four are necessary.
> 
> Can you please explain in some more detail, TO_CPU row, why invalidate is
> conditional sometimes.

See above - I hope my explanation above is sufficient.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

  parent reply	other threads:[~2018-05-18 21:55 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-11  7:59 common non-cache coherent direct dma mapping ops Christoph Hellwig
2018-05-11  7:59 ` Christoph Hellwig
2018-05-11  7:59 ` [PATCH 01/20] dma-mapping: simplify Kconfig dependencies Christoph Hellwig
2018-05-11  7:59   ` Christoph Hellwig
     [not found] ` <20180511075945.16548-1-hch-jcswGhMUV9g@public.gmane.org>
2018-05-11  7:59   ` [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
     [not found]     ` <20180511075945.16548-3-hch-jcswGhMUV9g@public.gmane.org>
2018-05-18 13:03       ` Alexey Brodkin
2018-05-18 13:03         ` Alexey Brodkin
     [not found]         ` <bad125dff49f6e49c895e818c9d1abb346a46e8e.camel-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2018-05-18 13:27           ` hch-jcswGhMUV9g
2018-05-18 13:27             ` hch
     [not found]             ` <20180518132731.GA31125-jcswGhMUV9g@public.gmane.org>
2018-05-18 14:13               ` Alexey Brodkin
2018-05-18 14:13                 ` Alexey Brodkin
2018-05-18 17:28               ` Vineet Gupta
2018-05-18 17:28                 ` Vineet Gupta
2018-05-18 17:20           ` dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation) Vineet Gupta
2018-05-18 17:20             ` Vineet Gupta
2018-05-18 17:50             ` Russell King - ARM Linux
2018-05-18 17:50               ` Russell King - ARM Linux
     [not found]               ` <20180518175004.GF17671-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2018-05-18 19:57                 ` Alexey Brodkin
2018-05-18 19:57                   ` Alexey Brodkin
     [not found]                   ` <182840dedb4890a88c672b1c5d556920bf89a8fb.camel-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2018-05-18 21:33                     ` Russell King - ARM Linux
2018-05-18 21:33                       ` Russell King - ARM Linux
2018-05-18 20:35                 ` Vineet Gupta
2018-05-18 20:35                   ` Vineet Gupta
2018-05-18 21:55                   ` Russell King - ARM Linux [this message]
2018-05-18 21:55                     ` Russell King - ARM Linux
2018-05-18 20:05         ` [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation Helge Deller
2018-05-18 20:05           ` Helge Deller
     [not found]           ` <0c5d27e9-2799-eb38-8b09-47a04c48b5c7-Mmb7MZpHnFY@public.gmane.org>
2018-05-19  6:38             ` hch-jcswGhMUV9g
2018-05-19  6:38               ` hch
2018-05-11  7:59   ` [PATCH 03/20] arc: use generic dma_noncoherent_ops Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
     [not found]     ` <20180511075945.16548-4-hch-jcswGhMUV9g@public.gmane.org>
2018-05-11 12:44       ` Alexey Brodkin
2018-05-11 12:44         ` Alexey Brodkin
2018-05-11  7:59   ` [PATCH 04/20] arm-nommu: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
     [not found]     ` <20180511075945.16548-5-hch-jcswGhMUV9g@public.gmane.org>
2018-05-11  9:11       ` Russell King - ARM Linux
2018-05-11  9:11         ` Russell King - ARM Linux
     [not found]         ` <20180511091114.GA16141-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2018-05-22 11:53           ` Christoph Hellwig
2018-05-22 11:53             ` Christoph Hellwig
2018-05-11 13:56       ` John Garry
2018-05-11 13:56         ` John Garry
2018-05-11  7:59   ` [PATCH 05/20] c6x: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
     [not found]     ` <20180511075945.16548-6-hch-jcswGhMUV9g@public.gmane.org>
2018-05-15  0:25       ` [Linux-c6x-dev] " Mark Salter
2018-05-15  0:25         ` Mark Salter
2018-05-11  7:59   ` [PATCH 06/20] hexagon: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 07/20] m68k: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 08/20] microblaze: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 09/20] microblaze: remove the consistent_sync and consistent_sync_page Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 10/20] nds32: use generic dma_noncoherent_ops Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 11/20] nios2: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 12/20] openrisc: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 13/20] sh: simplify get_arch_dma_ops Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 14/20] sh: introduce a sh_cacheop_vaddr helper Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 15/20] sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 16/20] mm: split arch/sh/mm/consistent.c Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 17/20] sh: use generic dma_noncoherent_ops Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 18/20] xtensa: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 19/20] sparc: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 20/20] parisc: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-13 13:26 ` common non-cache coherent direct dma mapping ops Helge Deller
2018-05-13 13:26   ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180518215548.GH17671@n2100.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=Alexey.Brodkin@synopsys.com \
    --cc=Vineet.Gupta1@synopsys.com \
    --cc=deanbo422@gmail.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-c6x-dev@linux-c6x.org \
    --cc=linux-hexagon@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux-xtensa@linux-xtensa.org \
    --cc=monstr@monstr.eu \
    --cc=openrisc@lis \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).