From: Russell King - ARM Linux <linux@armlinux.org.uk> To: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>, "hch@lst.de" <hch@lst.de>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-xtensa@linux-xtensa.org" <linux-xtensa@linux-xtensa.org>, "monstr@monstr.eu" <monstr@monstr.eu>, "deanbo422@gmail.com" <deanbo422@gmail.com>, "linux-c6x-dev@linux-c6x.org" <linux-c6x-dev@linux-c6x.org>, "linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>, "linux-sh@vger.kernel.org" <linux-sh@vger.kernel.org>, "linux-m68k@lists.linux-m68k.org" <linux-m68k@lists.linux-m68k.org>, "linux-hexagon@vger.kernel.org" <linux-hexagon@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>, openrisc@lis Subject: Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation) Date: Fri, 18 May 2018 22:55:48 +0100 [thread overview] Message-ID: <20180518215548.GH17671@n2100.armlinux.org.uk> (raw) In-Reply-To: <cecfe6bd-ef1f-1e25-bfcf-992d1f828efb@synopsys.com> On Fri, May 18, 2018 at 01:35:08PM -0700, Vineet Gupta wrote: > On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote: > >On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote: > >>I never understood the need for this direction. And if memory serves me > >>right, at that time I was seeing twice the amount of cache flushing ! > >It's necessary. Take a moment to think carefully about this: > > > > dma_map_single(, dir) > > > > dma_sync_single_for_cpu(, dir) > > > > dma_sync_single_for_device(, dir) > > > > dma_unmap_single(, dir) > > As an aside, do these imply a state machine of sorts - does a driver needs > to always call map_single first ? Kind-of, but some drivers do omit some of the dma_sync_*() calls. For example, if a buffer is written to, then mapped with TO_DEVICE, and then the CPU wishes to write to it, it's fairly common that a driver omits the dma_sync_single_for_cpu() call. If you think about the cases I gave and what cache operations happen, such a scenario practically turns out to be safe. > My original point of contention/confusion is the specific combinations of > API and direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU) Remember that it is expected that all calls for a mapping use the same direction argument while that mapping exists. In other words, if you call dma_map_single(TO_DEVICE) and then use any of the other functions, the other functions will also use TO_DEVICE. The DMA direction argument describes the direction of the DMA operation being performed on the buffer, not on the individual dma_* operation. What isn't expected at arch level is for drivers to do: dma_map_single(TO_DEVICE) dma_sync_single_for_cpu(FROM_DEVICE) or vice versa. > Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non > dma coherent arch. > > Your tables below have "none" for both, implying it is unlikely to be a real > combination (for ARM and ARC atleast). Very little for the cases that I've stated (and as I mentioned above, some drivers do omit the call in that case.) > The other case, actually @dir TO_CPU, independent of for_{cpu, device} > implies driver intends to touch it after the call, so it would invalidate > any stray lines, unconditionally (and not just for speculative prefetch > case). If you don't have a CPU that speculatively prefetches, and you've already had to invalidate the cache lines (to avoid write-backs corrupting DMA'd data) then there's no need for the architecture to do any work at the for_cpu(TO_CPU) case - the CPU shouldn't be touching cache lines that are part of the buffer while it is mapped, which means a non-speculating CPU won't pull in any cache lines without an explicit access. Speculating CPUs are different. The action of the speculation is to try and guess what data the program wants to access ahead of the program flow. That causes the CPU to prefetch data into the cache. The point in the program flow that this happens is not really determinant to the programmer. This means that if you try to read from the DMA buffer after the DMA operation has complete without invalidating the cache between the DMA completing and the CPU reading, you have no guarantee that you're reading the data that the DMA operation has been written. The cache may have loaded itself with data before the DMA operation completed, and the CPU may see that stale data. The difference between non-speculating CPUs and speculating CPUs is that for non-speculating CPUs, caches work according to explicit accesses by the program, and the program is stalled while the data is fetched from external memory. Speculating CPUs try to predict ahead of time what data the program will require in the future, and attempt to load that data into the caches _before_ the program requires it - which means that the program suffers fewer stalls. > >In the case of a DMA-incoherent architecture, the operations done at each > >stage depend on the direction argument: > > > > map for_cpu for_device unmap > >TO_DEV writeback none writeback none > >TO_CPU invalidate invalidate* invalidate invalidate* > >BIDIR writeback invalidate writeback invalidate > > > >* - only necessary if the CPU speculatively prefetches. > > > >The multiple invalidations for the TO_CPU case handles different > >conditions that can result in data corruption, and for some CPUs, all > >four are necessary. > > Can you please explain in some more detail, TO_CPU row, why invalidate is > conditional sometimes. See above - I hope my explanation above is sufficient. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux <linux@armlinux.org.uk> To: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>, "hch@lst.de" <hch@lst.de>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-xtensa@linux-xtensa.org" <linux-xtensa@linux-xtensa.org>, "monstr@monstr.eu" <monstr@monstr.eu>, "deanbo422@gmail.com" <deanbo422@gmail.com>, "linux-c6x-dev@linux-c6x.org" <linux-c6x-dev@linux-c6x.org>, "linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>, "linux-sh@vger.kernel.org" <linux-sh@vger.kernel.org>, "linux-m68k@lists.linux-m68k.org" <linux-m68k@lists.linux-m68k.org>, "linux-hexagon@vger.kernel.org" <linux-hexagon@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>, "openrisc@lists.librecores.org" <openrisc@lists.librecores.org>, "green.hu@gmail.com" <green.hu@gmail.com>, "linux-alpha@vger.kernel.org" <linux-alpha@vger.kernel.org>, "sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>, "nios2-dev@lists.rocketboards.org" <nios2-dev@lists.rocketboards.org>, Andrew Morton <akpm@linux-foundation.org>, "linux-snps-arc@lists.infradead.org" <linux-snps-arc@lists.infradead.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Subject: Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation) Date: Fri, 18 May 2018 22:55:48 +0100 [thread overview] Message-ID: <20180518215548.GH17671@n2100.armlinux.org.uk> (raw) Message-ID: <20180518215548.BRhzaJYfiDYS02MpjOFb5mzHCbYYObSHEGuohJ7B09k@z> (raw) In-Reply-To: <cecfe6bd-ef1f-1e25-bfcf-992d1f828efb@synopsys.com> On Fri, May 18, 2018 at 01:35:08PM -0700, Vineet Gupta wrote: > On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote: > >On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote: > >>I never understood the need for this direction. And if memory serves me > >>right, at that time I was seeing twice the amount of cache flushing ! > >It's necessary. Take a moment to think carefully about this: > > > > dma_map_single(, dir) > > > > dma_sync_single_for_cpu(, dir) > > > > dma_sync_single_for_device(, dir) > > > > dma_unmap_single(, dir) > > As an aside, do these imply a state machine of sorts - does a driver needs > to always call map_single first ? Kind-of, but some drivers do omit some of the dma_sync_*() calls. For example, if a buffer is written to, then mapped with TO_DEVICE, and then the CPU wishes to write to it, it's fairly common that a driver omits the dma_sync_single_for_cpu() call. If you think about the cases I gave and what cache operations happen, such a scenario practically turns out to be safe. > My original point of contention/confusion is the specific combinations of > API and direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU) Remember that it is expected that all calls for a mapping use the same direction argument while that mapping exists. In other words, if you call dma_map_single(TO_DEVICE) and then use any of the other functions, the other functions will also use TO_DEVICE. The DMA direction argument describes the direction of the DMA operation being performed on the buffer, not on the individual dma_* operation. What isn't expected at arch level is for drivers to do: dma_map_single(TO_DEVICE) dma_sync_single_for_cpu(FROM_DEVICE) or vice versa. > Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non > dma coherent arch. > > Your tables below have "none" for both, implying it is unlikely to be a real > combination (for ARM and ARC atleast). Very little for the cases that I've stated (and as I mentioned above, some drivers do omit the call in that case.) > The other case, actually @dir TO_CPU, independent of for_{cpu, device} > implies driver intends to touch it after the call, so it would invalidate > any stray lines, unconditionally (and not just for speculative prefetch > case). If you don't have a CPU that speculatively prefetches, and you've already had to invalidate the cache lines (to avoid write-backs corrupting DMA'd data) then there's no need for the architecture to do any work at the for_cpu(TO_CPU) case - the CPU shouldn't be touching cache lines that are part of the buffer while it is mapped, which means a non-speculating CPU won't pull in any cache lines without an explicit access. Speculating CPUs are different. The action of the speculation is to try and guess what data the program wants to access ahead of the program flow. That causes the CPU to prefetch data into the cache. The point in the program flow that this happens is not really determinant to the programmer. This means that if you try to read from the DMA buffer after the DMA operation has complete without invalidating the cache between the DMA completing and the CPU reading, you have no guarantee that you're reading the data that the DMA operation has been written. The cache may have loaded itself with data before the DMA operation completed, and the CPU may see that stale data. The difference between non-speculating CPUs and speculating CPUs is that for non-speculating CPUs, caches work according to explicit accesses by the program, and the program is stalled while the data is fetched from external memory. Speculating CPUs try to predict ahead of time what data the program will require in the future, and attempt to load that data into the caches _before_ the program requires it - which means that the program suffers fewer stalls. > >In the case of a DMA-incoherent architecture, the operations done at each > >stage depend on the direction argument: > > > > map for_cpu for_device unmap > >TO_DEV writeback none writeback none > >TO_CPU invalidate invalidate* invalidate invalidate* > >BIDIR writeback invalidate writeback invalidate > > > >* - only necessary if the CPU speculatively prefetches. > > > >The multiple invalidations for the TO_CPU case handles different > >conditions that can result in data corruption, and for some CPUs, all > >four are necessary. > > Can you please explain in some more detail, TO_CPU row, why invalidate is > conditional sometimes. See above - I hope my explanation above is sufficient. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
next prev parent reply other threads:[~2018-05-18 21:55 UTC|newest] Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-11 7:59 common non-cache coherent direct dma mapping ops Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 01/20] dma-mapping: simplify Kconfig dependencies Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig [not found] ` <20180511075945.16548-1-hch-jcswGhMUV9g@public.gmane.org> 2018-05-11 7:59 ` [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig [not found] ` <20180511075945.16548-3-hch-jcswGhMUV9g@public.gmane.org> 2018-05-18 13:03 ` Alexey Brodkin 2018-05-18 13:03 ` Alexey Brodkin [not found] ` <bad125dff49f6e49c895e818c9d1abb346a46e8e.camel-HKixBCOQz3hWk0Htik3J/w@public.gmane.org> 2018-05-18 13:27 ` hch-jcswGhMUV9g 2018-05-18 13:27 ` hch [not found] ` <20180518132731.GA31125-jcswGhMUV9g@public.gmane.org> 2018-05-18 14:13 ` Alexey Brodkin 2018-05-18 14:13 ` Alexey Brodkin 2018-05-18 17:28 ` Vineet Gupta 2018-05-18 17:28 ` Vineet Gupta 2018-05-18 17:20 ` dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation) Vineet Gupta 2018-05-18 17:20 ` Vineet Gupta 2018-05-18 17:50 ` Russell King - ARM Linux 2018-05-18 17:50 ` Russell King - ARM Linux [not found] ` <20180518175004.GF17671-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org> 2018-05-18 19:57 ` Alexey Brodkin 2018-05-18 19:57 ` Alexey Brodkin [not found] ` <182840dedb4890a88c672b1c5d556920bf89a8fb.camel-HKixBCOQz3hWk0Htik3J/w@public.gmane.org> 2018-05-18 21:33 ` Russell King - ARM Linux 2018-05-18 21:33 ` Russell King - ARM Linux 2018-05-18 20:35 ` Vineet Gupta 2018-05-18 20:35 ` Vineet Gupta 2018-05-18 21:55 ` Russell King - ARM Linux [this message] 2018-05-18 21:55 ` Russell King - ARM Linux 2018-05-18 20:05 ` [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation Helge Deller 2018-05-18 20:05 ` Helge Deller [not found] ` <0c5d27e9-2799-eb38-8b09-47a04c48b5c7-Mmb7MZpHnFY@public.gmane.org> 2018-05-19 6:38 ` hch-jcswGhMUV9g 2018-05-19 6:38 ` hch 2018-05-11 7:59 ` [PATCH 03/20] arc: use generic dma_noncoherent_ops Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig [not found] ` <20180511075945.16548-4-hch-jcswGhMUV9g@public.gmane.org> 2018-05-11 12:44 ` Alexey Brodkin 2018-05-11 12:44 ` Alexey Brodkin 2018-05-11 7:59 ` [PATCH 04/20] arm-nommu: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig [not found] ` <20180511075945.16548-5-hch-jcswGhMUV9g@public.gmane.org> 2018-05-11 9:11 ` Russell King - ARM Linux 2018-05-11 9:11 ` Russell King - ARM Linux [not found] ` <20180511091114.GA16141-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org> 2018-05-22 11:53 ` Christoph Hellwig 2018-05-22 11:53 ` Christoph Hellwig 2018-05-11 13:56 ` John Garry 2018-05-11 13:56 ` John Garry 2018-05-11 7:59 ` [PATCH 05/20] c6x: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig [not found] ` <20180511075945.16548-6-hch-jcswGhMUV9g@public.gmane.org> 2018-05-15 0:25 ` [Linux-c6x-dev] " Mark Salter 2018-05-15 0:25 ` Mark Salter 2018-05-11 7:59 ` [PATCH 06/20] hexagon: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 07/20] m68k: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 08/20] microblaze: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 09/20] microblaze: remove the consistent_sync and consistent_sync_page Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 10/20] nds32: use generic dma_noncoherent_ops Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 11/20] nios2: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 12/20] openrisc: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 13/20] sh: simplify get_arch_dma_ops Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 14/20] sh: introduce a sh_cacheop_vaddr helper Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 15/20] sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 16/20] mm: split arch/sh/mm/consistent.c Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 17/20] sh: use generic dma_noncoherent_ops Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 18/20] xtensa: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 19/20] sparc: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-11 7:59 ` [PATCH 20/20] parisc: " Christoph Hellwig 2018-05-11 7:59 ` Christoph Hellwig 2018-05-13 13:26 ` common non-cache coherent direct dma mapping ops Helge Deller 2018-05-13 13:26 ` Helge Deller
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180518215548.GH17671@n2100.armlinux.org.uk \ --to=linux@armlinux.org.uk \ --cc=Alexey.Brodkin@synopsys.com \ --cc=Vineet.Gupta1@synopsys.com \ --cc=deanbo422@gmail.com \ --cc=hch@lst.de \ --cc=iommu@lists.linux-foundation.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-c6x-dev@linux-c6x.org \ --cc=linux-hexagon@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-m68k@lists.linux-m68k.org \ --cc=linux-mm@kvack.org \ --cc=linux-parisc@vger.kernel.org \ --cc=linux-sh@vger.kernel.org \ --cc=linux-xtensa@linux-xtensa.org \ --cc=monstr@monstr.eu \ --cc=openrisc@lis \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).