From: Robin Murphy <robin.murphy@arm.com> To: "Toke Høiland-Jørgensen" <toke@toke.dk>, mbizon@freebox.fr, "Linus Torvalds" <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de>, Oleksandr Natalenko <oleksandr@natalenko.name>, Halil Pasic <pasic@linux.ibm.com>, Marek Szyprowski <m.szyprowski@samsung.com>, Kalle Valo <kvalo@kernel.org>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Olha Cherevyk <olha.cherevyk@gmail.com>, iommu <iommu@lists.linux-foundation.org>, linux-wireless <linux-wireless@vger.kernel.org>, Netdev <netdev@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable <stable@vger.kernel.org> Subject: Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP Date: Fri, 25 Mar 2022 16:45:39 +0000 [thread overview] Message-ID: <f4224721-4578-61d3-69a7-9a3a76c50529@arm.com> (raw) In-Reply-To: <87a6de80em.fsf@toke.dk> On 2022-03-25 16:25, Toke Høiland-Jørgensen wrote: > Maxime Bizon <mbizon@freebox.fr> writes: > >> On Thu, 2022-03-24 at 12:26 -0700, Linus Torvalds wrote: >> >>> >>> It's actually very natural in that situation to flush the caches from >>> the CPU side again. And so dma_sync_single_for_device() is a fairly >>> reasonable thing to do in that situation. >>> >> >> In the non-cache-coherent scenario, and assuming dma_map() did an >> initial cache invalidation, you can write this: >> >> rx_buffer_complete_1(buf) >> { >> invalidate_cache(buf, size) >> if (!is_ready(buf)) >> return; >> <proceed with receive> >> } >> >> or >> >> rx_buffer_complete_2(buf) >> { >> if (!is_ready(buf)) { >> invalidate_cache(buf, size) >> return; >> } >> <proceed with receive> >> } >> >> The latter is preferred for performance because dma_map() did the >> initial invalidate. >> >> Of course you could write: >> >> rx_buffer_complete_3(buf) >> { >> invalidate_cache(buf, size) >> if >> (!is_ready(buf)) { >> invalidate_cache(buf, size) >> return; >> } >> >> <proceed with receive> >> } >> >> >> but it's a waste of CPU cycles >> >> So I'd be very cautious assuming sync_for_cpu() and sync_for_device() >> are both doing invalidation in existing implementation of arch DMA ops, >> implementers may have taken some liberty around DMA-API to avoid >> unnecessary cache operation (not to blame them). > > I sense an implicit "and the driver can't (or shouldn't) influence > this" here, right? Right, drivers don't get a choice of how a given DMA API implementation works. >> For example looking at arch/arm/mm/dma-mapping.c, for DMA_FROM_DEVICE >> >> sync_single_for_device() >> => __dma_page_cpu_to_dev() >> => dma_cache_maint_page(op=dmac_map_area) >> => cpu_cache.dma_map_area() >> >> sync_single_for_cpu() >> => __dma_page_dev_to_cpu() >> => >> __dma_page_cpu_to_dev(op=dmac_unmap_area) >> => >> cpu_cache.dma_unmap_area() >> >> dma_map_area() always does cache invalidate. >> >> But for a couple of CPU variant, dma_unmap_area() is a noop, so >> sync_for_cpu() does nothing. >> >> Toke's patch will break ath9k on those platforms (mostly silent >> breakage, rx corruption leading to bad performance) > > Okay, so that would be bad obviously. So if I'm reading you correctly > (cf my question above), we can't fix this properly from the driver side, > and we should go with the partial SWIOTLB revert instead? Do you have any other way of telling if DMA is idle, or temporarily pausing it before the sync_for_cpu, such that you could honour the notion of ownership transfer properly? As mentioned elsewhere I suspect the only "real" fix if you really do need to allow concurrent access is to use the coherent DMA API for buffers rather than streaming mappings, but that's obviously some far more significant surgery. Robin.
WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com> To: "Toke Høiland-Jørgensen" <toke@toke.dk>, mbizon@freebox.fr, "Linus Torvalds" <torvalds@linux-foundation.org> Cc: Netdev <netdev@vger.kernel.org>, Kalle Valo <kvalo@kernel.org>, linux-wireless <linux-wireless@vger.kernel.org>, Oleksandr Natalenko <oleksandr@natalenko.name>, stable <stable@vger.kernel.org>, "David S. Miller" <davem@davemloft.net>, Halil Pasic <pasic@linux.ibm.com>, iommu <iommu@lists.linux-foundation.org>, Olha Cherevyk <olha.cherevyk@gmail.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Christoph Hellwig <hch@lst.de>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP Date: Fri, 25 Mar 2022 16:45:39 +0000 [thread overview] Message-ID: <f4224721-4578-61d3-69a7-9a3a76c50529@arm.com> (raw) In-Reply-To: <87a6de80em.fsf@toke.dk> On 2022-03-25 16:25, Toke Høiland-Jørgensen wrote: > Maxime Bizon <mbizon@freebox.fr> writes: > >> On Thu, 2022-03-24 at 12:26 -0700, Linus Torvalds wrote: >> >>> >>> It's actually very natural in that situation to flush the caches from >>> the CPU side again. And so dma_sync_single_for_device() is a fairly >>> reasonable thing to do in that situation. >>> >> >> In the non-cache-coherent scenario, and assuming dma_map() did an >> initial cache invalidation, you can write this: >> >> rx_buffer_complete_1(buf) >> { >> invalidate_cache(buf, size) >> if (!is_ready(buf)) >> return; >> <proceed with receive> >> } >> >> or >> >> rx_buffer_complete_2(buf) >> { >> if (!is_ready(buf)) { >> invalidate_cache(buf, size) >> return; >> } >> <proceed with receive> >> } >> >> The latter is preferred for performance because dma_map() did the >> initial invalidate. >> >> Of course you could write: >> >> rx_buffer_complete_3(buf) >> { >> invalidate_cache(buf, size) >> if >> (!is_ready(buf)) { >> invalidate_cache(buf, size) >> return; >> } >> >> <proceed with receive> >> } >> >> >> but it's a waste of CPU cycles >> >> So I'd be very cautious assuming sync_for_cpu() and sync_for_device() >> are both doing invalidation in existing implementation of arch DMA ops, >> implementers may have taken some liberty around DMA-API to avoid >> unnecessary cache operation (not to blame them). > > I sense an implicit "and the driver can't (or shouldn't) influence > this" here, right? Right, drivers don't get a choice of how a given DMA API implementation works. >> For example looking at arch/arm/mm/dma-mapping.c, for DMA_FROM_DEVICE >> >> sync_single_for_device() >> => __dma_page_cpu_to_dev() >> => dma_cache_maint_page(op=dmac_map_area) >> => cpu_cache.dma_map_area() >> >> sync_single_for_cpu() >> => __dma_page_dev_to_cpu() >> => >> __dma_page_cpu_to_dev(op=dmac_unmap_area) >> => >> cpu_cache.dma_unmap_area() >> >> dma_map_area() always does cache invalidate. >> >> But for a couple of CPU variant, dma_unmap_area() is a noop, so >> sync_for_cpu() does nothing. >> >> Toke's patch will break ath9k on those platforms (mostly silent >> breakage, rx corruption leading to bad performance) > > Okay, so that would be bad obviously. So if I'm reading you correctly > (cf my question above), we can't fix this properly from the driver side, > and we should go with the partial SWIOTLB revert instead? Do you have any other way of telling if DMA is idle, or temporarily pausing it before the sync_for_cpu, such that you could honour the notion of ownership transfer properly? As mentioned elsewhere I suspect the only "real" fix if you really do need to allow concurrent access is to use the coherent DMA API for buffers rather than streaming mappings, but that's obviously some far more significant surgery. Robin. _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2022-03-25 16:45 UTC|newest] Thread overview: 139+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-23 7:19 [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP Oleksandr Natalenko 2022-03-23 7:19 ` Oleksandr Natalenko via iommu 2022-03-23 7:28 ` Kalle Valo 2022-03-23 7:28 ` Kalle Valo 2022-03-23 17:27 ` Linus Torvalds 2022-03-23 17:27 ` Linus Torvalds 2022-03-23 19:06 ` Robin Murphy 2022-03-23 19:06 ` Robin Murphy 2022-03-23 19:16 ` Linus Torvalds 2022-03-23 19:16 ` Linus Torvalds 2022-03-23 20:54 ` Robin Murphy 2022-03-23 20:54 ` Robin Murphy 2022-03-24 5:57 ` Christoph Hellwig 2022-03-24 5:57 ` Christoph Hellwig 2022-03-24 10:25 ` Oleksandr Natalenko 2022-03-24 10:25 ` Oleksandr Natalenko via iommu 2022-03-24 11:05 ` Robin Murphy 2022-03-24 11:05 ` Robin Murphy 2022-03-24 14:27 ` Toke Høiland-Jørgensen 2022-03-24 14:27 ` Toke Høiland-Jørgensen via iommu 2022-03-24 16:29 ` Maxime Bizon 2022-03-24 16:29 ` Maxime Bizon 2022-03-24 16:31 ` Christoph Hellwig 2022-03-24 16:31 ` Christoph Hellwig 2022-03-24 16:52 ` Robin Murphy 2022-03-24 16:52 ` Robin Murphy 2022-03-24 17:07 ` Toke Høiland-Jørgensen 2022-03-24 17:07 ` Toke Høiland-Jørgensen via iommu 2022-03-24 19:26 ` Linus Torvalds 2022-03-24 19:26 ` Linus Torvalds 2022-03-24 21:14 ` Toke Høiland-Jørgensen 2022-03-24 21:14 ` Toke Høiland-Jørgensen via iommu 2022-03-25 10:25 ` Maxime Bizon 2022-03-25 10:25 ` Maxime Bizon 2022-03-25 11:27 ` Robin Murphy 2022-03-25 11:27 ` Robin Murphy 2022-03-25 23:38 ` Halil Pasic 2022-03-25 23:38 ` Halil Pasic 2022-03-26 16:05 ` Toke Høiland-Jørgensen 2022-03-26 16:05 ` Toke Høiland-Jørgensen via iommu 2022-03-26 18:38 ` Linus Torvalds 2022-03-26 18:38 ` Linus Torvalds 2022-03-26 22:38 ` David Laight 2022-03-26 22:38 ` David Laight 2022-03-26 22:41 ` Linus Torvalds 2022-03-26 22:41 ` Linus Torvalds 2022-03-25 16:25 ` Toke Høiland-Jørgensen 2022-03-25 16:25 ` Toke Høiland-Jørgensen via iommu 2022-03-25 16:45 ` Robin Murphy [this message] 2022-03-25 16:45 ` Robin Murphy 2022-03-25 18:13 ` Toke Høiland-Jørgensen via iommu 2022-03-25 18:13 ` Toke Høiland-Jørgensen 2022-03-25 18:30 ` Linus Torvalds 2022-03-25 18:30 ` Linus Torvalds 2022-03-25 19:14 ` Robin Murphy 2022-03-25 19:14 ` Robin Murphy 2022-03-25 19:21 ` Linus Torvalds 2022-03-25 19:21 ` Linus Torvalds 2022-03-25 19:26 ` Oleksandr Natalenko via iommu 2022-03-25 19:26 ` Oleksandr Natalenko 2022-03-25 19:27 ` Linus Torvalds 2022-03-25 19:27 ` Linus Torvalds 2022-03-25 19:35 ` Oleksandr Natalenko via iommu 2022-03-25 19:35 ` Oleksandr Natalenko 2022-03-25 20:37 ` Johannes Berg 2022-03-25 20:37 ` Johannes Berg 2022-03-25 20:47 ` Linus Torvalds 2022-03-25 20:47 ` Linus Torvalds 2022-03-25 21:13 ` Johannes Berg 2022-03-25 21:13 ` Johannes Berg 2022-03-25 21:40 ` David Laight 2022-03-25 21:40 ` David Laight 2022-03-25 21:56 ` Linus Torvalds 2022-03-25 21:56 ` Linus Torvalds 2022-03-25 22:41 ` David Laight 2022-03-25 22:41 ` David Laight 2022-03-27 3:15 ` Halil Pasic 2022-03-27 3:15 ` Halil Pasic 2022-03-28 9:48 ` Johannes Berg 2022-03-28 9:48 ` Johannes Berg 2022-03-28 9:50 ` Johannes Berg 2022-03-28 9:50 ` Johannes Berg 2022-03-28 9:57 ` Johannes Berg 2022-03-28 9:57 ` Johannes Berg 2022-03-27 3:48 ` Halil Pasic 2022-03-27 3:48 ` Halil Pasic 2022-03-27 5:06 ` Linus Torvalds 2022-03-27 5:06 ` Linus Torvalds 2022-03-27 5:21 ` Linus Torvalds 2022-03-27 5:21 ` Linus Torvalds 2022-03-27 15:24 ` David Laight 2022-03-27 15:24 ` David Laight 2022-03-27 19:23 ` Linus Torvalds 2022-03-27 19:23 ` Linus Torvalds 2022-03-27 20:04 ` Linus Torvalds 2022-03-27 20:04 ` Linus Torvalds 2022-03-27 23:52 ` Halil Pasic 2022-03-27 23:52 ` Halil Pasic 2022-03-28 0:30 ` Linus Torvalds 2022-03-28 0:30 ` Linus Torvalds 2022-03-28 12:02 ` Halil Pasic 2022-03-28 12:02 ` Halil Pasic 2022-03-27 23:37 ` Halil Pasic 2022-03-27 23:37 ` Halil Pasic 2022-03-28 0:37 ` Linus Torvalds 2022-03-28 0:37 ` Linus Torvalds 2022-03-25 7:12 ` Oleksandr Natalenko 2022-03-25 7:12 ` Oleksandr Natalenko via iommu 2022-03-25 9:21 ` Thorsten Leemhuis 2022-03-25 9:21 ` Thorsten Leemhuis 2022-03-24 18:31 ` Halil Pasic 2022-03-24 18:31 ` Halil Pasic 2022-03-25 16:31 ` Christoph Hellwig 2022-03-25 16:31 ` Christoph Hellwig 2022-03-24 18:02 ` Halil Pasic 2022-03-24 18:02 ` Halil Pasic 2022-03-25 15:25 ` Halil Pasic 2022-03-25 15:25 ` Halil Pasic 2022-03-25 16:23 ` Robin Murphy 2022-03-25 16:23 ` Robin Murphy 2022-03-25 16:32 ` Christoph Hellwig 2022-03-25 16:32 ` Christoph Hellwig 2022-03-25 18:15 ` Toke Høiland-Jørgensen via iommu 2022-03-25 18:15 ` Toke Høiland-Jørgensen 2022-03-25 18:42 ` Robin Murphy 2022-03-25 18:42 ` Robin Murphy 2022-03-25 18:46 ` Linus Torvalds 2022-03-25 18:46 ` Linus Torvalds 2022-03-28 6:37 ` Christoph Hellwig 2022-03-28 6:37 ` Christoph Hellwig 2022-03-28 8:15 ` David Laight 2022-03-28 8:15 ` David Laight 2022-03-30 12:11 ` Halil Pasic 2022-03-30 12:11 ` Halil Pasic 2022-03-24 8:55 ` Oleksandr Natalenko 2022-03-24 8:55 ` Oleksandr Natalenko via iommu 2022-03-24 12:32 ` [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP #forregzbot Thorsten Leemhuis 2022-03-25 9:24 ` Thorsten Leemhuis 2022-03-27 9:00 ` Thorsten Leemhuis
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=f4224721-4578-61d3-69a7-9a3a76c50529@arm.com \ --to=robin.murphy@arm.com \ --cc=davem@davemloft.net \ --cc=gregkh@linuxfoundation.org \ --cc=hch@lst.de \ --cc=iommu@lists.linux-foundation.org \ --cc=kuba@kernel.org \ --cc=kvalo@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-wireless@vger.kernel.org \ --cc=m.szyprowski@samsung.com \ --cc=mbizon@freebox.fr \ --cc=netdev@vger.kernel.org \ --cc=oleksandr@natalenko.name \ --cc=olha.cherevyk@gmail.com \ --cc=pabeni@redhat.com \ --cc=pasic@linux.ibm.com \ --cc=stable@vger.kernel.org \ --cc=toke@toke.dk \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.