From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09BB8C77B6F for ; Mon, 27 Mar 2023 12:14:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232671AbjC0MOI (ORCPT ); Mon, 27 Mar 2023 08:14:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232607AbjC0MOG (ORCPT ); Mon, 27 Mar 2023 08:14:06 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9D1A3C0A; Mon, 27 Mar 2023 05:14:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 632BAB81151; Mon, 27 Mar 2023 12:13:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6F8CC433D2; Mon, 27 Mar 2023 12:13:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679919237; bh=GiqSNoh/gnOD0kxoCZ+KxN9c2SO+z1nJBns1fyyi+vo=; h=From:To:Cc:Subject:Date:From; b=pq3BAphM3wHtatYjV+DfN2I8sJHS4IYBYyDE+v6/Qc8ZwCT8gpvS4DAGv7VxPzFGA QTONxH3L5u7xfGpaWF72Vn6IOud15UVhnlOu7Pkpc4MFJc3T+eyi/9YHM2uw4o3Q19 hlFRkjcr9R2SPBbGCiCcG9K2ncI1oTIHchYegylFXUpPuOkFuj5EhA9x60aFehDKxy Qltw/cCcMI8A23tAm4waeZqorYzy7ERYFcrUh9Z5Dt10vOC7n/8mHT1x94BLc8QwMZ RjkfvTiQw9Na09hCmY2JMMUGUbEev+PR6WHG/lHK6hz99r9Xiisa4orl4KnTMuyptQ /3AEVTN69OUkQ== From: Arnd Bergmann To: linux-kernel@vger.kernel.org Cc: Arnd Bergmann , Vineet Gupta , Russell King , Neil Armstrong , Linus Walleij , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Stafford Horne , Helge Deller , Michael Ellerman , Christophe Leroy , Paul Walmsley , Palmer Dabbelt , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Max Filippov , Christoph Hellwig , Robin Murphy , Lad Prabhakar , Conor Dooley , linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Subject: [PATCH 00/21] dma-mapping: unify support for cache flushes Date: Mon, 27 Mar 2023 14:12:56 +0200 Message-Id: <20230327121317.4081816-1-arnd@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org From: Arnd Bergmann After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta Cc: Russell King Cc: Neil Armstrong Cc: Linus Walleij Cc: Catalin Marinas Cc: Will Deacon Cc: Guo Ren Cc: Brian Cain Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Dinh Nguyen Cc: Stafford Horne Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Max Filippov Cc: Christoph Hellwig Cc: Robin Murphy Cc: Lad Prabhakar Cc: Conor Dooley Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E2A4FC7619A for ; Mon, 27 Mar 2023 12:14:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=t8FREJKmirGGnVfqeUhnD7ak6hJKTY2AtAqVHMnFUpg=; b=GyRSZbKE5W2FRD WyhzukhIMRZMrhSFn8VcNA+AzuVs0hKMxArl/HDaaYiyYKvg6TKr2L28U6xlljXuQo92A3/JpcW5e JqmqkW08hbBoRAJuVXEYm135X3ONfGYrDlX2wr/boNsN1O83CNt4ROwJpcZNV9BVZpc+1T7/dx66r bcy4GnEh27oOVSZ7+x4cWgBFOdhYwWw7ER5mDrA+dLEWLT8nX78qzzO23O2p/EMQsMA5RnkhUylxg 7m2ONiKTmJHrvsHJ/krrqLDC7qf4UNjULxs0KeY4Glxkf0xPWEvGW3Xn2iLWCrueALn/hT0WWJVuO /4Os/jzyUJY3zqZSvnBQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pgljk-00Aqog-1W; Mon, 27 Mar 2023 12:14:08 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pgljf-00Aqkf-23; Mon, 27 Mar 2023 12:14:06 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3AEF2B8114B; Mon, 27 Mar 2023 12:13:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6F8CC433D2; Mon, 27 Mar 2023 12:13:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679919237; bh=GiqSNoh/gnOD0kxoCZ+KxN9c2SO+z1nJBns1fyyi+vo=; h=From:To:Cc:Subject:Date:From; b=pq3BAphM3wHtatYjV+DfN2I8sJHS4IYBYyDE+v6/Qc8ZwCT8gpvS4DAGv7VxPzFGA QTONxH3L5u7xfGpaWF72Vn6IOud15UVhnlOu7Pkpc4MFJc3T+eyi/9YHM2uw4o3Q19 hlFRkjcr9R2SPBbGCiCcG9K2ncI1oTIHchYegylFXUpPuOkFuj5EhA9x60aFehDKxy Qltw/cCcMI8A23tAm4waeZqorYzy7ERYFcrUh9Z5Dt10vOC7n/8mHT1x94BLc8QwMZ RjkfvTiQw9Na09hCmY2JMMUGUbEev+PR6WHG/lHK6hz99r9Xiisa4orl4KnTMuyptQ /3AEVTN69OUkQ== From: Arnd Bergmann To: linux-kernel@vger.kernel.org Cc: Arnd Bergmann , Vineet Gupta , Russell King , Neil Armstrong , Linus Walleij , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Stafford Horne , Helge Deller , Michael Ellerman , Christophe Leroy , Paul Walmsley , Palmer Dabbelt , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Max Filippov , Christoph Hellwig , Robin Murphy , Lad Prabhakar , Conor Dooley , linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Subject: [PATCH 00/21] dma-mapping: unify support for cache flushes Date: Mon, 27 Mar 2023 14:12:56 +0200 Message-Id: <20230327121317.4081816-1-arnd@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230327_051403_995601_64EB2B1C X-CRM114-Status: GOOD ( 28.54 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Arnd Bergmann After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta Cc: Russell King Cc: Neil Armstrong Cc: Linus Walleij Cc: Catalin Marinas Cc: Will Deacon Cc: Guo Ren Cc: Brian Cain Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Dinh Nguyen Cc: Stafford Horne Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Max Filippov Cc: Christoph Hellwig Cc: Robin Murphy Cc: Lad Prabhakar Cc: Conor Dooley Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91F5AC761A6 for ; Mon, 27 Mar 2023 12:14:56 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4PlWwG6SLmz3fTT for ; Mon, 27 Mar 2023 23:14:54 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=pq3BAphM; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=kernel.org (client-ip=2604:1380:4601:e00::1; helo=ams.source.kernel.org; envelope-from=arnd@kernel.org; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=pq3BAphM; dkim-atps=neutral Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4PlWvJ15GHz3cJg for ; Mon, 27 Mar 2023 23:14:04 +1100 (AEDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3AEF2B8114B; Mon, 27 Mar 2023 12:13:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6F8CC433D2; Mon, 27 Mar 2023 12:13:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679919237; bh=GiqSNoh/gnOD0kxoCZ+KxN9c2SO+z1nJBns1fyyi+vo=; h=From:To:Cc:Subject:Date:From; b=pq3BAphM3wHtatYjV+DfN2I8sJHS4IYBYyDE+v6/Qc8ZwCT8gpvS4DAGv7VxPzFGA QTONxH3L5u7xfGpaWF72Vn6IOud15UVhnlOu7Pkpc4MFJc3T+eyi/9YHM2uw4o3Q19 hlFRkjcr9R2SPBbGCiCcG9K2ncI1oTIHchYegylFXUpPuOkFuj5EhA9x60aFehDKxy Qltw/cCcMI8A23tAm4waeZqorYzy7ERYFcrUh9Z5Dt10vOC7n/8mHT1x94BLc8QwMZ RjkfvTiQw9Na09hCmY2JMMUGUbEev+PR6WHG/lHK6hz99r9Xiisa4orl4KnTMuyptQ /3AEVTN69OUkQ== From: Arnd Bergmann To: linux-kernel@vger.kernel.org Subject: [PATCH 00/21] dma-mapping: unify support for cache flushes Date: Mon, 27 Mar 2023 14:12:56 +0200 Message-Id: <20230327121317.4081816-1-arnd@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rich Felker , linux-sh@vger.kernel.org, Catalin Marinas , Linus Walleij , John Paul Adrian Glaubitz , Max Filippov , Conor Dooley , Guo Ren , linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon , Christoph Hellwig , Helge Deller , Russell King , Geert Uytterhoeven , Vineet Gupta , linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Arnd Bergmann , Brian Cain , Lad Prabhakar , linux-m68k@lists.linux-m68k.org, Paul Walmsley , Stafford Horne , linux-arm-kernel@lists.infradead.org, Neil Armstrong , Michal Sime k , Thomas Bogendoerfer , linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mips@vger.kernel.org, Dinh Nguyen , Palmer Dabbelt , linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy , "David S. Miller" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Arnd Bergmann After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta Cc: Russell King Cc: Neil Armstrong Cc: Linus Walleij Cc: Catalin Marinas Cc: Will Deacon Cc: Guo Ren Cc: Brian Cain Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Dinh Nguyen Cc: Stafford Horne Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Max Filippov Cc: Christoph Hellwig Cc: Robin Murphy Cc: Lad Prabhakar Cc: Conor Dooley Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8FE6BC76195 for ; Mon, 27 Mar 2023 12:14:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=+sLU4P/ytBAbz+W9btvFc68ldIWkWSBJ0sOoj4CT9Kk=; b=17yO+tfELzGYU+ EHMxyxQQUZ3rrBMEdIDrdCeIQjQIy7ZgPg9VbffrPd3QpRur4gQYaYIpXZSonxnEHJhE45jrGeZm3 VkD+scJ6B/1tll5w0QWbAiYUsS4HQfpSf6LXtUp1u52yQCqu8tXBNaqaqBqWDQA2HciwDSvup6JpB gSV8EvkiY6cfBtw00fUsKdDRniDrn8vmnc9zXGv12upq/AoYx6oBlSbaUxBD4tD0C1DA7bnmiKtyc tOyIVM+uahUx7wHT9EUxSLa356DZyRVqjpN+7OLN+ncPqA91jHx+2S9/aVWQLIvjCxpaaPl1SqtbK 3lYEu87z95ryBQA0p3Bw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pgljl-00Aqoz-01; Mon, 27 Mar 2023 12:14:09 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pgljf-00Aqkf-23; Mon, 27 Mar 2023 12:14:06 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3AEF2B8114B; Mon, 27 Mar 2023 12:13:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6F8CC433D2; Mon, 27 Mar 2023 12:13:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679919237; bh=GiqSNoh/gnOD0kxoCZ+KxN9c2SO+z1nJBns1fyyi+vo=; h=From:To:Cc:Subject:Date:From; b=pq3BAphM3wHtatYjV+DfN2I8sJHS4IYBYyDE+v6/Qc8ZwCT8gpvS4DAGv7VxPzFGA QTONxH3L5u7xfGpaWF72Vn6IOud15UVhnlOu7Pkpc4MFJc3T+eyi/9YHM2uw4o3Q19 hlFRkjcr9R2SPBbGCiCcG9K2ncI1oTIHchYegylFXUpPuOkFuj5EhA9x60aFehDKxy Qltw/cCcMI8A23tAm4waeZqorYzy7ERYFcrUh9Z5Dt10vOC7n/8mHT1x94BLc8QwMZ RjkfvTiQw9Na09hCmY2JMMUGUbEev+PR6WHG/lHK6hz99r9Xiisa4orl4KnTMuyptQ /3AEVTN69OUkQ== From: Arnd Bergmann To: linux-kernel@vger.kernel.org Cc: Arnd Bergmann , Vineet Gupta , Russell King , Neil Armstrong , Linus Walleij , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Stafford Horne , Helge Deller , Michael Ellerman , Christophe Leroy , Paul Walmsley , Palmer Dabbelt , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Max Filippov , Christoph Hellwig , Robin Murphy , Lad Prabhakar , Conor Dooley , linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Subject: [PATCH 00/21] dma-mapping: unify support for cache flushes Date: Mon, 27 Mar 2023 14:12:56 +0200 Message-Id: <20230327121317.4081816-1-arnd@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230327_051403_995601_64EB2B1C X-CRM114-Status: GOOD ( 28.54 ) X-BeenThere: linux-snps-arc@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Linux on Synopsys ARC Processors List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-snps-arc" Errors-To: linux-snps-arc-bounces+linux-snps-arc=archiver.kernel.org@lists.infradead.org From: Arnd Bergmann After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta Cc: Russell King Cc: Neil Armstrong Cc: Linus Walleij Cc: Catalin Marinas Cc: Will Deacon Cc: Guo Ren Cc: Brian Cain Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Dinh Nguyen Cc: Stafford Horne Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Max Filippov Cc: Christoph Hellwig Cc: Robin Murphy Cc: Lad Prabhakar Cc: Conor Dooley Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B1A31C76195 for ; Mon, 27 Mar 2023 12:15:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=FZYzZfk80TZ+Nqc0ZTGlzciQR1VJQFyNRWQrIpJ6Tjs=; b=2uCaXxVnf49nLt CPoFb3twF80kFTnNG5/1H3rDwrmKGkRHDz8iKD3L8kSmFL5TLDzkW9JS8CMYTXSQQmtLb7Jse9b9w b9BSRK63uahW0knxux2Q8kmlftAaKzW5msfO+NbtEK9q5pzma2dWO5LpxVO6Y8Rxx5aE/U5Ncko9v Vf3RRnKUAWlBVsEAZ49Z9dWD2Yk5g6cdNmc46IHHa6AgP5uL9QWOnwhb2uG77O8BK5MQMwcgcsRlL 6Bx7Fm0DCVyfN6EHZUjJ0hScSzWu89jzyZ1FVOpgq10K4cnMTFMDSwBqA8p1eGnC3nsH4d+/jKehO t20wqR8fy9oxz55yS+LQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pgljj-00AqoF-19; Mon, 27 Mar 2023 12:14:07 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pgljf-00Aqkf-23; Mon, 27 Mar 2023 12:14:06 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3AEF2B8114B; Mon, 27 Mar 2023 12:13:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6F8CC433D2; Mon, 27 Mar 2023 12:13:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679919237; bh=GiqSNoh/gnOD0kxoCZ+KxN9c2SO+z1nJBns1fyyi+vo=; h=From:To:Cc:Subject:Date:From; b=pq3BAphM3wHtatYjV+DfN2I8sJHS4IYBYyDE+v6/Qc8ZwCT8gpvS4DAGv7VxPzFGA QTONxH3L5u7xfGpaWF72Vn6IOud15UVhnlOu7Pkpc4MFJc3T+eyi/9YHM2uw4o3Q19 hlFRkjcr9R2SPBbGCiCcG9K2ncI1oTIHchYegylFXUpPuOkFuj5EhA9x60aFehDKxy Qltw/cCcMI8A23tAm4waeZqorYzy7ERYFcrUh9Z5Dt10vOC7n/8mHT1x94BLc8QwMZ RjkfvTiQw9Na09hCmY2JMMUGUbEev+PR6WHG/lHK6hz99r9Xiisa4orl4KnTMuyptQ /3AEVTN69OUkQ== From: Arnd Bergmann To: linux-kernel@vger.kernel.org Cc: Arnd Bergmann , Vineet Gupta , Russell King , Neil Armstrong , Linus Walleij , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Stafford Horne , Helge Deller , Michael Ellerman , Christophe Leroy , Paul Walmsley , Palmer Dabbelt , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Max Filippov , Christoph Hellwig , Robin Murphy , Lad Prabhakar , Conor Dooley , linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Subject: [PATCH 00/21] dma-mapping: unify support for cache flushes Date: Mon, 27 Mar 2023 14:12:56 +0200 Message-Id: <20230327121317.4081816-1-arnd@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230327_051403_995601_64EB2B1C X-CRM114-Status: GOOD ( 28.54 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Arnd Bergmann After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta Cc: Russell King Cc: Neil Armstrong Cc: Linus Walleij Cc: Catalin Marinas Cc: Will Deacon Cc: Guo Ren Cc: Brian Cain Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Dinh Nguyen Cc: Stafford Horne Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Max Filippov Cc: Christoph Hellwig Cc: Robin Murphy Cc: Lad Prabhakar Cc: Conor Dooley Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnd Bergmann Subject: [PATCH 00/21] dma-mapping: unify support for cache flushes Date: Mon, 27 Mar 2023 14:12:56 +0200 Message-ID: <20230327121317.4081816-1-arnd@kernel.org> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679919237; bh=GiqSNoh/gnOD0kxoCZ+KxN9c2SO+z1nJBns1fyyi+vo=; h=From:To:Cc:Subject:Date:From; b=pq3BAphM3wHtatYjV+DfN2I8sJHS4IYBYyDE+v6/Qc8ZwCT8gpvS4DAGv7VxPzFGA QTONxH3L5u7xfGpaWF72Vn6IOud15UVhnlOu7Pkpc4MFJc3T+eyi/9YHM2uw4o3Q19 hlFRkjcr9R2SPBbGCiCcG9K2ncI1oTIHchYegylFXUpPuOkFuj5EhA9x60aFehDKxy Qltw/cCcMI8A23tAm4waeZqorYzy7ERYFcrUh9Z5Dt10vOC7n/8mHT1x94BLc8QwMZ RjkfvTiQw9Na09hCmY2JMMUGUbEev+PR6WHG/lHK6hz99r9Xiisa4orl4KnTMuyptQ /3AEVTN69OUkQ== List-ID: Content-Type: text/plain; charset="us-ascii" To: linux-kernel@vger.kernel.org Cc: Arnd Bergmann , Vineet Gupta , Russell King , Neil Armstrong , Linus Walleij , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Stafford Horne , Helge Deller , Michael Ellerman , Christophe Leroy , Paul Walmsley , Palmer Dabbelt , Rich Felker , John From: Arnd Bergmann After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta Cc: Russell King Cc: Neil Armstrong Cc: Linus Walleij Cc: Catalin Marinas Cc: Will Deacon Cc: Guo Ren Cc: Brian Cain Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Dinh Nguyen Cc: Stafford Horne Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Max Filippov Cc: Christoph Hellwig Cc: Robin Murphy Cc: Lad Prabhakar Cc: Conor Dooley Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org