iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
@ 2023-05-24 17:18 Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
                   ` (15 more replies)
  0 siblings, 16 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

Hi,

Another version of the series reducing the kmalloc() minimum alignment
on arm64 to 8 (from 128). Other architectures can easily opt in by
defining ARCH_KMALLOC_MINALIGN as 8 and selecting
DMA_BOUNCE_UNALIGNED_KMALLOC.

The first 10 patches decouple ARCH_KMALLOC_MINALIGN from
ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those
aligned to the run-time probed cache_line_size(). On arm64 we gain the
kmalloc-{64,192} caches.

The subsequent patches (11 to 15) further reduce the kmalloc() caches to
kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
buffers in the DMA API.

Changes since v4:

- Following Robin's suggestions, reworked the iommu handling so that the
  buffer size checks are done in the dev_use_swiotlb() and
  dev_use_sg_swiotlb() functions (together with dev_is_untrusted()). The
  sync operations can now check for the SG_DMA_USE_SWIOTLB flag. Since
  this flag is no longer specific to kmalloc() bouncing (covers
  dev_is_untrusted() as well), the sg_is_dma_use_swiotlb() and
  sg_dma_mark_use_swiotlb() functions are always defined if
  CONFIG_SWIOTLB.

- Dropped ARCH_WANT_KMALLOC_DMA_BOUNCE, only left the
  DMA_BOUNCE_UNALIGNED_KMALLOC option, selectable by the arch code. The
  NEED_SG_DMA_FLAGS is now selected by IOMMU_DMA if SWIOTLB.

- Rather than adding another config option, allow
  dma_get_cache_alignment() to be overridden by the arch code
  (Christoph's suggestion).

- Added a comment to the dma_kmalloc_needs_bounce() function on the
  heuristics behind the bouncing.

- Added acked-by/reviewed-by tags (not adding Ard's tested-by yet as
  there were some changes).

The updated patches are also available on this branch:

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/kmalloc-minalign

Thanks.

Catalin Marinas (14):
  mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  dma: Allow dma_get_cache_alignment() to be overridden by the arch code
  mm/slab: Simplify create_kmalloc_cache() args and make it static
  mm/slab: Limit kmalloc() minimum alignment to
    dma_get_cache_alignment()
  drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/gpu: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/spi: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/md: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  arm64: Allow kmalloc() caches aligned to the smaller cache_line_size()
  dma-mapping: Force bouncing if the kmalloc() size is not
    cache-line-aligned
  iommu/dma: Force bouncing if the size is not cacheline-aligned
  mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing
    possible
  arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64

Robin Murphy (1):
  scatterlist: Add dedicated config for DMA flags

 arch/arm64/Kconfig             |  1 +
 arch/arm64/include/asm/cache.h |  3 ++
 arch/arm64/mm/init.c           |  7 +++-
 drivers/base/devres.c          |  6 ++--
 drivers/gpu/drm/drm_managed.c  |  6 ++--
 drivers/iommu/Kconfig          |  1 +
 drivers/iommu/dma-iommu.c      | 50 +++++++++++++++++++++++-----
 drivers/md/dm-crypt.c          |  2 +-
 drivers/pci/Kconfig            |  1 +
 drivers/spi/spidev.c           |  2 +-
 drivers/usb/core/buffer.c      |  8 ++---
 include/linux/dma-map-ops.h    | 61 ++++++++++++++++++++++++++++++++++
 include/linux/dma-mapping.h    |  4 ++-
 include/linux/scatterlist.h    | 29 +++++++++++++---
 include/linux/slab.h           | 14 ++++++--
 kernel/dma/Kconfig             |  7 ++++
 kernel/dma/direct.h            |  3 +-
 mm/slab.c                      |  6 +---
 mm/slab.h                      |  5 ++-
 mm/slab_common.c               | 46 +++++++++++++++++++------
 20 files changed, 213 insertions(+), 49 deletions(-)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 02/15] dma: Allow dma_get_cache_alignment() to be overridden by the arch code Catalin Marinas
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

In preparation for supporting a kmalloc() minimum alignment smaller than
the arch DMA alignment, decouple the two definitions. This requires that
either the kmalloc() caches are aligned to a (run-time) cache-line size
or the DMA API bounces unaligned kmalloc() allocations. Subsequent
patches will implement both options.

After this patch, ARCH_DMA_MINALIGN is expected to be used in static
alignment annotations and defined by an architecture to be the maximum
alignment for all supported configurations/SoCs in a single Image.
Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to
define its value in the arch headers.

Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in
dma_get_cache_alignment() so that there is no change for architectures
not requiring a minimum DMA alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 include/linux/dma-mapping.h |  2 +-
 include/linux/slab.h        | 14 +++++++++++---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0ee20b764000..3288a1339271 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -545,7 +545,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
 
 static inline int dma_get_cache_alignment(void)
 {
-#ifdef ARCH_DMA_MINALIGN
+#ifdef ARCH_HAS_DMA_MINALIGN
 	return ARCH_DMA_MINALIGN;
 #endif
 	return 1;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 6b3e155b70bf..50dcf9cfbf62 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -235,12 +235,20 @@ void kmem_dump_obj(void *object);
  * alignment larger than the alignment of a 64-bit integer.
  * Setting ARCH_DMA_MINALIGN in arch headers allows that.
  */
-#if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
+#ifdef ARCH_DMA_MINALIGN
+#define ARCH_HAS_DMA_MINALIGN
+#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
 #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
-#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
-#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
+#endif
 #else
+#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
+#endif
+
+#ifndef ARCH_KMALLOC_MINALIGN
 #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
+#elif ARCH_KMALLOC_MINALIGN > 8
+#define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN
+#define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE)
 #endif
 
 /*

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 02/15] dma: Allow dma_get_cache_alignment() to be overridden by the arch code
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-25 13:59   ` Christoph Hellwig
  2023-05-24 17:18 ` [PATCH v5 03/15] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On arm64, ARCH_DMA_MINALIGN is larger than most cache line size
configurations deployed. Allow an architecture to override
dma_get_cache_alignment() in order to return a run-time probed value
(e.g. cache_line_size()).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/dma-mapping.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 3288a1339271..c41019289223 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -543,6 +543,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
 	return 0;
 }
 
+#ifndef dma_get_cache_alignment
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_HAS_DMA_MINALIGN
@@ -550,6 +551,7 @@ static inline int dma_get_cache_alignment(void)
 #endif
 	return 1;
 }
+#endif
 
 static inline void *dmam_alloc_coherent(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp)

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 03/15] mm/slab: Simplify create_kmalloc_cache() args and make it static
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 02/15] dma: Allow dma_get_cache_alignment() to be overridden by the arch code Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 04/15] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

In the slab variant of kmem_cache_init(), call new_kmalloc_cache()
instead of initialising the kmalloc_caches array directly. With this,
create_kmalloc_cache() is now only called from new_kmalloc_cache() in
the same file, so make it static. In addition, the useroffset argument
is always 0 while usersize is the same as size. Remove them.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c        |  6 +-----
 mm/slab.h        |  5 ++---
 mm/slab_common.c | 14 ++++++--------
 3 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index bb57f7fdbae1..b7817dcba63e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1240,11 +1240,7 @@ void __init kmem_cache_init(void)
 	 * Initialize the caches that provide memory for the  kmem_cache_node
 	 * structures first.  Without this, further allocations will bug.
 	 */
-	kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache(
-				kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL],
-				kmalloc_info[INDEX_NODE].size,
-				ARCH_KMALLOC_FLAGS, 0,
-				kmalloc_info[INDEX_NODE].size);
+	new_kmalloc_cache(INDEX_NODE, KMALLOC_NORMAL, ARCH_KMALLOC_FLAGS);
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
diff --git a/mm/slab.h b/mm/slab.h
index f01ac256a8f5..592590fcddae 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -255,9 +255,8 @@ gfp_t kmalloc_fix_flags(gfp_t flags);
 /* Functions provided by the slab allocators */
 int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags);
 
-struct kmem_cache *create_kmalloc_cache(const char *name, unsigned int size,
-			slab_flags_t flags, unsigned int useroffset,
-			unsigned int usersize);
+void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type,
+			      slab_flags_t flags);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
 			unsigned int size, slab_flags_t flags,
 			unsigned int useroffset, unsigned int usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 607249785c07..7f069159aee2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -658,17 +658,16 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
 	s->refcount = -1;	/* Exempt from merging for now */
 }
 
-struct kmem_cache *__init create_kmalloc_cache(const char *name,
-		unsigned int size, slab_flags_t flags,
-		unsigned int useroffset, unsigned int usersize)
+static struct kmem_cache *__init create_kmalloc_cache(const char *name,
+						      unsigned int size,
+						      slab_flags_t flags)
 {
 	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags | SLAB_KMALLOC, useroffset,
-								usersize);
+	create_boot_cache(s, name, size, flags | SLAB_KMALLOC, 0, size);
 	list_add(&s->list, &slab_caches);
 	s->refcount = 1;
 	return s;
@@ -863,7 +862,7 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
-static void __init
+void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 {
 	if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) {
@@ -880,8 +879,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 
 	kmalloc_caches[type][idx] = create_kmalloc_cache(
 					kmalloc_info[idx].name[type],
-					kmalloc_info[idx].size, flags, 0,
-					kmalloc_info[idx].size);
+					kmalloc_info[idx].size, flags);
 
 	/*
 	 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 04/15] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment()
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (2 preceding siblings ...)
  2023-05-24 17:18 ` [PATCH v5 03/15] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

Do not create kmalloc() caches which are not aligned to
dma_get_cache_alignment(). There is no functional change since for
current architectures defining ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN
equals ARCH_DMA_MINALIGN (and dma_get_cache_alignment()). On
architectures without a specific ARCH_DMA_MINALIGN,
dma_get_cache_alignment() is 1, so no change to the kmalloc() caches.

If an architecture selects ARCH_HAS_DMA_CACHE_LINE_SIZE (introduced
previously), the kmalloc() caches will be aligned to a cache line size.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 mm/slab_common.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7f069159aee2..7c6475847fdf 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -17,6 +17,7 @@
 #include <linux/cpu.h>
 #include <linux/uaccess.h>
 #include <linux/seq_file.h>
+#include <linux/dma-mapping.h>
 #include <linux/proc_fs.h>
 #include <linux/debugfs.h>
 #include <linux/kasan.h>
@@ -862,9 +863,18 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
+static unsigned int __kmalloc_minalign(void)
+{
+	return dma_get_cache_alignment();
+}
+
 void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 {
+	unsigned int minalign = __kmalloc_minalign();
+	unsigned int aligned_size = kmalloc_info[idx].size;
+	int aligned_idx = idx;
+
 	if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) {
 		flags |= SLAB_RECLAIM_ACCOUNT;
 	} else if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type == KMALLOC_CGROUP)) {
@@ -877,9 +887,17 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 		flags |= SLAB_CACHE_DMA;
 	}
 
-	kmalloc_caches[type][idx] = create_kmalloc_cache(
-					kmalloc_info[idx].name[type],
-					kmalloc_info[idx].size, flags);
+	if (minalign > ARCH_KMALLOC_MINALIGN) {
+		aligned_size = ALIGN(aligned_size, minalign);
+		aligned_idx = __kmalloc_index(aligned_size, false);
+	}
+
+	if (!kmalloc_caches[type][aligned_idx])
+		kmalloc_caches[type][aligned_idx] = create_kmalloc_cache(
+					kmalloc_info[aligned_idx].name[type],
+					aligned_size, flags);
+	if (idx != aligned_idx)
+		kmalloc_caches[type][idx] = kmalloc_caches[type][aligned_idx];
 
 	/*
 	 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (3 preceding siblings ...)
  2023-05-24 17:18 ` [PATCH v5 04/15] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 06/15] drivers/gpu: " Catalin Marinas
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
---
 drivers/base/devres.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 5c998cfac335..3df0025d12aa 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -29,10 +29,10 @@ struct devres {
 	 * Some archs want to perform DMA into kmalloc caches
 	 * and need a guaranteed alignment larger than
 	 * the alignment of a 64-bit integer.
-	 * Thus we use ARCH_KMALLOC_MINALIGN here and get exactly the same
-	 * buffer alignment as if it was allocated by plain kmalloc().
+	 * Thus we use ARCH_DMA_MINALIGN for data[] which will force the same
+	 * alignment for struct devres when allocated by kmalloc().
 	 */
-	u8 __aligned(ARCH_KMALLOC_MINALIGN) data[];
+	u8 __aligned(ARCH_DMA_MINALIGN) data[];
 };
 
 struct devres_group {

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 06/15] drivers/gpu: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (4 preceding siblings ...)
  2023-05-24 17:18 ` [PATCH v5 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 07/15] drivers/usb: " Catalin Marinas
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/drm_managed.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c
index 4cf214de50c4..3a5802f60e65 100644
--- a/drivers/gpu/drm/drm_managed.c
+++ b/drivers/gpu/drm/drm_managed.c
@@ -49,10 +49,10 @@ struct drmres {
 	 * Some archs want to perform DMA into kmalloc caches
 	 * and need a guaranteed alignment larger than
 	 * the alignment of a 64-bit integer.
-	 * Thus we use ARCH_KMALLOC_MINALIGN here and get exactly the same
-	 * buffer alignment as if it was allocated by plain kmalloc().
+	 * Thus we use ARCH_DMA_MINALIGN for data[] which will force the same
+	 * alignment for struct drmres when allocated by kmalloc().
 	 */
-	u8 __aligned(ARCH_KMALLOC_MINALIGN) data[];
+	u8 __aligned(ARCH_DMA_MINALIGN) data[];
 };
 
 static void free_dr(struct drmres *dr)

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 07/15] drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (5 preceding siblings ...)
  2023-05-24 17:18 ` [PATCH v5 06/15] drivers/gpu: " Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 08/15] drivers/spi: " Catalin Marinas
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/core/buffer.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
index fbb087b728dc..e21d8d106977 100644
--- a/drivers/usb/core/buffer.c
+++ b/drivers/usb/core/buffer.c
@@ -34,13 +34,13 @@ void __init usb_init_pool_max(void)
 {
 	/*
 	 * The pool_max values must never be smaller than
-	 * ARCH_KMALLOC_MINALIGN.
+	 * ARCH_DMA_MINALIGN.
 	 */
-	if (ARCH_KMALLOC_MINALIGN <= 32)
+	if (ARCH_DMA_MINALIGN <= 32)
 		;			/* Original value is okay */
-	else if (ARCH_KMALLOC_MINALIGN <= 64)
+	else if (ARCH_DMA_MINALIGN <= 64)
 		pool_max[0] = 64;
-	else if (ARCH_KMALLOC_MINALIGN <= 128)
+	else if (ARCH_DMA_MINALIGN <= 128)
 		pool_max[0] = 0;	/* Don't use this pool */
 	else
 		BUILD_BUG();		/* We don't allow this */

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 08/15] drivers/spi: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (6 preceding siblings ...)
  2023-05-24 17:18 ` [PATCH v5 07/15] drivers/usb: " Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-24 17:18 ` [PATCH v5 09/15] drivers/md: " Catalin Marinas
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Brown <broonie@kernel.org>
---
 drivers/spi/spidev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 39d94c850839..8d009275a59d 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -237,7 +237,7 @@ static int spidev_message(struct spidev_data *spidev,
 		/* Ensure that also following allocations from rx_buf/tx_buf will meet
 		 * DMA alignment requirements.
 		 */
-		unsigned int len_aligned = ALIGN(u_tmp->len, ARCH_KMALLOC_MINALIGN);
+		unsigned int len_aligned = ALIGN(u_tmp->len, ARCH_DMA_MINALIGN);
 
 		k_tmp->len = u_tmp->len;
 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 09/15] drivers/md: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (7 preceding siblings ...)
  2023-05-24 17:18 ` [PATCH v5 08/15] drivers/spi: " Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-25 14:00   ` Christoph Hellwig
  2023-05-24 17:18 ` [PATCH v5 10/15] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-crypt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 8b47b913ee83..ebbd8f7db880 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3256,7 +3256,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	cc->per_bio_data_size = ti->per_io_data_size =
 		ALIGN(sizeof(struct dm_crypt_io) + cc->dmreq_start + additional_req_size,
-		      ARCH_KMALLOC_MINALIGN);
+		      ARCH_DMA_MINALIGN);
 
 	ret = mempool_init(&cc->page_pool, BIO_MAX_VECS, crypt_page_alloc, crypt_page_free, cc);
 	if (ret) {

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 10/15] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size()
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (8 preceding siblings ...)
  2023-05-24 17:18 ` [PATCH v5 09/15] drivers/md: " Catalin Marinas
@ 2023-05-24 17:18 ` Catalin Marinas
  2023-05-24 17:19 ` [PATCH v5 11/15] scatterlist: Add dedicated config for DMA flags Catalin Marinas
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On arm64, ARCH_DMA_MINALIGN is 128, larger than the cache line size on
most of the current platforms (typically 64). Define
ARCH_KMALLOC_MINALIGN to 8 (the default for architectures without their
own ARCH_DMA_MINALIGN) and override dma_get_cache_alignment() to return
cache_line_size(), probed at run-time. The kmalloc() caches will be
limited to the cache line size. This will allow the additional
kmalloc-{64,192} caches on most arm64 platforms.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/cache.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a51e6e8f3171..ceb368d33bf4 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -33,6 +33,7 @@
  * the CPU.
  */
 #define ARCH_DMA_MINALIGN	(128)
+#define ARCH_KMALLOC_MINALIGN	(8)
 
 #ifndef __ASSEMBLY__
 
@@ -90,6 +91,8 @@ static inline int cache_line_size_of_cpu(void)
 
 int cache_line_size(void);
 
+#define dma_get_cache_alignment	cache_line_size
+
 /*
  * Read the effective value of CTR_EL0.
  *

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 11/15] scatterlist: Add dedicated config for DMA flags
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (9 preceding siblings ...)
  2023-05-24 17:18 ` [PATCH v5 10/15] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
@ 2023-05-24 17:19 ` Catalin Marinas
  2023-05-24 17:19 ` [PATCH v5 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:19 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

From: Robin Murphy <robin.murphy@arm.com>

The DMA flags field will be useful for users beyond PCI P2P, so upgrade
to its own dedicated config option.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/pci/Kconfig         | 1 +
 include/linux/scatterlist.h | 4 ++--
 kernel/dma/Kconfig          | 3 +++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 9309f2469b41..3c07d8d214b3 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -168,6 +168,7 @@ config PCI_P2PDMA
 	#
 	depends on 64BIT
 	select GENERIC_ALLOCATOR
+	select NEED_SG_DMA_FLAGS
 	help
 	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
 	  BARs that are exposed in other devices that are the part of
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 375a5e90d86a..87aaf8b5cdb4 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -16,7 +16,7 @@ struct scatterlist {
 #ifdef CONFIG_NEED_SG_DMA_LENGTH
 	unsigned int	dma_length;
 #endif
-#ifdef CONFIG_PCI_P2PDMA
+#ifdef CONFIG_NEED_SG_DMA_FLAGS
 	unsigned int    dma_flags;
 #endif
 };
@@ -249,7 +249,7 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 }
 
 /*
- * CONFGI_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
+ * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
  * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
  * Use this padding for DMA flags bits to indicate when a specific
  * dma address is a bus address.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 6677d0e64d27..acc6f231259c 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -24,6 +24,9 @@ config DMA_OPS_BYPASS
 config ARCH_HAS_DMA_MAP_DIRECT
 	bool
 
+config NEED_SG_DMA_FLAGS
+	bool
+
 config NEED_SG_DMA_LENGTH
 	bool
 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (10 preceding siblings ...)
  2023-05-24 17:19 ` [PATCH v5 11/15] scatterlist: Add dedicated config for DMA flags Catalin Marinas
@ 2023-05-24 17:19 ` Catalin Marinas
  2023-05-25 15:53   ` Robin Murphy
  2023-05-24 17:19 ` [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:19 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

For direct DMA, if the size is small enough to have originated from a
kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against
dma_get_cache_alignment() and bounce if necessary. For larger sizes, it
is the responsibility of the DMA API caller to ensure proper alignment.

At this point, the kmalloc() caches are properly aligned but this will
change in a subsequent patch.

Architectures can opt in by selecting ARCH_WANT_KMALLOC_DMA_BOUNCE.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 include/linux/dma-map-ops.h | 61 +++++++++++++++++++++++++++++++++++++
 kernel/dma/Kconfig          |  4 +++
 kernel/dma/direct.h         |  3 +-
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 31f114f486c4..9bf19b5bf755 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -8,6 +8,7 @@
 
 #include <linux/dma-mapping.h>
 #include <linux/pgtable.h>
+#include <linux/slab.h>
 
 struct cma;
 
@@ -277,6 +278,66 @@ static inline bool dev_is_dma_coherent(struct device *dev)
 }
 #endif /* CONFIG_ARCH_HAS_DMA_COHERENCE_H */
 
+/*
+ * Check whether potential kmalloc() buffers are safe for non-coherent DMA.
+ */
+static inline bool dma_kmalloc_safe(struct device *dev,
+				    enum dma_data_direction dir)
+{
+	/*
+	 * If DMA bouncing of kmalloc() buffers is disabled, the kmalloc()
+	 * caches have already been aligned to a DMA-safe size.
+	 */
+	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC))
+		return true;
+
+	/*
+	 * kmalloc() buffers are DMA-safe irrespective of size if the device
+	 * is coherent or the direction is DMA_TO_DEVICE (non-desctructive
+	 * cache maintenance and benign cache line evictions).
+	 */
+	if (dev_is_dma_coherent(dev) || dir == DMA_TO_DEVICE)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check whether the given size, assuming it is for a kmalloc()'ed buffer, is
+ * sufficiently aligned for non-coherent DMA.
+ */
+static inline bool dma_kmalloc_size_aligned(size_t size)
+{
+	/*
+	 * Larger kmalloc() sizes are guaranteed to be aligned to
+	 * ARCH_DMA_MINALIGN.
+	 */
+	if (size >= 2 * ARCH_DMA_MINALIGN ||
+	    IS_ALIGNED(kmalloc_size_roundup(size), dma_get_cache_alignment()))
+		return true;
+
+	return false;
+}
+
+/*
+ * Check whether the given object size may have originated from a kmalloc()
+ * buffer with a slab alignment below the DMA-safe alignment and needs
+ * bouncing for non-coherent DMA. The pointer alignment is not considered and
+ * in-structure DMA-safe offsets are the responsibility of the caller. Such
+ * code should use the static ARCH_DMA_MINALIGN for compiler annotations.
+ *
+ * The heuristics can have false positives, bouncing unnecessarily, though the
+ * buffers would be small. False negatives are theoretically possible if, for
+ * example, multiple small kmalloc() buffers are coalesced into a larger
+ * buffer that passes the alignment check. There are no such known constructs
+ * in the kernel.
+ */
+static inline bool dma_kmalloc_needs_bounce(struct device *dev, size_t size,
+					    enum dma_data_direction dir)
+{
+	return !dma_kmalloc_safe(dev, dir) && !dma_kmalloc_size_aligned(size);
+}
+
 void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index acc6f231259c..abea1823fe21 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -90,6 +90,10 @@ config SWIOTLB
 	bool
 	select NEED_DMA_MAP_STATE
 
+config DMA_BOUNCE_UNALIGNED_KMALLOC
+	bool
+	depends on SWIOTLB
+
 config DMA_RESTRICTED_POOL
 	bool "DMA Restricted Pool"
 	depends on OF && OF_RESERVED_MEM && SWIOTLB
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index e38ffc5e6bdd..97ec892ea0b5 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -94,7 +94,8 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
 		return swiotlb_map(dev, phys, size, dir, attrs);
 	}
 
-	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
+	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
+	    dma_kmalloc_needs_bounce(dev, size, dir)) {
 		if (is_pci_p2pdma_page(page))
 			return DMA_MAPPING_ERROR;
 		if (is_swiotlb_active(dev))

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (11 preceding siblings ...)
  2023-05-24 17:19 ` [PATCH v5 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
@ 2023-05-24 17:19 ` Catalin Marinas
  2023-05-25 15:57   ` Robin Murphy
  2023-05-26 16:36   ` Jisheng Zhang
  2023-05-24 17:19 ` [PATCH v5 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:19 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

Similarly to the direct DMA, bounce small allocations as they may have
originated from a kmalloc() cache not safe for DMA. Unlike the direct
DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
non-coherent devices as this would break some cases where the iova is
expected to be contiguous (dmabuf). Instead, scan the scatterlist for
any small sizes and only go the swiotlb path if any element of the list
needs bouncing (note that iommu_dma_map_page() would still only bounce
those buffers which are not DMA-aligned).

To avoid scanning the scatterlist on the 'sync' operations, introduce an
SG_DMA_USE_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The
dev_use_swiotlb() function together with the newly added
dev_use_sg_swiotlb() now check for both untrusted devices and unaligned
kmalloc() buffers (suggested by Robin Murphy).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/Kconfig       |  1 +
 drivers/iommu/dma-iommu.c   | 50 ++++++++++++++++++++++++++++++-------
 include/linux/scatterlist.h | 25 +++++++++++++++++--
 3 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index db98c3f86e8c..670eff7a8e11 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -152,6 +152,7 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select IRQ_MSI_IOMMU
 	select NEED_SG_DMA_LENGTH
+	select NEED_SG_DMA_FLAGS if SWIOTLB
 
 # Shared Virtual Addressing
 config IOMMU_SVA
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7a9f0b0bddbd..24a8b8c2368c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -520,9 +520,38 @@ static bool dev_is_untrusted(struct device *dev)
 	return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
-static bool dev_use_swiotlb(struct device *dev)
+static bool dev_use_swiotlb(struct device *dev, size_t size,
+			    enum dma_data_direction dir)
 {
-	return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
+	return IS_ENABLED(CONFIG_SWIOTLB) &&
+		(dev_is_untrusted(dev) ||
+		 dma_kmalloc_needs_bounce(dev, size, dir));
+}
+
+static bool dev_use_sg_swiotlb(struct device *dev, struct scatterlist *sg,
+			       int nents, enum dma_data_direction dir)
+{
+	struct scatterlist *s;
+	int i;
+
+	if (!IS_ENABLED(CONFIG_SWIOTLB))
+		return false;
+
+	if (dev_is_untrusted(dev))
+		return true;
+
+	/*
+	 * If kmalloc() buffers are not DMA-safe for this device and
+	 * direction, check the individual lengths in the sg list. If any
+	 * element is deemed unsafe, use the swiotlb for bouncing.
+	 */
+	if (!dma_kmalloc_safe(dev, dir)) {
+		for_each_sg(sg, s, nents, i)
+			if (!dma_kmalloc_size_aligned(s->length))
+				return true;
+	}
+
+	return false;
 }
 
 /**
@@ -922,7 +951,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
+	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -938,7 +967,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
+	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -956,7 +985,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (sg_is_dma_use_swiotlb(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
 						      sg->length, dir);
@@ -972,7 +1001,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (sg_is_dma_use_swiotlb(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_device(dev,
 							 sg_dma_address(sg),
@@ -998,7 +1027,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	 * If both the physical buffer start address and size are
 	 * page aligned, we don't need to use a bounce page.
 	 */
-	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
+	if (dev_use_swiotlb(dev, size, dir) &&
+	    iova_offset(iovad, phys | size)) {
 		void *padding_start;
 		size_t padding_size, aligned_size;
 
@@ -1166,6 +1196,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *s;
 	int i;
 
+	sg_dma_mark_use_swiotlb(sg);
+
 	for_each_sg(sg, s, nents, i) {
 		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
 				s->offset, s->length, dir, attrs);
@@ -1210,7 +1242,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 			goto out;
 	}
 
-	if (dev_use_swiotlb(dev))
+	if (dev_use_sg_swiotlb(dev, sg, nents, dir))
 		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
@@ -1315,7 +1347,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *tmp;
 	int i;
 
-	if (dev_use_swiotlb(dev)) {
+	if (sg_is_dma_use_swiotlb(sg)) {
 		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
 		return;
 	}
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 87aaf8b5cdb4..330a157c5501 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 	sg->page_link &= ~SG_END;
 }
 
+#define SG_DMA_BUS_ADDRESS	(1 << 0)
+#define SG_DMA_USE_SWIOTLB	(1 << 1)
+
+#ifdef CONFIG_SWIOTLB
+static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
+{
+	return sg->dma_flags & SG_DMA_USE_SWIOTLB;
+}
+
+static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
+{
+	sg->dma_flags |= SG_DMA_USE_SWIOTLB;
+}
+#else
+static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
+{
+	return false;
+}
+static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
+{
+}
+#endif
+
 /*
  * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
  * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
@@ -256,8 +279,6 @@ static inline void sg_unmark_end(struct scatterlist *sg)
  */
 #ifdef CONFIG_PCI_P2PDMA
 
-#define SG_DMA_BUS_ADDRESS (1 << 0)
-
 /**
  * sg_dma_is_bus address - Return whether a given segment was marked
  *			   as a bus address

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (12 preceding siblings ...)
  2023-05-24 17:19 ` [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
@ 2023-05-24 17:19 ` Catalin Marinas
  2023-05-24 17:19 ` [PATCH v5 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
  2023-05-25 12:31 ` [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Jonathan Cameron
  15 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:19 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

If an architecture opted in to DMA bouncing of unaligned kmalloc()
buffers (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc()
cache alignment below cache-line size to ARCH_KMALLOC_MINALIGN.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 mm/slab_common.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7c6475847fdf..fe46459a8b77 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -18,6 +18,7 @@
 #include <linux/uaccess.h>
 #include <linux/seq_file.h>
 #include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>
 #include <linux/proc_fs.h>
 #include <linux/debugfs.h>
 #include <linux/kasan.h>
@@ -863,10 +864,19 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
+#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
+static unsigned int __kmalloc_minalign(void)
+{
+	if (io_tlb_default_mem.nslabs)
+		return ARCH_KMALLOC_MINALIGN;
+	return dma_get_cache_alignment();
+}
+#else
 static unsigned int __kmalloc_minalign(void)
 {
 	return dma_get_cache_alignment();
 }
+#endif
 
 void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (13 preceding siblings ...)
  2023-05-24 17:19 ` [PATCH v5 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
@ 2023-05-24 17:19 ` Catalin Marinas
  2023-05-25 16:12   ` Robin Murphy
  2023-05-25 12:31 ` [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Jonathan Cameron
  15 siblings, 1 reply; 31+ messages in thread
From: Catalin Marinas @ 2023-05-24 17:19 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

With the DMA bouncing of unaligned kmalloc() buffers now in place,
enable it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In
addition, always create the swiotlb buffer even when the end of RAM is
within the 32-bit physical address range (the swiotlb buffer can still
be disabled on the kernel command line).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig   | 1 +
 arch/arm64/mm/init.c | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1201d25a8a4..af42871431c0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -120,6 +120,7 @@ config ARM64
 	select CRC32
 	select DCACHE_WORD_ACCESS
 	select DYNAMIC_FTRACE if FUNCTION_TRACER
+	select DMA_BOUNCE_UNALIGNED_KMALLOC
 	select DMA_DIRECT_REMAP
 	select EDAC_SUPPORT
 	select FRAME_POINTER
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 66e70ca47680..3ac2e9d79ce4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -442,7 +442,12 @@ void __init bootmem_init(void)
  */
 void __init mem_init(void)
 {
-	swiotlb_init(max_pfn > PFN_DOWN(arm64_dma_phys_limit), SWIOTLB_VERBOSE);
+	bool swiotlb = max_pfn > PFN_DOWN(arm64_dma_phys_limit);
+
+	if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC))
+		swiotlb = true;
+
+	swiotlb_init(swiotlb, SWIOTLB_VERBOSE);
 
 	/* this will put all unused low memory onto the freelists */
 	memblock_free_all();

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (14 preceding siblings ...)
  2023-05-24 17:19 ` [PATCH v5 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
@ 2023-05-25 12:31 ` Jonathan Cameron
  2023-05-25 14:31   ` Catalin Marinas
  15 siblings, 1 reply; 31+ messages in thread
From: Jonathan Cameron @ 2023-05-25 12:31 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Wed, 24 May 2023 18:18:49 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> Hi,
> 
> Another version of the series reducing the kmalloc() minimum alignment
> on arm64 to 8 (from 128). Other architectures can easily opt in by
> defining ARCH_KMALLOC_MINALIGN as 8 and selecting
> DMA_BOUNCE_UNALIGNED_KMALLOC.
> 
> The first 10 patches decouple ARCH_KMALLOC_MINALIGN from
> ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those
> aligned to the run-time probed cache_line_size(). On arm64 we gain the
> kmalloc-{64,192} caches.
> 
> The subsequent patches (11 to 15) further reduce the kmalloc() caches to
> kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
> buffers in the DMA API.

Hi Catalin,

I think IIO_DMA_MINALIGN needs to switch to ARCH_DMA_MINALIGN as well.

It's used to force static alignement of buffers with larger structures,
to make them suitable for non coherent DMA, similar to your other cases.

Thanks,

Jonathan


> 
> Changes since v4:
> 
> - Following Robin's suggestions, reworked the iommu handling so that the
>   buffer size checks are done in the dev_use_swiotlb() and
>   dev_use_sg_swiotlb() functions (together with dev_is_untrusted()). The
>   sync operations can now check for the SG_DMA_USE_SWIOTLB flag. Since
>   this flag is no longer specific to kmalloc() bouncing (covers
>   dev_is_untrusted() as well), the sg_is_dma_use_swiotlb() and
>   sg_dma_mark_use_swiotlb() functions are always defined if
>   CONFIG_SWIOTLB.
> 
> - Dropped ARCH_WANT_KMALLOC_DMA_BOUNCE, only left the
>   DMA_BOUNCE_UNALIGNED_KMALLOC option, selectable by the arch code. The
>   NEED_SG_DMA_FLAGS is now selected by IOMMU_DMA if SWIOTLB.
> 
> - Rather than adding another config option, allow
>   dma_get_cache_alignment() to be overridden by the arch code
>   (Christoph's suggestion).
> 
> - Added a comment to the dma_kmalloc_needs_bounce() function on the
>   heuristics behind the bouncing.
> 
> - Added acked-by/reviewed-by tags (not adding Ard's tested-by yet as
>   there were some changes).
> 
> The updated patches are also available on this branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/kmalloc-minalign
> 
> Thanks.
> 
> Catalin Marinas (14):
>   mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
>   dma: Allow dma_get_cache_alignment() to be overridden by the arch code
>   mm/slab: Simplify create_kmalloc_cache() args and make it static
>   mm/slab: Limit kmalloc() minimum alignment to
>     dma_get_cache_alignment()
>   drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
>   drivers/gpu: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
>   drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
>   drivers/spi: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
>   drivers/md: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
>   arm64: Allow kmalloc() caches aligned to the smaller cache_line_size()
>   dma-mapping: Force bouncing if the kmalloc() size is not
>     cache-line-aligned
>   iommu/dma: Force bouncing if the size is not cacheline-aligned
>   mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing
>     possible
>   arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64
> 
> Robin Murphy (1):
>   scatterlist: Add dedicated config for DMA flags
> 
>  arch/arm64/Kconfig             |  1 +
>  arch/arm64/include/asm/cache.h |  3 ++
>  arch/arm64/mm/init.c           |  7 +++-
>  drivers/base/devres.c          |  6 ++--
>  drivers/gpu/drm/drm_managed.c  |  6 ++--
>  drivers/iommu/Kconfig          |  1 +
>  drivers/iommu/dma-iommu.c      | 50 +++++++++++++++++++++++-----
>  drivers/md/dm-crypt.c          |  2 +-
>  drivers/pci/Kconfig            |  1 +
>  drivers/spi/spidev.c           |  2 +-
>  drivers/usb/core/buffer.c      |  8 ++---
>  include/linux/dma-map-ops.h    | 61 ++++++++++++++++++++++++++++++++++
>  include/linux/dma-mapping.h    |  4 ++-
>  include/linux/scatterlist.h    | 29 +++++++++++++---
>  include/linux/slab.h           | 14 ++++++--
>  kernel/dma/Kconfig             |  7 ++++
>  kernel/dma/direct.h            |  3 +-
>  mm/slab.c                      |  6 +---
>  mm/slab.h                      |  5 ++-
>  mm/slab_common.c               | 46 +++++++++++++++++++------
>  20 files changed, 213 insertions(+), 49 deletions(-)
> 
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 02/15] dma: Allow dma_get_cache_alignment() to be overridden by the arch code
  2023-05-24 17:18 ` [PATCH v5 02/15] dma: Allow dma_get_cache_alignment() to be overridden by the arch code Catalin Marinas
@ 2023-05-25 13:59   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-05-25 13:59 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Wed, May 24, 2023 at 06:18:51PM +0100, Catalin Marinas wrote:
> On arm64, ARCH_DMA_MINALIGN is larger than most cache line size
> configurations deployed. Allow an architecture to override
> dma_get_cache_alignment() in order to return a run-time probed value
> (e.g. cache_line_size()).
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Will Deacon <will@kernel.org>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 09/15] drivers/md: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-24 17:18 ` [PATCH v5 09/15] drivers/md: " Catalin Marinas
@ 2023-05-25 14:00   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-05-25 14:00 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

Nit: the subject might better read dm-crypt: ... as it's just touching
that driver.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-25 12:31 ` [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Jonathan Cameron
@ 2023-05-25 14:31   ` Catalin Marinas
  2023-05-26 16:07     ` Jonathan Cameron
  0 siblings, 1 reply; 31+ messages in thread
From: Catalin Marinas @ 2023-05-25 14:31 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Thu, May 25, 2023 at 01:31:38PM +0100, Jonathan Cameron wrote:
> On Wed, 24 May 2023 18:18:49 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > Another version of the series reducing the kmalloc() minimum alignment
> > on arm64 to 8 (from 128). Other architectures can easily opt in by
> > defining ARCH_KMALLOC_MINALIGN as 8 and selecting
> > DMA_BOUNCE_UNALIGNED_KMALLOC.
> > 
> > The first 10 patches decouple ARCH_KMALLOC_MINALIGN from
> > ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those
> > aligned to the run-time probed cache_line_size(). On arm64 we gain the
> > kmalloc-{64,192} caches.
> > 
> > The subsequent patches (11 to 15) further reduce the kmalloc() caches to
> > kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
> > buffers in the DMA API.
> 
> I think IIO_DMA_MINALIGN needs to switch to ARCH_DMA_MINALIGN as well.
> 
> It's used to force static alignement of buffers with larger structures,
> to make them suitable for non coherent DMA, similar to your other cases.

Ah, I forgot that you introduced that macro. However, at a quick grep, I
don't think this forced alignment always works as intended (irrespective
of this series). Let's take an example:

struct ltc2496_driverdata {
	/* this must be the first member */
	struct ltc2497core_driverdata common_ddata;
	struct spi_device *spi;

	/*
	 * DMA (thus cache coherency maintenance) may require the
	 * transfer buffers to live in their own cache lines.
	 */
	unsigned char rxbuf[3] __aligned(IIO_DMA_MINALIGN);
	unsigned char txbuf[3];
};

The rxbuf is aligned to IIO_DMA_MINALIGN, the structure and its size as
well but txbuf is at an offset of 3 bytes from the aligned
IIO_DMA_MINALIGN. So basically any cache maintenance on rxbuf would
corrupt txbuf. You need rxbuf to be the only resident of a
cache line, therefore the next member needs such alignment as well.

With this series and SWIOTLB enabled, however, if you try to transfer 3
bytes, they will be bounced, so the missing alignment won't matter much.

-- 
Catalin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned
  2023-05-24 17:19 ` [PATCH v5 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
@ 2023-05-25 15:53   ` Robin Murphy
  0 siblings, 0 replies; 31+ messages in thread
From: Robin Murphy @ 2023-05-25 15:53 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Christoph Hellwig
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On 24/05/2023 6:19 pm, Catalin Marinas wrote:
> For direct DMA, if the size is small enough to have originated from a
> kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against
> dma_get_cache_alignment() and bounce if necessary. For larger sizes, it
> is the responsibility of the DMA API caller to ensure proper alignment.
> 
> At this point, the kmalloc() caches are properly aligned but this will
> change in a subsequent patch.
> 
> Architectures can opt in by selecting ARCH_WANT_KMALLOC_DMA_BOUNCE.

Thanks for the additional comment, that's a great summary for future 
reference.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
>   include/linux/dma-map-ops.h | 61 +++++++++++++++++++++++++++++++++++++
>   kernel/dma/Kconfig          |  4 +++
>   kernel/dma/direct.h         |  3 +-
>   3 files changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 31f114f486c4..9bf19b5bf755 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -8,6 +8,7 @@
>   
>   #include <linux/dma-mapping.h>
>   #include <linux/pgtable.h>
> +#include <linux/slab.h>
>   
>   struct cma;
>   
> @@ -277,6 +278,66 @@ static inline bool dev_is_dma_coherent(struct device *dev)
>   }
>   #endif /* CONFIG_ARCH_HAS_DMA_COHERENCE_H */
>   
> +/*
> + * Check whether potential kmalloc() buffers are safe for non-coherent DMA.
> + */
> +static inline bool dma_kmalloc_safe(struct device *dev,
> +				    enum dma_data_direction dir)
> +{
> +	/*
> +	 * If DMA bouncing of kmalloc() buffers is disabled, the kmalloc()
> +	 * caches have already been aligned to a DMA-safe size.
> +	 */
> +	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC))
> +		return true;
> +
> +	/*
> +	 * kmalloc() buffers are DMA-safe irrespective of size if the device
> +	 * is coherent or the direction is DMA_TO_DEVICE (non-desctructive
> +	 * cache maintenance and benign cache line evictions).
> +	 */
> +	if (dev_is_dma_coherent(dev) || dir == DMA_TO_DEVICE)
> +		return true;
> +
> +	return false;
> +}
> +
> +/*
> + * Check whether the given size, assuming it is for a kmalloc()'ed buffer, is
> + * sufficiently aligned for non-coherent DMA.
> + */
> +static inline bool dma_kmalloc_size_aligned(size_t size)
> +{
> +	/*
> +	 * Larger kmalloc() sizes are guaranteed to be aligned to
> +	 * ARCH_DMA_MINALIGN.
> +	 */
> +	if (size >= 2 * ARCH_DMA_MINALIGN ||
> +	    IS_ALIGNED(kmalloc_size_roundup(size), dma_get_cache_alignment()))
> +		return true;
> +
> +	return false;
> +}
> +
> +/*
> + * Check whether the given object size may have originated from a kmalloc()
> + * buffer with a slab alignment below the DMA-safe alignment and needs
> + * bouncing for non-coherent DMA. The pointer alignment is not considered and
> + * in-structure DMA-safe offsets are the responsibility of the caller. Such
> + * code should use the static ARCH_DMA_MINALIGN for compiler annotations.
> + *
> + * The heuristics can have false positives, bouncing unnecessarily, though the
> + * buffers would be small. False negatives are theoretically possible if, for
> + * example, multiple small kmalloc() buffers are coalesced into a larger
> + * buffer that passes the alignment check. There are no such known constructs
> + * in the kernel.
> + */
> +static inline bool dma_kmalloc_needs_bounce(struct device *dev, size_t size,
> +					    enum dma_data_direction dir)
> +{
> +	return !dma_kmalloc_safe(dev, dir) && !dma_kmalloc_size_aligned(size);
> +}
> +
>   void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
>   		gfp_t gfp, unsigned long attrs);
>   void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index acc6f231259c..abea1823fe21 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -90,6 +90,10 @@ config SWIOTLB
>   	bool
>   	select NEED_DMA_MAP_STATE
>   
> +config DMA_BOUNCE_UNALIGNED_KMALLOC
> +	bool
> +	depends on SWIOTLB
> +
>   config DMA_RESTRICTED_POOL
>   	bool "DMA Restricted Pool"
>   	depends on OF && OF_RESERVED_MEM && SWIOTLB
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index e38ffc5e6bdd..97ec892ea0b5 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -94,7 +94,8 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
>   		return swiotlb_map(dev, phys, size, dir, attrs);
>   	}
>   
> -	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> +	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
> +	    dma_kmalloc_needs_bounce(dev, size, dir)) {
>   		if (is_pci_p2pdma_page(page))
>   			return DMA_MAPPING_ERROR;
>   		if (is_swiotlb_active(dev))

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-24 17:19 ` [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
@ 2023-05-25 15:57   ` Robin Murphy
  2023-05-26 16:36   ` Jisheng Zhang
  1 sibling, 0 replies; 31+ messages in thread
From: Robin Murphy @ 2023-05-25 15:57 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Christoph Hellwig
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On 24/05/2023 6:19 pm, Catalin Marinas wrote:
> Similarly to the direct DMA, bounce small allocations as they may have
> originated from a kmalloc() cache not safe for DMA. Unlike the direct
> DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
> non-coherent devices as this would break some cases where the iova is
> expected to be contiguous (dmabuf). Instead, scan the scatterlist for
> any small sizes and only go the swiotlb path if any element of the list
> needs bouncing (note that iommu_dma_map_page() would still only bounce
> those buffers which are not DMA-aligned).
> 
> To avoid scanning the scatterlist on the 'sync' operations, introduce an
> SG_DMA_USE_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The
> dev_use_swiotlb() function together with the newly added
> dev_use_sg_swiotlb() now check for both untrusted devices and unaligned
> kmalloc() buffers (suggested by Robin Murphy).
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
[...]
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 87aaf8b5cdb4..330a157c5501 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>   	sg->page_link &= ~SG_END;
>   }
>   
> +#define SG_DMA_BUS_ADDRESS	(1 << 0)
> +#define SG_DMA_USE_SWIOTLB	(1 << 1)
> +
> +#ifdef CONFIG_SWIOTLB
> +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)

Nit: can we decide whether this API is named sg_<verb>_<flag name> or 
sg_dma_<verb>_<flag suffix>? I'm leaning towards the latter, which would 
be consistent with the existing kerneldoc if not all the code (which I 
shall probably now send a patch to fix).

However, my internal grammar parser just cannot cope with the "is use" 
double-verb construct, so I would be inclined to collapse this 
particular one to "sg_dma_use_swiotlb" either way. Yes, it breaks the 
pattern, but that's what the English language does best :)

Otherwise,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

I don't seem to have got round to giving this a spin on my Juno today, 
but will try to do so soon.

Cheers,
Robin.

> +{
> +	return sg->dma_flags & SG_DMA_USE_SWIOTLB;
> +}
> +
> +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
> +{
> +	sg->dma_flags |= SG_DMA_USE_SWIOTLB;
> +}
> +#else
> +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
> +{
> +	return false;
> +}
> +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
> +{
> +}
> +#endif
> +
>   /*
>    * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
>    * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
> @@ -256,8 +279,6 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>    */
>   #ifdef CONFIG_PCI_P2PDMA
>   
> -#define SG_DMA_BUS_ADDRESS (1 << 0)
> -
>   /**
>    * sg_dma_is_bus address - Return whether a given segment was marked
>    *			   as a bus address

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64
  2023-05-24 17:19 ` [PATCH v5 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
@ 2023-05-25 16:12   ` Robin Murphy
  2023-05-25 17:08     ` Catalin Marinas
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Murphy @ 2023-05-25 16:12 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Christoph Hellwig
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On 24/05/2023 6:19 pm, Catalin Marinas wrote:
> With the DMA bouncing of unaligned kmalloc() buffers now in place,
> enable it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In
> addition, always create the swiotlb buffer even when the end of RAM is
> within the 32-bit physical address range (the swiotlb buffer can still
> be disabled on the kernel command line).
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/Kconfig   | 1 +
>   arch/arm64/mm/init.c | 7 ++++++-
>   2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index b1201d25a8a4..af42871431c0 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -120,6 +120,7 @@ config ARM64
>   	select CRC32
>   	select DCACHE_WORD_ACCESS
>   	select DYNAMIC_FTRACE if FUNCTION_TRACER
> +	select DMA_BOUNCE_UNALIGNED_KMALLOC

We may want to give the embedded folks an easier way of turning this 
off, since IIRC one of the reasons for the existing automatic behaviour 
was people not wanting to have to depend on the command line. Things 
with 256MB or so of RAM seem unlikely to get enough memory efficiency 
back from the smaller kmem caches to pay off the SWIOTLB allocation :)

Cheers,
Robin.

>   	select DMA_DIRECT_REMAP
>   	select EDAC_SUPPORT
>   	select FRAME_POINTER
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 66e70ca47680..3ac2e9d79ce4 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -442,7 +442,12 @@ void __init bootmem_init(void)
>    */
>   void __init mem_init(void)
>   {
> -	swiotlb_init(max_pfn > PFN_DOWN(arm64_dma_phys_limit), SWIOTLB_VERBOSE);
> +	bool swiotlb = max_pfn > PFN_DOWN(arm64_dma_phys_limit);
> +
> +	if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC))
> +		swiotlb = true;
> +
> +	swiotlb_init(swiotlb, SWIOTLB_VERBOSE);
>   
>   	/* this will put all unused low memory onto the freelists */
>   	memblock_free_all();

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64
  2023-05-25 16:12   ` Robin Murphy
@ 2023-05-25 17:08     ` Catalin Marinas
  0 siblings, 0 replies; 31+ messages in thread
From: Catalin Marinas @ 2023-05-25 17:08 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Linus Torvalds, Christoph Hellwig, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Thu, May 25, 2023 at 05:12:37PM +0100, Robin Murphy wrote:
> On 24/05/2023 6:19 pm, Catalin Marinas wrote:
> > With the DMA bouncing of unaligned kmalloc() buffers now in place,
> > enable it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In
> > addition, always create the swiotlb buffer even when the end of RAM is
> > within the 32-bit physical address range (the swiotlb buffer can still
> > be disabled on the kernel command line).
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > ---
> >   arch/arm64/Kconfig   | 1 +
> >   arch/arm64/mm/init.c | 7 ++++++-
> >   2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index b1201d25a8a4..af42871431c0 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -120,6 +120,7 @@ config ARM64
> >   	select CRC32
> >   	select DCACHE_WORD_ACCESS
> >   	select DYNAMIC_FTRACE if FUNCTION_TRACER
> > +	select DMA_BOUNCE_UNALIGNED_KMALLOC
> 
> We may want to give the embedded folks an easier way of turning this off,
> since IIRC one of the reasons for the existing automatic behaviour was
> people not wanting to have to depend on the command line. Things with 256MB
> or so of RAM seem unlikely to get enough memory efficiency back from the
> smaller kmem caches to pay off the SWIOTLB allocation :)

I thought about this initially and that's why I had two options
(ARCH_WANT_* and this one). But we already select SWIOTLB on arm64, so
for the embedded folk the only option is swiotlb=noforce on the cmdline
which, in turn, limits the kmalloc caches to kmalloc-64 (or whatever the
cache line size is) irrespective of this new select.

-- 
Catalin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-25 14:31   ` Catalin Marinas
@ 2023-05-26 16:07     ` Jonathan Cameron
  2023-05-26 16:29       ` Jonathan Cameron
  0 siblings, 1 reply; 31+ messages in thread
From: Jonathan Cameron @ 2023-05-26 16:07 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Thu, 25 May 2023 15:31:34 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> On Thu, May 25, 2023 at 01:31:38PM +0100, Jonathan Cameron wrote:
> > On Wed, 24 May 2023 18:18:49 +0100
> > Catalin Marinas <catalin.marinas@arm.com> wrote:  
> > > Another version of the series reducing the kmalloc() minimum alignment
> > > on arm64 to 8 (from 128). Other architectures can easily opt in by
> > > defining ARCH_KMALLOC_MINALIGN as 8 and selecting
> > > DMA_BOUNCE_UNALIGNED_KMALLOC.
> > > 
> > > The first 10 patches decouple ARCH_KMALLOC_MINALIGN from
> > > ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those
> > > aligned to the run-time probed cache_line_size(). On arm64 we gain the
> > > kmalloc-{64,192} caches.
> > > 
> > > The subsequent patches (11 to 15) further reduce the kmalloc() caches to
> > > kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
> > > buffers in the DMA API.  
> > 
> > I think IIO_DMA_MINALIGN needs to switch to ARCH_DMA_MINALIGN as well.
> > 
> > It's used to force static alignement of buffers with larger structures,
> > to make them suitable for non coherent DMA, similar to your other cases.  
> 
> Ah, I forgot that you introduced that macro. However, at a quick grep, I
> don't think this forced alignment always works as intended (irrespective
> of this series). Let's take an example:
> 
> struct ltc2496_driverdata {
> 	/* this must be the first member */
> 	struct ltc2497core_driverdata common_ddata;
> 	struct spi_device *spi;
> 
> 	/*
> 	 * DMA (thus cache coherency maintenance) may require the
> 	 * transfer buffers to live in their own cache lines.
> 	 */
> 	unsigned char rxbuf[3] __aligned(IIO_DMA_MINALIGN);
> 	unsigned char txbuf[3];
> };
> 
> The rxbuf is aligned to IIO_DMA_MINALIGN, the structure and its size as
> well but txbuf is at an offset of 3 bytes from the aligned
> IIO_DMA_MINALIGN. So basically any cache maintenance on rxbuf would
> corrupt txbuf.

That was intentional (though possibly wrong if I've misunderstood
the underlying issue).

For SPI controllers at least my understanding was that it is safe to
assume that they won't trample on themselves.  The driver doesn't
touch the buffers when DMA is in flight - to do so would indeed result
in corruption.

So whilst we could end up with the SPI master writing stale data back
to txbuf after the transfer it will never matter (as the value is unchanged).
Any flushes in the other direction will end up flushing both rxbuf and
txbuf anyway which is also harmless.

> You need rxbuf to be the only resident of a
> cache line, therefore the next member needs such alignment as well.
> 
> With this series and SWIOTLB enabled, however, if you try to transfer 3
> bytes, they will be bounced, so the missing alignment won't matter much.
> 

Only on arm64?  If the above is wrong, it might cause trouble on
some other architectures.

As a side note spi_write_then_read() goes through a bounce buffer dance
to avoid using dma unsafe buffers. Superficially that looks to me like it might
now end up with an undersized buffer and hence up bouncing which rather
defeats the point of it.  It uses SMP_CACHE_BYTES for the size.

Jonathan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-26 16:07     ` Jonathan Cameron
@ 2023-05-26 16:29       ` Jonathan Cameron
  2023-05-30 13:38         ` Catalin Marinas
  0 siblings, 1 reply; 31+ messages in thread
From: Jonathan Cameron @ 2023-05-26 16:29 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Fri, 26 May 2023 17:07:40 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Thu, 25 May 2023 15:31:34 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> > On Thu, May 25, 2023 at 01:31:38PM +0100, Jonathan Cameron wrote:  
> > > On Wed, 24 May 2023 18:18:49 +0100
> > > Catalin Marinas <catalin.marinas@arm.com> wrote:    
> > > > Another version of the series reducing the kmalloc() minimum alignment
> > > > on arm64 to 8 (from 128). Other architectures can easily opt in by
> > > > defining ARCH_KMALLOC_MINALIGN as 8 and selecting
> > > > DMA_BOUNCE_UNALIGNED_KMALLOC.
> > > > 
> > > > The first 10 patches decouple ARCH_KMALLOC_MINALIGN from
> > > > ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those
> > > > aligned to the run-time probed cache_line_size(). On arm64 we gain the
> > > > kmalloc-{64,192} caches.
> > > > 
> > > > The subsequent patches (11 to 15) further reduce the kmalloc() caches to
> > > > kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
> > > > buffers in the DMA API.    
> > > 
> > > I think IIO_DMA_MINALIGN needs to switch to ARCH_DMA_MINALIGN as well.
> > > 
> > > It's used to force static alignement of buffers with larger structures,
> > > to make them suitable for non coherent DMA, similar to your other cases.    
> > 
> > Ah, I forgot that you introduced that macro. However, at a quick grep, I
> > don't think this forced alignment always works as intended (irrespective
> > of this series). Let's take an example:
> > 
> > struct ltc2496_driverdata {
> > 	/* this must be the first member */
> > 	struct ltc2497core_driverdata common_ddata;
> > 	struct spi_device *spi;
> > 
> > 	/*
> > 	 * DMA (thus cache coherency maintenance) may require the
> > 	 * transfer buffers to live in their own cache lines.
> > 	 */
> > 	unsigned char rxbuf[3] __aligned(IIO_DMA_MINALIGN);
> > 	unsigned char txbuf[3];
> > };
> > 
> > The rxbuf is aligned to IIO_DMA_MINALIGN, the structure and its size as
> > well but txbuf is at an offset of 3 bytes from the aligned
> > IIO_DMA_MINALIGN. So basically any cache maintenance on rxbuf would
> > corrupt txbuf.  
> 
> That was intentional (though possibly wrong if I've misunderstood
> the underlying issue).
> 
> For SPI controllers at least my understanding was that it is safe to
> assume that they won't trample on themselves.  The driver doesn't
> touch the buffers when DMA is in flight - to do so would indeed result
> in corruption.
> 
> So whilst we could end up with the SPI master writing stale data back
> to txbuf after the transfer it will never matter (as the value is unchanged).
> Any flushes in the other direction will end up flushing both rxbuf and
> txbuf anyway which is also harmless.

Adding missing detail.  As the driver never writes txbuf whilst any DMA
is going on, the second cache evict (to flush out any lines that have
crept back into cache after the flush - and write backs - pre DMA) will
find a clean line and will drop it without writing back - thus no corruption.



> 
> > You need rxbuf to be the only resident of a
> > cache line, therefore the next member needs such alignment as well.
> > 
> > With this series and SWIOTLB enabled, however, if you try to transfer 3
> > bytes, they will be bounced, so the missing alignment won't matter much.
> >   
> 
> Only on arm64?  If the above is wrong, it might cause trouble on
> some other architectures.
> 
> As a side note spi_write_then_read() goes through a bounce buffer dance
> to avoid using dma unsafe buffers. Superficially that looks to me like it might
> now end up with an undersized buffer and hence up bouncing which rather
> defeats the point of it.  It uses SMP_CACHE_BYTES for the size.
> 
> Jonathan
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-24 17:19 ` [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
  2023-05-25 15:57   ` Robin Murphy
@ 2023-05-26 16:36   ` Jisheng Zhang
  2023-05-26 19:22     ` Catalin Marinas
  1 sibling, 1 reply; 31+ messages in thread
From: Jisheng Zhang @ 2023-05-26 16:36 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Wed, May 24, 2023 at 06:19:02PM +0100, Catalin Marinas wrote:
> Similarly to the direct DMA, bounce small allocations as they may have
> originated from a kmalloc() cache not safe for DMA. Unlike the direct
> DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
> non-coherent devices as this would break some cases where the iova is
> expected to be contiguous (dmabuf). Instead, scan the scatterlist for
> any small sizes and only go the swiotlb path if any element of the list
> needs bouncing (note that iommu_dma_map_page() would still only bounce
> those buffers which are not DMA-aligned).
> 
> To avoid scanning the scatterlist on the 'sync' operations, introduce an
> SG_DMA_USE_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The
> dev_use_swiotlb() function together with the newly added
> dev_use_sg_swiotlb() now check for both untrusted devices and unaligned
> kmalloc() buffers (suggested by Robin Murphy).
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/Kconfig       |  1 +
>  drivers/iommu/dma-iommu.c   | 50 ++++++++++++++++++++++++++++++-------
>  include/linux/scatterlist.h | 25 +++++++++++++++++--
>  3 files changed, 65 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index db98c3f86e8c..670eff7a8e11 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -152,6 +152,7 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select IRQ_MSI_IOMMU
>  	select NEED_SG_DMA_LENGTH
> +	select NEED_SG_DMA_FLAGS if SWIOTLB
>  
>  # Shared Virtual Addressing
>  config IOMMU_SVA
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7a9f0b0bddbd..24a8b8c2368c 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -520,9 +520,38 @@ static bool dev_is_untrusted(struct device *dev)
>  	return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
>  }
>  
> -static bool dev_use_swiotlb(struct device *dev)
> +static bool dev_use_swiotlb(struct device *dev, size_t size,
> +			    enum dma_data_direction dir)
>  {
> -	return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
> +	return IS_ENABLED(CONFIG_SWIOTLB) &&
> +		(dev_is_untrusted(dev) ||
> +		 dma_kmalloc_needs_bounce(dev, size, dir));
> +}
> +
> +static bool dev_use_sg_swiotlb(struct device *dev, struct scatterlist *sg,
> +			       int nents, enum dma_data_direction dir)
> +{
> +	struct scatterlist *s;
> +	int i;
> +
> +	if (!IS_ENABLED(CONFIG_SWIOTLB))
> +		return false;
> +
> +	if (dev_is_untrusted(dev))
> +		return true;
> +
> +	/*
> +	 * If kmalloc() buffers are not DMA-safe for this device and
> +	 * direction, check the individual lengths in the sg list. If any
> +	 * element is deemed unsafe, use the swiotlb for bouncing.
> +	 */
> +	if (!dma_kmalloc_safe(dev, dir)) {
> +		for_each_sg(sg, s, nents, i)
> +			if (!dma_kmalloc_size_aligned(s->length))
> +				return true;
> +	}
> +
> +	return false;
>  }
>  
>  /**
> @@ -922,7 +951,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>  {
>  	phys_addr_t phys;
>  
> -	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
> +	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>  		return;
>  
>  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> @@ -938,7 +967,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
>  {
>  	phys_addr_t phys;
>  
> -	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
> +	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>  		return;
>  
>  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> @@ -956,7 +985,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>  	struct scatterlist *sg;
>  	int i;
>  
> -	if (dev_use_swiotlb(dev))
> +	if (sg_is_dma_use_swiotlb(sgl))
>  		for_each_sg(sgl, sg, nelems, i)
>  			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
>  						      sg->length, dir);
> @@ -972,7 +1001,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>  	struct scatterlist *sg;
>  	int i;
>  
> -	if (dev_use_swiotlb(dev))
> +	if (sg_is_dma_use_swiotlb(sgl))
>  		for_each_sg(sgl, sg, nelems, i)
>  			iommu_dma_sync_single_for_device(dev,
>  							 sg_dma_address(sg),
> @@ -998,7 +1027,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>  	 * If both the physical buffer start address and size are
>  	 * page aligned, we don't need to use a bounce page.
>  	 */
> -	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
> +	if (dev_use_swiotlb(dev, size, dir) &&
> +	    iova_offset(iovad, phys | size)) {
>  		void *padding_start;
>  		size_t padding_size, aligned_size;
>  
> @@ -1166,6 +1196,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
>  	struct scatterlist *s;
>  	int i;
>  
> +	sg_dma_mark_use_swiotlb(sg);
> +
>  	for_each_sg(sg, s, nents, i) {
>  		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
>  				s->offset, s->length, dir, attrs);
> @@ -1210,7 +1242,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>  			goto out;
>  	}
>  
> -	if (dev_use_swiotlb(dev))
> +	if (dev_use_sg_swiotlb(dev, sg, nents, dir))
>  		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
>  
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> @@ -1315,7 +1347,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>  	struct scatterlist *tmp;
>  	int i;
>  
> -	if (dev_use_swiotlb(dev)) {
> +	if (sg_is_dma_use_swiotlb(sg)) {
>  		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
>  		return;
>  	}
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 87aaf8b5cdb4..330a157c5501 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>  	sg->page_link &= ~SG_END;
>  }
>  
> +#define SG_DMA_BUS_ADDRESS	(1 << 0)
> +#define SG_DMA_USE_SWIOTLB	(1 << 1)
> +
> +#ifdef CONFIG_SWIOTLB

s/CONFIG_SWIOTLB/CONFIG_NEED_SG_DMA_FLAGS ?
Otherwise, there's compiler error if SWIOTLB=y but IOMMU=n

Thanks

> +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
> +{
> +	return sg->dma_flags & SG_DMA_USE_SWIOTLB;
> +}
> +
> +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
> +{
> +	sg->dma_flags |= SG_DMA_USE_SWIOTLB;
> +}
> +#else
> +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
> +{
> +	return false;
> +}
> +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
> +{
> +}
> +#endif
> +
>  /*
>   * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
>   * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
> @@ -256,8 +279,6 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>   */
>  #ifdef CONFIG_PCI_P2PDMA
>  
> -#define SG_DMA_BUS_ADDRESS (1 << 0)
> -
>  /**
>   * sg_dma_is_bus address - Return whether a given segment was marked
>   *			   as a bus address
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-26 16:36   ` Jisheng Zhang
@ 2023-05-26 19:22     ` Catalin Marinas
  2023-05-30 13:01       ` Robin Murphy
  0 siblings, 1 reply; 31+ messages in thread
From: Catalin Marinas @ 2023-05-26 19:22 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Sat, May 27, 2023 at 12:36:30AM +0800, Jisheng Zhang wrote:
> On Wed, May 24, 2023 at 06:19:02PM +0100, Catalin Marinas wrote:
> > diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> > index 87aaf8b5cdb4..330a157c5501 100644
> > --- a/include/linux/scatterlist.h
> > +++ b/include/linux/scatterlist.h
> > @@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
> >  	sg->page_link &= ~SG_END;
> >  }
> >  
> > +#define SG_DMA_BUS_ADDRESS	(1 << 0)
> > +#define SG_DMA_USE_SWIOTLB	(1 << 1)
> > +
> > +#ifdef CONFIG_SWIOTLB
> 
> s/CONFIG_SWIOTLB/CONFIG_NEED_SG_DMA_FLAGS ?
> Otherwise, there's compiler error if SWIOTLB=y but IOMMU=n

Yes, I pushed a fixup to the kmalloc-minalign branch.

> > +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
> > +{
> > +	return sg->dma_flags & SG_DMA_USE_SWIOTLB;
> > +}
> > +
> > +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
> > +{
> > +	sg->dma_flags |= SG_DMA_USE_SWIOTLB;
> > +}
> > +#else
> > +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
> > +{
> > +	return false;
> > +}
> > +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
> > +{
> > +}
> > +#endif

And I wonder whether we should keep these accessors in the scatterlist.h
file. They are only used by the dma-iommu.c code.

-- 
Catalin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-26 19:22     ` Catalin Marinas
@ 2023-05-30 13:01       ` Robin Murphy
  0 siblings, 0 replies; 31+ messages in thread
From: Robin Murphy @ 2023-05-30 13:01 UTC (permalink / raw)
  To: Catalin Marinas, Jisheng Zhang
  Cc: Linus Torvalds, Christoph Hellwig, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On 26/05/2023 8:22 pm, Catalin Marinas wrote:
> On Sat, May 27, 2023 at 12:36:30AM +0800, Jisheng Zhang wrote:
>> On Wed, May 24, 2023 at 06:19:02PM +0100, Catalin Marinas wrote:
>>> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
>>> index 87aaf8b5cdb4..330a157c5501 100644
>>> --- a/include/linux/scatterlist.h
>>> +++ b/include/linux/scatterlist.h
>>> @@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>>>   	sg->page_link &= ~SG_END;
>>>   }
>>>   
>>> +#define SG_DMA_BUS_ADDRESS	(1 << 0)
>>> +#define SG_DMA_USE_SWIOTLB	(1 << 1)
>>> +
>>> +#ifdef CONFIG_SWIOTLB
>>
>> s/CONFIG_SWIOTLB/CONFIG_NEED_SG_DMA_FLAGS ?
>> Otherwise, there's compiler error if SWIOTLB=y but IOMMU=n
> 
> Yes, I pushed a fixup to the kmalloc-minalign branch.

I'd agree that CONFIG_NEED_SG_DMA_FLAGS is the better option.

>>> +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
>>> +{
>>> +	return sg->dma_flags & SG_DMA_USE_SWIOTLB;
>>> +}
>>> +
>>> +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
>>> +{
>>> +	sg->dma_flags |= SG_DMA_USE_SWIOTLB;
>>> +}
>>> +#else
>>> +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
>>> +{
>>> +	return false;
>>> +}
>>> +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
>>> +{
>>> +}
>>> +#endif
> 
> And I wonder whether we should keep these accessors in the scatterlist.h
> file. They are only used by the dma-iommu.c code.

Nah, logically they belong with the definition of the flag itself, which 
certainly should be here. Also once this does land, dma-direct and 
possibly others will be able to take advantage if it too (as a small win 
over repeating the is_swiotlb_buffer() check a lot).

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-26 16:29       ` Jonathan Cameron
@ 2023-05-30 13:38         ` Catalin Marinas
  2023-05-30 16:31           ` Jonathan Cameron
  0 siblings, 1 reply; 31+ messages in thread
From: Catalin Marinas @ 2023-05-30 13:38 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Fri, May 26, 2023 at 05:29:30PM +0100, Jonathan Cameron wrote:
> On Fri, 26 May 2023 17:07:40 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> > On Thu, 25 May 2023 15:31:34 +0100
> > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Thu, May 25, 2023 at 01:31:38PM +0100, Jonathan Cameron wrote:  
> > > > On Wed, 24 May 2023 18:18:49 +0100
> > > > Catalin Marinas <catalin.marinas@arm.com> wrote:    
> > > > > Another version of the series reducing the kmalloc() minimum alignment
> > > > > on arm64 to 8 (from 128). Other architectures can easily opt in by
> > > > > defining ARCH_KMALLOC_MINALIGN as 8 and selecting
> > > > > DMA_BOUNCE_UNALIGNED_KMALLOC.
> > > > > 
> > > > > The first 10 patches decouple ARCH_KMALLOC_MINALIGN from
> > > > > ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those
> > > > > aligned to the run-time probed cache_line_size(). On arm64 we gain the
> > > > > kmalloc-{64,192} caches.
> > > > > 
> > > > > The subsequent patches (11 to 15) further reduce the kmalloc() caches to
> > > > > kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
> > > > > buffers in the DMA API.    
> > > > 
> > > > I think IIO_DMA_MINALIGN needs to switch to ARCH_DMA_MINALIGN as well.
> > > > 
> > > > It's used to force static alignement of buffers with larger structures,
> > > > to make them suitable for non coherent DMA, similar to your other cases.    
> > > 
> > > Ah, I forgot that you introduced that macro. However, at a quick grep, I
> > > don't think this forced alignment always works as intended (irrespective
> > > of this series). Let's take an example:
> > > 
> > > struct ltc2496_driverdata {
> > > 	/* this must be the first member */
> > > 	struct ltc2497core_driverdata common_ddata;
> > > 	struct spi_device *spi;
> > > 
> > > 	/*
> > > 	 * DMA (thus cache coherency maintenance) may require the
> > > 	 * transfer buffers to live in their own cache lines.
> > > 	 */
> > > 	unsigned char rxbuf[3] __aligned(IIO_DMA_MINALIGN);
> > > 	unsigned char txbuf[3];
> > > };
> > > 
> > > The rxbuf is aligned to IIO_DMA_MINALIGN, the structure and its size as
> > > well but txbuf is at an offset of 3 bytes from the aligned
> > > IIO_DMA_MINALIGN. So basically any cache maintenance on rxbuf would
> > > corrupt txbuf.  
> > 
> > That was intentional (though possibly wrong if I've misunderstood
> > the underlying issue).
> > 
> > For SPI controllers at least my understanding was that it is safe to
> > assume that they won't trample on themselves.  The driver doesn't
> > touch the buffers when DMA is in flight - to do so would indeed result
> > in corruption.
> > 
> > So whilst we could end up with the SPI master writing stale data back
> > to txbuf after the transfer it will never matter (as the value is unchanged).
> > Any flushes in the other direction will end up flushing both rxbuf and
> > txbuf anyway which is also harmless.
> 
> Adding missing detail.  As the driver never writes txbuf whilst any DMA
> is going on, the second cache evict (to flush out any lines that have
> crept back into cache after the flush - and write backs - pre DMA) will
> find a clean line and will drop it without writing back - thus no corruption.

Thanks for the clarification. One more thing, can the txbuf be written
prior to the DMA_FROM_DEVICE transfer into rxbuf? Or the txbuf writing
is always followed by a DMA_TO_DEVICE mapping (which would flush the
cache line).

-- 
Catalin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-30 13:38         ` Catalin Marinas
@ 2023-05-30 16:31           ` Jonathan Cameron
  0 siblings, 0 replies; 31+ messages in thread
From: Jonathan Cameron @ 2023-05-30 16:31 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Tue, 30 May 2023 14:38:55 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> On Fri, May 26, 2023 at 05:29:30PM +0100, Jonathan Cameron wrote:
> > On Fri, 26 May 2023 17:07:40 +0100
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:  
> > > On Thu, 25 May 2023 15:31:34 +0100
> > > Catalin Marinas <catalin.marinas@arm.com> wrote:  
> > > > On Thu, May 25, 2023 at 01:31:38PM +0100, Jonathan Cameron wrote:    
> > > > > On Wed, 24 May 2023 18:18:49 +0100
> > > > > Catalin Marinas <catalin.marinas@arm.com> wrote:      
> > > > > > Another version of the series reducing the kmalloc() minimum alignment
> > > > > > on arm64 to 8 (from 128). Other architectures can easily opt in by
> > > > > > defining ARCH_KMALLOC_MINALIGN as 8 and selecting
> > > > > > DMA_BOUNCE_UNALIGNED_KMALLOC.
> > > > > > 
> > > > > > The first 10 patches decouple ARCH_KMALLOC_MINALIGN from
> > > > > > ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those
> > > > > > aligned to the run-time probed cache_line_size(). On arm64 we gain the
> > > > > > kmalloc-{64,192} caches.
> > > > > > 
> > > > > > The subsequent patches (11 to 15) further reduce the kmalloc() caches to
> > > > > > kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
> > > > > > buffers in the DMA API.      
> > > > > 
> > > > > I think IIO_DMA_MINALIGN needs to switch to ARCH_DMA_MINALIGN as well.
> > > > > 
> > > > > It's used to force static alignement of buffers with larger structures,
> > > > > to make them suitable for non coherent DMA, similar to your other cases.      
> > > > 
> > > > Ah, I forgot that you introduced that macro. However, at a quick grep, I
> > > > don't think this forced alignment always works as intended (irrespective
> > > > of this series). Let's take an example:
> > > > 
> > > > struct ltc2496_driverdata {
> > > > 	/* this must be the first member */
> > > > 	struct ltc2497core_driverdata common_ddata;
> > > > 	struct spi_device *spi;
> > > > 
> > > > 	/*
> > > > 	 * DMA (thus cache coherency maintenance) may require the
> > > > 	 * transfer buffers to live in their own cache lines.
> > > > 	 */
> > > > 	unsigned char rxbuf[3] __aligned(IIO_DMA_MINALIGN);
> > > > 	unsigned char txbuf[3];
> > > > };
> > > > 
> > > > The rxbuf is aligned to IIO_DMA_MINALIGN, the structure and its size as
> > > > well but txbuf is at an offset of 3 bytes from the aligned
> > > > IIO_DMA_MINALIGN. So basically any cache maintenance on rxbuf would
> > > > corrupt txbuf.    
> > > 
> > > That was intentional (though possibly wrong if I've misunderstood
> > > the underlying issue).
> > > 
> > > For SPI controllers at least my understanding was that it is safe to
> > > assume that they won't trample on themselves.  The driver doesn't
> > > touch the buffers when DMA is in flight - to do so would indeed result
> > > in corruption.
> > > 
> > > So whilst we could end up with the SPI master writing stale data back
> > > to txbuf after the transfer it will never matter (as the value is unchanged).
> > > Any flushes in the other direction will end up flushing both rxbuf and
> > > txbuf anyway which is also harmless.  
> > 
> > Adding missing detail.  As the driver never writes txbuf whilst any DMA
> > is going on, the second cache evict (to flush out any lines that have
> > crept back into cache after the flush - and write backs - pre DMA) will
> > find a clean line and will drop it without writing back - thus no corruption.  
> 
> Thanks for the clarification. One more thing, can the txbuf be written
> prior to the DMA_FROM_DEVICE transfer into rxbuf? Or the txbuf writing
> is always followed by a DMA_TO_DEVICE mapping (which would flush the
> cache line).
> 

In practice, the driver should never write the txbuf unless it's about to
use it for a transfer (and a lock prevents anything racing against that -
the lock covering both tx and rx buffers), then the SPI controller would
have to follow it with a DMA_TO_DEVICE mapping.  

Having said that there might be examples in tree where a really weird sequence occurs.
1. tx is written with a default value (say in probe()).
2. An rx only transfer occurs first.
Would require something like a device that needs a read to wake it up before it'll
take any transmitted data. Or one that resets on a particularly long read (yuk,
the ad7476 does that, but it has no tx buffer so that's fine).
I don't think we have that case, but may be worth looking out for in future.

In more detail:
The two buffers are either both used for a given call to the SPI core driver
or they are used individually.
In any case where the data is used by host or device the mappings should be
fine.

TX and RX pair.
 - Take lock
 - Write tx from CPU
 - Pass both tx and rx to the spi controller which will map tx line DMA_TO_DEVICE and
   rx DMA_FROM_DEVICE - write back the line to memory (dcache_clean_poc() for tx map
   and rx map)
 - On interrupt or similar spi controller will use unmap DMA_FROM_DEVICE on the rx buffer - caches
   drop what they think are clean lines, so read will then be from memory
   (dcache_inval_poc() for rx unmap, noop  for tx unmap - but taken out anyway by the
   rx one which is fine)
 - Read rx content from CPU.
 - Release lock

TX only
 - Take lock
 - Write tx from CPU
 - tx only passed to spi controller ... DMA_TO_DEVICE... write back line to memory
 - No unmap DMA_FROM_DEVICE needed as nothing updated by device (device reads only) so no need
   to flush anything any cached lines are still correct.
 - Mapping torn down.
 - Release lock

RX only
 - Take lock
 - rx only passed to spi controller..  DMA_FROM_DEVICE.. cleans rx (probably clean anyway)
 - Device fills rx in memory.
 - rx DMA_FROM_DEVICE to drop any clean lines from cache and ensures we get rx from memory. 
 - Mapping torn down.
 - CPU reads rx
 - Release lock

I've probably missed some mappings in there, but short version is that rx and tx are
treated as one resource by the driver will all access serialized.  In some cases we will
get bonus flushes but shouldn't hurt anything.

Jonathan

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-05-30 16:31 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-24 17:18 [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
2023-05-24 17:18 ` [PATCH v5 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
2023-05-24 17:18 ` [PATCH v5 02/15] dma: Allow dma_get_cache_alignment() to be overridden by the arch code Catalin Marinas
2023-05-25 13:59   ` Christoph Hellwig
2023-05-24 17:18 ` [PATCH v5 03/15] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
2023-05-24 17:18 ` [PATCH v5 04/15] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
2023-05-24 17:18 ` [PATCH v5 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
2023-05-24 17:18 ` [PATCH v5 06/15] drivers/gpu: " Catalin Marinas
2023-05-24 17:18 ` [PATCH v5 07/15] drivers/usb: " Catalin Marinas
2023-05-24 17:18 ` [PATCH v5 08/15] drivers/spi: " Catalin Marinas
2023-05-24 17:18 ` [PATCH v5 09/15] drivers/md: " Catalin Marinas
2023-05-25 14:00   ` Christoph Hellwig
2023-05-24 17:18 ` [PATCH v5 10/15] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
2023-05-24 17:19 ` [PATCH v5 11/15] scatterlist: Add dedicated config for DMA flags Catalin Marinas
2023-05-24 17:19 ` [PATCH v5 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
2023-05-25 15:53   ` Robin Murphy
2023-05-24 17:19 ` [PATCH v5 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
2023-05-25 15:57   ` Robin Murphy
2023-05-26 16:36   ` Jisheng Zhang
2023-05-26 19:22     ` Catalin Marinas
2023-05-30 13:01       ` Robin Murphy
2023-05-24 17:19 ` [PATCH v5 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
2023-05-24 17:19 ` [PATCH v5 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
2023-05-25 16:12   ` Robin Murphy
2023-05-25 17:08     ` Catalin Marinas
2023-05-25 12:31 ` [PATCH v5 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Jonathan Cameron
2023-05-25 14:31   ` Catalin Marinas
2023-05-26 16:07     ` Jonathan Cameron
2023-05-26 16:29       ` Jonathan Cameron
2023-05-30 13:38         ` Catalin Marinas
2023-05-30 16:31           ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).