linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
@ 2023-05-18 17:33 Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
                   ` (15 more replies)
  0 siblings, 16 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

Hi,

That's the fourth version of the series reducing the kmalloc() minimum
alignment on arm64 to 8 (from 128).

The first 10 patches decouple ARCH_KMALLOC_MINALIGN from
ARCH_DMA_MINALIGN and, for arm64, it limits the kmalloc() caches to
those aligned to the run-time probed cache_line_size(). The advantage on
arm64 is that we gain the kmalloc-{64,192} caches.

The subsequent patches (11 to 15) further reduce the kmalloc() caches to
kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
buffers in the DMA API. For iommu, following discussions with Robin, we
concluded that it's still simpler to walk the sg list if the device is
non-coherent and follow the bouncing path when any of the elements may
originate from a small kmalloc() allocation.

Main changes since v3:

- Reorganise the series so that the first 10 patches could be applied
  before the DMA bouncing. They are still useful on arm64 reducing the
  kmalloc() alignment to 64.

- There is no dma_sg_kmalloc_needs_bounce() function, it has been
  unrolled in the iommu_dma_sync_sg_for_device() function.

- No crypto changes needed following Herbert's reworking of the crypto
  code (thanks!).

The patches are also available on this branch:

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/kmalloc-minalign

Thanks.

Catalin Marinas (14):
  mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  dma: Allow dma_get_cache_alignment() to return the smaller
    cache_line_size()
  mm/slab: Simplify create_kmalloc_cache() args and make it static
  mm/slab: Limit kmalloc() minimum alignment to
    dma_get_cache_alignment()
  drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/gpu: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/spi: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/md: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  arm64: Allow kmalloc() caches aligned to the smaller cache_line_size()
  dma-mapping: Force bouncing if the kmalloc() size is not
    cache-line-aligned
  iommu/dma: Force bouncing if the size is not cacheline-aligned
  mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing
    possible
  arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64

Robin Murphy (1):
  scatterlist: Add dedicated config for DMA flags

 arch/arm64/Kconfig             |  2 ++
 arch/arm64/include/asm/cache.h |  1 +
 arch/arm64/mm/init.c           |  7 ++++-
 drivers/base/devres.c          |  6 ++---
 drivers/gpu/drm/drm_managed.c  |  6 ++---
 drivers/iommu/dma-iommu.c      | 25 ++++++++++++++----
 drivers/md/dm-crypt.c          |  2 +-
 drivers/pci/Kconfig            |  1 +
 drivers/spi/spidev.c           |  2 +-
 drivers/usb/core/buffer.c      |  8 +++---
 include/linux/dma-map-ops.h    | 48 ++++++++++++++++++++++++++++++++++
 include/linux/dma-mapping.h    |  4 ++-
 include/linux/scatterlist.h    | 29 +++++++++++++++++---
 include/linux/slab.h           | 16 +++++++++---
 kernel/dma/Kconfig             | 19 ++++++++++++++
 kernel/dma/direct.h            |  3 ++-
 mm/slab.c                      |  6 +----
 mm/slab.h                      |  5 ++--
 mm/slab_common.c               | 43 +++++++++++++++++++++++-------
 19 files changed, 188 insertions(+), 45 deletions(-)



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-19 15:49   ` Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 02/15] dma: Allow dma_get_cache_alignment() to return the smaller cache_line_size() Catalin Marinas
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

In preparation for supporting a kmalloc() minimum alignment smaller than
the arch DMA alignment, decouple the two definitions. This requires that
either the kmalloc() caches are aligned to a (run-time) cache-line size
or the DMA API bounces unaligned kmalloc() allocations. Subsequent
patches will implement both options.

After this patch, ARCH_DMA_MINALIGN is expected to be used in static
alignment annotations and defined by an architecture to be the maximum
alignment for all supported configurations/SoCs in a single Image.
Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to
define its value in the arch headers.

Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in
dma_get_cache_alignment() so that there is no change for architectures
not requiring a minimum DMA alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 include/linux/dma-mapping.h |  2 +-
 include/linux/slab.h        | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0ee20b764000..3288a1339271 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -545,7 +545,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
 
 static inline int dma_get_cache_alignment(void)
 {
-#ifdef ARCH_DMA_MINALIGN
+#ifdef ARCH_HAS_DMA_MINALIGN
 	return ARCH_DMA_MINALIGN;
 #endif
 	return 1;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 6b3e155b70bf..3f76e7c53ada 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -235,14 +235,24 @@ void kmem_dump_obj(void *object);
  * alignment larger than the alignment of a 64-bit integer.
  * Setting ARCH_DMA_MINALIGN in arch headers allows that.
  */
-#if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
+#ifdef ARCH_DMA_MINALIGN
+#define ARCH_HAS_DMA_MINALIGN
+#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
 #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
-#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
-#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
+#endif
 #else
+#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
+#endif
+
+#ifndef ARCH_KMALLOC_MINALIGN
 #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
 #endif
 
+#if ARCH_KMALLOC_MINALIGN > 8
+#define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN
+#define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE)
+#endif
+
 /*
  * Setting ARCH_SLAB_MINALIGN in arch headers allows a different alignment.
  * Intended for arches that get misalignment faults even for 64 bit integer


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 02/15] dma: Allow dma_get_cache_alignment() to return the smaller cache_line_size()
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-20  5:42   ` Christoph Hellwig
  2023-05-18 17:33 ` [PATCH v4 03/15] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On architectures like arm64, ARCH_DMA_MINALIGN is larger than most cache
line size configurations deployed. However, the single kernel binary
requirement doesn't allow the smaller ARCH_DMA_MINALIGN. Permit an
architecture to opt in to dma_get_cache_alignment() returning
cache_line_size() which can be probed at run-time.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/dma-mapping.h | 2 ++
 kernel/dma/Kconfig          | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 3288a1339271..b29124341317 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -545,6 +545,8 @@ static inline int dma_set_min_align_mask(struct device *dev,
 
 static inline int dma_get_cache_alignment(void)
 {
+	if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CACHE_LINE_SIZE))
+		return cache_line_size();
 #ifdef ARCH_HAS_DMA_MINALIGN
 	return ARCH_DMA_MINALIGN;
 #endif
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 6677d0e64d27..cc750062c412 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -76,6 +76,13 @@ config ARCH_HAS_DMA_PREP_COHERENT
 config ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	bool
 
+config ARCH_HAS_DMA_CACHE_LINE_SIZE
+	bool
+	help
+	  Select if the architecture has non-coherent DMA and
+	  defines ARCH_DMA_MINALIGN larger than a run-time
+	  cache_line_size().
+
 #
 # Select this option if the architecture assumes DMA devices are coherent
 # by default.


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 03/15] mm/slab: Simplify create_kmalloc_cache() args and make it static
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 02/15] dma: Allow dma_get_cache_alignment() to return the smaller cache_line_size() Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 04/15] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

In the slab variant of kmem_cache_init(), call new_kmalloc_cache()
instead of initialising the kmalloc_caches array directly. With this,
create_kmalloc_cache() is now only called from new_kmalloc_cache() in
the same file, so make it static. In addition, the useroffset argument
is always 0 while usersize is the same as size. Remove them.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c        |  6 +-----
 mm/slab.h        |  5 ++---
 mm/slab_common.c | 14 ++++++--------
 3 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index bb57f7fdbae1..b7817dcba63e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1240,11 +1240,7 @@ void __init kmem_cache_init(void)
 	 * Initialize the caches that provide memory for the  kmem_cache_node
 	 * structures first.  Without this, further allocations will bug.
 	 */
-	kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache(
-				kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL],
-				kmalloc_info[INDEX_NODE].size,
-				ARCH_KMALLOC_FLAGS, 0,
-				kmalloc_info[INDEX_NODE].size);
+	new_kmalloc_cache(INDEX_NODE, KMALLOC_NORMAL, ARCH_KMALLOC_FLAGS);
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
diff --git a/mm/slab.h b/mm/slab.h
index f01ac256a8f5..592590fcddae 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -255,9 +255,8 @@ gfp_t kmalloc_fix_flags(gfp_t flags);
 /* Functions provided by the slab allocators */
 int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags);
 
-struct kmem_cache *create_kmalloc_cache(const char *name, unsigned int size,
-			slab_flags_t flags, unsigned int useroffset,
-			unsigned int usersize);
+void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type,
+			      slab_flags_t flags);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
 			unsigned int size, slab_flags_t flags,
 			unsigned int useroffset, unsigned int usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 607249785c07..7f069159aee2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -658,17 +658,16 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
 	s->refcount = -1;	/* Exempt from merging for now */
 }
 
-struct kmem_cache *__init create_kmalloc_cache(const char *name,
-		unsigned int size, slab_flags_t flags,
-		unsigned int useroffset, unsigned int usersize)
+static struct kmem_cache *__init create_kmalloc_cache(const char *name,
+						      unsigned int size,
+						      slab_flags_t flags)
 {
 	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags | SLAB_KMALLOC, useroffset,
-								usersize);
+	create_boot_cache(s, name, size, flags | SLAB_KMALLOC, 0, size);
 	list_add(&s->list, &slab_caches);
 	s->refcount = 1;
 	return s;
@@ -863,7 +862,7 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
-static void __init
+void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 {
 	if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) {
@@ -880,8 +879,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 
 	kmalloc_caches[type][idx] = create_kmalloc_cache(
 					kmalloc_info[idx].name[type],
-					kmalloc_info[idx].size, flags, 0,
-					kmalloc_info[idx].size);
+					kmalloc_info[idx].size, flags);
 
 	/*
 	 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 04/15] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment()
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (2 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 03/15] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

Do not create kmalloc() caches which are not aligned to
dma_get_cache_alignment(). There is no functional change since for
current architectures defining ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN
equals ARCH_DMA_MINALIGN (and dma_get_cache_alignment()). On
architectures without a specific ARCH_DMA_MINALIGN,
dma_get_cache_alignment() is 1, so no change to the kmalloc() caches.

If an architecture selects ARCH_HAS_DMA_CACHE_LINE_SIZE (introduced
previously), the kmalloc() caches will be aligned to a cache line size.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 mm/slab_common.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7f069159aee2..7c6475847fdf 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -17,6 +17,7 @@
 #include <linux/cpu.h>
 #include <linux/uaccess.h>
 #include <linux/seq_file.h>
+#include <linux/dma-mapping.h>
 #include <linux/proc_fs.h>
 #include <linux/debugfs.h>
 #include <linux/kasan.h>
@@ -862,9 +863,18 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
+static unsigned int __kmalloc_minalign(void)
+{
+	return dma_get_cache_alignment();
+}
+
 void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 {
+	unsigned int minalign = __kmalloc_minalign();
+	unsigned int aligned_size = kmalloc_info[idx].size;
+	int aligned_idx = idx;
+
 	if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) {
 		flags |= SLAB_RECLAIM_ACCOUNT;
 	} else if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type == KMALLOC_CGROUP)) {
@@ -877,9 +887,17 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 		flags |= SLAB_CACHE_DMA;
 	}
 
-	kmalloc_caches[type][idx] = create_kmalloc_cache(
-					kmalloc_info[idx].name[type],
-					kmalloc_info[idx].size, flags);
+	if (minalign > ARCH_KMALLOC_MINALIGN) {
+		aligned_size = ALIGN(aligned_size, minalign);
+		aligned_idx = __kmalloc_index(aligned_size, false);
+	}
+
+	if (!kmalloc_caches[type][aligned_idx])
+		kmalloc_caches[type][aligned_idx] = create_kmalloc_cache(
+					kmalloc_info[aligned_idx].name[type],
+					aligned_size, flags);
+	if (idx != aligned_idx)
+		kmalloc_caches[type][idx] = kmalloc_caches[type][aligned_idx];
 
 	/*
 	 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (3 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 04/15] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-19  9:41   ` Greg Kroah-Hartman
  2023-05-18 17:33 ` [PATCH v4 06/15] drivers/gpu: " Catalin Marinas
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
---
 drivers/base/devres.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 5c998cfac335..3df0025d12aa 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -29,10 +29,10 @@ struct devres {
 	 * Some archs want to perform DMA into kmalloc caches
 	 * and need a guaranteed alignment larger than
 	 * the alignment of a 64-bit integer.
-	 * Thus we use ARCH_KMALLOC_MINALIGN here and get exactly the same
-	 * buffer alignment as if it was allocated by plain kmalloc().
+	 * Thus we use ARCH_DMA_MINALIGN for data[] which will force the same
+	 * alignment for struct devres when allocated by kmalloc().
 	 */
-	u8 __aligned(ARCH_KMALLOC_MINALIGN) data[];
+	u8 __aligned(ARCH_DMA_MINALIGN) data[];
 };
 
 struct devres_group {


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 06/15] drivers/gpu: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (4 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 07/15] drivers/usb: " Catalin Marinas
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/drm_managed.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c
index 4cf214de50c4..3a5802f60e65 100644
--- a/drivers/gpu/drm/drm_managed.c
+++ b/drivers/gpu/drm/drm_managed.c
@@ -49,10 +49,10 @@ struct drmres {
 	 * Some archs want to perform DMA into kmalloc caches
 	 * and need a guaranteed alignment larger than
 	 * the alignment of a 64-bit integer.
-	 * Thus we use ARCH_KMALLOC_MINALIGN here and get exactly the same
-	 * buffer alignment as if it was allocated by plain kmalloc().
+	 * Thus we use ARCH_DMA_MINALIGN for data[] which will force the same
+	 * alignment for struct drmres when allocated by kmalloc().
 	 */
-	u8 __aligned(ARCH_KMALLOC_MINALIGN) data[];
+	u8 __aligned(ARCH_DMA_MINALIGN) data[];
 };
 
 static void free_dr(struct drmres *dr)


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 07/15] drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (5 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 06/15] drivers/gpu: " Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-19  9:41   ` Greg Kroah-Hartman
  2023-05-18 17:33 ` [PATCH v4 08/15] drivers/spi: " Catalin Marinas
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/core/buffer.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
index fbb087b728dc..e21d8d106977 100644
--- a/drivers/usb/core/buffer.c
+++ b/drivers/usb/core/buffer.c
@@ -34,13 +34,13 @@ void __init usb_init_pool_max(void)
 {
 	/*
 	 * The pool_max values must never be smaller than
-	 * ARCH_KMALLOC_MINALIGN.
+	 * ARCH_DMA_MINALIGN.
 	 */
-	if (ARCH_KMALLOC_MINALIGN <= 32)
+	if (ARCH_DMA_MINALIGN <= 32)
 		;			/* Original value is okay */
-	else if (ARCH_KMALLOC_MINALIGN <= 64)
+	else if (ARCH_DMA_MINALIGN <= 64)
 		pool_max[0] = 64;
-	else if (ARCH_KMALLOC_MINALIGN <= 128)
+	else if (ARCH_DMA_MINALIGN <= 128)
 		pool_max[0] = 0;	/* Don't use this pool */
 	else
 		BUILD_BUG();		/* We don't allow this */


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 08/15] drivers/spi: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (6 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 07/15] drivers/usb: " Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 09/15] drivers/md: " Catalin Marinas
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Brown <broonie@kernel.org>
---
 drivers/spi/spidev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 39d94c850839..8d009275a59d 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -237,7 +237,7 @@ static int spidev_message(struct spidev_data *spidev,
 		/* Ensure that also following allocations from rx_buf/tx_buf will meet
 		 * DMA alignment requirements.
 		 */
-		unsigned int len_aligned = ALIGN(u_tmp->len, ARCH_KMALLOC_MINALIGN);
+		unsigned int len_aligned = ALIGN(u_tmp->len, ARCH_DMA_MINALIGN);
 
 		k_tmp->len = u_tmp->len;
 


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 09/15] drivers/md: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (7 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 08/15] drivers/spi: " Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 10/15] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-crypt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 8b47b913ee83..ebbd8f7db880 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3256,7 +3256,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	cc->per_bio_data_size = ti->per_io_data_size =
 		ALIGN(sizeof(struct dm_crypt_io) + cc->dmreq_start + additional_req_size,
-		      ARCH_KMALLOC_MINALIGN);
+		      ARCH_DMA_MINALIGN);
 
 	ret = mempool_init(&cc->page_pool, BIO_MAX_VECS, crypt_page_alloc, crypt_page_free, cc);
 	if (ret) {


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 10/15] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size()
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (8 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 09/15] drivers/md: " Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-18 17:33 ` [PATCH v4 11/15] scatterlist: Add dedicated config for DMA flags Catalin Marinas
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On arm64, ARCH_DMA_MINALIGN is 128, larger than the cache line size on
most of the current platforms (typically 64). Select
ARCH_DMA_CACHE_LINE_SIZE and define ARCH_KMALLOC_MINALIGN to 8 (the
default for architectures without their own ARCH_DMA_MINALIGN). The
kmalloc() caches will be limited to the run-time value of
cache_line_size(). Typically, this will allow the additional
kmalloc-{64,192} caches on most arm64 platforms.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig             | 1 +
 arch/arm64/include/asm/cache.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1201d25a8a4..d11340b41703 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -23,6 +23,7 @@ config ARM64
 	select ARCH_HAS_CURRENT_STACK_POINTER
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEBUG_VM_PGTABLE
+	select ARCH_HAS_DMA_CACHE_LINE_SIZE
 	select ARCH_HAS_DMA_PREP_COHERENT
 	select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
 	select ARCH_HAS_FAST_MULTIPLIER
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a51e6e8f3171..e24c10192636 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -33,6 +33,7 @@
  * the CPU.
  */
 #define ARCH_DMA_MINALIGN	(128)
+#define ARCH_KMALLOC_MINALIGN	(8)
 
 #ifndef __ASSEMBLY__
 


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 11/15] scatterlist: Add dedicated config for DMA flags
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (9 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 10/15] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
@ 2023-05-18 17:33 ` Catalin Marinas
  2023-05-20  5:42   ` Christoph Hellwig
  2023-05-18 17:34 ` [PATCH v4 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:33 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

From: Robin Murphy <robin.murphy@arm.com>

The DMA flags field will be useful for users beyond PCI P2P, so upgrade
to its own dedicated config option.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 drivers/pci/Kconfig         | 1 +
 include/linux/scatterlist.h | 4 ++--
 kernel/dma/Kconfig          | 3 +++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 9309f2469b41..3c07d8d214b3 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -168,6 +168,7 @@ config PCI_P2PDMA
 	#
 	depends on 64BIT
 	select GENERIC_ALLOCATOR
+	select NEED_SG_DMA_FLAGS
 	help
 	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
 	  BARs that are exposed in other devices that are the part of
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 375a5e90d86a..87aaf8b5cdb4 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -16,7 +16,7 @@ struct scatterlist {
 #ifdef CONFIG_NEED_SG_DMA_LENGTH
 	unsigned int	dma_length;
 #endif
-#ifdef CONFIG_PCI_P2PDMA
+#ifdef CONFIG_NEED_SG_DMA_FLAGS
 	unsigned int    dma_flags;
 #endif
 };
@@ -249,7 +249,7 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 }
 
 /*
- * CONFGI_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
+ * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
  * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
  * Use this padding for DMA flags bits to indicate when a specific
  * dma address is a bus address.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index cc750062c412..3e2aab296986 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -24,6 +24,9 @@ config DMA_OPS_BYPASS
 config ARCH_HAS_DMA_MAP_DIRECT
 	bool
 
+config NEED_SG_DMA_FLAGS
+	bool
+
 config NEED_SG_DMA_LENGTH
 	bool
 


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (10 preceding siblings ...)
  2023-05-18 17:33 ` [PATCH v4 11/15] scatterlist: Add dedicated config for DMA flags Catalin Marinas
@ 2023-05-18 17:34 ` Catalin Marinas
  2023-05-20  5:44   ` Christoph Hellwig
  2023-05-18 17:34 ` [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:34 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

For direct DMA, if the size is small enough to have originated from a
kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against
dma_get_cache_alignment() and bounce if necessary. For larger sizes, it
is the responsibility of the DMA API caller to ensure proper alignment.

At this point, the kmalloc() caches are properly aligned but this will
change in a subsequent patch.

Architectures can opt in by selecting ARCH_WANT_KMALLOC_DMA_BOUNCE.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 include/linux/dma-map-ops.h | 48 +++++++++++++++++++++++++++++++++++++
 kernel/dma/Kconfig          |  9 +++++++
 kernel/dma/direct.h         |  3 ++-
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 31f114f486c4..43bf50c35e14 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -8,6 +8,7 @@
 
 #include <linux/dma-mapping.h>
 #include <linux/pgtable.h>
+#include <linux/slab.h>
 
 struct cma;
 
@@ -277,6 +278,53 @@ static inline bool dev_is_dma_coherent(struct device *dev)
 }
 #endif /* CONFIG_ARCH_HAS_DMA_COHERENCE_H */
 
+/*
+ * Check whether potential kmalloc() buffers are safe for non-coherent DMA.
+ */
+static inline bool dma_kmalloc_safe(struct device *dev,
+				    enum dma_data_direction dir)
+{
+	/*
+	 * If DMA bouncing of kmalloc() buffers is disabled, the kmalloc()
+	 * caches have already been aligned to a DMA-safe size.
+	 */
+	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC))
+		return true;
+
+	/*
+	 * kmalloc() buffers are DMA-safe irrespective of size if the device
+	 * is coherent or the direction is DMA_TO_DEVICE (non-desctructive
+	 * cache maintenance and benign cache line evictions).
+	 */
+	if (dev_is_dma_coherent(dev) || dir == DMA_TO_DEVICE)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check whether the given size, assuming it is for a kmalloc()'ed buffer, is
+ * sufficiently aligned for non-coherent DMA.
+ */
+static inline bool dma_kmalloc_size_aligned(size_t size)
+{
+	/*
+	 * Larger kmalloc() sizes are guaranteed to be aligned to
+	 * ARCH_DMA_MINALIGN.
+	 */
+	if (size >= 2 * ARCH_DMA_MINALIGN ||
+	    IS_ALIGNED(kmalloc_size_roundup(size), dma_get_cache_alignment()))
+		return true;
+
+	return false;
+}
+
+static inline bool dma_kmalloc_needs_bounce(struct device *dev, size_t size,
+					    enum dma_data_direction dir)
+{
+	return !dma_kmalloc_safe(dev, dir) && !dma_kmalloc_size_aligned(size);
+}
+
 void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 3e2aab296986..18dd03c74734 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -97,6 +97,15 @@ config SWIOTLB
 	bool
 	select NEED_DMA_MAP_STATE
 
+config ARCH_WANT_KMALLOC_DMA_BOUNCE
+	bool
+
+config DMA_BOUNCE_UNALIGNED_KMALLOC
+	def_bool y
+	depends on ARCH_WANT_KMALLOC_DMA_BOUNCE
+	depends on SWIOTLB
+	select NEED_SG_DMA_FLAGS
+
 config DMA_RESTRICTED_POOL
 	bool "DMA Restricted Pool"
 	depends on OF && OF_RESERVED_MEM && SWIOTLB
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index e38ffc5e6bdd..97ec892ea0b5 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -94,7 +94,8 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
 		return swiotlb_map(dev, phys, size, dir, attrs);
 	}
 
-	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
+	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
+	    dma_kmalloc_needs_bounce(dev, size, dir)) {
 		if (is_pci_p2pdma_page(page))
 			return DMA_MAPPING_ERROR;
 		if (is_swiotlb_active(dev))


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (11 preceding siblings ...)
  2023-05-18 17:34 ` [PATCH v4 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
@ 2023-05-18 17:34 ` Catalin Marinas
  2023-05-19 12:29   ` Robin Murphy
  2023-05-18 17:34 ` [PATCH v4 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:34 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

Similarly to the direct DMA, bounce small allocations as they may have
originated from a kmalloc() cache not safe for DMA. Unlike the direct
DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
non-coherent devices as this would break some cases where the iova is
expected to be contiguous (dmabuf). Instead, scan the scatterlist for
any small sizes and only go the swiotlb path if any element of the list
needs bouncing (note that iommu_dma_map_page() would still only bounce
those buffers which are not DMA-aligned).

To avoid scanning the scatterlist on the 'sync' operations, introduce a
SG_DMA_BOUNCED flag set during the iommu_dma_map_sg() call (suggested by
Robin Murphy).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/dma-iommu.c   | 25 ++++++++++++++++++++-----
 include/linux/scatterlist.h | 25 +++++++++++++++++++++++--
 2 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7a9f0b0bddbd..ab1c1681c06e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -956,7 +956,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
 						      sg->length, dir);
@@ -972,7 +972,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_device(dev,
 							 sg_dma_address(sg),
@@ -998,7 +998,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	 * If both the physical buffer start address and size are
 	 * page aligned, we don't need to use a bounce page.
 	 */
-	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
+	if ((dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) ||
+	    dma_kmalloc_needs_bounce(dev, size, dir)) {
 		void *padding_start;
 		size_t padding_size, aligned_size;
 
@@ -1210,7 +1211,21 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 			goto out;
 	}
 
-	if (dev_use_swiotlb(dev))
+	/*
+	 * If kmalloc() buffers are not DMA-safe for this device and
+	 * direction, check the individual lengths in the sg list. If one of
+	 * the buffers is deemed unsafe, follow the iommu_dma_map_sg_swiotlb()
+	 * path for potential bouncing.
+	 */
+	if (!dma_kmalloc_safe(dev, dir)) {
+		for_each_sg(sg, s, nents, i)
+			if (!dma_kmalloc_size_aligned(s->length)) {
+				sg_dma_mark_bounced(sg);
+				break;
+			}
+	}
+
+	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sg))
 		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
@@ -1315,7 +1330,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *tmp;
 	int i;
 
-	if (dev_use_swiotlb(dev)) {
+	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sg)) {
 		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
 		return;
 	}
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 87aaf8b5cdb4..9306880cae1c 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 	sg->page_link &= ~SG_END;
 }
 
+#define SG_DMA_BUS_ADDRESS	(1 << 0)
+#define SG_DMA_BOUNCED		(1 << 1)
+
+#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
+static inline bool sg_is_dma_bounced(struct scatterlist *sg)
+{
+	return sg->dma_flags & SG_DMA_BOUNCED;
+}
+
+static inline void sg_dma_mark_bounced(struct scatterlist *sg)
+{
+	sg->dma_flags |= SG_DMA_BOUNCED;
+}
+#else
+static inline bool sg_is_dma_bounced(struct scatterlist *sg)
+{
+	return false;
+}
+static inline void sg_dma_mark_bounced(struct scatterlist *sg)
+{
+}
+#endif
+
 /*
  * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
  * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
@@ -256,8 +279,6 @@ static inline void sg_unmark_end(struct scatterlist *sg)
  */
 #ifdef CONFIG_PCI_P2PDMA
 
-#define SG_DMA_BUS_ADDRESS (1 << 0)
-
 /**
  * sg_dma_is_bus address - Return whether a given segment was marked
  *			   as a bus address


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (12 preceding siblings ...)
  2023-05-18 17:34 ` [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
@ 2023-05-18 17:34 ` Catalin Marinas
  2023-05-19 11:00   ` Catalin Marinas
  2023-05-18 17:34 ` [PATCH v4 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
  2023-05-18 17:56 ` [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Linus Torvalds
  15 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:34 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

If an architecture opted in to DMA bouncing of unaligned kmalloc()
buffers (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc()
cache alignment below cache-line size to ARCH_KMALLOC_MINALIGN.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 mm/slab_common.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7c6475847fdf..84e5a5e435d6 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -18,6 +18,7 @@
 #include <linux/uaccess.h>
 #include <linux/seq_file.h>
 #include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>
 #include <linux/proc_fs.h>
 #include <linux/debugfs.h>
 #include <linux/kasan.h>
@@ -865,7 +866,13 @@ void __init setup_kmalloc_cache_index_table(void)
 
 static unsigned int __kmalloc_minalign(void)
 {
-	return dma_get_cache_alignment();
+	int cache_align = dma_get_cache_alignment();
+
+	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
+	    io_tlb_default_mem.nslabs == 0)
+		return cache_align;
+
+	return ARCH_KMALLOC_MINALIGN;
 }
 
 void __init


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (13 preceding siblings ...)
  2023-05-18 17:34 ` [PATCH v4 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
@ 2023-05-18 17:34 ` Catalin Marinas
  2023-05-18 17:56 ` [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Linus Torvalds
  15 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 17:34 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

With the DMA bouncing of unaligned kmalloc() buffers now in place,
enable it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In
addition, always create the swiotlb buffer even when the end of RAM is
within the 32-bit physical address range (the swiotlb buffer can still
be disabled on the kernel command line).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig   | 1 +
 arch/arm64/mm/init.c | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d11340b41703..1dcfa78f131f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -102,6 +102,7 @@ config ARM64
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
+	select ARCH_WANT_KMALLOC_DMA_BOUNCE
 	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_WANTS_NO_INSTR
 	select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 66e70ca47680..3ac2e9d79ce4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -442,7 +442,12 @@ void __init bootmem_init(void)
  */
 void __init mem_init(void)
 {
-	swiotlb_init(max_pfn > PFN_DOWN(arm64_dma_phys_limit), SWIOTLB_VERBOSE);
+	bool swiotlb = max_pfn > PFN_DOWN(arm64_dma_phys_limit);
+
+	if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC))
+		swiotlb = true;
+
+	swiotlb_init(swiotlb, SWIOTLB_VERBOSE);
 
 	/* this will put all unused low memory onto the freelists */
 	memblock_free_all();


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (14 preceding siblings ...)
  2023-05-18 17:34 ` [PATCH v4 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
@ 2023-05-18 17:56 ` Linus Torvalds
  2023-05-18 18:13   ` Ard Biesheuvel
  2023-05-18 18:46   ` Catalin Marinas
  15 siblings, 2 replies; 35+ messages in thread
From: Linus Torvalds @ 2023-05-18 17:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman,
	Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On Thu, May 18, 2023 at 10:34 AM Catalin Marinas
<catalin.marinas@arm.com> wrote:
>
> That's the fourth version of the series reducing the kmalloc() minimum
> alignment on arm64 to 8 (from 128).

Lovely. On my M2 Macbook air, I right now have about 24MB in the
kmalloc-128 bucket, and most of it is presumably just 16 byte
allocations (judging by my x86-64 slabinfo).

I guess it doesn't really matter when I have 16GB in the machine, but
this has annoyed me for a while.

It feels like this is ready for 6.5, no?

                 Linus


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-18 17:56 ` [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Linus Torvalds
@ 2023-05-18 18:13   ` Ard Biesheuvel
  2023-05-18 18:50     ` Catalin Marinas
  2023-05-18 18:46   ` Catalin Marinas
  1 sibling, 1 reply; 35+ messages in thread
From: Ard Biesheuvel @ 2023-05-18 18:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Catalin Marinas, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Isaac Manjarres, Saravana Kannan, Alasdair Kergon,
	Daniel Vetter, Joerg Roedel, Mark Brown, Mike Snitzer,
	Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On Thu, 18 May 2023 at 19:56, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, May 18, 2023 at 10:34 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> >
> > That's the fourth version of the series reducing the kmalloc() minimum
> > alignment on arm64 to 8 (from 128).

For the series:

Tested-by: Ard Biesheuvel <ardb@kernel.org> # tx2

and I am seeing lots of smaller allocations, all of which would have
otherwise taken up 128 or 256 bytes:

kmalloc-192         6426   6426    192   42    2 : tunables    0    0
  0 : slabdata    153    153      0
kmalloc-128         9472   9472    128   64    2 : tunables    0    0
  0 : slabdata    148    148      0
kmalloc-96         15666  15666     96   42    1 : tunables    0    0
  0 : slabdata    373    373      0
kmalloc-64         21952  21952     64   64    1 : tunables    0    0
  0 : slabdata    343    343      0
kmalloc-32         23424  23424     32  128    1 : tunables    0    0
  0 : slabdata    183    183      0
kmalloc-16         41216  41216     16  256    1 : tunables    0    0
  0 : slabdata    161    161      0
kmalloc-8          77846  80384      8  512    1 : tunables    0    0
  0 : slabdata    157    157      0

The box is fully DMA coherent, of course, so this doesn't really tell
us whether the swiotlb DMA bouncing stuff works or not.

>
> Lovely. On my M2 Macbook air, I right now have about 24MB in the
> kmalloc-128 bucket, and most of it is presumably just 16 byte
> allocations (judging by my x86-64 slabinfo).
>
> I guess it doesn't really matter when I have 16GB in the machine, but
> this has annoyed me for a while.
>

Yeah but surely the overhead in terms of D-cache footprint is a factor here too?

> It feels like this is ready for 6.5, no?
>

Yes, please...


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-18 17:56 ` [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Linus Torvalds
  2023-05-18 18:13   ` Ard Biesheuvel
@ 2023-05-18 18:46   ` Catalin Marinas
  1 sibling, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 18:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman,
	Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On Thu, May 18, 2023 at 10:56:24AM -0700, Linus Torvalds wrote:
> On Thu, May 18, 2023 at 10:34 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> >
> > That's the fourth version of the series reducing the kmalloc() minimum
> > alignment on arm64 to 8 (from 128).
> 
> Lovely. On my M2 Macbook air, I right now have about 24MB in the
> kmalloc-128 bucket, and most of it is presumably just 16 byte
> allocations (judging by my x86-64 slabinfo).
> 
> I guess it doesn't really matter when I have 16GB in the machine, but
> this has annoyed me for a while.
> 
> It feels like this is ready for 6.5, no?

From an implementation approach perspective, I definitely target 6.5.
But I need help with testing this, especially the iommu bits (can buy
Robin some beers ;)).

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-18 18:13   ` Ard Biesheuvel
@ 2023-05-18 18:50     ` Catalin Marinas
  0 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-18 18:50 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Isaac Manjarres, Saravana Kannan, Alasdair Kergon,
	Daniel Vetter, Joerg Roedel, Mark Brown, Mike Snitzer,
	Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On Thu, May 18, 2023 at 08:13:08PM +0200, Ard Biesheuvel wrote:
> On Thu, 18 May 2023 at 19:56, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Thu, May 18, 2023 at 10:34 AM Catalin Marinas
> > <catalin.marinas@arm.com> wrote:
> > >
> > > That's the fourth version of the series reducing the kmalloc() minimum
> > > alignment on arm64 to 8 (from 128).
> 
> For the series:
> 
> Tested-by: Ard Biesheuvel <ardb@kernel.org> # tx2
[...]
> The box is fully DMA coherent, of course, so this doesn't really tell
> us whether the swiotlb DMA bouncing stuff works or not.

Thanks.

On TX2, I forced the bouncing with:

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 43bf50c35e14..9006bf680db0 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -296,7 +296,7 @@ static inline bool dma_kmalloc_safe(struct device *dev,
	 * is coherent or the direction is DMA_TO_DEVICE (non-desctructive
	 * cache maintenance and benign cache line evictions).
	 */
-	if (dev_is_dma_coherent(dev) || dir == DMA_TO_DEVICE)
+	if (/*dev_is_dma_coherent(dev) || */dir == DMA_TO_DEVICE)
		return true;
 
	return false;

-- 
Catalin


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 07/15] drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-18 17:33 ` [PATCH v4 07/15] drivers/usb: " Catalin Marinas
@ 2023-05-19  9:41   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 35+ messages in thread
From: Greg Kroah-Hartman @ 2023-05-19  9:41 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Will Deacon,
	Marc Zyngier, Andrew Morton, Herbert Xu, Ard Biesheuvel,
	Isaac Manjarres, Saravana Kannan, Alasdair Kergon, Daniel Vetter,
	Joerg Roedel, Mark Brown, Mike Snitzer, Rafael J. Wysocki,
	Robin Murphy, linux-mm, iommu, linux-arm-kernel

On Thu, May 18, 2023 at 06:33:55PM +0100, Catalin Marinas wrote:
> ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
> operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
> alignment.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-18 17:33 ` [PATCH v4 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
@ 2023-05-19  9:41   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 35+ messages in thread
From: Greg Kroah-Hartman @ 2023-05-19  9:41 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Will Deacon,
	Marc Zyngier, Andrew Morton, Herbert Xu, Ard Biesheuvel,
	Isaac Manjarres, Saravana Kannan, Alasdair Kergon, Daniel Vetter,
	Joerg Roedel, Mark Brown, Mike Snitzer, Rafael J. Wysocki,
	Robin Murphy, linux-mm, iommu, linux-arm-kernel

On Thu, May 18, 2023 at 06:33:53PM +0100, Catalin Marinas wrote:
> ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
> operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
> alignment.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible
  2023-05-18 17:34 ` [PATCH v4 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
@ 2023-05-19 11:00   ` Catalin Marinas
  0 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-19 11:00 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On Thu, May 18, 2023 at 06:34:02PM +0100, Catalin Marinas wrote:
> If an architecture opted in to DMA bouncing of unaligned kmalloc()
> buffers (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc()
> cache alignment below cache-line size to ARCH_KMALLOC_MINALIGN.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
>  mm/slab_common.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 7c6475847fdf..84e5a5e435d6 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -18,6 +18,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/seq_file.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/swiotlb.h>
>  #include <linux/proc_fs.h>
>  #include <linux/debugfs.h>
>  #include <linux/kasan.h>
> @@ -865,7 +866,13 @@ void __init setup_kmalloc_cache_index_table(void)
>  
>  static unsigned int __kmalloc_minalign(void)
>  {
> -	return dma_get_cache_alignment();
> +	int cache_align = dma_get_cache_alignment();
> +
> +	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
> +	    io_tlb_default_mem.nslabs == 0)
> +		return cache_align;
> +
> +	return ARCH_KMALLOC_MINALIGN;
>  }

This gives a build error if the architecture doesn't select SWIOTLB (I
had this done properly in v3 but for some reason I rewrote it here). The
fixup is to add #ifdefs. I'll fold this in for v5:

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 84e5a5e435d6..fe46459a8b77 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -864,16 +864,19 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
+#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
 static unsigned int __kmalloc_minalign(void)
 {
-	int cache_align = dma_get_cache_alignment();
-
-	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
-	    io_tlb_default_mem.nslabs == 0)
-		return cache_align;
-
-	return ARCH_KMALLOC_MINALIGN;
+	if (io_tlb_default_mem.nslabs)
+		return ARCH_KMALLOC_MINALIGN;
+	return dma_get_cache_alignment();
 }
+#else
+static unsigned int __kmalloc_minalign(void)
+{
+	return dma_get_cache_alignment();
+}
+#endif
 
 void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)

-- 
Catalin


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-18 17:34 ` [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
@ 2023-05-19 12:29   ` Robin Murphy
  2023-05-19 14:02     ` Catalin Marinas
  0 siblings, 1 reply; 35+ messages in thread
From: Robin Murphy @ 2023-05-19 12:29 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Arnd Bergmann,
	Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On 2023-05-18 18:34, Catalin Marinas wrote:
> Similarly to the direct DMA, bounce small allocations as they may have
> originated from a kmalloc() cache not safe for DMA. Unlike the direct
> DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
> non-coherent devices as this would break some cases where the iova is
> expected to be contiguous (dmabuf). Instead, scan the scatterlist for
> any small sizes and only go the swiotlb path if any element of the list
> needs bouncing (note that iommu_dma_map_page() would still only bounce
> those buffers which are not DMA-aligned).
> 
> To avoid scanning the scatterlist on the 'sync' operations, introduce a
> SG_DMA_BOUNCED flag set during the iommu_dma_map_sg() call (suggested by
> Robin Murphy).
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
>   drivers/iommu/dma-iommu.c   | 25 ++++++++++++++++++++-----
>   include/linux/scatterlist.h | 25 +++++++++++++++++++++++--
>   2 files changed, 43 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7a9f0b0bddbd..ab1c1681c06e 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -956,7 +956,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   	struct scatterlist *sg;
>   	int i;
>   
> -	if (dev_use_swiotlb(dev))
> +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
>   		for_each_sg(sgl, sg, nelems, i)
>   			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
>   						      sg->length, dir);
> @@ -972,7 +972,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>   	struct scatterlist *sg;
>   	int i;
>   
> -	if (dev_use_swiotlb(dev))
> +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
>   		for_each_sg(sgl, sg, nelems, i)
>   			iommu_dma_sync_single_for_device(dev,
>   							 sg_dma_address(sg),
> @@ -998,7 +998,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   	 * If both the physical buffer start address and size are
>   	 * page aligned, we don't need to use a bounce page.
>   	 */
> -	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
> +	if ((dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) ||
> +	    dma_kmalloc_needs_bounce(dev, size, dir)) {
>   		void *padding_start;
>   		size_t padding_size, aligned_size;
>   
> @@ -1210,7 +1211,21 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   			goto out;
>   	}
>   
> -	if (dev_use_swiotlb(dev))
> +	/*
> +	 * If kmalloc() buffers are not DMA-safe for this device and
> +	 * direction, check the individual lengths in the sg list. If one of
> +	 * the buffers is deemed unsafe, follow the iommu_dma_map_sg_swiotlb()
> +	 * path for potential bouncing.
> +	 */
> +	if (!dma_kmalloc_safe(dev, dir)) {
> +		for_each_sg(sg, s, nents, i)
> +			if (!dma_kmalloc_size_aligned(s->length)) {

Just to remind myself, we're not checking s->offset on the grounds that 
if anyone wants to DMA into an unaligned part of a larger allocation 
that remains at their own risk, is that right?

Do we care about the (probably theoretical) case where someone might 
build a scatterlist for multiple small allocations such that ones which 
happen to be adjacent might get combined into a single segment of 
apparently "safe" length but still at "unsafe" alignment?

> +				sg_dma_mark_bounced(sg);

I'd prefer to have iommu_dma_map_sg_swiotlb() mark the segments, since 
that's in charge of the actual bouncing. Then we can fold the alignment 
check into dev_use_swiotlb() (with the dev_is_untrusted() condition 
taking priority), and sync/unmap can simply rely on sg_is_dma_bounced() 
alone.

(ultimately I'd like to merge the two separate paths back together and 
handle bouncing per-segment, but that can wait)

Thanks,
Robin.

> +				break;
> +			}
> +	}
> +
> +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sg))
>   		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
>   
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> @@ -1315,7 +1330,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>   	struct scatterlist *tmp;
>   	int i;
>   
> -	if (dev_use_swiotlb(dev)) {
> +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sg)) {
>   		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
>   		return;
>   	}
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 87aaf8b5cdb4..9306880cae1c 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>   	sg->page_link &= ~SG_END;
>   }
>   
> +#define SG_DMA_BUS_ADDRESS	(1 << 0)
> +#define SG_DMA_BOUNCED		(1 << 1)
> +
> +#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
> +static inline bool sg_is_dma_bounced(struct scatterlist *sg)
> +{
> +	return sg->dma_flags & SG_DMA_BOUNCED;
> +}
> +
> +static inline void sg_dma_mark_bounced(struct scatterlist *sg)
> +{
> +	sg->dma_flags |= SG_DMA_BOUNCED;
> +}
> +#else
> +static inline bool sg_is_dma_bounced(struct scatterlist *sg)
> +{
> +	return false;
> +}
> +static inline void sg_dma_mark_bounced(struct scatterlist *sg)
> +{
> +}
> +#endif
> +
>   /*
>    * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
>    * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
> @@ -256,8 +279,6 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>    */
>   #ifdef CONFIG_PCI_P2PDMA
>   
> -#define SG_DMA_BUS_ADDRESS (1 << 0)
> -
>   /**
>    * sg_dma_is_bus address - Return whether a given segment was marked
>    *			   as a bus address


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-19 12:29   ` Robin Murphy
@ 2023-05-19 14:02     ` Catalin Marinas
  2023-05-19 15:46       ` Catalin Marinas
  2023-05-19 17:09       ` Robin Murphy
  0 siblings, 2 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-19 14:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Fri, May 19, 2023 at 01:29:38PM +0100, Robin Murphy wrote:
> On 2023-05-18 18:34, Catalin Marinas wrote:
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 7a9f0b0bddbd..ab1c1681c06e 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -956,7 +956,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
> >   	struct scatterlist *sg;
> >   	int i;
> > -	if (dev_use_swiotlb(dev))
> > +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
> >   		for_each_sg(sgl, sg, nelems, i)
> >   			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
> >   						      sg->length, dir);
> > @@ -972,7 +972,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
> >   	struct scatterlist *sg;
> >   	int i;
> > -	if (dev_use_swiotlb(dev))
> > +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
> >   		for_each_sg(sgl, sg, nelems, i)
> >   			iommu_dma_sync_single_for_device(dev,
> >   							 sg_dma_address(sg),
> > @@ -998,7 +998,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> >   	 * If both the physical buffer start address and size are
> >   	 * page aligned, we don't need to use a bounce page.
> >   	 */
> > -	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
> > +	if ((dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) ||
> > +	    dma_kmalloc_needs_bounce(dev, size, dir)) {
> >   		void *padding_start;
> >   		size_t padding_size, aligned_size;
> > @@ -1210,7 +1211,21 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
> >   			goto out;
> >   	}
> > -	if (dev_use_swiotlb(dev))
> > +	/*
> > +	 * If kmalloc() buffers are not DMA-safe for this device and
> > +	 * direction, check the individual lengths in the sg list. If one of
> > +	 * the buffers is deemed unsafe, follow the iommu_dma_map_sg_swiotlb()
> > +	 * path for potential bouncing.
> > +	 */
> > +	if (!dma_kmalloc_safe(dev, dir)) {
> > +		for_each_sg(sg, s, nents, i)
> > +			if (!dma_kmalloc_size_aligned(s->length)) {
> 
> Just to remind myself, we're not checking s->offset on the grounds that if
> anyone wants to DMA into an unaligned part of a larger allocation that
> remains at their own risk, is that right?

Right. That's the case currently as well and those users that were
relying on ARCH_KMALLOC_MINALIGN for this have either been migrated to
ARCH_DMA_MINALIGN in this series or the logic rewritten (as in the
crypto code).

> Do we care about the (probably theoretical) case where someone might build a
> scatterlist for multiple small allocations such that ones which happen to be
> adjacent might get combined into a single segment of apparently "safe"
> length but still at "unsafe" alignment?

I'd say that's theoretical only. One could write such code but normally
you'd go for an array rather than relying on the randomness of the
kmalloc pointers to figure out adjacent objects. It also only works if
the individual struct size is exactly one of the kmalloc cache sizes, so
not generic enough.

> > +				sg_dma_mark_bounced(sg);
> 
> I'd prefer to have iommu_dma_map_sg_swiotlb() mark the segments, since
> that's in charge of the actual bouncing. Then we can fold the alignment
> check into dev_use_swiotlb() (with the dev_is_untrusted() condition taking
> priority), and sync/unmap can simply rely on sg_is_dma_bounced() alone.

With this patch we only set the SG_DMA_BOUNCED on the first element of
the sglist. Do you want to set this flag only on individual elements
being bounced? It makes some sense in principle but the
iommu_dma_unmap_sg() path would need to scan the list again to decide
whether to go the swiotlb path.

If we keep the SG_DMA_BOUNCED flag only on the first element, I can
change it to your suggestion, assuming I understood it.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-19 14:02     ` Catalin Marinas
@ 2023-05-19 15:46       ` Catalin Marinas
  2023-05-19 17:09       ` Robin Murphy
  1 sibling, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-19 15:46 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Fri, May 19, 2023 at 03:02:24PM +0100, Catalin Marinas wrote:
> On Fri, May 19, 2023 at 01:29:38PM +0100, Robin Murphy wrote:
> > On 2023-05-18 18:34, Catalin Marinas wrote:
> > > +				sg_dma_mark_bounced(sg);
> > 
> > I'd prefer to have iommu_dma_map_sg_swiotlb() mark the segments, since
> > that's in charge of the actual bouncing. Then we can fold the alignment
> > check into dev_use_swiotlb() (with the dev_is_untrusted() condition taking
> > priority), and sync/unmap can simply rely on sg_is_dma_bounced() alone.
> 
> With this patch we only set the SG_DMA_BOUNCED on the first element of
> the sglist. Do you want to set this flag only on individual elements
> being bounced? It makes some sense in principle but the
> iommu_dma_unmap_sg() path would need to scan the list again to decide
> whether to go the swiotlb path.
> 
> If we keep the SG_DMA_BOUNCED flag only on the first element, I can
> change it to your suggestion, assuming I understood it.

Can one call:

	iommu_dma_map_sg(sg, nents);
	...
	iommu_dma_unmap_sg(sg + n, nents - n);

(i.e. unmap it in multiple steps)

If yes, setting SG_DMA_BOUNCED on the first element only won't work. I
don't find this an unlikely scenario, so we maybe we do have to walk the
list again in unmap to search for the flag.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  2023-05-18 17:33 ` [PATCH v4 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
@ 2023-05-19 15:49   ` Catalin Marinas
  0 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-19 15:49 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman
  Cc: Will Deacon, Marc Zyngier, Andrew Morton, Herbert Xu,
	Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On Thu, May 18, 2023 at 06:33:49PM +0100, Catalin Marinas wrote:
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 6b3e155b70bf..3f76e7c53ada 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -235,14 +235,24 @@ void kmem_dump_obj(void *object);
>   * alignment larger than the alignment of a 64-bit integer.
>   * Setting ARCH_DMA_MINALIGN in arch headers allows that.
>   */
> -#if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
> +#ifdef ARCH_DMA_MINALIGN
> +#define ARCH_HAS_DMA_MINALIGN
> +#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
>  #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
> -#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
> -#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
> +#endif
>  #else
> +#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
> +#endif
> +
> +#ifndef ARCH_KMALLOC_MINALIGN
>  #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
>  #endif
>  
> +#if ARCH_KMALLOC_MINALIGN > 8
> +#define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN
> +#define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE)
> +#endif

And another fixup here (reported by the test robot; I pushed the fixups
to the git branch):

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 3f76e7c53ada..50dcf9cfbf62 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -246,9 +246,7 @@ void kmem_dump_obj(void *object);

 #ifndef ARCH_KMALLOC_MINALIGN
 #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
-#endif
-
-#if ARCH_KMALLOC_MINALIGN > 8
+#elif ARCH_KMALLOC_MINALIGN > 8
 #define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN
 #define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE)
 #endif

-- 
Catalin


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-19 14:02     ` Catalin Marinas
  2023-05-19 15:46       ` Catalin Marinas
@ 2023-05-19 17:09       ` Robin Murphy
  2023-05-22  7:27         ` Catalin Marinas
  1 sibling, 1 reply; 35+ messages in thread
From: Robin Murphy @ 2023-05-19 17:09 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On 19/05/2023 3:02 pm, Catalin Marinas wrote:
> On Fri, May 19, 2023 at 01:29:38PM +0100, Robin Murphy wrote:
>> On 2023-05-18 18:34, Catalin Marinas wrote:
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 7a9f0b0bddbd..ab1c1681c06e 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -956,7 +956,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>>>    	struct scatterlist *sg;
>>>    	int i;
>>> -	if (dev_use_swiotlb(dev))
>>> +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
>>>    		for_each_sg(sgl, sg, nelems, i)
>>>    			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
>>>    						      sg->length, dir);
>>> @@ -972,7 +972,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>>>    	struct scatterlist *sg;
>>>    	int i;
>>> -	if (dev_use_swiotlb(dev))
>>> +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
>>>    		for_each_sg(sgl, sg, nelems, i)
>>>    			iommu_dma_sync_single_for_device(dev,
>>>    							 sg_dma_address(sg),
>>> @@ -998,7 +998,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>>>    	 * If both the physical buffer start address and size are
>>>    	 * page aligned, we don't need to use a bounce page.
>>>    	 */
>>> -	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
>>> +	if ((dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) ||
>>> +	    dma_kmalloc_needs_bounce(dev, size, dir)) {
>>>    		void *padding_start;
>>>    		size_t padding_size, aligned_size;
>>> @@ -1210,7 +1211,21 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>>    			goto out;
>>>    	}
>>> -	if (dev_use_swiotlb(dev))
>>> +	/*
>>> +	 * If kmalloc() buffers are not DMA-safe for this device and
>>> +	 * direction, check the individual lengths in the sg list. If one of
>>> +	 * the buffers is deemed unsafe, follow the iommu_dma_map_sg_swiotlb()
>>> +	 * path for potential bouncing.
>>> +	 */
>>> +	if (!dma_kmalloc_safe(dev, dir)) {
>>> +		for_each_sg(sg, s, nents, i)
>>> +			if (!dma_kmalloc_size_aligned(s->length)) {
>>
>> Just to remind myself, we're not checking s->offset on the grounds that if
>> anyone wants to DMA into an unaligned part of a larger allocation that
>> remains at their own risk, is that right?
> 
> Right. That's the case currently as well and those users that were
> relying on ARCH_KMALLOC_MINALIGN for this have either been migrated to
> ARCH_DMA_MINALIGN in this series or the logic rewritten (as in the
> crypto code).

OK, I did manage to summon a vague memory of this being discussed 
before, which at least stopped me asking "Should we be checking..." - 
perhaps a comment on dma_kmalloc_safe() to help remember that reasoning 
might not go amiss?

>> Do we care about the (probably theoretical) case where someone might build a
>> scatterlist for multiple small allocations such that ones which happen to be
>> adjacent might get combined into a single segment of apparently "safe"
>> length but still at "unsafe" alignment?
> 
> I'd say that's theoretical only. One could write such code but normally
> you'd go for an array rather than relying on the randomness of the
> kmalloc pointers to figure out adjacent objects. It also only works if
> the individual struct size is exactly one of the kmalloc cache sizes, so
> not generic enough.

FWIW I was imagining something like sg_alloc_table_from_pages() but at a 
smaller scale, queueing up some list/array of, say, 32-byte buffers into 
a scatterlist to submit as a single DMA job. I'm not aware that such a 
thing exists though, and I'm inclined to agree that it probably is 
sufficiently unrealistic to be concerned about. As usual I just want to 
feel comfortable that we've explored all the possibilities :)

>>> +				sg_dma_mark_bounced(sg);
>>
>> I'd prefer to have iommu_dma_map_sg_swiotlb() mark the segments, since
>> that's in charge of the actual bouncing. Then we can fold the alignment
>> check into dev_use_swiotlb() (with the dev_is_untrusted() condition taking
>> priority), and sync/unmap can simply rely on sg_is_dma_bounced() alone.
> 
> With this patch we only set the SG_DMA_BOUNCED on the first element of
> the sglist. Do you want to set this flag only on individual elements
> being bounced? It makes some sense in principle but the
> iommu_dma_unmap_sg() path would need to scan the list again to decide
> whether to go the swiotlb path.
> 
> If we keep the SG_DMA_BOUNCED flag only on the first element, I can
> change it to your suggestion, assuming I understood it.

Indeed that should be fine - sync_sg/unmap_sg always have to be given 
the same arguments which were passed to map_sg (and note that in the 
normal case, the DMA address/length will often end up concatenated 
entirely into the first element), so while we still have the two 
distinct flows internally, I don't think there's any issue with only 
tagging the head of the list to steer between them. Of course if it then 
works out to be trivial enough to tag *all* the segments for good 
measure, there should be no harm in that either - at the moment the flag 
is destined to have more of a "this might be bounced, so needs checking" 
meaning than "this definitely is bounced" either way.

Cheers,
Robin.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 02/15] dma: Allow dma_get_cache_alignment() to return the smaller cache_line_size()
  2023-05-18 17:33 ` [PATCH v4 02/15] dma: Allow dma_get_cache_alignment() to return the smaller cache_line_size() Catalin Marinas
@ 2023-05-20  5:42   ` Christoph Hellwig
  2023-05-20  6:14     ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2023-05-20  5:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On Thu, May 18, 2023 at 06:33:50PM +0100, Catalin Marinas wrote:
> On architectures like arm64, ARCH_DMA_MINALIGN is larger than most cache
> line size configurations deployed. However, the single kernel binary
> requirement doesn't allow the smaller ARCH_DMA_MINALIGN. Permit an
> architecture to opt in to dma_get_cache_alignment() returning
> cache_line_size() which can be probed at run-time.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  include/linux/dma-mapping.h | 2 ++
>  kernel/dma/Kconfig          | 7 +++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 3288a1339271..b29124341317 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -545,6 +545,8 @@ static inline int dma_set_min_align_mask(struct device *dev,
>  
>  static inline int dma_get_cache_alignment(void)
>  {
> +	if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CACHE_LINE_SIZE))
> +		return cache_line_size();
>  #ifdef ARCH_HAS_DMA_MINALIGN
>  	return ARCH_DMA_MINALIGN;
>  #endif

Maybe allowing architectures to simply override
dma_get_cache_alignment would be a little cleaner rather than adding
yet another abstraction?  That might also be able to repace
ARCH_DMA_MINALIGN in follow on cleanup.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 11/15] scatterlist: Add dedicated config for DMA flags
  2023-05-18 17:33 ` [PATCH v4 11/15] scatterlist: Add dedicated config for DMA flags Catalin Marinas
@ 2023-05-20  5:42   ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2023-05-20  5:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned
  2023-05-18 17:34 ` [PATCH v4 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
@ 2023-05-20  5:44   ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2023-05-20  5:44 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 02/15] dma: Allow dma_get_cache_alignment() to return the smaller cache_line_size()
  2023-05-20  5:42   ` Christoph Hellwig
@ 2023-05-20  6:14     ` Christoph Hellwig
  2023-05-20 10:34       ` Catalin Marinas
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2023-05-20  6:14 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Robin Murphy, linux-mm, iommu,
	linux-arm-kernel

On Sat, May 20, 2023 at 07:42:09AM +0200, Christoph Hellwig wrote:
> yet another abstraction?  That might also be able to repace
> ARCH_DMA_MINALIGN in follow on cleanup.

Looking at the rest of the series, this part is obviously not going to
work..


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 02/15] dma: Allow dma_get_cache_alignment() to return the smaller cache_line_size()
  2023-05-20  6:14     ` Christoph Hellwig
@ 2023-05-20 10:34       ` Catalin Marinas
  0 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2023-05-20 10:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Arnd Bergmann, Greg Kroah-Hartman, Will Deacon,
	Marc Zyngier, Andrew Morton, Herbert Xu, Ard Biesheuvel,
	Isaac Manjarres, Saravana Kannan, Alasdair Kergon, Daniel Vetter,
	Joerg Roedel, Mark Brown, Mike Snitzer, Rafael J. Wysocki,
	Robin Murphy, linux-mm, iommu, linux-arm-kernel

On Sat, May 20, 2023 at 08:14:41AM +0200, Christoph Hellwig wrote:
> On Sat, May 20, 2023 at 07:42:09AM +0200, Christoph Hellwig wrote:
> > yet another abstraction?  That might also be able to repace
> > ARCH_DMA_MINALIGN in follow on cleanup.
> 
> Looking at the rest of the series, this part is obviously not going to
> work..

ARCH_DMA_MINALIGN needs to remain a constant (taking over from the
original ARCH_KMALLOC_MINALIGN). But the dma_get_cache_alignment() can
indeed be overridden by the arch code, it's a good idea.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-19 17:09       ` Robin Murphy
@ 2023-05-22  7:27         ` Catalin Marinas
  2023-05-23 15:47           ` Robin Murphy
  0 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2023-05-22  7:27 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On Fri, May 19, 2023 at 06:09:45PM +0100, Robin Murphy wrote:
> On 19/05/2023 3:02 pm, Catalin Marinas wrote:
> > On Fri, May 19, 2023 at 01:29:38PM +0100, Robin Murphy wrote:
> > > On 2023-05-18 18:34, Catalin Marinas wrote:
> > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > > > index 7a9f0b0bddbd..ab1c1681c06e 100644
> > > > --- a/drivers/iommu/dma-iommu.c
> > > > +++ b/drivers/iommu/dma-iommu.c
[...]
> > > > @@ -1210,7 +1211,21 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
> > > >    			goto out;
> > > >    	}
> > > > -	if (dev_use_swiotlb(dev))
> > > > +	/*
> > > > +	 * If kmalloc() buffers are not DMA-safe for this device and
> > > > +	 * direction, check the individual lengths in the sg list. If one of
> > > > +	 * the buffers is deemed unsafe, follow the iommu_dma_map_sg_swiotlb()
> > > > +	 * path for potential bouncing.
> > > > +	 */
> > > > +	if (!dma_kmalloc_safe(dev, dir)) {
> > > > +		for_each_sg(sg, s, nents, i)
> > > > +			if (!dma_kmalloc_size_aligned(s->length)) {
> > > 
> > > Just to remind myself, we're not checking s->offset on the grounds that if
> > > anyone wants to DMA into an unaligned part of a larger allocation that
> > > remains at their own risk, is that right?
> > 
> > Right. That's the case currently as well and those users that were
> > relying on ARCH_KMALLOC_MINALIGN for this have either been migrated to
> > ARCH_DMA_MINALIGN in this series or the logic rewritten (as in the
> > crypto code).
> 
> OK, I did manage to summon a vague memory of this being discussed before,
> which at least stopped me asking "Should we be checking..." - perhaps a
> comment on dma_kmalloc_safe() to help remember that reasoning might not go
> amiss?

I'll add some notes in the comment.

> > > Do we care about the (probably theoretical) case where someone might build a
> > > scatterlist for multiple small allocations such that ones which happen to be
> > > adjacent might get combined into a single segment of apparently "safe"
> > > length but still at "unsafe" alignment?
> > 
> > I'd say that's theoretical only. One could write such code but normally
> > you'd go for an array rather than relying on the randomness of the
> > kmalloc pointers to figure out adjacent objects. It also only works if
> > the individual struct size is exactly one of the kmalloc cache sizes, so
> > not generic enough.
> 
> FWIW I was imagining something like sg_alloc_table_from_pages() but at a
> smaller scale, queueing up some list/array of, say, 32-byte buffers into a
> scatterlist to submit as a single DMA job. I'm not aware that such a thing
> exists though, and I'm inclined to agree that it probably is sufficiently
> unrealistic to be concerned about. As usual I just want to feel comfortable
> that we've explored all the possibilities :)

The strict approach would be to check each pointer and size (not just
small ones) and, if unaligned, test whether it comes from a slab
allocation and what its actual alignment is, something similar to
ksize(). But this adds too many checks for (I think) a theoretical
issue. We discussed this in previous iterations of this series and
concluded to only check the size and bounce accordingly (even if we may
bounce fully aligned slabs or miss cases like the one you mentioned).
Anyway, we have a backup plan if we trip over something like this, just
slightly more expensive.

> > > > +				sg_dma_mark_bounced(sg);
> > > 
> > > I'd prefer to have iommu_dma_map_sg_swiotlb() mark the segments, since
> > > that's in charge of the actual bouncing. Then we can fold the alignment
> > > check into dev_use_swiotlb() (with the dev_is_untrusted() condition taking
> > > priority), and sync/unmap can simply rely on sg_is_dma_bounced() alone.
> > 
> > With this patch we only set the SG_DMA_BOUNCED on the first element of
> > the sglist. Do you want to set this flag only on individual elements
> > being bounced? It makes some sense in principle but the
> > iommu_dma_unmap_sg() path would need to scan the list again to decide
> > whether to go the swiotlb path.
> > 
> > If we keep the SG_DMA_BOUNCED flag only on the first element, I can
> > change it to your suggestion, assuming I understood it.
> 
> Indeed that should be fine - sync_sg/unmap_sg always have to be given the
> same arguments which were passed to map_sg (and note that in the normal
> case, the DMA address/length will often end up concatenated entirely into
> the first element), so while we still have the two distinct flows
> internally, I don't think there's any issue with only tagging the head of
> the list to steer between them. Of course if it then works out to be trivial
> enough to tag *all* the segments for good measure, there should be no harm
> in that either - at the moment the flag is destined to have more of a "this
> might be bounced, so needs checking" meaning than "this definitely is
> bounced" either way.

I renamed SG_DMA_BOUNCED to SG_DMA_USE_SWIOTLB (to match
dev_use_swiotlb()). The past participle of bounce does make you think
that it was definitely bounced.

Before I post a v5, does this resemble what you suggested:

------8<------------------------------
From 6558c2bc242ea8598d16b842c8cc77105ce1d5fa Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Tue, 8 Nov 2022 11:19:31 +0000
Subject: [PATCH] iommu/dma: Force bouncing if the size is not
 cacheline-aligned

Similarly to the direct DMA, bounce small allocations as they may have
originated from a kmalloc() cache not safe for DMA. Unlike the direct
DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
non-coherent devices as this would break some cases where the iova is
expected to be contiguous (dmabuf). Instead, scan the scatterlist for
any small sizes and only go the swiotlb path if any element of the list
needs bouncing (note that iommu_dma_map_page() would still only bounce
those buffers which are not DMA-aligned).

To avoid scanning the scatterlist on the 'sync' operations, introduce a
SG_DMA_USE_SWIOTLB flag set during the iommu_dma_map_sg_swiotlb() call
(suggested by Robin Murphy).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/dma-iommu.c   | 50 ++++++++++++++++++++++++++++++-------
 include/linux/scatterlist.h | 25 +++++++++++++++++--
 2 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7a9f0b0bddbd..24a8b8c2368c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -520,9 +520,38 @@ static bool dev_is_untrusted(struct device *dev)
 	return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
-static bool dev_use_swiotlb(struct device *dev)
+static bool dev_use_swiotlb(struct device *dev, size_t size,
+			    enum dma_data_direction dir)
 {
-	return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
+	return IS_ENABLED(CONFIG_SWIOTLB) &&
+		(dev_is_untrusted(dev) ||
+		 dma_kmalloc_needs_bounce(dev, size, dir));
+}
+
+static bool dev_use_sg_swiotlb(struct device *dev, struct scatterlist *sg,
+			       int nents, enum dma_data_direction dir)
+{
+	struct scatterlist *s;
+	int i;
+
+	if (!IS_ENABLED(CONFIG_SWIOTLB))
+		return false;
+
+	if (dev_is_untrusted(dev))
+		return true;
+
+	/*
+	 * If kmalloc() buffers are not DMA-safe for this device and
+	 * direction, check the individual lengths in the sg list. If any
+	 * element is deemed unsafe, use the swiotlb for bouncing.
+	 */
+	if (!dma_kmalloc_safe(dev, dir)) {
+		for_each_sg(sg, s, nents, i)
+			if (!dma_kmalloc_size_aligned(s->length))
+				return true;
+	}
+
+	return false;
 }
 
 /**
@@ -922,7 +951,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
+	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -938,7 +967,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
+	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -956,7 +985,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (sg_is_dma_use_swiotlb(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
 						      sg->length, dir);
@@ -972,7 +1001,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (sg_is_dma_use_swiotlb(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_device(dev,
 							 sg_dma_address(sg),
@@ -998,7 +1027,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	 * If both the physical buffer start address and size are
 	 * page aligned, we don't need to use a bounce page.
 	 */
-	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
+	if (dev_use_swiotlb(dev, size, dir) &&
+	    iova_offset(iovad, phys | size)) {
 		void *padding_start;
 		size_t padding_size, aligned_size;
 
@@ -1166,6 +1196,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *s;
 	int i;
 
+	sg_dma_mark_use_swiotlb(sg);
+
 	for_each_sg(sg, s, nents, i) {
 		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
 				s->offset, s->length, dir, attrs);
@@ -1210,7 +1242,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 			goto out;
 	}
 
-	if (dev_use_swiotlb(dev))
+	if (dev_use_sg_swiotlb(dev, sg, nents, dir))
 		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
@@ -1315,7 +1347,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *tmp;
 	int i;
 
-	if (dev_use_swiotlb(dev)) {
+	if (sg_is_dma_use_swiotlb(sg)) {
 		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
 		return;
 	}
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 87aaf8b5cdb4..e0f9fea456c1 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 	sg->page_link &= ~SG_END;
 }
 
+#define SG_DMA_BUS_ADDRESS	(1 << 0)
+#define SG_DMA_USE_SWIOTLB	(1 << 1)
+
+#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
+static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
+{
+	return sg->dma_flags & SG_DMA_USE_SWIOTLB;
+}
+
+static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
+{
+	sg->dma_flags |= SG_DMA_USE_SWIOTLB;
+}
+#else
+static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
+{
+	return false;
+}
+static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
+{
+}
+#endif
+
 /*
  * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
  * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
@@ -256,8 +279,6 @@ static inline void sg_unmark_end(struct scatterlist *sg)
  */
 #ifdef CONFIG_PCI_P2PDMA
 
-#define SG_DMA_BUS_ADDRESS (1 << 0)
-
 /**
  * sg_dma_is_bus address - Return whether a given segment was marked
  *			   as a bus address


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-22  7:27         ` Catalin Marinas
@ 2023-05-23 15:47           ` Robin Murphy
  0 siblings, 0 replies; 35+ messages in thread
From: Robin Murphy @ 2023-05-23 15:47 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, linux-mm, iommu,
	linux-arm-kernel

On 22/05/2023 8:27 am, Catalin Marinas wrote:
> On Fri, May 19, 2023 at 06:09:45PM +0100, Robin Murphy wrote:
>> On 19/05/2023 3:02 pm, Catalin Marinas wrote:
>>> On Fri, May 19, 2023 at 01:29:38PM +0100, Robin Murphy wrote:
>>>> On 2023-05-18 18:34, Catalin Marinas wrote:
>>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>>> index 7a9f0b0bddbd..ab1c1681c06e 100644
>>>>> --- a/drivers/iommu/dma-iommu.c
>>>>> +++ b/drivers/iommu/dma-iommu.c
> [...]
>>>>> @@ -1210,7 +1211,21 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>>>>     			goto out;
>>>>>     	}
>>>>> -	if (dev_use_swiotlb(dev))
>>>>> +	/*
>>>>> +	 * If kmalloc() buffers are not DMA-safe for this device and
>>>>> +	 * direction, check the individual lengths in the sg list. If one of
>>>>> +	 * the buffers is deemed unsafe, follow the iommu_dma_map_sg_swiotlb()
>>>>> +	 * path for potential bouncing.
>>>>> +	 */
>>>>> +	if (!dma_kmalloc_safe(dev, dir)) {
>>>>> +		for_each_sg(sg, s, nents, i)
>>>>> +			if (!dma_kmalloc_size_aligned(s->length)) {
>>>>
>>>> Just to remind myself, we're not checking s->offset on the grounds that if
>>>> anyone wants to DMA into an unaligned part of a larger allocation that
>>>> remains at their own risk, is that right?
>>>
>>> Right. That's the case currently as well and those users that were
>>> relying on ARCH_KMALLOC_MINALIGN for this have either been migrated to
>>> ARCH_DMA_MINALIGN in this series or the logic rewritten (as in the
>>> crypto code).
>>
>> OK, I did manage to summon a vague memory of this being discussed before,
>> which at least stopped me asking "Should we be checking..." - perhaps a
>> comment on dma_kmalloc_safe() to help remember that reasoning might not go
>> amiss?
> 
> I'll add some notes in the comment.
> 
>>>> Do we care about the (probably theoretical) case where someone might build a
>>>> scatterlist for multiple small allocations such that ones which happen to be
>>>> adjacent might get combined into a single segment of apparently "safe"
>>>> length but still at "unsafe" alignment?
>>>
>>> I'd say that's theoretical only. One could write such code but normally
>>> you'd go for an array rather than relying on the randomness of the
>>> kmalloc pointers to figure out adjacent objects. It also only works if
>>> the individual struct size is exactly one of the kmalloc cache sizes, so
>>> not generic enough.
>>
>> FWIW I was imagining something like sg_alloc_table_from_pages() but at a
>> smaller scale, queueing up some list/array of, say, 32-byte buffers into a
>> scatterlist to submit as a single DMA job. I'm not aware that such a thing
>> exists though, and I'm inclined to agree that it probably is sufficiently
>> unrealistic to be concerned about. As usual I just want to feel comfortable
>> that we've explored all the possibilities :)
> 
> The strict approach would be to check each pointer and size (not just
> small ones) and, if unaligned, test whether it comes from a slab
> allocation and what its actual alignment is, something similar to
> ksize(). But this adds too many checks for (I think) a theoretical
> issue. We discussed this in previous iterations of this series and
> concluded to only check the size and bounce accordingly (even if we may
> bounce fully aligned slabs or miss cases like the one you mentioned).
> Anyway, we have a backup plan if we trip over something like this, just
> slightly more expensive.
> 
>>>>> +				sg_dma_mark_bounced(sg);
>>>>
>>>> I'd prefer to have iommu_dma_map_sg_swiotlb() mark the segments, since
>>>> that's in charge of the actual bouncing. Then we can fold the alignment
>>>> check into dev_use_swiotlb() (with the dev_is_untrusted() condition taking
>>>> priority), and sync/unmap can simply rely on sg_is_dma_bounced() alone.
>>>
>>> With this patch we only set the SG_DMA_BOUNCED on the first element of
>>> the sglist. Do you want to set this flag only on individual elements
>>> being bounced? It makes some sense in principle but the
>>> iommu_dma_unmap_sg() path would need to scan the list again to decide
>>> whether to go the swiotlb path.
>>>
>>> If we keep the SG_DMA_BOUNCED flag only on the first element, I can
>>> change it to your suggestion, assuming I understood it.
>>
>> Indeed that should be fine - sync_sg/unmap_sg always have to be given the
>> same arguments which were passed to map_sg (and note that in the normal
>> case, the DMA address/length will often end up concatenated entirely into
>> the first element), so while we still have the two distinct flows
>> internally, I don't think there's any issue with only tagging the head of
>> the list to steer between them. Of course if it then works out to be trivial
>> enough to tag *all* the segments for good measure, there should be no harm
>> in that either - at the moment the flag is destined to have more of a "this
>> might be bounced, so needs checking" meaning than "this definitely is
>> bounced" either way.
> 
> I renamed SG_DMA_BOUNCED to SG_DMA_USE_SWIOTLB (to match
> dev_use_swiotlb()). The past participle of bounce does make you think
> that it was definitely bounced.
> 
> Before I post a v5, does this resemble what you suggested:

Indeed; I hadn't got as far as considering optimising checks for the sg 
case, but the overall shape looks like what I was imagining. Possibly 
some naming nitpicks, but I'm not sure how much I can be bothered :)

Thanks,
Robin.

> ------8<------------------------------
>  From 6558c2bc242ea8598d16b842c8cc77105ce1d5fa Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Tue, 8 Nov 2022 11:19:31 +0000
> Subject: [PATCH] iommu/dma: Force bouncing if the size is not
>   cacheline-aligned
> 
> Similarly to the direct DMA, bounce small allocations as they may have
> originated from a kmalloc() cache not safe for DMA. Unlike the direct
> DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
> non-coherent devices as this would break some cases where the iova is
> expected to be contiguous (dmabuf). Instead, scan the scatterlist for
> any small sizes and only go the swiotlb path if any element of the list
> needs bouncing (note that iommu_dma_map_page() would still only bounce
> those buffers which are not DMA-aligned).
> 
> To avoid scanning the scatterlist on the 'sync' operations, introduce a
> SG_DMA_USE_SWIOTLB flag set during the iommu_dma_map_sg_swiotlb() call
> (suggested by Robin Murphy).
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
>   drivers/iommu/dma-iommu.c   | 50 ++++++++++++++++++++++++++++++-------
>   include/linux/scatterlist.h | 25 +++++++++++++++++--
>   2 files changed, 64 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7a9f0b0bddbd..24a8b8c2368c 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -520,9 +520,38 @@ static bool dev_is_untrusted(struct device *dev)
>   	return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
>   }
>   
> -static bool dev_use_swiotlb(struct device *dev)
> +static bool dev_use_swiotlb(struct device *dev, size_t size,
> +			    enum dma_data_direction dir)
>   {
> -	return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
> +	return IS_ENABLED(CONFIG_SWIOTLB) &&
> +		(dev_is_untrusted(dev) ||
> +		 dma_kmalloc_needs_bounce(dev, size, dir));
> +}
> +
> +static bool dev_use_sg_swiotlb(struct device *dev, struct scatterlist *sg,
> +			       int nents, enum dma_data_direction dir)
> +{
> +	struct scatterlist *s;
> +	int i;
> +
> +	if (!IS_ENABLED(CONFIG_SWIOTLB))
> +		return false;
> +
> +	if (dev_is_untrusted(dev))
> +		return true;
> +
> +	/*
> +	 * If kmalloc() buffers are not DMA-safe for this device and
> +	 * direction, check the individual lengths in the sg list. If any
> +	 * element is deemed unsafe, use the swiotlb for bouncing.
> +	 */
> +	if (!dma_kmalloc_safe(dev, dir)) {
> +		for_each_sg(sg, s, nents, i)
> +			if (!dma_kmalloc_size_aligned(s->length))
> +				return true;
> +	}
> +
> +	return false;
>   }
>   
>   /**
> @@ -922,7 +951,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>   {
>   	phys_addr_t phys;
>   
> -	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
> +	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>   		return;
>   
>   	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> @@ -938,7 +967,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
>   {
>   	phys_addr_t phys;
>   
> -	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
> +	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>   		return;
>   
>   	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> @@ -956,7 +985,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   	struct scatterlist *sg;
>   	int i;
>   
> -	if (dev_use_swiotlb(dev))
> +	if (sg_is_dma_use_swiotlb(sgl))
>   		for_each_sg(sgl, sg, nelems, i)
>   			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
>   						      sg->length, dir);
> @@ -972,7 +1001,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>   	struct scatterlist *sg;
>   	int i;
>   
> -	if (dev_use_swiotlb(dev))
> +	if (sg_is_dma_use_swiotlb(sgl))
>   		for_each_sg(sgl, sg, nelems, i)
>   			iommu_dma_sync_single_for_device(dev,
>   							 sg_dma_address(sg),
> @@ -998,7 +1027,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   	 * If both the physical buffer start address and size are
>   	 * page aligned, we don't need to use a bounce page.
>   	 */
> -	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
> +	if (dev_use_swiotlb(dev, size, dir) &&
> +	    iova_offset(iovad, phys | size)) {
>   		void *padding_start;
>   		size_t padding_size, aligned_size;
>   
> @@ -1166,6 +1196,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
>   	struct scatterlist *s;
>   	int i;
>   
> +	sg_dma_mark_use_swiotlb(sg);
> +
>   	for_each_sg(sg, s, nents, i) {
>   		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
>   				s->offset, s->length, dir, attrs);
> @@ -1210,7 +1242,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   			goto out;
>   	}
>   
> -	if (dev_use_swiotlb(dev))
> +	if (dev_use_sg_swiotlb(dev, sg, nents, dir))
>   		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
>   
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> @@ -1315,7 +1347,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>   	struct scatterlist *tmp;
>   	int i;
>   
> -	if (dev_use_swiotlb(dev)) {
> +	if (sg_is_dma_use_swiotlb(sg)) {
>   		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
>   		return;
>   	}
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 87aaf8b5cdb4..e0f9fea456c1 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -248,6 +248,29 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>   	sg->page_link &= ~SG_END;
>   }
>   
> +#define SG_DMA_BUS_ADDRESS	(1 << 0)
> +#define SG_DMA_USE_SWIOTLB	(1 << 1)
> +
> +#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
> +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
> +{
> +	return sg->dma_flags & SG_DMA_USE_SWIOTLB;
> +}
> +
> +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
> +{
> +	sg->dma_flags |= SG_DMA_USE_SWIOTLB;
> +}
> +#else
> +static inline bool sg_is_dma_use_swiotlb(struct scatterlist *sg)
> +{
> +	return false;
> +}
> +static inline void sg_dma_mark_use_swiotlb(struct scatterlist *sg)
> +{
> +}
> +#endif
> +
>   /*
>    * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
>    * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
> @@ -256,8 +279,6 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>    */
>   #ifdef CONFIG_PCI_P2PDMA
>   
> -#define SG_DMA_BUS_ADDRESS (1 << 0)
> -
>   /**
>    * sg_dma_is_bus address - Return whether a given segment was marked
>    *			   as a bus address


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-05-23 15:48 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 17:33 [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 01/15] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
2023-05-19 15:49   ` Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 02/15] dma: Allow dma_get_cache_alignment() to return the smaller cache_line_size() Catalin Marinas
2023-05-20  5:42   ` Christoph Hellwig
2023-05-20  6:14     ` Christoph Hellwig
2023-05-20 10:34       ` Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 03/15] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 04/15] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 05/15] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
2023-05-19  9:41   ` Greg Kroah-Hartman
2023-05-18 17:33 ` [PATCH v4 06/15] drivers/gpu: " Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 07/15] drivers/usb: " Catalin Marinas
2023-05-19  9:41   ` Greg Kroah-Hartman
2023-05-18 17:33 ` [PATCH v4 08/15] drivers/spi: " Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 09/15] drivers/md: " Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 10/15] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
2023-05-18 17:33 ` [PATCH v4 11/15] scatterlist: Add dedicated config for DMA flags Catalin Marinas
2023-05-20  5:42   ` Christoph Hellwig
2023-05-18 17:34 ` [PATCH v4 12/15] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
2023-05-20  5:44   ` Christoph Hellwig
2023-05-18 17:34 ` [PATCH v4 13/15] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
2023-05-19 12:29   ` Robin Murphy
2023-05-19 14:02     ` Catalin Marinas
2023-05-19 15:46       ` Catalin Marinas
2023-05-19 17:09       ` Robin Murphy
2023-05-22  7:27         ` Catalin Marinas
2023-05-23 15:47           ` Robin Murphy
2023-05-18 17:34 ` [PATCH v4 14/15] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
2023-05-19 11:00   ` Catalin Marinas
2023-05-18 17:34 ` [PATCH v4 15/15] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
2023-05-18 17:56 ` [PATCH v4 00/15] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Linus Torvalds
2023-05-18 18:13   ` Ard Biesheuvel
2023-05-18 18:50     ` Catalin Marinas
2023-05-18 18:46   ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).