iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
@ 2023-05-31 15:48 Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
                   ` (17 more replies)
  0 siblings, 18 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

Hi,

Here's version 6 of the series reducing the kmalloc() minimum alignment
on arm64 to 8 (from 128). There are patches already to do the same for
riscv (pretty straight-forward after this series).

The first 11 patches decouple ARCH_KMALLOC_MINALIGN from
ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those
aligned to the run-time probed cache_line_size(). On arm64 we gain the
kmalloc-{64,192} caches.

The subsequent patches (11 to 17) further reduce the kmalloc() caches to
kmalloc-{8,16,32,96} if the default swiotlb is present by bouncing small
buffers in the DMA API.

Changes since v5:

- Renaming of the sg_* accessors for consistency.

- IIO_DMA_MINALIGN defined to ARCH_DMA_MINALIGN (missed it in previous
  versions).

- Modified Robin's patch 11 to use #ifdef CONFIG_NEED_SG_DMA_FLAGS
  instead of CONFIG_PCI_P2PDMA in scatterlist.h.

- Added the new sg_dma_*_swiotlb() under the same #ifdef as above.

The updated patches are also available on this branch:

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/kmalloc-minalign

Thanks.

Catalin Marinas (15):
  mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  dma: Allow dma_get_cache_alignment() to be overridden by the arch code
  mm/slab: Simplify create_kmalloc_cache() args and make it static
  mm/slab: Limit kmalloc() minimum alignment to
    dma_get_cache_alignment()
  drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/gpu: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  drivers/spi: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  dm-crypt: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  iio: core: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  arm64: Allow kmalloc() caches aligned to the smaller cache_line_size()
  dma-mapping: Force bouncing if the kmalloc() size is not
    cache-line-aligned
  iommu/dma: Force bouncing if the size is not cacheline-aligned
  mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing
    possible
  arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64

Robin Murphy (2):
  scatterlist: Add dedicated config for DMA flags
  dma-mapping: Name SG DMA flag helpers consistently

 arch/arm64/Kconfig             |  1 +
 arch/arm64/include/asm/cache.h |  3 ++
 arch/arm64/mm/init.c           |  7 +++-
 drivers/base/devres.c          |  6 ++--
 drivers/gpu/drm/drm_managed.c  |  6 ++--
 drivers/iommu/Kconfig          |  1 +
 drivers/iommu/dma-iommu.c      | 58 ++++++++++++++++++++++++--------
 drivers/iommu/iommu.c          |  2 +-
 drivers/md/dm-crypt.c          |  2 +-
 drivers/pci/Kconfig            |  1 +
 drivers/spi/spidev.c           |  2 +-
 drivers/usb/core/buffer.c      |  8 ++---
 include/linux/dma-map-ops.h    | 61 ++++++++++++++++++++++++++++++++++
 include/linux/dma-mapping.h    |  4 ++-
 include/linux/iio/iio.h        |  2 +-
 include/linux/scatterlist.h    | 60 ++++++++++++++++++++++++++-------
 include/linux/slab.h           | 14 ++++++--
 kernel/dma/Kconfig             |  7 ++++
 kernel/dma/direct.c            |  2 +-
 kernel/dma/direct.h            |  3 +-
 mm/slab.c                      |  6 +---
 mm/slab.h                      |  5 ++-
 mm/slab_common.c               | 46 +++++++++++++++++++------
 23 files changed, 243 insertions(+), 64 deletions(-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-06-09 12:32   ` Vlastimil Babka
  2023-05-31 15:48 ` [PATCH v6 02/17] dma: Allow dma_get_cache_alignment() to be overridden by the arch code Catalin Marinas
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

In preparation for supporting a kmalloc() minimum alignment smaller than
the arch DMA alignment, decouple the two definitions. This requires that
either the kmalloc() caches are aligned to a (run-time) cache-line size
or the DMA API bounces unaligned kmalloc() allocations. Subsequent
patches will implement both options.

After this patch, ARCH_DMA_MINALIGN is expected to be used in static
alignment annotations and defined by an architecture to be the maximum
alignment for all supported configurations/SoCs in a single Image.
Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to
define its value in the arch headers.

Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in
dma_get_cache_alignment() so that there is no change for architectures
not requiring a minimum DMA alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 include/linux/dma-mapping.h |  2 +-
 include/linux/slab.h        | 14 +++++++++++---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0ee20b764000..3288a1339271 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -545,7 +545,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
 
 static inline int dma_get_cache_alignment(void)
 {
-#ifdef ARCH_DMA_MINALIGN
+#ifdef ARCH_HAS_DMA_MINALIGN
 	return ARCH_DMA_MINALIGN;
 #endif
 	return 1;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 6b3e155b70bf..50dcf9cfbf62 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -235,12 +235,20 @@ void kmem_dump_obj(void *object);
  * alignment larger than the alignment of a 64-bit integer.
  * Setting ARCH_DMA_MINALIGN in arch headers allows that.
  */
-#if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
+#ifdef ARCH_DMA_MINALIGN
+#define ARCH_HAS_DMA_MINALIGN
+#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
 #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
-#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
-#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
+#endif
 #else
+#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
+#endif
+
+#ifndef ARCH_KMALLOC_MINALIGN
 #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
+#elif ARCH_KMALLOC_MINALIGN > 8
+#define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN
+#define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE)
 #endif
 
 /*

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 02/17] dma: Allow dma_get_cache_alignment() to be overridden by the arch code
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 03/17] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

On arm64, ARCH_DMA_MINALIGN is larger than most cache line size
configurations deployed. Allow an architecture to override
dma_get_cache_alignment() in order to return a run-time probed value
(e.g. cache_line_size()).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/dma-mapping.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 3288a1339271..c41019289223 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -543,6 +543,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
 	return 0;
 }
 
+#ifndef dma_get_cache_alignment
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_HAS_DMA_MINALIGN
@@ -550,6 +551,7 @@ static inline int dma_get_cache_alignment(void)
 #endif
 	return 1;
 }
+#endif
 
 static inline void *dmam_alloc_coherent(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 03/17] mm/slab: Simplify create_kmalloc_cache() args and make it static
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 02/17] dma: Allow dma_get_cache_alignment() to be overridden by the arch code Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-06-09 13:03   ` Vlastimil Babka
  2023-05-31 15:48 ` [PATCH v6 04/17] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

In the slab variant of kmem_cache_init(), call new_kmalloc_cache()
instead of initialising the kmalloc_caches array directly. With this,
create_kmalloc_cache() is now only called from new_kmalloc_cache() in
the same file, so make it static. In addition, the useroffset argument
is always 0 while usersize is the same as size. Remove them.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slab.c        |  6 +-----
 mm/slab.h        |  5 ++---
 mm/slab_common.c | 14 ++++++--------
 3 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index bb57f7fdbae1..b7817dcba63e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1240,11 +1240,7 @@ void __init kmem_cache_init(void)
 	 * Initialize the caches that provide memory for the  kmem_cache_node
 	 * structures first.  Without this, further allocations will bug.
 	 */
-	kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache(
-				kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL],
-				kmalloc_info[INDEX_NODE].size,
-				ARCH_KMALLOC_FLAGS, 0,
-				kmalloc_info[INDEX_NODE].size);
+	new_kmalloc_cache(INDEX_NODE, KMALLOC_NORMAL, ARCH_KMALLOC_FLAGS);
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
diff --git a/mm/slab.h b/mm/slab.h
index f01ac256a8f5..592590fcddae 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -255,9 +255,8 @@ gfp_t kmalloc_fix_flags(gfp_t flags);
 /* Functions provided by the slab allocators */
 int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags);
 
-struct kmem_cache *create_kmalloc_cache(const char *name, unsigned int size,
-			slab_flags_t flags, unsigned int useroffset,
-			unsigned int usersize);
+void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type,
+			      slab_flags_t flags);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
 			unsigned int size, slab_flags_t flags,
 			unsigned int useroffset, unsigned int usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 607249785c07..7f069159aee2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -658,17 +658,16 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
 	s->refcount = -1;	/* Exempt from merging for now */
 }
 
-struct kmem_cache *__init create_kmalloc_cache(const char *name,
-		unsigned int size, slab_flags_t flags,
-		unsigned int useroffset, unsigned int usersize)
+static struct kmem_cache *__init create_kmalloc_cache(const char *name,
+						      unsigned int size,
+						      slab_flags_t flags)
 {
 	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags | SLAB_KMALLOC, useroffset,
-								usersize);
+	create_boot_cache(s, name, size, flags | SLAB_KMALLOC, 0, size);
 	list_add(&s->list, &slab_caches);
 	s->refcount = 1;
 	return s;
@@ -863,7 +862,7 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
-static void __init
+void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 {
 	if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) {
@@ -880,8 +879,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 
 	kmalloc_caches[type][idx] = create_kmalloc_cache(
 					kmalloc_info[idx].name[type],
-					kmalloc_info[idx].size, flags, 0,
-					kmalloc_info[idx].size);
+					kmalloc_info[idx].size, flags);
 
 	/*
 	 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 04/17] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment()
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (2 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 03/17] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-06-09 14:33   ` Vlastimil Babka
  2023-05-31 15:48 ` [PATCH v6 05/17] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

Do not create kmalloc() caches which are not aligned to
dma_get_cache_alignment(). There is no functional change since for
current architectures defining ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN
equals ARCH_DMA_MINALIGN (and dma_get_cache_alignment()). On
architectures without a specific ARCH_DMA_MINALIGN,
dma_get_cache_alignment() is 1, so no change to the kmalloc() caches.

If an architecture selects ARCH_HAS_DMA_CACHE_LINE_SIZE (introduced
previously), the kmalloc() caches will be aligned to a cache line size.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 mm/slab_common.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7f069159aee2..7c6475847fdf 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -17,6 +17,7 @@
 #include <linux/cpu.h>
 #include <linux/uaccess.h>
 #include <linux/seq_file.h>
+#include <linux/dma-mapping.h>
 #include <linux/proc_fs.h>
 #include <linux/debugfs.h>
 #include <linux/kasan.h>
@@ -862,9 +863,18 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
+static unsigned int __kmalloc_minalign(void)
+{
+	return dma_get_cache_alignment();
+}
+
 void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 {
+	unsigned int minalign = __kmalloc_minalign();
+	unsigned int aligned_size = kmalloc_info[idx].size;
+	int aligned_idx = idx;
+
 	if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) {
 		flags |= SLAB_RECLAIM_ACCOUNT;
 	} else if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type == KMALLOC_CGROUP)) {
@@ -877,9 +887,17 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
 		flags |= SLAB_CACHE_DMA;
 	}
 
-	kmalloc_caches[type][idx] = create_kmalloc_cache(
-					kmalloc_info[idx].name[type],
-					kmalloc_info[idx].size, flags);
+	if (minalign > ARCH_KMALLOC_MINALIGN) {
+		aligned_size = ALIGN(aligned_size, minalign);
+		aligned_idx = __kmalloc_index(aligned_size, false);
+	}
+
+	if (!kmalloc_caches[type][aligned_idx])
+		kmalloc_caches[type][aligned_idx] = create_kmalloc_cache(
+					kmalloc_info[aligned_idx].name[type],
+					aligned_size, flags);
+	if (idx != aligned_idx)
+		kmalloc_caches[type][idx] = kmalloc_caches[type][aligned_idx];
 
 	/*
 	 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 05/17] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (3 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 04/17] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 06/17] drivers/gpu: " Catalin Marinas
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
---
 drivers/base/devres.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 5c998cfac335..3df0025d12aa 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -29,10 +29,10 @@ struct devres {
 	 * Some archs want to perform DMA into kmalloc caches
 	 * and need a guaranteed alignment larger than
 	 * the alignment of a 64-bit integer.
-	 * Thus we use ARCH_KMALLOC_MINALIGN here and get exactly the same
-	 * buffer alignment as if it was allocated by plain kmalloc().
+	 * Thus we use ARCH_DMA_MINALIGN for data[] which will force the same
+	 * alignment for struct devres when allocated by kmalloc().
 	 */
-	u8 __aligned(ARCH_KMALLOC_MINALIGN) data[];
+	u8 __aligned(ARCH_DMA_MINALIGN) data[];
 };
 
 struct devres_group {

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 06/17] drivers/gpu: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (4 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 05/17] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 07/17] drivers/usb: " Catalin Marinas
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/drm_managed.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c
index 4cf214de50c4..3a5802f60e65 100644
--- a/drivers/gpu/drm/drm_managed.c
+++ b/drivers/gpu/drm/drm_managed.c
@@ -49,10 +49,10 @@ struct drmres {
 	 * Some archs want to perform DMA into kmalloc caches
 	 * and need a guaranteed alignment larger than
 	 * the alignment of a 64-bit integer.
-	 * Thus we use ARCH_KMALLOC_MINALIGN here and get exactly the same
-	 * buffer alignment as if it was allocated by plain kmalloc().
+	 * Thus we use ARCH_DMA_MINALIGN for data[] which will force the same
+	 * alignment for struct drmres when allocated by kmalloc().
 	 */
-	u8 __aligned(ARCH_KMALLOC_MINALIGN) data[];
+	u8 __aligned(ARCH_DMA_MINALIGN) data[];
 };
 
 static void free_dr(struct drmres *dr)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 07/17] drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (5 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 06/17] drivers/gpu: " Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 08/17] drivers/spi: " Catalin Marinas
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/core/buffer.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
index fbb087b728dc..e21d8d106977 100644
--- a/drivers/usb/core/buffer.c
+++ b/drivers/usb/core/buffer.c
@@ -34,13 +34,13 @@ void __init usb_init_pool_max(void)
 {
 	/*
 	 * The pool_max values must never be smaller than
-	 * ARCH_KMALLOC_MINALIGN.
+	 * ARCH_DMA_MINALIGN.
 	 */
-	if (ARCH_KMALLOC_MINALIGN <= 32)
+	if (ARCH_DMA_MINALIGN <= 32)
 		;			/* Original value is okay */
-	else if (ARCH_KMALLOC_MINALIGN <= 64)
+	else if (ARCH_DMA_MINALIGN <= 64)
 		pool_max[0] = 64;
-	else if (ARCH_KMALLOC_MINALIGN <= 128)
+	else if (ARCH_DMA_MINALIGN <= 128)
 		pool_max[0] = 0;	/* Don't use this pool */
 	else
 		BUILD_BUG();		/* We don't allow this */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 08/17] drivers/spi: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (6 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 07/17] drivers/usb: " Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 09/17] dm-crypt: " Catalin Marinas
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Brown <broonie@kernel.org>
---
 drivers/spi/spidev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 39d94c850839..8d009275a59d 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -237,7 +237,7 @@ static int spidev_message(struct spidev_data *spidev,
 		/* Ensure that also following allocations from rx_buf/tx_buf will meet
 		 * DMA alignment requirements.
 		 */
-		unsigned int len_aligned = ALIGN(u_tmp->len, ARCH_KMALLOC_MINALIGN);
+		unsigned int len_aligned = ALIGN(u_tmp->len, ARCH_DMA_MINALIGN);
 
 		k_tmp->len = u_tmp->len;
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 09/17] dm-crypt: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (7 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 08/17] drivers/spi: " Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 10/17] iio: core: " Catalin Marinas
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-crypt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 8b47b913ee83..ebbd8f7db880 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3256,7 +3256,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	cc->per_bio_data_size = ti->per_io_data_size =
 		ALIGN(sizeof(struct dm_crypt_io) + cc->dmreq_start + additional_req_size,
-		      ARCH_KMALLOC_MINALIGN);
+		      ARCH_DMA_MINALIGN);
 
 	ret = mempool_init(&cc->page_pool, BIO_MAX_VECS, crypt_page_alloc, crypt_page_free, cc);
 	if (ret) {

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 10/17] iio: core: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (8 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 09/17] dm-crypt: " Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-06-02 11:19   ` Jonathan Cameron
  2023-05-31 15:48 ` [PATCH v6 11/17] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel, Lars-Peter Clausen

ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
alignment.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
---
 include/linux/iio/iio.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
index 81413cd3a3e7..d28a5e8097e4 100644
--- a/include/linux/iio/iio.h
+++ b/include/linux/iio/iio.h
@@ -722,7 +722,7 @@ static inline void *iio_device_get_drvdata(const struct iio_dev *indio_dev)
  * must not share  cachelines with the rest of the structure, thus making
  * them safe for use with non-coherent DMA.
  */
-#define IIO_DMA_MINALIGN ARCH_KMALLOC_MINALIGN
+#define IIO_DMA_MINALIGN ARCH_DMA_MINALIGN
 struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv);
 
 /* The information at the returned address is guaranteed to be cacheline aligned */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 11/17] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size()
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (9 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 10/17] iio: core: " Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 12/17] scatterlist: Add dedicated config for DMA flags Catalin Marinas
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

On arm64, ARCH_DMA_MINALIGN is 128, larger than the cache line size on
most of the current platforms (typically 64). Define
ARCH_KMALLOC_MINALIGN to 8 (the default for architectures without their
own ARCH_DMA_MINALIGN) and override dma_get_cache_alignment() to return
cache_line_size(), probed at run-time. The kmalloc() caches will be
limited to the cache line size. This will allow the additional
kmalloc-{64,192} caches on most arm64 platforms.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/cache.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a51e6e8f3171..ceb368d33bf4 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -33,6 +33,7 @@
  * the CPU.
  */
 #define ARCH_DMA_MINALIGN	(128)
+#define ARCH_KMALLOC_MINALIGN	(8)
 
 #ifndef __ASSEMBLY__
 
@@ -90,6 +91,8 @@ static inline int cache_line_size_of_cpu(void)
 
 int cache_line_size(void);
 
+#define dma_get_cache_alignment	cache_line_size
+
 /*
  * Read the effective value of CTR_EL0.
  *

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 12/17] scatterlist: Add dedicated config for DMA flags
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (10 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 11/17] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 13/17] dma-mapping: Name SG DMA flag helpers consistently Catalin Marinas
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

From: Robin Murphy <robin.murphy@arm.com>

The DMA flags field will be useful for users beyond PCI P2P, so upgrade
to its own dedicated config option.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[catalin.marinas@arm.com: use #ifdef CONFIG_NEED_SG_DMA_FLAGS in scatterlist.h]
[catalin.marinas@arm.com: update PCI_P2PDMA dma_flags comment in scatterlist.h]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/pci/Kconfig         |  1 +
 include/linux/scatterlist.h | 13 ++++++-------
 kernel/dma/Kconfig          |  3 +++
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 9309f2469b41..3c07d8d214b3 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -168,6 +168,7 @@ config PCI_P2PDMA
 	#
 	depends on 64BIT
 	select GENERIC_ALLOCATOR
+	select NEED_SG_DMA_FLAGS
 	help
 	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
 	  BARs that are exposed in other devices that are the part of
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 375a5e90d86a..19833fd4113b 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -16,7 +16,7 @@ struct scatterlist {
 #ifdef CONFIG_NEED_SG_DMA_LENGTH
 	unsigned int	dma_length;
 #endif
-#ifdef CONFIG_PCI_P2PDMA
+#ifdef CONFIG_NEED_SG_DMA_FLAGS
 	unsigned int    dma_flags;
 #endif
 };
@@ -249,12 +249,11 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 }
 
 /*
- * CONFGI_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
- * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
- * Use this padding for DMA flags bits to indicate when a specific
- * dma address is a bus address.
+ * One 64-bit architectures there is a 4-byte padding in struct scatterlist
+ * (assuming also CONFIG_NEED_SG_DMA_LENGTH is set). Use this padding for DMA
+ * flags bits to indicate when a specific dma address is a bus address.
  */
-#ifdef CONFIG_PCI_P2PDMA
+#ifdef CONFIG_NEED_SG_DMA_FLAGS
 
 #define SG_DMA_BUS_ADDRESS (1 << 0)
 
@@ -312,7 +311,7 @@ static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
 {
 }
 
-#endif
+#endif	/* CONFIG_NEED_SG_DMA_FLAGS */
 
 /**
  * sg_phys - Return physical address of an sg entry
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 6677d0e64d27..acc6f231259c 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -24,6 +24,9 @@ config DMA_OPS_BYPASS
 config ARCH_HAS_DMA_MAP_DIRECT
 	bool
 
+config NEED_SG_DMA_FLAGS
+	bool
+
 config NEED_SG_DMA_LENGTH
 	bool
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 13/17] dma-mapping: Name SG DMA flag helpers consistently
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (11 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 12/17] scatterlist: Add dedicated config for DMA flags Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 14/17] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel, Jerry Snitselaar,
	Logan Gunthorpe

From: Robin Murphy <robin.murphy@arm.com>

sg_is_dma_bus_address() is inconsistent with the naming pattern of its
corresponding setters and its own kerneldoc, so take the majority vote
and rename it sg_dma_is_bus_address() (and fix up the missing
underscores in the kerneldoc too). This gives us a nice clear pattern
where SG DMA flags are SG_DMA_<NAME>, and the helpers for acting on them
are sg_dma_<action>_<name>().

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://lore.kernel.org/r/fa2eca2862c7ffc41b50337abffb2dfd2864d3ea.1685036694.git.robin.murphy@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 drivers/iommu/dma-iommu.c   | 8 ++++----
 drivers/iommu/iommu.c       | 2 +-
 include/linux/scatterlist.h | 8 ++++----
 kernel/dma/direct.c         | 2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7a9f0b0bddbd..b8bba4aa196f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1080,7 +1080,7 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
 		sg_dma_address(s) = DMA_MAPPING_ERROR;
 		sg_dma_len(s) = 0;
 
-		if (sg_is_dma_bus_address(s)) {
+		if (sg_dma_is_bus_address(s)) {
 			if (i > 0)
 				cur = sg_next(cur);
 
@@ -1136,7 +1136,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (sg_is_dma_bus_address(s)) {
+		if (sg_dma_is_bus_address(s)) {
 			sg_dma_unmark_bus_address(s);
 		} else {
 			if (sg_dma_address(s) != DMA_MAPPING_ERROR)
@@ -1329,7 +1329,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	 * just have to be determined.
 	 */
 	for_each_sg(sg, tmp, nents, i) {
-		if (sg_is_dma_bus_address(tmp)) {
+		if (sg_dma_is_bus_address(tmp)) {
 			sg_dma_unmark_bus_address(tmp);
 			continue;
 		}
@@ -1343,7 +1343,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 
 	nents -= i;
 	for_each_sg(tmp, tmp, nents, i) {
-		if (sg_is_dma_bus_address(tmp)) {
+		if (sg_dma_is_bus_address(tmp)) {
 			sg_dma_unmark_bus_address(tmp);
 			continue;
 		}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f1dcfa3f1a1b..eb620552967b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2567,7 +2567,7 @@ ssize_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			len = 0;
 		}
 
-		if (sg_is_dma_bus_address(sg))
+		if (sg_dma_is_bus_address(sg))
 			goto next;
 
 		if (len) {
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 19833fd4113b..2f06178996ba 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -258,7 +258,7 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 #define SG_DMA_BUS_ADDRESS (1 << 0)
 
 /**
- * sg_dma_is_bus address - Return whether a given segment was marked
+ * sg_dma_is_bus_address - Return whether a given segment was marked
  *			   as a bus address
  * @sg:		 SG entry
  *
@@ -266,13 +266,13 @@ static inline void sg_unmark_end(struct scatterlist *sg)
  *   Returns true if sg_dma_mark_bus_address() has been called on
  *   this segment.
  **/
-static inline bool sg_is_dma_bus_address(struct scatterlist *sg)
+static inline bool sg_dma_is_bus_address(struct scatterlist *sg)
 {
 	return sg->dma_flags & SG_DMA_BUS_ADDRESS;
 }
 
 /**
- * sg_dma_mark_bus address - Mark the scatterlist entry as a bus address
+ * sg_dma_mark_bus_address - Mark the scatterlist entry as a bus address
  * @sg:		 SG entry
  *
  * Description:
@@ -300,7 +300,7 @@ static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
 
 #else
 
-static inline bool sg_is_dma_bus_address(struct scatterlist *sg)
+static inline bool sg_dma_is_bus_address(struct scatterlist *sg)
 {
 	return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 5595d1d5cdcc..d29cade048db 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -463,7 +463,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 	int i;
 
 	for_each_sg(sgl,  sg, nents, i) {
-		if (sg_is_dma_bus_address(sg))
+		if (sg_dma_is_bus_address(sg))
 			sg_dma_unmark_bus_address(sg);
 		else
 			dma_direct_unmap_page(dev, sg->dma_address,

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 14/17] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (12 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 13/17] dma-mapping: Name SG DMA flag helpers consistently Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-05-31 15:48 ` [PATCH v6 15/17] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

For direct DMA, if the size is small enough to have originated from a
kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against
dma_get_cache_alignment() and bounce if necessary. For larger sizes, it
is the responsibility of the DMA API caller to ensure proper alignment.

At this point, the kmalloc() caches are properly aligned but this will
change in a subsequent patch.

Architectures can opt in by selecting ARCH_WANT_KMALLOC_DMA_BOUNCE.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 include/linux/dma-map-ops.h | 61 +++++++++++++++++++++++++++++++++++++
 kernel/dma/Kconfig          |  4 +++
 kernel/dma/direct.h         |  3 +-
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 31f114f486c4..9bf19b5bf755 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -8,6 +8,7 @@
 
 #include <linux/dma-mapping.h>
 #include <linux/pgtable.h>
+#include <linux/slab.h>
 
 struct cma;
 
@@ -277,6 +278,66 @@ static inline bool dev_is_dma_coherent(struct device *dev)
 }
 #endif /* CONFIG_ARCH_HAS_DMA_COHERENCE_H */
 
+/*
+ * Check whether potential kmalloc() buffers are safe for non-coherent DMA.
+ */
+static inline bool dma_kmalloc_safe(struct device *dev,
+				    enum dma_data_direction dir)
+{
+	/*
+	 * If DMA bouncing of kmalloc() buffers is disabled, the kmalloc()
+	 * caches have already been aligned to a DMA-safe size.
+	 */
+	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC))
+		return true;
+
+	/*
+	 * kmalloc() buffers are DMA-safe irrespective of size if the device
+	 * is coherent or the direction is DMA_TO_DEVICE (non-desctructive
+	 * cache maintenance and benign cache line evictions).
+	 */
+	if (dev_is_dma_coherent(dev) || dir == DMA_TO_DEVICE)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check whether the given size, assuming it is for a kmalloc()'ed buffer, is
+ * sufficiently aligned for non-coherent DMA.
+ */
+static inline bool dma_kmalloc_size_aligned(size_t size)
+{
+	/*
+	 * Larger kmalloc() sizes are guaranteed to be aligned to
+	 * ARCH_DMA_MINALIGN.
+	 */
+	if (size >= 2 * ARCH_DMA_MINALIGN ||
+	    IS_ALIGNED(kmalloc_size_roundup(size), dma_get_cache_alignment()))
+		return true;
+
+	return false;
+}
+
+/*
+ * Check whether the given object size may have originated from a kmalloc()
+ * buffer with a slab alignment below the DMA-safe alignment and needs
+ * bouncing for non-coherent DMA. The pointer alignment is not considered and
+ * in-structure DMA-safe offsets are the responsibility of the caller. Such
+ * code should use the static ARCH_DMA_MINALIGN for compiler annotations.
+ *
+ * The heuristics can have false positives, bouncing unnecessarily, though the
+ * buffers would be small. False negatives are theoretically possible if, for
+ * example, multiple small kmalloc() buffers are coalesced into a larger
+ * buffer that passes the alignment check. There are no such known constructs
+ * in the kernel.
+ */
+static inline bool dma_kmalloc_needs_bounce(struct device *dev, size_t size,
+					    enum dma_data_direction dir)
+{
+	return !dma_kmalloc_safe(dev, dir) && !dma_kmalloc_size_aligned(size);
+}
+
 void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index acc6f231259c..abea1823fe21 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -90,6 +90,10 @@ config SWIOTLB
 	bool
 	select NEED_DMA_MAP_STATE
 
+config DMA_BOUNCE_UNALIGNED_KMALLOC
+	bool
+	depends on SWIOTLB
+
 config DMA_RESTRICTED_POOL
 	bool "DMA Restricted Pool"
 	depends on OF && OF_RESERVED_MEM && SWIOTLB
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index e38ffc5e6bdd..97ec892ea0b5 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -94,7 +94,8 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
 		return swiotlb_map(dev, phys, size, dir, attrs);
 	}
 
-	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
+	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
+	    dma_kmalloc_needs_bounce(dev, size, dir)) {
 		if (is_pci_p2pdma_page(page))
 			return DMA_MAPPING_ERROR;
 		if (is_swiotlb_active(dev))

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 15/17] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (13 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 14/17] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-06-09 11:52   ` Robin Murphy
  2023-05-31 15:48 ` [PATCH v6 16/17] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

Similarly to the direct DMA, bounce small allocations as they may have
originated from a kmalloc() cache not safe for DMA. Unlike the direct
DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
non-coherent devices as this would break some cases where the iova is
expected to be contiguous (dmabuf). Instead, scan the scatterlist for
any small sizes and only go the swiotlb path if any element of the list
needs bouncing (note that iommu_dma_map_page() would still only bounce
those buffers which are not DMA-aligned).

To avoid scanning the scatterlist on the 'sync' operations, introduce an
SG_DMA_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The
dev_use_swiotlb() function together with the newly added
dev_use_sg_swiotlb() now check for both untrusted devices and unaligned
kmalloc() buffers (suggested by Robin Murphy).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/Kconfig       |  1 +
 drivers/iommu/dma-iommu.c   | 50 ++++++++++++++++++++++++++++++-------
 include/linux/scatterlist.h | 41 ++++++++++++++++++++++++++++--
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index db98c3f86e8c..670eff7a8e11 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -152,6 +152,7 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select IRQ_MSI_IOMMU
 	select NEED_SG_DMA_LENGTH
+	select NEED_SG_DMA_FLAGS if SWIOTLB
 
 # Shared Virtual Addressing
 config IOMMU_SVA
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b8bba4aa196f..6eaac5123839 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -520,9 +520,38 @@ static bool dev_is_untrusted(struct device *dev)
 	return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
-static bool dev_use_swiotlb(struct device *dev)
+static bool dev_use_swiotlb(struct device *dev, size_t size,
+			    enum dma_data_direction dir)
 {
-	return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
+	return IS_ENABLED(CONFIG_SWIOTLB) &&
+		(dev_is_untrusted(dev) ||
+		 dma_kmalloc_needs_bounce(dev, size, dir));
+}
+
+static bool dev_use_sg_swiotlb(struct device *dev, struct scatterlist *sg,
+			       int nents, enum dma_data_direction dir)
+{
+	struct scatterlist *s;
+	int i;
+
+	if (!IS_ENABLED(CONFIG_SWIOTLB))
+		return false;
+
+	if (dev_is_untrusted(dev))
+		return true;
+
+	/*
+	 * If kmalloc() buffers are not DMA-safe for this device and
+	 * direction, check the individual lengths in the sg list. If any
+	 * element is deemed unsafe, use the swiotlb for bouncing.
+	 */
+	if (!dma_kmalloc_safe(dev, dir)) {
+		for_each_sg(sg, s, nents, i)
+			if (!dma_kmalloc_size_aligned(s->length))
+				return true;
+	}
+
+	return false;
 }
 
 /**
@@ -922,7 +951,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
+	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -938,7 +967,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
+	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -956,7 +985,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (sg_dma_use_swiotlb(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
 						      sg->length, dir);
@@ -972,7 +1001,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (sg_dma_use_swiotlb(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_device(dev,
 							 sg_dma_address(sg),
@@ -998,7 +1027,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	 * If both the physical buffer start address and size are
 	 * page aligned, we don't need to use a bounce page.
 	 */
-	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
+	if (dev_use_swiotlb(dev, size, dir) &&
+	    iova_offset(iovad, phys | size)) {
 		void *padding_start;
 		size_t padding_size, aligned_size;
 
@@ -1166,6 +1196,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *s;
 	int i;
 
+	sg_dma_mark_swiotlb(sg);
+
 	for_each_sg(sg, s, nents, i) {
 		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
 				s->offset, s->length, dir, attrs);
@@ -1210,7 +1242,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 			goto out;
 	}
 
-	if (dev_use_swiotlb(dev))
+	if (dev_use_sg_swiotlb(dev, sg, nents, dir))
 		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
@@ -1315,7 +1347,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *tmp;
 	int i;
 
-	if (dev_use_swiotlb(dev)) {
+	if (sg_dma_use_swiotlb(sg)) {
 		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
 		return;
 	}
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 2f06178996ba..69d87e312263 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -251,11 +251,13 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 /*
  * One 64-bit architectures there is a 4-byte padding in struct scatterlist
  * (assuming also CONFIG_NEED_SG_DMA_LENGTH is set). Use this padding for DMA
- * flags bits to indicate when a specific dma address is a bus address.
+ * flags bits to indicate when a specific dma address is a bus address or the
+ * buffer may have been bounced via SWIOTLB.
  */
 #ifdef CONFIG_NEED_SG_DMA_FLAGS
 
-#define SG_DMA_BUS_ADDRESS (1 << 0)
+#define SG_DMA_BUS_ADDRESS	(1 << 0)
+#define SG_DMA_SWIOTLB		(1 << 1)
 
 /**
  * sg_dma_is_bus_address - Return whether a given segment was marked
@@ -298,6 +300,34 @@ static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
 	sg->dma_flags &= ~SG_DMA_BUS_ADDRESS;
 }
 
+/**
+ * sg_dma_use_swiotlb - Return whether the scatterlist was marked for SWIOTLB
+ *			bouncing
+ * @sg:		SG entry
+ *
+ * Description:
+ *   Returns true if the scatterlist was marked for SWIOTLB bouncing. Not all
+ *   elements may have been bounced, so the caller would have to check
+ *   individual SG entries with is_swiotlb_buffer().
+ */
+static inline bool sg_dma_use_swiotlb(struct scatterlist *sg)
+{
+	return sg->dma_flags & SG_DMA_SWIOTLB;
+}
+
+/**
+ * sg_dma_use_swiotlb - Mark the scatterlist for SWIOTLB bouncing
+ * @sg:		SG entry
+ *
+ * Description:
+ *   Marks a a scatterlist for SWIOTLB bounce. Not all SG entries may be
+ *   bounced.
+ */
+static inline void sg_dma_mark_swiotlb(struct scatterlist *sg)
+{
+	sg->dma_flags |= SG_DMA_SWIOTLB;
+}
+
 #else
 
 static inline bool sg_dma_is_bus_address(struct scatterlist *sg)
@@ -310,6 +340,13 @@ static inline void sg_dma_mark_bus_address(struct scatterlist *sg)
 static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
 {
 }
+static inline bool sg_dma_use_swiotlb(struct scatterlist *sg)
+{
+	return false;
+}
+static inline void sg_dma_mark_swiotlb(struct scatterlist *sg)
+{
+}
 
 #endif	/* CONFIG_NEED_SG_DMA_FLAGS */
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 16/17] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (14 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 15/17] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-06-09 14:39   ` Vlastimil Babka
  2023-05-31 15:48 ` [PATCH v6 17/17] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
  2023-06-08  5:45 ` [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Isaac Manjarres
  17 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

If an architecture opted in to DMA bouncing of unaligned kmalloc()
buffers (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc()
cache alignment below cache-line size to ARCH_KMALLOC_MINALIGN.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 mm/slab_common.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7c6475847fdf..fe46459a8b77 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -18,6 +18,7 @@
 #include <linux/uaccess.h>
 #include <linux/seq_file.h>
 #include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>
 #include <linux/proc_fs.h>
 #include <linux/debugfs.h>
 #include <linux/kasan.h>
@@ -863,10 +864,19 @@ void __init setup_kmalloc_cache_index_table(void)
 	}
 }
 
+#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
+static unsigned int __kmalloc_minalign(void)
+{
+	if (io_tlb_default_mem.nslabs)
+		return ARCH_KMALLOC_MINALIGN;
+	return dma_get_cache_alignment();
+}
+#else
 static unsigned int __kmalloc_minalign(void)
 {
 	return dma_get_cache_alignment();
 }
+#endif
 
 void __init
 new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 17/17] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (15 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 16/17] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
@ 2023-05-31 15:48 ` Catalin Marinas
  2023-06-08  5:45 ` [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Isaac Manjarres
  17 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

With the DMA bouncing of unaligned kmalloc() buffers now in place,
enable it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In
addition, always create the swiotlb buffer even when the end of RAM is
within the 32-bit physical address range (the swiotlb buffer can still
be disabled on the kernel command line).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig   | 1 +
 arch/arm64/mm/init.c | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1201d25a8a4..af42871431c0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -120,6 +120,7 @@ config ARM64
 	select CRC32
 	select DCACHE_WORD_ACCESS
 	select DYNAMIC_FTRACE if FUNCTION_TRACER
+	select DMA_BOUNCE_UNALIGNED_KMALLOC
 	select DMA_DIRECT_REMAP
 	select EDAC_SUPPORT
 	select FRAME_POINTER
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 66e70ca47680..3ac2e9d79ce4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -442,7 +442,12 @@ void __init bootmem_init(void)
  */
 void __init mem_init(void)
 {
-	swiotlb_init(max_pfn > PFN_DOWN(arm64_dma_phys_limit), SWIOTLB_VERBOSE);
+	bool swiotlb = max_pfn > PFN_DOWN(arm64_dma_phys_limit);
+
+	if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC))
+		swiotlb = true;
+
+	swiotlb_init(swiotlb, SWIOTLB_VERBOSE);
 
 	/* this will put all unused low memory onto the freelists */
 	memblock_free_all();

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 10/17] iio: core: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
  2023-05-31 15:48 ` [PATCH v6 10/17] iio: core: " Catalin Marinas
@ 2023-06-02 11:19   ` Jonathan Cameron
  0 siblings, 0 replies; 34+ messages in thread
From: Jonathan Cameron @ 2023-06-02 11:19 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron, linux-mm,
	iommu, linux-arm-kernel, Lars-Peter Clausen

On Wed, 31 May 2023 16:48:29 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
> operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
> alignment.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jonathan Cameron <jic23@kernel.org>
> Cc: Lars-Peter Clausen <lars@metafoo.de>

Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks.

Jonathan

> ---
>  include/linux/iio/iio.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
> index 81413cd3a3e7..d28a5e8097e4 100644
> --- a/include/linux/iio/iio.h
> +++ b/include/linux/iio/iio.h
> @@ -722,7 +722,7 @@ static inline void *iio_device_get_drvdata(const struct iio_dev *indio_dev)
>   * must not share  cachelines with the rest of the structure, thus making
>   * them safe for use with non-coherent DMA.
>   */
> -#define IIO_DMA_MINALIGN ARCH_KMALLOC_MINALIGN
> +#define IIO_DMA_MINALIGN ARCH_DMA_MINALIGN
>  struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv);
>  
>  /* The information at the returned address is guaranteed to be cacheline aligned */
> 
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
                   ` (16 preceding siblings ...)
  2023-05-31 15:48 ` [PATCH v6 17/17] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
@ 2023-06-08  5:45 ` Isaac Manjarres
  2023-06-08  8:05   ` Ard Biesheuvel
  17 siblings, 1 reply; 34+ messages in thread
From: Isaac Manjarres @ 2023-06-08  5:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Saravana Kannan, Alasdair Kergon,
	Daniel Vetter, Joerg Roedel, Mark Brown, Mike Snitzer,
	Rafael J. Wysocki, Jonathan Cameron, linux-mm, iommu,
	linux-arm-kernel

On Wed, May 31, 2023 at 8:48 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> Here's version 6 of the series reducing the kmalloc() minimum alignment
> on arm64 to 8 (from 128). There are patches already to do the same for
> riscv (pretty straight-forward after this series).
Thanks, Catalin for getting these patches out. Please add my "Tested-by:" tag
for the series:

Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com>

With the first 11 patches, I observed a reduction of 18.4 MB
in the slab memory footprint on my Pixel 6 device. After applying the
rest of the patches in the series, I observed a total reduction of
26.5 MB in the
slab memory footprint on my device. These are great results!

--Isaac

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-06-08  5:45 ` [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Isaac Manjarres
@ 2023-06-08  8:05   ` Ard Biesheuvel
  2023-06-08 21:29     ` Isaac Manjarres
  0 siblings, 1 reply; 34+ messages in thread
From: Ard Biesheuvel @ 2023-06-08  8:05 UTC (permalink / raw)
  To: Isaac Manjarres
  Cc: Catalin Marinas, Linus Torvalds, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Saravana Kannan, Alasdair Kergon,
	Daniel Vetter, Joerg Roedel, Mark Brown, Mike Snitzer,
	Rafael J. Wysocki, Jonathan Cameron, linux-mm, iommu,
	linux-arm-kernel

On Thu, 8 Jun 2023 at 07:45, Isaac Manjarres <isaacmanjarres@google.com> wrote:
>
> On Wed, May 31, 2023 at 8:48 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > Here's version 6 of the series reducing the kmalloc() minimum alignment
> > on arm64 to 8 (from 128). There are patches already to do the same for
> > riscv (pretty straight-forward after this series).
> Thanks, Catalin for getting these patches out. Please add my "Tested-by:" tag
> for the series:
>
> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com>
>
> With the first 11 patches, I observed a reduction of 18.4 MB
> in the slab memory footprint on my Pixel 6 device. After applying the
> rest of the patches in the series, I observed a total reduction of
> 26.5 MB in the
> slab memory footprint on my device. These are great results!
>

It would also be good to get an insight into how much bouncing is
going on in this case, given that (AFAIK) Pixel 6 uses non-cache
coherent DMA.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-06-08  8:05   ` Ard Biesheuvel
@ 2023-06-08 21:29     ` Isaac Manjarres
  2023-06-09  8:11       ` Petr Tesařík
  0 siblings, 1 reply; 34+ messages in thread
From: Isaac Manjarres @ 2023-06-08 21:29 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Catalin Marinas, Linus Torvalds, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Saravana Kannan, Alasdair Kergon,
	Daniel Vetter, Joerg Roedel, Mark Brown, Mike Snitzer,
	Rafael J. Wysocki, Jonathan Cameron, linux-mm, iommu,
	linux-arm-kernel

On Thu, Jun 08, 2023 at 10:05:58AM +0200, Ard Biesheuvel wrote:
> On Thu, 8 Jun 2023 at 07:45, Isaac Manjarres <isaacmanjarres@google.com> wrote:
> >
> > On Wed, May 31, 2023 at 8:48 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > Here's version 6 of the series reducing the kmalloc() minimum alignment
> > > on arm64 to 8 (from 128). There are patches already to do the same for
> > > riscv (pretty straight-forward after this series).
> > Thanks, Catalin for getting these patches out. Please add my "Tested-by:" tag
> > for the series:
> >
> > Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> >
> > With the first 11 patches, I observed a reduction of 18.4 MB
> > in the slab memory footprint on my Pixel 6 device. After applying the
> > rest of the patches in the series, I observed a total reduction of
> > 26.5 MB in the
> > slab memory footprint on my device. These are great results!
> >
> 
> It would also be good to get an insight into how much bouncing is
> going on in this case, given that (AFAIK) Pixel 6 uses non-cache
> coherent DMA.

I enabled the "swiotlb_bounced" trace event from the kernel command line
to see if anything was being bounced. It turns out that for Pixel 6
there are non-coherent DMA transfers occurring, but none of the
transfers that are in either the DMA_FROM_DEVICE or
DMA_BIDIRECTIONAL directions are small enough to require bouncing.

--Isaac

P.S. I noticed that the trace_swiotlb_bounced() tracepoint may not be
invoked even though bouncing occurs. For example, in the dma-iommu path,
swiotlb_tbl_map_single() is called when bouncing, instead of
swiotlb_map(), which is what ends up calling trace_swiotlb_bounced().

Would it make sense to move the call to trace_swiotlb_bounced() to
swiotlb_tbl_map_single() since that function is always invoked?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-06-08 21:29     ` Isaac Manjarres
@ 2023-06-09  8:11       ` Petr Tesařík
  2023-06-12  7:44         ` Tomonori Fujita
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Tesařík @ 2023-06-09  8:11 UTC (permalink / raw)
  To: Isaac Manjarres
  Cc: Ard Biesheuvel, Catalin Marinas, Linus Torvalds,
	Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Saravana Kannan, Alasdair Kergon, Daniel Vetter,
	Joerg Roedel, Mark Brown, Mike Snitzer, Rafael J. Wysocki,
	Jonathan Cameron, linux-mm, iommu, linux-arm-kernel,
	FUJITA Tomonori, Konrad Rzeszutek Wilk

On Thu, 8 Jun 2023 14:29:45 -0700
Isaac Manjarres <isaacmanjarres@google.com> wrote:

>[...]
> P.S. I noticed that the trace_swiotlb_bounced() tracepoint may not be
> invoked even though bouncing occurs. For example, in the dma-iommu path,
> swiotlb_tbl_map_single() is called when bouncing, instead of
> swiotlb_map(), which is what ends up calling trace_swiotlb_bounced().
> 
> Would it make sense to move the call to trace_swiotlb_bounced() to
> swiotlb_tbl_map_single() since that function is always invoked?

Definitely, if you ask me. I believe the change was merely forgotten in
commit eb605a5754d0 ("swiotlb: add swiotlb_tbl_map_single library
function").

Let me take the author into Cc. Plus Konrad, who built further on that
commit, may also have an opinion.

Petr T

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 15/17] iommu/dma: Force bouncing if the size is not cacheline-aligned
  2023-05-31 15:48 ` [PATCH v6 15/17] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
@ 2023-06-09 11:52   ` Robin Murphy
  0 siblings, 0 replies; 34+ messages in thread
From: Robin Murphy @ 2023-06-09 11:52 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Christoph Hellwig
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

On 2023-05-31 16:48, Catalin Marinas wrote:
[...]
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 2f06178996ba..69d87e312263 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -251,11 +251,13 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>   /*
>    * One 64-bit architectures there is a 4-byte padding in struct scatterlist
>    * (assuming also CONFIG_NEED_SG_DMA_LENGTH is set). Use this padding for DMA
> - * flags bits to indicate when a specific dma address is a bus address.
> + * flags bits to indicate when a specific dma address is a bus address or the
> + * buffer may have been bounced via SWIOTLB.
>    */
>   #ifdef CONFIG_NEED_SG_DMA_FLAGS
>   
> -#define SG_DMA_BUS_ADDRESS (1 << 0)
> +#define SG_DMA_BUS_ADDRESS	(1 << 0)
> +#define SG_DMA_SWIOTLB		(1 << 1)
>   
>   /**
>    * sg_dma_is_bus_address - Return whether a given segment was marked
> @@ -298,6 +300,34 @@ static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
>   	sg->dma_flags &= ~SG_DMA_BUS_ADDRESS;
>   }
>   
> +/**
> + * sg_dma_use_swiotlb - Return whether the scatterlist was marked for SWIOTLB
> + *			bouncing
> + * @sg:		SG entry
> + *
> + * Description:
> + *   Returns true if the scatterlist was marked for SWIOTLB bouncing. Not all
> + *   elements may have been bounced, so the caller would have to check
> + *   individual SG entries with is_swiotlb_buffer().
> + */
> +static inline bool sg_dma_use_swiotlb(struct scatterlist *sg)

Nit: since you tweaked the flag name again, we could happily go back to 
the pattern with s/use/is/ for this one now too.

> +{
> +	return sg->dma_flags & SG_DMA_SWIOTLB;
> +}
> +
> +/**
> + * sg_dma_use_swiotlb - Mark the scatterlist for SWIOTLB bouncing

Oops - s/use/mark/

Feel free to fix those up when applying if there's no other reason for a v7.

Thanks,
Robin.

> + * @sg:		SG entry
> + *
> + * Description:
> + *   Marks a a scatterlist for SWIOTLB bounce. Not all SG entries may be
> + *   bounced.
> + */
> +static inline void sg_dma_mark_swiotlb(struct scatterlist *sg)
> +{
> +	sg->dma_flags |= SG_DMA_SWIOTLB;
> +}
> +
>   #else
>   
>   static inline bool sg_dma_is_bus_address(struct scatterlist *sg)
> @@ -310,6 +340,13 @@ static inline void sg_dma_mark_bus_address(struct scatterlist *sg)
>   static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
>   {
>   }
> +static inline bool sg_dma_use_swiotlb(struct scatterlist *sg)
> +{
> +	return false;
> +}
> +static inline void sg_dma_mark_swiotlb(struct scatterlist *sg)
> +{
> +}
>   
>   #endif	/* CONFIG_NEED_SG_DMA_FLAGS */
>   

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  2023-05-31 15:48 ` [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
@ 2023-06-09 12:32   ` Vlastimil Babka
  2023-06-09 13:44     ` Catalin Marinas
  0 siblings, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2023-06-09 12:32 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel

On 5/31/23 17:48, Catalin Marinas wrote:
> In preparation for supporting a kmalloc() minimum alignment smaller than
> the arch DMA alignment, decouple the two definitions. This requires that
> either the kmalloc() caches are aligned to a (run-time) cache-line size
> or the DMA API bounces unaligned kmalloc() allocations. Subsequent
> patches will implement both options.
> 
> After this patch, ARCH_DMA_MINALIGN is expected to be used in static
> alignment annotations and defined by an architecture to be the maximum
> alignment for all supported configurations/SoCs in a single Image.
> Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to
> define its value in the arch headers.
> 
> Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in
> dma_get_cache_alignment() so that there is no change for architectures
> not requiring a minimum DMA alignment.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
>  include/linux/dma-mapping.h |  2 +-
>  include/linux/slab.h        | 14 +++++++++++---
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 0ee20b764000..3288a1339271 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -545,7 +545,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
>  
>  static inline int dma_get_cache_alignment(void)
>  {
> -#ifdef ARCH_DMA_MINALIGN
> +#ifdef ARCH_HAS_DMA_MINALIGN
>  	return ARCH_DMA_MINALIGN;
>  #endif
>  	return 1;
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 6b3e155b70bf..50dcf9cfbf62 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -235,12 +235,20 @@ void kmem_dump_obj(void *object);
>   * alignment larger than the alignment of a 64-bit integer.
>   * Setting ARCH_DMA_MINALIGN in arch headers allows that.
>   */
> -#if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
> +#ifdef ARCH_DMA_MINALIGN
> +#define ARCH_HAS_DMA_MINALIGN
> +#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
>  #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
> -#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
> -#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
> +#endif
>  #else
> +#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
> +#endif

It seems weird to make slab.h responsible for this part, especially for
#define ARCH_HAS_DMA_MINALIGN, which dma-mapping.h consumes. Maybe it would
be difficult to do differently due to some dependency hell, but minimally I
don't see dma-mapping.h including slab.h so the result is
include-order-dependent? Maybe it's included transitively, but then it's
fragile and would be better to do explicitly?

> +
> +#ifndef ARCH_KMALLOC_MINALIGN
>  #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
> +#elif ARCH_KMALLOC_MINALIGN > 8
> +#define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN
> +#define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE)
>  #endif
>  
>  /*
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 03/17] mm/slab: Simplify create_kmalloc_cache() args and make it static
  2023-05-31 15:48 ` [PATCH v6 03/17] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
@ 2023-06-09 13:03   ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2023-06-09 13:03 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel, David Rientjes,
	Christoph Lameter, Pekka Enberg, Joonsoo Kim, Hyeonggon Yoo,
	Roman Gushchin

On 5/31/23 17:48, Catalin Marinas wrote:
> In the slab variant of kmem_cache_init(), call new_kmalloc_cache()
> instead of initialising the kmalloc_caches array directly. With this,
> create_kmalloc_cache() is now only called from new_kmalloc_cache() in
> the same file, so make it static. In addition, the useroffset argument
> is always 0 while usersize is the same as size. Remove them.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Nice cleanup, thanks!

> ---
>  mm/slab.c        |  6 +-----
>  mm/slab.h        |  5 ++---
>  mm/slab_common.c | 14 ++++++--------
>  3 files changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index bb57f7fdbae1..b7817dcba63e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1240,11 +1240,7 @@ void __init kmem_cache_init(void)
>  	 * Initialize the caches that provide memory for the  kmem_cache_node
>  	 * structures first.  Without this, further allocations will bug.
>  	 */
> -	kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache(
> -				kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL],
> -				kmalloc_info[INDEX_NODE].size,
> -				ARCH_KMALLOC_FLAGS, 0,
> -				kmalloc_info[INDEX_NODE].size);
> +	new_kmalloc_cache(INDEX_NODE, KMALLOC_NORMAL, ARCH_KMALLOC_FLAGS);
>  	slab_state = PARTIAL_NODE;
>  	setup_kmalloc_cache_index_table();
>  
> diff --git a/mm/slab.h b/mm/slab.h
> index f01ac256a8f5..592590fcddae 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -255,9 +255,8 @@ gfp_t kmalloc_fix_flags(gfp_t flags);
>  /* Functions provided by the slab allocators */
>  int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags);
>  
> -struct kmem_cache *create_kmalloc_cache(const char *name, unsigned int size,
> -			slab_flags_t flags, unsigned int useroffset,
> -			unsigned int usersize);
> +void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type,
> +			      slab_flags_t flags);
>  extern void create_boot_cache(struct kmem_cache *, const char *name,
>  			unsigned int size, slab_flags_t flags,
>  			unsigned int useroffset, unsigned int usersize);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 607249785c07..7f069159aee2 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -658,17 +658,16 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
>  	s->refcount = -1;	/* Exempt from merging for now */
>  }
>  
> -struct kmem_cache *__init create_kmalloc_cache(const char *name,
> -		unsigned int size, slab_flags_t flags,
> -		unsigned int useroffset, unsigned int usersize)
> +static struct kmem_cache *__init create_kmalloc_cache(const char *name,
> +						      unsigned int size,
> +						      slab_flags_t flags)
>  {
>  	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
>  
>  	if (!s)
>  		panic("Out of memory when creating slab %s\n", name);
>  
> -	create_boot_cache(s, name, size, flags | SLAB_KMALLOC, useroffset,
> -								usersize);
> +	create_boot_cache(s, name, size, flags | SLAB_KMALLOC, 0, size);
>  	list_add(&s->list, &slab_caches);
>  	s->refcount = 1;
>  	return s;
> @@ -863,7 +862,7 @@ void __init setup_kmalloc_cache_index_table(void)
>  	}
>  }
>  
> -static void __init
> +void __init
>  new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
>  {
>  	if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) {
> @@ -880,8 +879,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
>  
>  	kmalloc_caches[type][idx] = create_kmalloc_cache(
>  					kmalloc_info[idx].name[type],
> -					kmalloc_info[idx].size, flags, 0,
> -					kmalloc_info[idx].size);
> +					kmalloc_info[idx].size, flags);
>  
>  	/*
>  	 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  2023-06-09 12:32   ` Vlastimil Babka
@ 2023-06-09 13:44     ` Catalin Marinas
  2023-06-09 13:57       ` Catalin Marinas
  0 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2023-06-09 13:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron, linux-mm,
	iommu, linux-arm-kernel

On Fri, Jun 09, 2023 at 02:32:57PM +0200, Vlastimil Babka wrote:
> On 5/31/23 17:48, Catalin Marinas wrote:
> > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> > index 0ee20b764000..3288a1339271 100644
> > --- a/include/linux/dma-mapping.h
> > +++ b/include/linux/dma-mapping.h
> > @@ -545,7 +545,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
> >  
> >  static inline int dma_get_cache_alignment(void)
> >  {
> > -#ifdef ARCH_DMA_MINALIGN
> > +#ifdef ARCH_HAS_DMA_MINALIGN
> >  	return ARCH_DMA_MINALIGN;
> >  #endif
> >  	return 1;
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 6b3e155b70bf..50dcf9cfbf62 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -235,12 +235,20 @@ void kmem_dump_obj(void *object);
> >   * alignment larger than the alignment of a 64-bit integer.
> >   * Setting ARCH_DMA_MINALIGN in arch headers allows that.
> >   */
> > -#if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
> > +#ifdef ARCH_DMA_MINALIGN
> > +#define ARCH_HAS_DMA_MINALIGN
> > +#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
> >  #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
> > -#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
> > -#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
> > +#endif
> >  #else
> > +#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
> > +#endif
> 
> It seems weird to make slab.h responsible for this part, especially for
> #define ARCH_HAS_DMA_MINALIGN, which dma-mapping.h consumes. Maybe it would
> be difficult to do differently due to some dependency hell, but minimally I
> don't see dma-mapping.h including slab.h so the result is
> include-order-dependent? Maybe it's included transitively, but then it's
> fragile and would be better to do explicitly?

True, there's a risk that it doesn't get included with some future
header refactoring.

What about moving ARCH_DMA_MINALIGN to linux/cache.h? Alternatively, I
could create a new linux/dma-minalign.h file but I feel since this is
about caches, having it in cache.h makes more sense. asm/cache.h is also
where most archs define the constant (apart from mips, sh, microblaze).

-- 
Catalin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  2023-06-09 13:44     ` Catalin Marinas
@ 2023-06-09 13:57       ` Catalin Marinas
  2023-06-09 14:13         ` Vlastimil Babka
  0 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2023-06-09 13:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron, linux-mm,
	iommu, linux-arm-kernel

On Fri, Jun 09, 2023 at 02:44:01PM +0100, Catalin Marinas wrote:
> On Fri, Jun 09, 2023 at 02:32:57PM +0200, Vlastimil Babka wrote:
> > On 5/31/23 17:48, Catalin Marinas wrote:
> > > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> > > index 0ee20b764000..3288a1339271 100644
> > > --- a/include/linux/dma-mapping.h
> > > +++ b/include/linux/dma-mapping.h
> > > @@ -545,7 +545,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
> > >  
> > >  static inline int dma_get_cache_alignment(void)
> > >  {
> > > -#ifdef ARCH_DMA_MINALIGN
> > > +#ifdef ARCH_HAS_DMA_MINALIGN
> > >  	return ARCH_DMA_MINALIGN;
> > >  #endif
> > >  	return 1;
> > > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > > index 6b3e155b70bf..50dcf9cfbf62 100644
> > > --- a/include/linux/slab.h
> > > +++ b/include/linux/slab.h
> > > @@ -235,12 +235,20 @@ void kmem_dump_obj(void *object);
> > >   * alignment larger than the alignment of a 64-bit integer.
> > >   * Setting ARCH_DMA_MINALIGN in arch headers allows that.
> > >   */
> > > -#if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
> > > +#ifdef ARCH_DMA_MINALIGN
> > > +#define ARCH_HAS_DMA_MINALIGN
> > > +#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
> > >  #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
> > > -#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
> > > -#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
> > > +#endif
> > >  #else
> > > +#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
> > > +#endif
> > 
> > It seems weird to make slab.h responsible for this part, especially for
> > #define ARCH_HAS_DMA_MINALIGN, which dma-mapping.h consumes. Maybe it would
> > be difficult to do differently due to some dependency hell, but minimally I
> > don't see dma-mapping.h including slab.h so the result is
> > include-order-dependent? Maybe it's included transitively, but then it's
> > fragile and would be better to do explicitly?
> 
> True, there's a risk that it doesn't get included with some future
> header refactoring.
> 
> What about moving ARCH_DMA_MINALIGN to linux/cache.h? Alternatively, I
> could create a new linux/dma-minalign.h file but I feel since this is
> about caches, having it in cache.h makes more sense. asm/cache.h is also
> where most archs define the constant (apart from mips, sh, microblaze).

Something like this (still compiling):

diff --git a/include/linux/cache.h b/include/linux/cache.h
index 5da1bbd96154..9900d20b76c2 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -98,4 +98,10 @@ struct cacheline_padding {
 #define CACHELINE_PADDING(name)
 #endif
 
+#ifdef ARCH_DMA_MINALIGN
+#define ARCH_HAS_DMA_MINALIGN
+#else
+#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
+#endif
+
 #endif /* __LINUX_CACHE_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index c41019289223..e13050eb9777 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_DMA_MAPPING_H
 #define _LINUX_DMA_MAPPING_H
 
+#include <linux/cache.h>
 #include <linux/sizes.h>
 #include <linux/string.h>
 #include <linux/device.h>
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 50dcf9cfbf62..9bdfb042d93d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -12,6 +12,7 @@
 #ifndef _LINUX_SLAB_H
 #define	_LINUX_SLAB_H
 
+#include <linux/cache.h>
 #include <linux/gfp.h>
 #include <linux/overflow.h>
 #include <linux/types.h>
@@ -235,14 +236,10 @@ void kmem_dump_obj(void *object);
  * alignment larger than the alignment of a 64-bit integer.
  * Setting ARCH_DMA_MINALIGN in arch headers allows that.
  */
-#ifdef ARCH_DMA_MINALIGN
-#define ARCH_HAS_DMA_MINALIGN
-#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
+#if defined(ARCH_HAS_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8 && \
+	!defined(ARCH_KMALLOC_MINALIGN)
 #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
 #endif
-#else
-#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
-#endif
 
 #ifndef ARCH_KMALLOC_MINALIGN
 #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)

-- 
Catalin

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN
  2023-06-09 13:57       ` Catalin Marinas
@ 2023-06-09 14:13         ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2023-06-09 14:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Ard Biesheuvel, Isaac Manjarres, Saravana Kannan,
	Alasdair Kergon, Daniel Vetter, Joerg Roedel, Mark Brown,
	Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron, linux-mm,
	iommu, linux-arm-kernel

On 6/9/23 15:57, Catalin Marinas wrote:
> On Fri, Jun 09, 2023 at 02:44:01PM +0100, Catalin Marinas wrote:
>> On Fri, Jun 09, 2023 at 02:32:57PM +0200, Vlastimil Babka wrote:
>> > On 5/31/23 17:48, Catalin Marinas wrote:
>> > > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
>> > > index 0ee20b764000..3288a1339271 100644
>> > > --- a/include/linux/dma-mapping.h
>> > > +++ b/include/linux/dma-mapping.h
>> > > @@ -545,7 +545,7 @@ static inline int dma_set_min_align_mask(struct device *dev,
>> > >  
>> > >  static inline int dma_get_cache_alignment(void)
>> > >  {
>> > > -#ifdef ARCH_DMA_MINALIGN
>> > > +#ifdef ARCH_HAS_DMA_MINALIGN
>> > >  	return ARCH_DMA_MINALIGN;
>> > >  #endif
>> > >  	return 1;
>> > > diff --git a/include/linux/slab.h b/include/linux/slab.h
>> > > index 6b3e155b70bf..50dcf9cfbf62 100644
>> > > --- a/include/linux/slab.h
>> > > +++ b/include/linux/slab.h
>> > > @@ -235,12 +235,20 @@ void kmem_dump_obj(void *object);
>> > >   * alignment larger than the alignment of a 64-bit integer.
>> > >   * Setting ARCH_DMA_MINALIGN in arch headers allows that.
>> > >   */
>> > > -#if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
>> > > +#ifdef ARCH_DMA_MINALIGN
>> > > +#define ARCH_HAS_DMA_MINALIGN
>> > > +#if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN)
>> > >  #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
>> > > -#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
>> > > -#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
>> > > +#endif
>> > >  #else
>> > > +#define ARCH_DMA_MINALIGN __alignof__(unsigned long long)
>> > > +#endif
>> > 
>> > It seems weird to make slab.h responsible for this part, especially for
>> > #define ARCH_HAS_DMA_MINALIGN, which dma-mapping.h consumes. Maybe it would
>> > be difficult to do differently due to some dependency hell, but minimally I
>> > don't see dma-mapping.h including slab.h so the result is
>> > include-order-dependent? Maybe it's included transitively, but then it's
>> > fragile and would be better to do explicitly?
>> 
>> True, there's a risk that it doesn't get included with some future
>> header refactoring.
>> 
>> What about moving ARCH_DMA_MINALIGN to linux/cache.h? Alternatively, I
>> could create a new linux/dma-minalign.h file but I feel since this is
>> about caches, having it in cache.h makes more sense. asm/cache.h is also
>> where most archs define the constant (apart from mips, sh, microblaze).
> 
> Something like this (still compiling):

Yeah that would be great!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 04/17] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment()
  2023-05-31 15:48 ` [PATCH v6 04/17] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
@ 2023-06-09 14:33   ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2023-06-09 14:33 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Hyeonggon Yoo

On 5/31/23 17:48, Catalin Marinas wrote:
> Do not create kmalloc() caches which are not aligned to
> dma_get_cache_alignment(). There is no functional change since for
> current architectures defining ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN
> equals ARCH_DMA_MINALIGN (and dma_get_cache_alignment()). On
> architectures without a specific ARCH_DMA_MINALIGN,
> dma_get_cache_alignment() is 1, so no change to the kmalloc() caches.
> 
> If an architecture selects ARCH_HAS_DMA_CACHE_LINE_SIZE (introduced
> previously), the kmalloc() caches will be aligned to a cache line size.

It this part leftover from a previous version?

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slab_common.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 7f069159aee2..7c6475847fdf 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -17,6 +17,7 @@
>  #include <linux/cpu.h>
>  #include <linux/uaccess.h>
>  #include <linux/seq_file.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/proc_fs.h>
>  #include <linux/debugfs.h>
>  #include <linux/kasan.h>
> @@ -862,9 +863,18 @@ void __init setup_kmalloc_cache_index_table(void)
>  	}
>  }
>  
> +static unsigned int __kmalloc_minalign(void)
> +{
> +	return dma_get_cache_alignment();
> +}
> +
>  void __init
>  new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
>  {
> +	unsigned int minalign = __kmalloc_minalign();
> +	unsigned int aligned_size = kmalloc_info[idx].size;
> +	int aligned_idx = idx;
> +
>  	if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) {
>  		flags |= SLAB_RECLAIM_ACCOUNT;
>  	} else if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type == KMALLOC_CGROUP)) {
> @@ -877,9 +887,17 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
>  		flags |= SLAB_CACHE_DMA;
>  	}
>  
> -	kmalloc_caches[type][idx] = create_kmalloc_cache(
> -					kmalloc_info[idx].name[type],
> -					kmalloc_info[idx].size, flags);
> +	if (minalign > ARCH_KMALLOC_MINALIGN) {
> +		aligned_size = ALIGN(aligned_size, minalign);
> +		aligned_idx = __kmalloc_index(aligned_size, false);
> +	}
> +
> +	if (!kmalloc_caches[type][aligned_idx])
> +		kmalloc_caches[type][aligned_idx] = create_kmalloc_cache(
> +					kmalloc_info[aligned_idx].name[type],
> +					aligned_size, flags);
> +	if (idx != aligned_idx)
> +		kmalloc_caches[type][idx] = kmalloc_caches[type][aligned_idx];
>  
>  	/*
>  	 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 16/17] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible
  2023-05-31 15:48 ` [PATCH v6 16/17] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
@ 2023-06-09 14:39   ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2023-06-09 14:39 UTC (permalink / raw)
  To: Catalin Marinas, Linus Torvalds, Christoph Hellwig, Robin Murphy
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Ard Biesheuvel, Isaac Manjarres,
	Saravana Kannan, Alasdair Kergon, Daniel Vetter, Joerg Roedel,
	Mark Brown, Mike Snitzer, Rafael J. Wysocki, Jonathan Cameron,
	linux-mm, iommu, linux-arm-kernel, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Pekka Enberg, Hyeonggon Yoo,
	Roman Gushchin

On 5/31/23 17:48, Catalin Marinas wrote:
> If an architecture opted in to DMA bouncing of unaligned kmalloc()
> buffers (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc()
> cache alignment below cache-line size to ARCH_KMALLOC_MINALIGN.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Nit below:

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
>  mm/slab_common.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 7c6475847fdf..fe46459a8b77 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -18,6 +18,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/seq_file.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/swiotlb.h>
>  #include <linux/proc_fs.h>
>  #include <linux/debugfs.h>
>  #include <linux/kasan.h>
> @@ -863,10 +864,19 @@ void __init setup_kmalloc_cache_index_table(void)
>  	}
>  }
>  
> +#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
> +static unsigned int __kmalloc_minalign(void)
> +{
> +	if (io_tlb_default_mem.nslabs)
> +		return ARCH_KMALLOC_MINALIGN;
> +	return dma_get_cache_alignment();
> +}
> +#else
>  static unsigned int __kmalloc_minalign(void)
>  {
>  	return dma_get_cache_alignment();
>  }
> +#endif

Should be enough to put the #ifdef around the two lines into a single
implementation of the function?

>  void __init
>  new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-06-09  8:11       ` Petr Tesařík
@ 2023-06-12  7:44         ` Tomonori Fujita
  2023-06-12  7:47           ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: Tomonori Fujita @ 2023-06-12  7:44 UTC (permalink / raw)
  To: Petr Tesařík, Isaac Manjarres
  Cc: Ard Biesheuvel, Catalin Marinas, Linus Torvalds,
	Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Saravana Kannan, Alasdair Kergon, Daniel Vetter,
	Joerg Roedel, Mark Brown, Mike Snitzer, Rafael J. Wysocki,
	Jonathan Cameron, linux-mm, iommu, linux-arm-kernel,
	FUJITA Tomonori, Konrad Rzeszutek Wilk

Hi,

On Thu, 8 Jun 2023 14:29:45 -0700
Isaac Manjarres <isaacmanjarres@google.com> wrote:

> P.S. I noticed that the trace_swiotlb_bounced() tracepoint may not be  
>invoked even though bouncing occurs. For example, in the dma-iommu 
>path,
> swiotlb_tbl_map_single() is called when bouncing, instead of  
>swiotlb_map(), which is what ends up calling trace_swiotlb_bounced().
> 
> Would it make sense to move the call to trace_swiotlb_bounced() to
> swiotlb_tbl_map_single() since that function is always invoked?

Definitely, if you ask me. I believe the change was merely forgotten in commit eb605a5754d0 ("swiotlb: add swiotlb_tbl_map_single library function").

Let me take the author into Cc. Plus Konrad, who built further on that commit, may also have an opinion.

When I wrote the patch, trace_swiotlb_bounced() existed?

I cannot recall the patch but from quick look, moving trace_swiotlb_bounced() to
swiotlb_tbl_map_single() makes sense.

thanks


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-06-12  7:44         ` Tomonori Fujita
@ 2023-06-12  7:47           ` Christoph Hellwig
  2023-06-14 23:55             ` Isaac Manjarres
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-06-12  7:47 UTC (permalink / raw)
  To: Tomonori Fujita
  Cc: Petr Tesařík, Isaac Manjarres, Ard Biesheuvel,
	Catalin Marinas, Linus Torvalds, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, Marc Zyngier,
	Andrew Morton, Herbert Xu, Saravana Kannan, Alasdair Kergon,
	Daniel Vetter, Joerg Roedel, Mark Brown, Mike Snitzer,
	Rafael J. Wysocki, Jonathan Cameron, linux-mm, iommu,
	linux-arm-kernel, FUJITA Tomonori, Konrad Rzeszutek Wilk

On Mon, Jun 12, 2023 at 07:44:46AM +0000, Tomonori Fujita wrote:
> I cannot recall the patch but from quick look, moving trace_swiotlb_bounced() to
> swiotlb_tbl_map_single() makes sense.

Agreed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
  2023-06-12  7:47           ` Christoph Hellwig
@ 2023-06-14 23:55             ` Isaac Manjarres
  0 siblings, 0 replies; 34+ messages in thread
From: Isaac Manjarres @ 2023-06-14 23:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tomonori Fujita, Petr Tesařík, Ard Biesheuvel,
	Catalin Marinas, Linus Torvalds, Robin Murphy, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Marc Zyngier, Andrew Morton,
	Herbert Xu, Saravana Kannan, Alasdair Kergon, Daniel Vetter,
	Joerg Roedel, Mark Brown, Mike Snitzer, Rafael J. Wysocki,
	Jonathan Cameron, linux-mm, iommu, linux-arm-kernel,
	FUJITA Tomonori, Konrad Rzeszutek Wilk

On Mon, Jun 12, 2023 at 09:47:55AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 12, 2023 at 07:44:46AM +0000, Tomonori Fujita wrote:
> > I cannot recall the patch but from quick look, moving trace_swiotlb_bounced() to
> > swiotlb_tbl_map_single() makes sense.
> 
> Agreed.

There's actually two call-sites for trace_swiotlb_bounced():
swiotlb_map() and xen_swiotlb_map_page(). Both those functions
also invoke swiotlb_tbl_map_single(), so moving the call to
trace_swiotlb_bounced() to swiotlb_tbl_map_single() means that
there will be 2 traces per bounce buffering event.

The difference between the two call-sites of trace_swiotlb_bounced()
is that the call in swiotlb_map() uses phys_to_dma() for the device
address, while xen_swiotlb_map_page() uses xen_phys_to_dma().

Would it make sense to move the trace_swiotlb_bounced() call to
swiotlb_tbl_map_single() and then introduce a
swiotlb_tbl_map_single_notrace() function which doesn't do the tracing,
and xen_swiotlb_map_page() can call this?

--Isaac

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-06-14 23:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31 15:48 [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 01/17] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Catalin Marinas
2023-06-09 12:32   ` Vlastimil Babka
2023-06-09 13:44     ` Catalin Marinas
2023-06-09 13:57       ` Catalin Marinas
2023-06-09 14:13         ` Vlastimil Babka
2023-05-31 15:48 ` [PATCH v6 02/17] dma: Allow dma_get_cache_alignment() to be overridden by the arch code Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 03/17] mm/slab: Simplify create_kmalloc_cache() args and make it static Catalin Marinas
2023-06-09 13:03   ` Vlastimil Babka
2023-05-31 15:48 ` [PATCH v6 04/17] mm/slab: Limit kmalloc() minimum alignment to dma_get_cache_alignment() Catalin Marinas
2023-06-09 14:33   ` Vlastimil Babka
2023-05-31 15:48 ` [PATCH v6 05/17] drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 06/17] drivers/gpu: " Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 07/17] drivers/usb: " Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 08/17] drivers/spi: " Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 09/17] dm-crypt: " Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 10/17] iio: core: " Catalin Marinas
2023-06-02 11:19   ` Jonathan Cameron
2023-05-31 15:48 ` [PATCH v6 11/17] arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 12/17] scatterlist: Add dedicated config for DMA flags Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 13/17] dma-mapping: Name SG DMA flag helpers consistently Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 14/17] dma-mapping: Force bouncing if the kmalloc() size is not cache-line-aligned Catalin Marinas
2023-05-31 15:48 ` [PATCH v6 15/17] iommu/dma: Force bouncing if the size is not cacheline-aligned Catalin Marinas
2023-06-09 11:52   ` Robin Murphy
2023-05-31 15:48 ` [PATCH v6 16/17] mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing possible Catalin Marinas
2023-06-09 14:39   ` Vlastimil Babka
2023-05-31 15:48 ` [PATCH v6 17/17] arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 Catalin Marinas
2023-06-08  5:45 ` [PATCH v6 00/17] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 Isaac Manjarres
2023-06-08  8:05   ` Ard Biesheuvel
2023-06-08 21:29     ` Isaac Manjarres
2023-06-09  8:11       ` Petr Tesařík
2023-06-12  7:44         ` Tomonori Fujita
2023-06-12  7:47           ` Christoph Hellwig
2023-06-14 23:55             ` Isaac Manjarres

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).