[PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
@ 2018-11-11  9:03 Nicolas Boichat
  2018-11-11  9:03 ` [PATCH v2 1/3] mm: slab/slub: Add check_slab_flags function to check for valid flags Nicolas Boichat
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-11  9:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, Joerg Roedel, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

This is a follow-up to the discussion in [1], to make sure that the page
tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
physical address space.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html

Fixes since v1:
 - Add support for SLAB_CACHE_DMA32 in slab and slub (patches 1/2)
 - iommu/io-pgtable-arm-v7s (patch 3):
   - Changed approach to use SLAB_CACHE_DMA32 added by the previous
     commit.
   - Use DMA or DMA32 depending on the architecture (DMA for arm,
     DMA32 for arm64).

Nicolas Boichat (3):
  mm: slab/slub: Add check_slab_flags function to check for valid flags
  mm: Add support for SLAB_CACHE_DMA32
  iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

 drivers/iommu/io-pgtable-arm-v7s.c | 20 ++++++++++++++++----
 include/linux/slab.h               |  2 ++
 mm/internal.h                      | 21 +++++++++++++++++++--
 mm/slab.c                          | 10 +++-------
 mm/slab.h                          |  3 ++-
 mm/slab_common.c                   |  2 +-
 mm/slub.c                          | 24 +++++++++++++++++-------
 7 files changed, 60 insertions(+), 22 deletions(-)

-- 
2.19.1.930.g4563a0d9d0-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/3] mm: slab/slub: Add check_slab_flags function to check for valid flags
  2018-11-11  9:03 [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Nicolas Boichat
@ 2018-11-11  9:03 ` Nicolas Boichat
  2018-11-11  9:03 ` [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32 Nicolas Boichat
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-11  9:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, Joerg Roedel, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

Remove duplicated code between slab and slub, and will make it
easier to make the test more complicated in the next commits.

Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
---
 mm/internal.h | 17 +++++++++++++++--
 mm/slab.c     |  8 +-------
 mm/slub.c     |  8 +-------
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 3b1ec1412fd2cd..7a500b232e4a43 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -33,8 +33,22 @@
 /* Control allocation cpuset and node placement constraints */
 #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
 
-/* Do not use these with a slab allocator */
-#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
+/* Check for flags that must not be used with a slab allocator */
+static inline gfp_t check_slab_flags(gfp_t flags)
+{
+	gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
+
+	if (unlikely(flags & bug_mask)) {
+		gfp_t invalid_mask = flags & bug_mask;
+
+		flags &= ~bug_mask;
+		pr_warn("Unexpected gfp: %#x (%pGg). Fixing up to gfp: %#x (%pGg). Fix your code!\n",
+				invalid_mask, &invalid_mask, flags, &flags);
+		dump_stack();
+	}
+
+	return flags;
+}
 
 void page_writeback_init(void);
 
diff --git a/mm/slab.c b/mm/slab.c
index 2a5654bb3b3ff3..251e09a5a3ef5c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2656,13 +2656,7 @@ static struct page *cache_grow_begin(struct kmem_cache *cachep,
 	 * Be lazy and only check for valid flags here,  keeping it out of the
 	 * critical path in kmem_cache_alloc().
 	 */
-	if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
-		gfp_t invalid_mask = flags & GFP_SLAB_BUG_MASK;
-		flags &= ~GFP_SLAB_BUG_MASK;
-		pr_warn("Unexpected gfp: %#x (%pGg). Fixing up to gfp: %#x (%pGg). Fix your code!\n",
-				invalid_mask, &invalid_mask, flags, &flags);
-		dump_stack();
-	}
+	flags = check_slab_flags(flags);
 	WARN_ON_ONCE(cachep->ctor && (flags & __GFP_ZERO));
 	local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
diff --git a/mm/slub.c b/mm/slub.c
index e3629cd7aff164..1cca562bebdc8d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1681,13 +1681,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
-	if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
-		gfp_t invalid_mask = flags & GFP_SLAB_BUG_MASK;
-		flags &= ~GFP_SLAB_BUG_MASK;
-		pr_warn("Unexpected gfp: %#x (%pGg). Fixing up to gfp: %#x (%pGg). Fix your code!\n",
-				invalid_mask, &invalid_mask, flags, &flags);
-		dump_stack();
-	}
+	flags = check_slab_flags(flags);
 
 	return allocate_slab(s,
 		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
-- 
2.19.1.930.g4563a0d9d0-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32
  2018-11-11  9:03 [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Nicolas Boichat
  2018-11-11  9:03 ` [PATCH v2 1/3] mm: slab/slub: Add check_slab_flags function to check for valid flags Nicolas Boichat
@ 2018-11-11  9:03 ` Nicolas Boichat
  2018-11-21 18:32   ` Christopher Lameter
  2018-11-11  9:03 ` [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging Nicolas Boichat
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-11  9:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, Joerg Roedel, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

SLAB_CACHE_DMA32 is only available after explicit kmem_cache_create calls,
no default cache is created for kmalloc. Add a test in check_slab_flags
for this.

Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
---
 include/linux/slab.h |  2 ++
 mm/internal.h        |  8 ++++++--
 mm/slab.c            |  4 +++-
 mm/slab.h            |  3 ++-
 mm/slab_common.c     |  2 +-
 mm/slub.c            | 18 +++++++++++++++++-
 6 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 918f374e7156f4..afc51ee1dae5d4 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -32,6 +32,8 @@
 #define SLAB_HWCACHE_ALIGN	((slab_flags_t __force)0x00002000U)
 /* Use GFP_DMA memory */
 #define SLAB_CACHE_DMA		((slab_flags_t __force)0x00004000U)
+/* Use GFP_DMA32 memory */
+#define SLAB_CACHE_DMA32	((slab_flags_t __force)0x00008000U)
 /* DEBUG: Store the last owner for bug hunting */
 #define SLAB_STORE_USER		((slab_flags_t __force)0x00010000U)
 /* Panic if kmem_cache_create() fails */
diff --git a/mm/internal.h b/mm/internal.h
index 7a500b232e4a43..2aa9c8491d2ca2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -14,6 +14,7 @@
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include <linux/pagemap.h>
+#include <linux/slab.h>
 #include <linux/tracepoint-defs.h>
 
 /*
@@ -34,9 +35,12 @@
 #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
 
 /* Check for flags that must not be used with a slab allocator */
-static inline gfp_t check_slab_flags(gfp_t flags)
+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags)
 {
-	gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
+	gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
+
+	if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & SLAB_CACHE_DMA32))
+		bug_mask |= __GFP_DMA32;
 
 	if (unlikely(flags & bug_mask)) {
 		gfp_t invalid_mask = flags & bug_mask;
diff --git a/mm/slab.c b/mm/slab.c
index 251e09a5a3ef5c..6efcaad6a02b70 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2122,6 +2122,8 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
 	cachep->allocflags = __GFP_COMP;
 	if (flags & SLAB_CACHE_DMA)
 		cachep->allocflags |= GFP_DMA;
+	if (flags & SLAB_CACHE_DMA32)
+		cachep->allocflags |= GFP_DMA32;
 	if (flags & SLAB_RECLAIM_ACCOUNT)
 		cachep->allocflags |= __GFP_RECLAIMABLE;
 	cachep->size = size;
@@ -2656,7 +2658,7 @@ static struct page *cache_grow_begin(struct kmem_cache *cachep,
 	 * Be lazy and only check for valid flags here,  keeping it out of the
 	 * critical path in kmem_cache_alloc().
 	 */
-	flags = check_slab_flags(flags);
+	flags = check_slab_flags(flags, cachep->flags);
 	WARN_ON_ONCE(cachep->ctor && (flags & __GFP_ZERO));
 	local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
diff --git a/mm/slab.h b/mm/slab.h
index 58c6c1c2a78ee3..9632772e14beb2 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -127,7 +127,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
 
 
 /* Legal flag mask for kmem_cache_create(), for various configurations */
-#define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | SLAB_PANIC | \
+#define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
+			 SLAB_CACHE_DMA32 | SLAB_PANIC | \
 			 SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS )
 
 #if defined(CONFIG_DEBUG_SLAB)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7eb8dc136c1cb8..f204385553bbac 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -53,7 +53,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
 		SLAB_FAILSLAB | SLAB_KASAN)
 
 #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
-			 SLAB_ACCOUNT)
+			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
 
 /*
  * Merge control. If this is set then no merging of slab caches will occur.
diff --git a/mm/slub.c b/mm/slub.c
index 1cca562bebdc8d..c639bd008e8c11 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1681,7 +1681,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
-	flags = check_slab_flags(flags);
+	flags = check_slab_flags(flags, s->flags);
 
 	return allocate_slab(s,
 		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
@@ -3571,6 +3571,9 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	if (s->flags & SLAB_CACHE_DMA)
 		s->allocflags |= GFP_DMA;
 
+	if (s->flags & SLAB_CACHE_DMA32)
+		s->allocflags |= GFP_DMA32;
+
 	if (s->flags & SLAB_RECLAIM_ACCOUNT)
 		s->allocflags |= __GFP_RECLAIMABLE;
 
@@ -5090,6 +5093,14 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
 SLAB_ATTR_RO(cache_dma);
 #endif
 
+#ifdef CONFIG_ZONE_DMA32
+static ssize_t cache_dma32_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%d\n", !!(s->flags & SLAB_CACHE_DMA32));
+}
+SLAB_ATTR_RO(cache_dma32);
+#endif
+
 static ssize_t usersize_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%u\n", s->usersize);
@@ -5430,6 +5441,9 @@ static struct attribute *slab_attrs[] = {
 #ifdef CONFIG_ZONE_DMA
 	&cache_dma_attr.attr,
 #endif
+#ifdef CONFIG_ZONE_DMA32
+	&cache_dma32_attr.attr,
+#endif
 #ifdef CONFIG_NUMA
 	&remote_node_defrag_ratio_attr.attr,
 #endif
@@ -5660,6 +5674,8 @@ static char *create_unique_id(struct kmem_cache *s)
 	 */
 	if (s->flags & SLAB_CACHE_DMA)
 		*p++ = 'd';
+	if (s->flags & SLAB_CACHE_DMA32)
+		*p++ = 'D';
 	if (s->flags & SLAB_RECLAIM_ACCOUNT)
 		*p++ = 'a';
 	if (s->flags & SLAB_CONSISTENCY_CHECKS)
-- 
2.19.1.930.g4563a0d9d0-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  2018-11-11  9:03 [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Nicolas Boichat
  2018-11-11  9:03 ` [PATCH v2 1/3] mm: slab/slub: Add check_slab_flags function to check for valid flags Nicolas Boichat
  2018-11-11  9:03 ` [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32 Nicolas Boichat
@ 2018-11-11  9:03 ` Nicolas Boichat
  2018-11-21 16:46   ` Will Deacon
  2018-11-21 18:20 ` [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Christopher Lameter
  2018-12-04  9:37 ` Nicolas Boichat
  4 siblings, 1 reply; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-11  9:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, Joerg Roedel, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
is defined (e.g. on arm64 platforms).

For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.

Also, print an error when the physical address does not fit in
32-bit, to make debugging easier in the future.

Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
---

Changes since v1:
 - Changed approach to use SLAB_CACHE_DMA32 added by the previous
   commit.
 - Use DMA or DMA32 depending on the architecture (DMA for arm,
   DMA32 for arm64).

drivers/iommu/io-pgtable-arm-v7s.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 445c3bde04800c..996f7b6d00b44a 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -161,6 +161,14 @@
 
 #define ARM_V7S_TCR_PD1			BIT(5)
 
+#ifdef CONFIG_ZONE_DMA32
+#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
+#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
+#else
+#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
+#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
+#endif
+
 typedef u32 arm_v7s_iopte;
 
 static bool selftest_running;
@@ -198,13 +206,17 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
 	void *table = NULL;
 
 	if (lvl == 1)
-		table = (void *)__get_dma_pages(__GFP_ZERO, get_order(size));
+		table = (void *)__get_free_pages(
+			__GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size));
 	else if (lvl == 2)
-		table = kmem_cache_zalloc(data->l2_tables, gfp | GFP_DMA);
+		table = kmem_cache_zalloc(data->l2_tables,
+					  gfp | ARM_V7S_TABLE_GFP_DMA);
 	phys = virt_to_phys(table);
-	if (phys != (arm_v7s_iopte)phys)
+	if (phys != (arm_v7s_iopte)phys) {
 		/* Doesn't fit in PTE */
+		dev_err(dev, "Page table does not fit in PTE: %pa", &phys);
 		goto out_free;
+	}
 	if (table && !(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA)) {
 		dma = dma_map_single(dev, table, size, DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, dma))
@@ -737,7 +749,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 	data->l2_tables = kmem_cache_create("io-pgtable_armv7s_l2",
 					    ARM_V7S_TABLE_SIZE(2),
 					    ARM_V7S_TABLE_SIZE(2),
-					    SLAB_CACHE_DMA, NULL);
+					    ARM_V7S_TABLE_SLAB_CACHE, NULL);
 	if (!data->l2_tables)
 		goto out_free_data;
 
-- 
2.19.1.930.g4563a0d9d0-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  2018-11-11  9:03 ` [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging Nicolas Boichat
@ 2018-11-21 16:46   ` Will Deacon
  2018-11-21 17:38     ` Christopher Lameter
  2018-11-21 18:02     ` Michal Hocko
  0 siblings, 2 replies; 35+ messages in thread
From: Will Deacon @ 2018-11-21 16:46 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Robin Murphy, Joerg Roedel, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
> For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> is defined (e.g. on arm64 platforms).
> 
> For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> 
> Also, print an error when the physical address does not fit in
> 32-bit, to make debugging easier in the future.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> ---
> 
> Changes since v1:
>  - Changed approach to use SLAB_CACHE_DMA32 added by the previous
>    commit.
>  - Use DMA or DMA32 depending on the architecture (DMA for arm,
>    DMA32 for arm64).
> 
> drivers/iommu/io-pgtable-arm-v7s.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 445c3bde04800c..996f7b6d00b44a 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -161,6 +161,14 @@
>  
>  #define ARM_V7S_TCR_PD1			BIT(5)
>  
> +#ifdef CONFIG_ZONE_DMA32
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> +#else
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> +#endif

It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
architectures, since then we wouldn't need this #ifdeffery afaict.

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  2018-11-21 16:46   ` Will Deacon
@ 2018-11-21 17:38     ` Christopher Lameter
  2018-11-21 17:43       ` Robin Murphy
  2018-11-21 18:02     ` Michal Hocko
  1 sibling, 1 reply; 35+ messages in thread
From: Christopher Lameter @ 2018-11-21 17:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nicolas Boichat, Robin Murphy, Joerg Roedel, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Wed, 21 Nov 2018, Will Deacon wrote:

> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32

SLAB_CACHE_DMA32??? WTH is going on here? We are trying to get rid of
the dma slab array.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  2018-11-21 17:38     ` Christopher Lameter
@ 2018-11-21 17:43       ` Robin Murphy
  2018-11-21 18:18         ` Christopher Lameter
  0 siblings, 1 reply; 35+ messages in thread
From: Robin Murphy @ 2018-11-21 17:43 UTC (permalink / raw)
  To: Christopher Lameter, Will Deacon
  Cc: Nicolas Boichat, Joerg Roedel, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Michal Hocko,
	Mel Gorman, Levin Alexander, Huaisheng Ye, Mike Rapoport,
	linux-arm-kernel, iommu, linux-kernel, linux-mm, Yong Wu,
	Matthias Brugger, Tomasz Figa, yingjoe.chen

On 21/11/2018 17:38, Christopher Lameter wrote:
> On Wed, 21 Nov 2018, Will Deacon wrote:
> 
>>> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> 
> SLAB_CACHE_DMA32??? WTH is going on here? We are trying to get rid of
> the dma slab array.

See the previous two patches in this series. If there's already a 
(better) way to have a kmem_cache which allocates its backing pages with 
GFP_DMA32, please do let us know.

Robin.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  2018-11-21 16:46   ` Will Deacon
  2018-11-21 17:38     ` Christopher Lameter
@ 2018-11-21 18:02     ` Michal Hocko
  2018-11-22  1:20       ` Nicolas Boichat
  1 sibling, 1 reply; 35+ messages in thread
From: Michal Hocko @ 2018-11-21 18:02 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nicolas Boichat, Robin Murphy, Joerg Roedel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Wed 21-11-18 16:46:38, Will Deacon wrote:
> On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
> > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> > is defined (e.g. on arm64 platforms).
> > 
> > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> > 
> > Also, print an error when the physical address does not fit in
> > 32-bit, to make debugging easier in the future.
> > 
> > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > ---
> > 
> > Changes since v1:
> >  - Changed approach to use SLAB_CACHE_DMA32 added by the previous
> >    commit.
> >  - Use DMA or DMA32 depending on the architecture (DMA for arm,
> >    DMA32 for arm64).
> > 
> > drivers/iommu/io-pgtable-arm-v7s.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> > index 445c3bde04800c..996f7b6d00b44a 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -161,6 +161,14 @@
> >  
> >  #define ARM_V7S_TCR_PD1			BIT(5)
> >  
> > +#ifdef CONFIG_ZONE_DMA32
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> > +#else
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> > +#endif
> 
> It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
> architectures, since then we wouldn't need this #ifdeffery afaict.

But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is
going on in here?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  2018-11-21 17:43       ` Robin Murphy
@ 2018-11-21 18:18         ` Christopher Lameter
  0 siblings, 0 replies; 35+ messages in thread
From: Christopher Lameter @ 2018-11-21 18:18 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, Nicolas Boichat, Joerg Roedel, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Wed, 21 Nov 2018, Robin Murphy wrote:

> On 21/11/2018 17:38, Christopher Lameter wrote:
> > On Wed, 21 Nov 2018, Will Deacon wrote:
> >
> > > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> >
> > SLAB_CACHE_DMA32??? WTH is going on here? We are trying to get rid of
> > the dma slab array.
>
> See the previous two patches in this series. If there's already a (better) way
> to have a kmem_cache which allocates its backing pages with GFP_DMA32, please
> do let us know.

Was not cced on the whole patchset. Trying to find it. Its best to
allocate DMA memory through the page based allocation functions.
dma_alloc_coherent() and friends.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-11  9:03 [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Nicolas Boichat
                   ` (2 preceding siblings ...)
  2018-11-11  9:03 ` [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging Nicolas Boichat
@ 2018-11-21 18:20 ` Christopher Lameter
  2018-11-21 21:38   ` Matthew Wilcox
  2018-12-04  9:37 ` Nicolas Boichat
  4 siblings, 1 reply; 35+ messages in thread
From: Christopher Lameter @ 2018-11-21 18:20 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Sun, 11 Nov 2018, Nicolas Boichat wrote:

> This is a follow-up to the discussion in [1], to make sure that the page
> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> physical address space.

Page tables? This means you need a page frame? Why go through the slab
allocators?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32
  2018-11-11  9:03 ` [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32 Nicolas Boichat
@ 2018-11-21 18:32   ` Christopher Lameter
  2018-11-22  0:52     ` Nicolas Boichat
  0 siblings, 1 reply; 35+ messages in thread
From: Christopher Lameter @ 2018-11-21 18:32 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Sun, 11 Nov 2018, Nicolas Boichat wrote:

> SLAB_CACHE_DMA32 is only available after explicit kmem_cache_create calls,
> no default cache is created for kmalloc. Add a test in check_slab_flags
> for this.

This does not define the dma32 kmalloc array. Is that intentional? In that
case you need to fail any request for GFP_DMA32 coming in via kmalloc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-21 18:20 ` [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Christopher Lameter
@ 2018-11-21 21:38   ` Matthew Wilcox
  2018-11-21 22:26     ` Robin Murphy
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2018-11-21 21:38 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Nicolas Boichat, Robin Murphy, Will Deacon, Joerg Roedel,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Michal Hocko, Mel Gorman, Levin Alexander,
	Huaisheng Ye, Mike Rapoport, linux-arm-kernel, iommu,
	linux-kernel, linux-mm, Yong Wu, Matthias Brugger, Tomasz Figa,
	yingjoe.chen

On Wed, Nov 21, 2018 at 06:20:02PM +0000, Christopher Lameter wrote:
> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
> 
> > This is a follow-up to the discussion in [1], to make sure that the page
> > tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> > physical address space.
> 
> Page tables? This means you need a page frame? Why go through the slab
> allocators?

Because this particular architecture has sub-page-size PMD page tables.
We desperately need to hoist page table allocation out of the architectures;
there're a bunch of different implementations and they're mostly bad,
one way or another.

For each level of page table we generally have three cases:

1. single page
2. sub-page, naturally aligned
3. multiple pages, naturally aligned

for 1 and 3, the page allocator will do just fine.
for 2, we should have a per-MM page_frag allocator.  s390 already has
something like this, although it's more complicated.  ppc also has
something a little more complex for the cases when it's configured with
a 64k page size but wants to use a 4k page table entry.

I'd like x86 to be able to simply do:

#define pte_alloc_one(mm, addr)	page_alloc_table(mm, addr, 0)
#define pmd_alloc_one(mm, addr)	page_alloc_table(mm, addr, 0)
#define pud_alloc_one(mm, addr)	page_alloc_table(mm, addr, 0)
#define p4d_alloc_one(mm, addr)	page_alloc_table(mm, addr, 0)

An architecture with 4k page size and needing a 16k PMD would do:

#define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2)

while an architecture with a 64k page size needing a 4k PTE would do:

#define ARCH_PAGE_TABLE_FRAG
#define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096)

I haven't had time to work on this, but perhaps someone with a problem
that needs fixing would like to, instead of burying yet another awful
implementation away in arch/ somewhere.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-21 21:38   ` Matthew Wilcox
@ 2018-11-21 22:26     ` Robin Murphy
  2018-11-22  1:05       ` Nicolas Boichat
                         ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Robin Murphy @ 2018-11-21 22:26 UTC (permalink / raw)
  To: Matthew Wilcox, Christopher Lameter
  Cc: Nicolas Boichat, Will Deacon, Joerg Roedel, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm-kernel, iommu, linux-kernel, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On 2018-11-21 9:38 pm, Matthew Wilcox wrote:
> On Wed, Nov 21, 2018 at 06:20:02PM +0000, Christopher Lameter wrote:
>> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
>>
>>> This is a follow-up to the discussion in [1], to make sure that the page
>>> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
>>> physical address space.
>>
>> Page tables? This means you need a page frame? Why go through the slab
>> allocators?
> 
> Because this particular architecture has sub-page-size PMD page tables.
> We desperately need to hoist page table allocation out of the architectures;
> there're a bunch of different implementations and they're mostly bad,
> one way or another.

These are IOMMU page tables, rather than CPU ones, so we're already well 
outside arch code - indeed the original motivation of io-pgtable was to 
be entirely independent of the p*d types and arch-specific MM code (this 
Armv7 short-descriptor format is already "non-native" when used by 
drivers in an arm64 kernel).

There are various efficiency reasons for using regular kernel memory 
instead of coherent DMA allocations - for the most part it works well, 
we just have the odd corner case like this one where the 32-bit format 
gets used on 64-bit systems such that the tables themselves still need 
to be allocated below 4GB (although the final output address can point 
at higher memory by virtue of the IOMMU in question not implementing 
permissions and repurposing some of those PTE fields as extra address bits).

TBH, if this DMA32 stuff is going to be contentious we could possibly 
just rip out the offending kmem_cache - it seemed like good practice for 
the use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied 
upon to give the same 1KB alignment and chance of succeeding as the 
equivalent kmem_cache_alloc(), then we could quite easily make do with 
that instead.

Thanks,
Robin.

> For each level of page table we generally have three cases:
> 
> 1. single page
> 2. sub-page, naturally aligned
> 3. multiple pages, naturally aligned
> 
> for 1 and 3, the page allocator will do just fine.
> for 2, we should have a per-MM page_frag allocator.  s390 already has
> something like this, although it's more complicated.  ppc also has
> something a little more complex for the cases when it's configured with
> a 64k page size but wants to use a 4k page table entry.
> 
> I'd like x86 to be able to simply do:
> 
> #define pte_alloc_one(mm, addr)	page_alloc_table(mm, addr, 0)
> #define pmd_alloc_one(mm, addr)	page_alloc_table(mm, addr, 0)
> #define pud_alloc_one(mm, addr)	page_alloc_table(mm, addr, 0)
> #define p4d_alloc_one(mm, addr)	page_alloc_table(mm, addr, 0)
> 
> An architecture with 4k page size and needing a 16k PMD would do:
> 
> #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2)
> 
> while an architecture with a 64k page size needing a 4k PTE would do:
> 
> #define ARCH_PAGE_TABLE_FRAG
> #define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096)
> 
> I haven't had time to work on this, but perhaps someone with a problem
> that needs fixing would like to, instead of burying yet another awful
> implementation away in arch/ somewhere.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32
  2018-11-21 18:32   ` Christopher Lameter
@ 2018-11-22  0:52     ` Nicolas Boichat
  0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-22  0:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm Mailing List, iommu, lkml, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Thu, Nov 22, 2018 at 2:32 AM Christopher Lameter <cl@linux.com> wrote:
>
> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
>
> > SLAB_CACHE_DMA32 is only available after explicit kmem_cache_create calls,
> > no default cache is created for kmalloc. Add a test in check_slab_flags
> > for this.
>
> This does not define the dma32 kmalloc array. Is that intentional?

Yes that's intentional, AFAICT there is no user, so there is no point
creating the cache.

 (okay, I could find one, but it's probably broken:
git grep GFP_DMA32 | grep k[a-z]*alloc
drivers/media/platform/vivid/vivid-osd.c: dev->video_vbase =
kzalloc(dev->video_buffer_size, GFP_KERNEL | GFP_DMA32);
).

> In that
> case you need to fail any request for GFP_DMA32 coming in via kmalloc.

Well, we do check for these in check_slab_flags (aka GFP_SLAB_BUG_MASK
before patch 1/3 of this series), so, with or without this patch,
calls with GFP_DMA32 will end up failing in check_slab_flags.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-21 22:26     ` Robin Murphy
@ 2018-11-22  1:05       ` Nicolas Boichat
  2018-11-22  2:35       ` Matthew Wilcox
  2018-11-22  8:23       ` Christoph Hellwig
  2 siblings, 0 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-22  1:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: willy, Christoph Lameter, Will Deacon, Joerg Roedel,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Michal Hocko, Mel Gorman, Levin Alexander,
	Huaisheng Ye, Mike Rapoport, linux-arm Mailing List, iommu, lkml,
	linux-mm, Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Thu, Nov 22, 2018 at 6:27 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2018-11-21 9:38 pm, Matthew Wilcox wrote:
> > On Wed, Nov 21, 2018 at 06:20:02PM +0000, Christopher Lameter wrote:
> >> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
> >>
> >>> This is a follow-up to the discussion in [1], to make sure that the page
> >>> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> >>> physical address space.
> >>
> >> Page tables? This means you need a page frame? Why go through the slab
> >> allocators?
> >
> > Because this particular architecture has sub-page-size PMD page tables.
> > We desperately need to hoist page table allocation out of the architectures;
> > there're a bunch of different implementations and they're mostly bad,
> > one way or another.
>
> These are IOMMU page tables, rather than CPU ones, so we're already well
> outside arch code - indeed the original motivation of io-pgtable was to
> be entirely independent of the p*d types and arch-specific MM code (this
> Armv7 short-descriptor format is already "non-native" when used by
> drivers in an arm64 kernel).
>
> There are various efficiency reasons for using regular kernel memory
> instead of coherent DMA allocations - for the most part it works well,
> we just have the odd corner case like this one where the 32-bit format
> gets used on 64-bit systems such that the tables themselves still need
> to be allocated below 4GB (although the final output address can point
> at higher memory by virtue of the IOMMU in question not implementing
> permissions and repurposing some of those PTE fields as extra address bits).
>
> TBH, if this DMA32 stuff is going to be contentious we could possibly
> just rip out the offending kmem_cache - it seemed like good practice for
> the use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied
> upon to give the same 1KB alignment and chance of succeeding as the
> equivalent kmem_cache_alloc(), then we could quite easily make do with
> that instead.

Yes, but if we want to use kzalloc, we'll need to create
kmalloc_caches for DMA32, which seems wasteful as there are no other
users (see my comment here:
https://patchwork.kernel.org/patch/10677525/#22332697).

Thanks,

> Thanks,
> Robin.
>
> > For each level of page table we generally have three cases:
> >
> > 1. single page
> > 2. sub-page, naturally aligned
> > 3. multiple pages, naturally aligned
> >
> > for 1 and 3, the page allocator will do just fine.
> > for 2, we should have a per-MM page_frag allocator.  s390 already has
> > something like this, although it's more complicated.  ppc also has
> > something a little more complex for the cases when it's configured with
> > a 64k page size but wants to use a 4k page table entry.
> >
> > I'd like x86 to be able to simply do:
> >
> > #define pte_alloc_one(mm, addr)       page_alloc_table(mm, addr, 0)
> > #define pmd_alloc_one(mm, addr)       page_alloc_table(mm, addr, 0)
> > #define pud_alloc_one(mm, addr)       page_alloc_table(mm, addr, 0)
> > #define p4d_alloc_one(mm, addr)       page_alloc_table(mm, addr, 0)
> >
> > An architecture with 4k page size and needing a 16k PMD would do:
> >
> > #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2)
> >
> > while an architecture with a 64k page size needing a 4k PTE would do:
> >
> > #define ARCH_PAGE_TABLE_FRAG
> > #define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096)
> >
> > I haven't had time to work on this, but perhaps someone with a problem
> > that needs fixing would like to, instead of burying yet another awful
> > implementation away in arch/ somewhere.
> >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  2018-11-21 18:02     ` Michal Hocko
@ 2018-11-22  1:20       ` Nicolas Boichat
  2018-11-23 12:15         ` Vlastimil Babka
  0 siblings, 1 reply; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-22  1:20 UTC (permalink / raw)
  To: mhocko
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Levin Alexander, Huaisheng Ye,
	Mike Rapoport, linux-arm Mailing List, iommu, lkml, linux-mm,
	Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Thu, Nov 22, 2018 at 2:02 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 21-11-18 16:46:38, Will Deacon wrote:
> > On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
> > > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> > > is defined (e.g. on arm64 platforms).
> > >
> > > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> > >
> > > Also, print an error when the physical address does not fit in
> > > 32-bit, to make debugging easier in the future.
> > >
> > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > > ---
> > >
> > > Changes since v1:
> > >  - Changed approach to use SLAB_CACHE_DMA32 added by the previous
> > >    commit.
> > >  - Use DMA or DMA32 depending on the architecture (DMA for arm,
> > >    DMA32 for arm64).
> > >
> > > drivers/iommu/io-pgtable-arm-v7s.c | 20 ++++++++++++++++----
> > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> > > index 445c3bde04800c..996f7b6d00b44a 100644
> > > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > > @@ -161,6 +161,14 @@
> > >
> > >  #define ARM_V7S_TCR_PD1                    BIT(5)
> > >
> > > +#ifdef CONFIG_ZONE_DMA32
> > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> > > +#else
> > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> > > +#endif
> >
> > It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
> > architectures, since then we wouldn't need this #ifdeffery afaict.
>
> But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is
> going on in here?

GFP_DMA32 will fail due to check_slab_flags (aka GFP_SLAB_BUG_MASK
before patch 1/3 of this series)... But yes, it may be neater if there
was transparent remapping of GFP_DMA32/SLAB_CACHE_DMA32 to
GFP_DMA/SLAB_CACHE_DMA on 32-bit arch...

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-21 22:26     ` Robin Murphy
  2018-11-22  1:05       ` Nicolas Boichat
@ 2018-11-22  2:35       ` Matthew Wilcox
  2018-11-22  5:56         ` Nicolas Boichat
  2018-11-22  8:26         ` Christoph Hellwig
  2018-11-22  8:23       ` Christoph Hellwig
  2 siblings, 2 replies; 35+ messages in thread
From: Matthew Wilcox @ 2018-11-22  2:35 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christopher Lameter, Nicolas Boichat, Will Deacon, Joerg Roedel,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Michal Hocko, Mel Gorman, Levin Alexander,
	Huaisheng Ye, Mike Rapoport, linux-arm-kernel, iommu,
	linux-kernel, linux-mm, Yong Wu, Matthias Brugger, Tomasz Figa,
	yingjoe.chen

On Wed, Nov 21, 2018 at 10:26:26PM +0000, Robin Murphy wrote:
> These are IOMMU page tables, rather than CPU ones, so we're already well
> outside arch code - indeed the original motivation of io-pgtable was to be
> entirely independent of the p*d types and arch-specific MM code (this Armv7
> short-descriptor format is already "non-native" when used by drivers in an
> arm64 kernel).

There was quite a lot of explanation missing from this patch description!

> There are various efficiency reasons for using regular kernel memory instead
> of coherent DMA allocations - for the most part it works well, we just have
> the odd corner case like this one where the 32-bit format gets used on
> 64-bit systems such that the tables themselves still need to be allocated
> below 4GB (although the final output address can point at higher memory by
> virtue of the IOMMU in question not implementing permissions and repurposing
> some of those PTE fields as extra address bits).
> 
> TBH, if this DMA32 stuff is going to be contentious we could possibly just
> rip out the offending kmem_cache - it seemed like good practice for the
> use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> give the same 1KB alignment and chance of succeeding as the equivalent
> kmem_cache_alloc(), then we could quite easily make do with that instead.

I think you should look at using the page_frag allocator here.  You can
use whatever GFP_DMA flags you like.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-22  2:35       ` Matthew Wilcox
@ 2018-11-22  5:56         ` Nicolas Boichat
  2018-11-22  8:26         ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-22  5:56 UTC (permalink / raw)
  To: willy
  Cc: Robin Murphy, Christoph Lameter, Will Deacon, Joerg Roedel,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Michal Hocko, Mel Gorman, Levin Alexander,
	Huaisheng Ye, Mike Rapoport, linux-arm Mailing List, iommu, lkml,
	linux-mm, Yong Wu, Matthias Brugger, Tomasz Figa, yingjoe.chen

On Thu, Nov 22, 2018 at 10:36 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Nov 21, 2018 at 10:26:26PM +0000, Robin Murphy wrote:
> > These are IOMMU page tables, rather than CPU ones, so we're already well
> > outside arch code - indeed the original motivation of io-pgtable was to be
> > entirely independent of the p*d types and arch-specific MM code (this Armv7
> > short-descriptor format is already "non-native" when used by drivers in an
> > arm64 kernel).
>
> There was quite a lot of explanation missing from this patch description!

I totally agree ,-) I'm not familiar at all with either iommu or
mm/... Looks like the patchset triggered a helpful discussion, and I
understand the problem better now. I'll improve the description in the
next revision.

> > There are various efficiency reasons for using regular kernel memory instead
> > of coherent DMA allocations - for the most part it works well, we just have
> > the odd corner case like this one where the 32-bit format gets used on
> > 64-bit systems such that the tables themselves still need to be allocated
> > below 4GB (although the final output address can point at higher memory by
> > virtue of the IOMMU in question not implementing permissions and repurposing
> > some of those PTE fields as extra address bits).
> >
> > TBH, if this DMA32 stuff is going to be contentious we could possibly just
> > rip out the offending kmem_cache - it seemed like good practice for the
> > use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> > give the same 1KB alignment and chance of succeeding as the equivalent
> > kmem_cache_alloc(), then we could quite easily make do with that instead.
>
> I think you should look at using the page_frag allocator here.  You can
> use whatever GFP_DMA flags you like.

I'll try that.

Thanks!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-21 22:26     ` Robin Murphy
  2018-11-22  1:05       ` Nicolas Boichat
  2018-11-22  2:35       ` Matthew Wilcox
@ 2018-11-22  8:23       ` Christoph Hellwig
  2018-11-23  3:04         ` Nicolas Boichat
  2018-11-23 12:23         ` Vlastimil Babka
  2 siblings, 2 replies; 35+ messages in thread
From: Christoph Hellwig @ 2018-11-22  8:23 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Matthew Wilcox, Christopher Lameter, Levin Alexander,
	Mike Rapoport, Nicolas Boichat, Huaisheng Ye, Tomasz Figa,
	Will Deacon, linux-kernel, Pekka Enberg, linux-mm, iommu,
	Michal Hocko, linux-arm-kernel, David Rientjes, Matthias Brugger,
	yingjoe.chen, Joonsoo Kim, Mel Gorman, Andrew Morton,
	Vlastimil Babka

On Wed, Nov 21, 2018 at 10:26:26PM +0000, Robin Murphy wrote:
> TBH, if this DMA32 stuff is going to be contentious we could possibly just
> rip out the offending kmem_cache - it seemed like good practice for the
> use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> give the same 1KB alignment and chance of succeeding as the equivalent
> kmem_cache_alloc(), then we could quite easily make do with that instead.

Neither is the slab support for kmalloc, not do kmalloc allocations
have useful alignment apparently (at least if you use slub debug).

But I do agree with the sentiment of not wanting to spread GFP_DMA32
futher into the slab allocator.

I think you want a simple genalloc allocator for this rather special
use case.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-22  2:35       ` Matthew Wilcox
  2018-11-22  5:56         ` Nicolas Boichat
@ 2018-11-22  8:26         ` Christoph Hellwig
  2018-11-22 15:16           ` Matthew Wilcox
  1 sibling, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2018-11-22  8:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Robin Murphy, Michal Hocko, Will Deacon, Levin Alexander,
	linux-mm, Christopher Lameter, Nicolas Boichat, Huaisheng Ye,
	David Rientjes, yingjoe.chen, Vlastimil Babka, Tomasz Figa,
	Mike Rapoport, Matthias Brugger, Joonsoo Kim, linux-arm-kernel,
	linux-kernel, Pekka Enberg, iommu, Andrew Morton, Mel Gorman

On Wed, Nov 21, 2018 at 06:35:58PM -0800, Matthew Wilcox wrote:
> I think you should look at using the page_frag allocator here.  You can
> use whatever GFP_DMA flags you like.

So I actually tries to use page_frag to solve the XFS unaligned kmalloc
allocations problem, and I don't think it is the right hammer for this
nail (or any other nail outside of networking).

The problem with the page_frag allocator is that it never reuses
fragments returned to the page, but only only frees the page once all
fragments are freed.  This means that if you have some long(er) term
allocations you are effectively creating memory leaks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-22  8:26         ` Christoph Hellwig
@ 2018-11-22 15:16           ` Matthew Wilcox
  2018-11-22 15:19             ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2018-11-22 15:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Robin Murphy, Michal Hocko, Will Deacon, Levin Alexander,
	linux-mm, Christopher Lameter, Nicolas Boichat, Huaisheng Ye,
	David Rientjes, yingjoe.chen, Vlastimil Babka, Tomasz Figa,
	Mike Rapoport, Matthias Brugger, Joonsoo Kim, linux-arm-kernel,
	linux-kernel, Pekka Enberg, iommu, Andrew Morton, Mel Gorman

On Thu, Nov 22, 2018 at 12:26:02AM -0800, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 06:35:58PM -0800, Matthew Wilcox wrote:
> > I think you should look at using the page_frag allocator here.  You can
> > use whatever GFP_DMA flags you like.
> 
> So I actually tries to use page_frag to solve the XFS unaligned kmalloc
> allocations problem, and I don't think it is the right hammer for this
> nail (or any other nail outside of networking).
> 
> The problem with the page_frag allocator is that it never reuses
> fragments returned to the page, but only only frees the page once all
> fragments are freed.  This means that if you have some long(er) term
> allocations you are effectively creating memory leaks.

Yes, your allocations from the page_frag allocator have to have similar
lifetimes.  I thought that would be ideal for XFS though; as I understood
the problem, these were per-IO allocations, and IOs to the same filesystem
tend to take roughly the same amount of time.  Sure, in an error case,
some IOs will take a long time before timing out, but it should be OK
to have pages unavailable during that time in these rare situations.
What am I missing?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-22 15:16           ` Matthew Wilcox
@ 2018-11-22 15:19             ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2018-11-22 15:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Robin Murphy, Michal Hocko, Will Deacon,
	Levin Alexander, linux-mm, Christopher Lameter, Nicolas Boichat,
	Huaisheng Ye, David Rientjes, yingjoe.chen, Vlastimil Babka,
	Tomasz Figa, Mike Rapoport, Matthias Brugger, Joonsoo Kim,
	linux-arm-kernel, linux-kernel, Pekka Enberg, iommu,
	Andrew Morton, Mel Gorman

On Thu, Nov 22, 2018 at 07:16:32AM -0800, Matthew Wilcox wrote:
> Yes, your allocations from the page_frag allocator have to have similar
> lifetimes.  I thought that would be ideal for XFS though; as I understood
> the problem, these were per-IO allocations, and IOs to the same filesystem
> tend to take roughly the same amount of time.  Sure, in an error case,
> some IOs will take a long time before timing out, but it should be OK
> to have pages unavailable during that time in these rare situations.
> What am I missing?

No, thee are allocations for meatada buffers, which can stay around
for a long time.  Worse still that depends on usage, so one buffer
allocated from ma page might basically stay around forever, while
another one might get reclaimed very quickly.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-22  8:23       ` Christoph Hellwig
@ 2018-11-23  3:04         ` Nicolas Boichat
  2018-11-23  5:37           ` Nicolas Boichat
  2018-11-23 12:23         ` Vlastimil Babka
  1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-23  3:04 UTC (permalink / raw)
  To: hch
  Cc: Robin Murphy, willy, Christoph Lameter, Levin Alexander,
	Mike Rapoport, Huaisheng Ye, Tomasz Figa, Will Deacon, lkml,
	Pekka Enberg, linux-mm, iommu, Michal Hocko,
	linux-arm Mailing List, David Rientjes, Matthias Brugger,
	yingjoe.chen, Joonsoo Kim, Mel Gorman, Andrew Morton,
	Vlastimil Babka

On Thu, Nov 22, 2018 at 4:23 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Nov 21, 2018 at 10:26:26PM +0000, Robin Murphy wrote:
> > TBH, if this DMA32 stuff is going to be contentious we could possibly just
> > rip out the offending kmem_cache - it seemed like good practice for the
> > use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> > give the same 1KB alignment and chance of succeeding as the equivalent
> > kmem_cache_alloc(), then we could quite easily make do with that instead.
>
> Neither is the slab support for kmalloc, not do kmalloc allocations
> have useful alignment apparently (at least if you use slub debug).
>
> But I do agree with the sentiment of not wanting to spread GFP_DMA32
> futher into the slab allocator.
>
> I think you want a simple genalloc allocator for this rather special
> use case.

So I had a look at genalloc, we'd need to add pre-allocated memory
using gen_pool_add [1]. There can be up to 4096 L2 page tables, so we
may need to pre-allocate 4MB of memory (1KB per L2 page table). We
could add chunks on demand, but then it'd be difficult to free them up
(genalloc does not have a "gen_pool_remove" call). So basically if the
full 4MB end up being requested, we'd be stuck with that until the
iommu domain is freed (on the arm64 Mediatek platforms I looked at,
there is only one iommu domain, and it never gets freed).

page_frag would at least have a chance to reclaim those pages (if I
understand Christoph's statement correctly)

Robin: Do you have some ideas of the lifetime/usage of L2 tables? If
they are usually few of them, or if they don't get reclaimed easily,
some on demand genalloc allocation would be ok (or even 4MB allocation
on init, if we're willing to take that hit). If they get allocated and
freed together, maybe page_frag is a better option?

Thanks,

[1] https://www.kernel.org/doc/html/v4.19/core-api/genalloc.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-23  3:04         ` Nicolas Boichat
@ 2018-11-23  5:37           ` Nicolas Boichat
  0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-23  5:37 UTC (permalink / raw)
  To: hch
  Cc: Robin Murphy, willy, Christoph Lameter, Levin Alexander,
	Mike Rapoport, Huaisheng Ye, Tomasz Figa, Will Deacon, lkml,
	Pekka Enberg, linux-mm, iommu, Michal Hocko,
	linux-arm Mailing List, David Rientjes, Matthias Brugger,
	yingjoe.chen, Joonsoo Kim, Mel Gorman, Andrew Morton,
	Vlastimil Babka

On Fri, Nov 23, 2018 at 11:04 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
>
> On Thu, Nov 22, 2018 at 4:23 PM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Wed, Nov 21, 2018 at 10:26:26PM +0000, Robin Murphy wrote:
> > > TBH, if this DMA32 stuff is going to be contentious we could possibly just
> > > rip out the offending kmem_cache - it seemed like good practice for the
> > > use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> > > give the same 1KB alignment and chance of succeeding as the equivalent
> > > kmem_cache_alloc(), then we could quite easily make do with that instead.
> >
> > Neither is the slab support for kmalloc, not do kmalloc allocations
> > have useful alignment apparently (at least if you use slub debug).
> >
> > But I do agree with the sentiment of not wanting to spread GFP_DMA32
> > futher into the slab allocator.
> >
> > I think you want a simple genalloc allocator for this rather special
> > use case.
>
> So I had a look at genalloc, we'd need to add pre-allocated memory
> using gen_pool_add [1]. There can be up to 4096 L2 page tables, so we
> may need to pre-allocate 4MB of memory (1KB per L2 page table). We
> could add chunks on demand, but then it'd be difficult to free them up
> (genalloc does not have a "gen_pool_remove" call). So basically if the
> full 4MB end up being requested, we'd be stuck with that until the
> iommu domain is freed (on the arm64 Mediatek platforms I looked at,
> there is only one iommu domain, and it never gets freed).

I tried out genalloc with pre-allocated 4MB, and that seems to work
fine. Allocating in chunks would require genalloc changes as
gen_pool_add calls kmalloc with just GFP_KERNEL [2], and we are in
atomic context in __arm_v7s_alloc_table...

[2] https://elixir.bootlin.com/linux/latest/source/lib/genalloc.c#L190

> page_frag would at least have a chance to reclaim those pages (if I
> understand Christoph's statement correctly)
>
> Robin: Do you have some ideas of the lifetime/usage of L2 tables? If
> they are usually few of them, or if they don't get reclaimed easily,
> some on demand genalloc allocation would be ok (or even 4MB allocation
> on init, if we're willing to take that hit). If they get allocated and
> freed together, maybe page_frag is a better option?
>
> Thanks,
>
> [1] https://www.kernel.org/doc/html/v4.19/core-api/genalloc.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  2018-11-22  1:20       ` Nicolas Boichat
@ 2018-11-23 12:15         ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2018-11-23 12:15 UTC (permalink / raw)
  To: Nicolas Boichat, mhocko
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Mel Gorman, Levin Alexander, Huaisheng Ye, Mike Rapoport,
	linux-arm Mailing List, iommu, lkml, linux-mm, Yong Wu,
	Matthias Brugger, Tomasz Figa, yingjoe.chen

On 11/22/18 2:20 AM, Nicolas Boichat wrote:
> On Thu, Nov 22, 2018 at 2:02 AM Michal Hocko <mhocko@kernel.org> wrote:
>>
>> On Wed 21-11-18 16:46:38, Will Deacon wrote:
>>> On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
>>>
>>> It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
>>> architectures, since then we wouldn't need this #ifdeffery afaict.
>>
>> But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is
>> going on in here?
> 
> GFP_DMA32 will fail due to check_slab_flags (aka GFP_SLAB_BUG_MASK
> before patch 1/3 of this series)... But yes, it may be neater if there
> was transparent remapping of GFP_DMA32/SLAB_CACHE_DMA32 to
> GFP_DMA/SLAB_CACHE_DMA on 32-bit arch...

I don't know about ARM, but AFAIK on x86 DMA means within first 4MB of
physical memory, and DMA32 means within first 4GB. It doesn't matter if
the CPU is running in 32bit or 64bit mode. But, when it runs 32bit, the
kernel can direct map less than 4GB anyway, which means it doesn't need
the extra DMA32 zone, i.e. GFP_KERNEL can only get you memory that's
also acceptable for GFP_DMA32.
But, DMA is still DMA, i.e. first 4MB. Remapping GFP_DMA32 to GFP_DMA on
x86 wouldn't work, as the GFP_DMA32 allocations would then only use
those 4MB and exhaust it very fast.

>> --
>> Michal Hocko
>> SUSE Labs


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-22  8:23       ` Christoph Hellwig
  2018-11-23  3:04         ` Nicolas Boichat
@ 2018-11-23 12:23         ` Vlastimil Babka
  2018-11-23 12:30           ` Michal Hocko
  2018-11-26  8:02           ` Christoph Hellwig
  1 sibling, 2 replies; 35+ messages in thread
From: Vlastimil Babka @ 2018-11-23 12:23 UTC (permalink / raw)
  To: Christoph Hellwig, Robin Murphy
  Cc: Matthew Wilcox, Christopher Lameter, Levin Alexander,
	Mike Rapoport, Nicolas Boichat, Huaisheng Ye, Tomasz Figa,
	Will Deacon, linux-kernel, Pekka Enberg, linux-mm, iommu,
	Michal Hocko, linux-arm-kernel, David Rientjes, Matthias Brugger,
	yingjoe.chen, Joonsoo Kim, Mel Gorman, Andrew Morton

On 11/22/18 9:23 AM, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 10:26:26PM +0000, Robin Murphy wrote:
>> TBH, if this DMA32 stuff is going to be contentious we could possibly just
>> rip out the offending kmem_cache - it seemed like good practice for the
>> use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
>> give the same 1KB alignment and chance of succeeding as the equivalent
>> kmem_cache_alloc(), then we could quite easily make do with that instead.
> 
> Neither is the slab support for kmalloc, not do kmalloc allocations
> have useful alignment apparently (at least if you use slub debug).

Is this also true for caches created by kmem_cache_create(), that
debugging options can result in not respecting the alignment passed to
kmem_cache_create()? That would be rather bad, IMHO.

> But I do agree with the sentiment of not wanting to spread GFP_DMA32
> futher into the slab allocator.

I don't see a problem with GFP_DMA32 for custom caches. Generic
kmalloc() would be worse, since it would have to create a new array of
kmalloc caches. But that's already ruled out due to the alignment.

> I think you want a simple genalloc allocator for this rather special
> use case.

I would prefer if slab could support it, as it doesn't have to
preallocate. OTOH if the allocations are GFP_ATOMIC as suggested later
in the thread, and need to always succeed, then preallocation could be
better, and thus maybe genalloc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-23 12:23         ` Vlastimil Babka
@ 2018-11-23 12:30           ` Michal Hocko
  2018-11-26  8:02           ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2018-11-23 12:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Hellwig, Robin Murphy, Matthew Wilcox,
	Christopher Lameter, Levin Alexander, Mike Rapoport,
	Nicolas Boichat, Huaisheng Ye, Tomasz Figa, Will Deacon,
	linux-kernel, Pekka Enberg, linux-mm, iommu, linux-arm-kernel,
	David Rientjes, Matthias Brugger, yingjoe.chen, Joonsoo Kim,
	Mel Gorman, Andrew Morton

On Fri 23-11-18 13:23:41, Vlastimil Babka wrote:
> On 11/22/18 9:23 AM, Christoph Hellwig wrote:
[...]
> > But I do agree with the sentiment of not wanting to spread GFP_DMA32
> > futher into the slab allocator.
> 
> I don't see a problem with GFP_DMA32 for custom caches. Generic
> kmalloc() would be worse, since it would have to create a new array of
> kmalloc caches. But that's already ruled out due to the alignment.

Yes that makes quite a lot of sense to me. We do not really need a
generic support. Just make sure that if somebody creates a GFP_DMA32
restricted cache then allow allocating restricted memory from that.

Is there any fundamental reason that this wouldn't be possible?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-23 12:23         ` Vlastimil Babka
  2018-11-23 12:30           ` Michal Hocko
@ 2018-11-26  8:02           ` Christoph Hellwig
  2018-11-28  8:55             ` Nicolas Boichat
  1 sibling, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2018-11-26  8:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Hellwig, Robin Murphy, Matthew Wilcox,
	Christopher Lameter, Levin Alexander, Mike Rapoport,
	Nicolas Boichat, Huaisheng Ye, Tomasz Figa, Will Deacon,
	linux-kernel, Pekka Enberg, linux-mm, iommu, Michal Hocko,
	linux-arm-kernel, David Rientjes, Matthias Brugger, yingjoe.chen,
	Joonsoo Kim, Mel Gorman, Andrew Morton

On Fri, Nov 23, 2018 at 01:23:41PM +0100, Vlastimil Babka wrote:
> Is this also true for caches created by kmem_cache_create(), that
> debugging options can result in not respecting the alignment passed to
> kmem_cache_create()? That would be rather bad, IMHO.

That's what I understood in the discussion.  If not it would make
our live simpler, but would need to be well document.

Christoph can probably explain the alignment choices in slub.

> 
> > But I do agree with the sentiment of not wanting to spread GFP_DMA32
> > futher into the slab allocator.
> 
> I don't see a problem with GFP_DMA32 for custom caches. Generic
> kmalloc() would be worse, since it would have to create a new array of
> kmalloc caches. But that's already ruled out due to the alignment.

True, purely slab probably isn't too bad.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-26  8:02           ` Christoph Hellwig
@ 2018-11-28  8:55             ` Nicolas Boichat
  0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-11-28  8:55 UTC (permalink / raw)
  To: hch
  Cc: Vlastimil Babka, Robin Murphy, willy, Christoph Lameter,
	Levin Alexander, Mike Rapoport, Huaisheng Ye, Tomasz Figa,
	Will Deacon, lkml, Pekka Enberg, linux-mm, iommu, Michal Hocko,
	linux-arm Mailing List, David Rientjes, Matthias Brugger,
	yingjoe.chen, Joonsoo Kim, Mel Gorman, Andrew Morton

On Mon, Nov 26, 2018 at 4:02 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Nov 23, 2018 at 01:23:41PM +0100, Vlastimil Babka wrote:
> > Is this also true for caches created by kmem_cache_create(), that
> > debugging options can result in not respecting the alignment passed to
> > kmem_cache_create()? That would be rather bad, IMHO.
>
> That's what I understood in the discussion.  If not it would make
> our live simpler, but would need to be well document.

From my experiment, adding `slub_debug` to command line does _not_
break the alignment of kmem_cache_alloc'ed objects.

We do see an increase in slab_size
(/sys/kernel/slab/io-pgtable_armv7s_l2/slab_size), from 1024 to 3072
(probably because slub needs to allocate space on each side for the
red zone/padding, while keeping the alignment?)

> Christoph can probably explain the alignment choices in slub.
>
> >
> > > But I do agree with the sentiment of not wanting to spread GFP_DMA32
> > > futher into the slab allocator.
> >
> > I don't see a problem with GFP_DMA32 for custom caches. Generic
> > kmalloc() would be worse, since it would have to create a new array of
> > kmalloc caches. But that's already ruled out due to the alignment.
>
> True, purely slab probably isn't too bad.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-11-11  9:03 [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Nicolas Boichat
                   ` (3 preceding siblings ...)
  2018-11-21 18:20 ` [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Christopher Lameter
@ 2018-12-04  9:37 ` Nicolas Boichat
  2018-12-04 14:35   ` Vlastimil Babka
  2018-12-04 16:28   ` Will Deacon
  4 siblings, 2 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-12-04  9:37 UTC (permalink / raw)
  To: Robin Murphy, Christoph Lameter, Vlastimil Babka, Michal Hocko,
	Matthias Brugger, hch, Matthew Wilcox
  Cc: Will Deacon, Joerg Roedel, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Mel Gorman, Levin Alexander,
	Huaisheng Ye, Mike Rapoport, linux-arm Mailing List, iommu, lkml,
	linux-mm, Yong Wu, Tomasz Figa, yingjoe.chen, Hsin-Yi Wang,
	Daniel Kurtz

On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
>
> This is a follow-up to the discussion in [1], to make sure that the page
> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> physical address space.
>
> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html

Hi everyone,

Let's try to summarize here.

First, we confirmed that this is a regression, and IOMMU errors happen
on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13).
The issue most likely starts from ad67f5a6545f ("arm64: replace
ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number
of Mediatek platforms (and maybe others?).

We have a few options here:
1. This series [2], that adds support for GFP_DMA32 slab caches,
_without_ adding kmalloc caches (since there are no users of
kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on
the 3 patches, and AFAICT this solution works fine.
2. genalloc. That works, but unless we preallocate 4MB for L2 tables
(which is wasteful as we usually only need a handful of L2 tables),
we'll need changes in the core (use GFP_ATOMIC) to allow allocating on
demand, and as it stands we'd have no way to shrink the allocation.
3. page_frag [3]. That works fine, and the code is quite simple. One
drawback is that fragments in partially freed pages cannot be reused
(from limited experiments, I see that IOMMU L2 tables are rarely
freed, so it's unlikely a whole page would get freed). But given the
low number of L2 tables, maybe we can live with that.

I think 2 is out. Any preference between 1 and 3? I think 1 makes
better use of the memory, so that'd be my preference. But I'm probably
missing something.

[2] https://patchwork.kernel.org/cover/10677529/, 3 patches
[3] https://patchwork.codeaurora.org/patch/671639/

Thanks,

Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-12-04  9:37 ` Nicolas Boichat
@ 2018-12-04 14:35   ` Vlastimil Babka
  2018-12-05  2:04     ` Nicolas Boichat
  2018-12-04 16:28   ` Will Deacon
  1 sibling, 1 reply; 35+ messages in thread
From: Vlastimil Babka @ 2018-12-04 14:35 UTC (permalink / raw)
  To: Nicolas Boichat, Robin Murphy, Christoph Lameter, Michal Hocko,
	Matthias Brugger, hch, Matthew Wilcox
  Cc: Will Deacon, Joerg Roedel, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Mel Gorman, Levin Alexander,
	Huaisheng Ye, Mike Rapoport, linux-arm Mailing List, iommu, lkml,
	linux-mm, Yong Wu, Tomasz Figa, yingjoe.chen, Hsin-Yi Wang,
	Daniel Kurtz

On 12/4/18 10:37 AM, Nicolas Boichat wrote:
> On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
>>
>> This is a follow-up to the discussion in [1], to make sure that the page
>> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
>> physical address space.
>>
>> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
> 
> Hi everyone,
> 
> Let's try to summarize here.
> 
> First, we confirmed that this is a regression, and IOMMU errors happen
> on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13).
> The issue most likely starts from ad67f5a6545f ("arm64: replace
> ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number
> of Mediatek platforms (and maybe others?).
> 
> We have a few options here:
> 1. This series [2], that adds support for GFP_DMA32 slab caches,
> _without_ adding kmalloc caches (since there are no users of
> kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on
> the 3 patches, and AFAICT this solution works fine.
> 2. genalloc. That works, but unless we preallocate 4MB for L2 tables
> (which is wasteful as we usually only need a handful of L2 tables),
> we'll need changes in the core (use GFP_ATOMIC) to allow allocating on
> demand, and as it stands we'd have no way to shrink the allocation.
> 3. page_frag [3]. That works fine, and the code is quite simple. One
> drawback is that fragments in partially freed pages cannot be reused
> (from limited experiments, I see that IOMMU L2 tables are rarely
> freed, so it's unlikely a whole page would get freed). But given the
> low number of L2 tables, maybe we can live with that.
> 
> I think 2 is out. Any preference between 1 and 3? I think 1 makes
> better use of the memory, so that'd be my preference. But I'm probably
> missing something.

I would prefer 1 as well. IIRC you already confirmed that alignment
requirements are not broken for custom kmem caches even in presence of
SLUB debug options (and I would say it's a bug to be fixed if they
weren't). I just asked (and didn't get a reply I think) about your
ability to handle the GFP_ATOMIC allocation failures. They should be
rare when only single page allocations are needed for the kmem cache.
But in case they are not an option, then preallocating would be needed,
thus probably option 2.

> [2] https://patchwork.kernel.org/cover/10677529/, 3 patches
> [3] https://patchwork.codeaurora.org/patch/671639/
> 
> Thanks,
> 
> Nicolas
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-12-04  9:37 ` Nicolas Boichat
  2018-12-04 14:35   ` Vlastimil Babka
@ 2018-12-04 16:28   ` Will Deacon
  1 sibling, 0 replies; 35+ messages in thread
From: Will Deacon @ 2018-12-04 16:28 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Robin Murphy, Christoph Lameter, Vlastimil Babka, Michal Hocko,
	Matthias Brugger, hch, Matthew Wilcox, Joerg Roedel,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Mel Gorman, Levin Alexander, Huaisheng Ye, Mike Rapoport,
	linux-arm Mailing List, iommu, lkml, linux-mm, Yong Wu,
	Tomasz Figa, yingjoe.chen, Hsin-Yi Wang, Daniel Kurtz

On Tue, Dec 04, 2018 at 05:37:13PM +0800, Nicolas Boichat wrote:
> On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
> >
> > This is a follow-up to the discussion in [1], to make sure that the page
> > tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> > physical address space.
> >
> > [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
> 
> Hi everyone,
> 
> Let's try to summarize here.
> 
> First, we confirmed that this is a regression, and IOMMU errors happen
> on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13).
> The issue most likely starts from ad67f5a6545f ("arm64: replace
> ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number
> of Mediatek platforms (and maybe others?).
> 
> We have a few options here:
> 1. This series [2], that adds support for GFP_DMA32 slab caches,
> _without_ adding kmalloc caches (since there are no users of
> kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on
> the 3 patches, and AFAICT this solution works fine.
> 2. genalloc. That works, but unless we preallocate 4MB for L2 tables
> (which is wasteful as we usually only need a handful of L2 tables),
> we'll need changes in the core (use GFP_ATOMIC) to allow allocating on
> demand, and as it stands we'd have no way to shrink the allocation.
> 3. page_frag [3]. That works fine, and the code is quite simple. One
> drawback is that fragments in partially freed pages cannot be reused
> (from limited experiments, I see that IOMMU L2 tables are rarely
> freed, so it's unlikely a whole page would get freed). But given the
> low number of L2 tables, maybe we can live with that.
> 
> I think 2 is out. Any preference between 1 and 3? I think 1 makes
> better use of the memory, so that'd be my preference. But I'm probably
> missing something.

FWIW, I'm open to any solution at this point, since I'd like to see this
regression fixed. (1) does sound better longer-term, but (3) looks pretty
much ready to do afaict.

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-12-04 14:35   ` Vlastimil Babka
@ 2018-12-05  2:04     ` Nicolas Boichat
  2018-12-05  5:51       ` Nicolas Boichat
  2018-12-05 14:41       ` Will Deacon
  0 siblings, 2 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-12-05  2:04 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Robin Murphy, Christoph Lameter, Michal Hocko, Matthias Brugger,
	hch, Matthew Wilcox, Will Deacon, Joerg Roedel, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Mel Gorman,
	Levin Alexander, Huaisheng Ye, Mike Rapoport,
	linux-arm Mailing List, iommu, lkml, linux-mm, Yong Wu,
	Tomasz Figa, yingjoe.chen, Hsin-Yi Wang, Daniel Kurtz

On Tue, Dec 4, 2018 at 10:35 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 12/4/18 10:37 AM, Nicolas Boichat wrote:
> > On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
> >>
> >> This is a follow-up to the discussion in [1], to make sure that the page
> >> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> >> physical address space.
> >>
> >> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
> >
> > Hi everyone,
> >
> > Let's try to summarize here.
> >
> > First, we confirmed that this is a regression, and IOMMU errors happen
> > on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13).
> > The issue most likely starts from ad67f5a6545f ("arm64: replace
> > ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number
> > of Mediatek platforms (and maybe others?).
> >
> > We have a few options here:
> > 1. This series [2], that adds support for GFP_DMA32 slab caches,
> > _without_ adding kmalloc caches (since there are no users of
> > kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on
> > the 3 patches, and AFAICT this solution works fine.
> > 2. genalloc. That works, but unless we preallocate 4MB for L2 tables
> > (which is wasteful as we usually only need a handful of L2 tables),
> > we'll need changes in the core (use GFP_ATOMIC) to allow allocating on
> > demand, and as it stands we'd have no way to shrink the allocation.
> > 3. page_frag [3]. That works fine, and the code is quite simple. One
> > drawback is that fragments in partially freed pages cannot be reused
> > (from limited experiments, I see that IOMMU L2 tables are rarely
> > freed, so it's unlikely a whole page would get freed). But given the
> > low number of L2 tables, maybe we can live with that.
> >
> > I think 2 is out. Any preference between 1 and 3? I think 1 makes
> > better use of the memory, so that'd be my preference. But I'm probably
> > missing something.
>
> I would prefer 1 as well. IIRC you already confirmed that alignment
> requirements are not broken for custom kmem caches even in presence of
> SLUB debug options (and I would say it's a bug to be fixed if they
> weren't).

> I just asked (and didn't get a reply I think) about your
> ability to handle the GFP_ATOMIC allocation failures. They should be
> rare when only single page allocations are needed for the kmem cache.
> But in case they are not an option, then preallocating would be needed,
> thus probably option 2.

Oh, sorry, I missed your question.

I don't have a full answer, but:
- The allocations themselves are rare (I count a few 10s of L2 tables
at most on my system, I assume we rarely have >100), and yes, we only
need a single page, so the failures should be exceptional.
- My change is probably not making anything worse: I assume that even
with the current approach using GFP_DMA slab caches on older kernels,
failures could potentially happen. I don't think we've seen those. If
we are really concerned about this, maybe we'd need to modify
mtk_iommu_map to not hold a spinlock (if that's possible), so we don't
need to use GFP_ATOMIC. I suggest we just keep an eye on such issues,
and address them if they show up (we can even revisit genalloc at that
stage).

Anyway, I'll clean up patches for 1 (mostly commit message changes
based on the comments in the threads) and resend.

Thanks,

> > [2] https://patchwork.kernel.org/cover/10677529/, 3 patches
> > [3] https://patchwork.codeaurora.org/patch/671639/
> >
> > Thanks,
> >
> > Nicolas
> >
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-12-05  2:04     ` Nicolas Boichat
@ 2018-12-05  5:51       ` Nicolas Boichat
  2018-12-05 14:41       ` Will Deacon
  1 sibling, 0 replies; 35+ messages in thread
From: Nicolas Boichat @ 2018-12-05  5:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Robin Murphy, Christoph Lameter, Michal Hocko, Matthias Brugger,
	hch, Matthew Wilcox, Will Deacon, Joerg Roedel, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Mel Gorman,
	Levin Alexander, Huaisheng Ye, Mike Rapoport,
	linux-arm Mailing List, iommu, lkml, linux-mm, Yong Wu,
	Tomasz Figa, yingjoe.chen, Hsin-Yi Wang, Daniel Kurtz

On Wed, Dec 5, 2018 at 10:04 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
>
> On Tue, Dec 4, 2018 at 10:35 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 12/4/18 10:37 AM, Nicolas Boichat wrote:
> > > On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
> > >>
> > >> This is a follow-up to the discussion in [1], to make sure that the page
> > >> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> > >> physical address space.
> > >>
> > >> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
> > >
> > > Hi everyone,
> > >
> > > Let's try to summarize here.
> > >
> > > First, we confirmed that this is a regression, and IOMMU errors happen
> > > on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13).
> > > The issue most likely starts from ad67f5a6545f ("arm64: replace
> > > ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number
> > > of Mediatek platforms (and maybe others?).
> > >
> > > We have a few options here:
> > > 1. This series [2], that adds support for GFP_DMA32 slab caches,
> > > _without_ adding kmalloc caches (since there are no users of
> > > kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on
> > > the 3 patches, and AFAICT this solution works fine.
> > > 2. genalloc. That works, but unless we preallocate 4MB for L2 tables
> > > (which is wasteful as we usually only need a handful of L2 tables),
> > > we'll need changes in the core (use GFP_ATOMIC) to allow allocating on
> > > demand, and as it stands we'd have no way to shrink the allocation.
> > > 3. page_frag [3]. That works fine, and the code is quite simple. One
> > > drawback is that fragments in partially freed pages cannot be reused
> > > (from limited experiments, I see that IOMMU L2 tables are rarely
> > > freed, so it's unlikely a whole page would get freed). But given the
> > > low number of L2 tables, maybe we can live with that.
> > >
> > > I think 2 is out. Any preference between 1 and 3? I think 1 makes
> > > better use of the memory, so that'd be my preference. But I'm probably
> > > missing something.
> >
> > I would prefer 1 as well. IIRC you already confirmed that alignment
> > requirements are not broken for custom kmem caches even in presence of
> > SLUB debug options (and I would say it's a bug to be fixed if they
> > weren't).
>
> > I just asked (and didn't get a reply I think) about your
> > ability to handle the GFP_ATOMIC allocation failures. They should be
> > rare when only single page allocations are needed for the kmem cache.
> > But in case they are not an option, then preallocating would be needed,
> > thus probably option 2.
>
> Oh, sorry, I missed your question.
>
> I don't have a full answer, but:
> - The allocations themselves are rare (I count a few 10s of L2 tables
> at most on my system, I assume we rarely have >100), and yes, we only
> need a single page, so the failures should be exceptional.
> - My change is probably not making anything worse: I assume that even
> with the current approach using GFP_DMA slab caches on older kernels,
> failures could potentially happen. I don't think we've seen those. If
> we are really concerned about this, maybe we'd need to modify
> mtk_iommu_map to not hold a spinlock (if that's possible), so we don't
> need to use GFP_ATOMIC. I suggest we just keep an eye on such issues,
> and address them if they show up (we can even revisit genalloc at that
> stage).
>
> Anyway, I'll clean up patches for 1 (mostly commit message changes
> based on the comments in the threads) and resend.

Done here: https://patchwork.kernel.org/cover/10713019/ .

> Thanks,
>
> > > [2] https://patchwork.kernel.org/cover/10677529/, 3 patches
> > > [3] https://patchwork.codeaurora.org/patch/671639/
> > >
> > > Thanks,
> > >
> > > Nicolas
> > >
> >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
  2018-12-05  2:04     ` Nicolas Boichat
  2018-12-05  5:51       ` Nicolas Boichat
@ 2018-12-05 14:41       ` Will Deacon
  1 sibling, 0 replies; 35+ messages in thread
From: Will Deacon @ 2018-12-05 14:41 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Vlastimil Babka, Robin Murphy, Christoph Lameter, Michal Hocko,
	Matthias Brugger, hch, Matthew Wilcox, Joerg Roedel,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Mel Gorman, Levin Alexander, Huaisheng Ye, Mike Rapoport,
	linux-arm Mailing List, iommu, lkml, linux-mm, Yong Wu,
	Tomasz Figa, yingjoe.chen, Hsin-Yi Wang, Daniel Kurtz

On Wed, Dec 05, 2018 at 10:04:00AM +0800, Nicolas Boichat wrote:
> On Tue, Dec 4, 2018 at 10:35 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 12/4/18 10:37 AM, Nicolas Boichat wrote:
> > > On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
> > >>
> > >> This is a follow-up to the discussion in [1], to make sure that the page
> > >> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> > >> physical address space.
> > >>
> > >> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
> > >
> > > Hi everyone,
> > >
> > > Let's try to summarize here.
> > >
> > > First, we confirmed that this is a regression, and IOMMU errors happen
> > > on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13).
> > > The issue most likely starts from ad67f5a6545f ("arm64: replace
> > > ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number
> > > of Mediatek platforms (and maybe others?).
> > >
> > > We have a few options here:
> > > 1. This series [2], that adds support for GFP_DMA32 slab caches,
> > > _without_ adding kmalloc caches (since there are no users of
> > > kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on
> > > the 3 patches, and AFAICT this solution works fine.
> > > 2. genalloc. That works, but unless we preallocate 4MB for L2 tables
> > > (which is wasteful as we usually only need a handful of L2 tables),
> > > we'll need changes in the core (use GFP_ATOMIC) to allow allocating on
> > > demand, and as it stands we'd have no way to shrink the allocation.
> > > 3. page_frag [3]. That works fine, and the code is quite simple. One
> > > drawback is that fragments in partially freed pages cannot be reused
> > > (from limited experiments, I see that IOMMU L2 tables are rarely
> > > freed, so it's unlikely a whole page would get freed). But given the
> > > low number of L2 tables, maybe we can live with that.
> > >
> > > I think 2 is out. Any preference between 1 and 3? I think 1 makes
> > > better use of the memory, so that'd be my preference. But I'm probably
> > > missing something.
> >
> > I would prefer 1 as well. IIRC you already confirmed that alignment
> > requirements are not broken for custom kmem caches even in presence of
> > SLUB debug options (and I would say it's a bug to be fixed if they
> > weren't).
> 
> > I just asked (and didn't get a reply I think) about your
> > ability to handle the GFP_ATOMIC allocation failures. They should be
> > rare when only single page allocations are needed for the kmem cache.
> > But in case they are not an option, then preallocating would be needed,
> > thus probably option 2.
> 
> Oh, sorry, I missed your question.
> 
> I don't have a full answer, but:
> - The allocations themselves are rare (I count a few 10s of L2 tables
> at most on my system, I assume we rarely have >100), and yes, we only
> need a single page, so the failures should be exceptional.
> - My change is probably not making anything worse: I assume that even
> with the current approach using GFP_DMA slab caches on older kernels,
> failures could potentially happen. I don't think we've seen those. If
> we are really concerned about this, maybe we'd need to modify
> mtk_iommu_map to not hold a spinlock (if that's possible), so we don't
> need to use GFP_ATOMIC. I suggest we just keep an eye on such issues,
> and address them if they show up (we can even revisit genalloc at that
> stage).

I think the spinlock is the least of our worries: the map/unmap routines
can be called in irq context and may need to allocate second-level tables.

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-12-05 14:41 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-11  9:03 [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Nicolas Boichat
2018-11-11  9:03 ` [PATCH v2 1/3] mm: slab/slub: Add check_slab_flags function to check for valid flags Nicolas Boichat
2018-11-11  9:03 ` [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32 Nicolas Boichat
2018-11-21 18:32   ` Christopher Lameter
2018-11-22  0:52     ` Nicolas Boichat
2018-11-11  9:03 ` [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging Nicolas Boichat
2018-11-21 16:46   ` Will Deacon
2018-11-21 17:38     ` Christopher Lameter
2018-11-21 17:43       ` Robin Murphy
2018-11-21 18:18         ` Christopher Lameter
2018-11-21 18:02     ` Michal Hocko
2018-11-22  1:20       ` Nicolas Boichat
2018-11-23 12:15         ` Vlastimil Babka
2018-11-21 18:20 ` [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables Christopher Lameter
2018-11-21 21:38   ` Matthew Wilcox
2018-11-21 22:26     ` Robin Murphy
2018-11-22  1:05       ` Nicolas Boichat
2018-11-22  2:35       ` Matthew Wilcox
2018-11-22  5:56         ` Nicolas Boichat
2018-11-22  8:26         ` Christoph Hellwig
2018-11-22 15:16           ` Matthew Wilcox
2018-11-22 15:19             ` Christoph Hellwig
2018-11-22  8:23       ` Christoph Hellwig
2018-11-23  3:04         ` Nicolas Boichat
2018-11-23  5:37           ` Nicolas Boichat
2018-11-23 12:23         ` Vlastimil Babka
2018-11-23 12:30           ` Michal Hocko
2018-11-26  8:02           ` Christoph Hellwig
2018-11-28  8:55             ` Nicolas Boichat
2018-12-04  9:37 ` Nicolas Boichat
2018-12-04 14:35   ` Vlastimil Babka
2018-12-05  2:04     ` Nicolas Boichat
2018-12-05  5:51       ` Nicolas Boichat
2018-12-05 14:41       ` Will Deacon
2018-12-04 16:28   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).