All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-07  3:07 ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

***Problem observed:
On x86_64, when crash is triggered and entering into kdump kernel, page
allocation failure can always be seen.

 ---------------------------------
 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ......
  __alloc_pages+0x24d/0x2c0
  ......
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ------------------------------------

***Root cause:
In the current kernel, it assumes that DMA zone must have managed pages
and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
always true. E.g in kdump kernel of x86_64, only low 1M is presented and
locked down at very early stage of boot, so that this low 1M won't be
added into buddy allocator to become managed pages of DMA zone. This
exception will always cause page allocation failure if page is requested
from DMA zone.

***Investigation:
This failure happens since below commit merged into linus's tree.
  1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
  f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
  7c321eb2b843 x86/kdump: Remove the backup region handling
  6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified

Before them, on x86_64, the low 640K area will be reused by kdump kernel.
So in kdump kernel, the content of low 640K area is copied into a backup
region for dumping before jumping into kdump. Then except of those firmware
reserved region in [0, 640K], the left area will be added into buddy
allocator to become available managed pages of DMA zone.

However, after above commits applied, in kdump kernel of x86_64, the low
1M is reserved by memblock, but not released to buddy allocator. So any
later page allocation requested from DMA zone will fail. 

This low 1M lock down is needed because AMD SME encrypts memory making
the old backup region mechanims impossible when switching into kdump
kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
which is under development in kernel also needs lock down the low 1M.
So we can't simply revert above commits to fix the page allocation
failure from DMA zone as someone suggested.

***Solution:
Currently, only DMA atomic pool and dma-kmalloc will initialize and
request page allocation with GFP_DMA during bootup. So only initialize
them when DMA zone has available managed pages, otherwise just skip the
initialization. From testing and code, this doesn't matter. In kdump
kernel of x86_64, the page allocation failure disappear.

***Further thinking
On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
take care of antique ISA devices. In fact, on 64bit system, it rarely
need ZONE_DMA (which is low 16M) to support almost extinct ISA devices. 
However, some components treat DMA as a generic concept, e.g
kmalloc-dma, slab allocator initializes it for later any DMA related
buffer allocation, but not limited to ISA DMA. 

On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 
are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
the 32-bit addressable memory. 

I am wondering if we can also change the size of DMA and DMA32 ZONE as
dynamically adjusted, just as arm64 is doing? On x86_64, we can make
zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
(I am not familiar with ISA_DMA_API, will it require 24-bit addressable
memory when enabled?)

Change history:

v2 post:
https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/#u

v1 post:
https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/#u

v2->v2 RESEND:
 John pinged to push the repost of this patchset. So fix one typo of
 suject of patch 3/5; Fix a building error caused by mix declaration in
 patch 5/5. Both of them are found by John from his testing.

v1->v2:
 Change to check if managed DMA zone exists. If DMA zone has managed
 pages, go further to request page from DMA zone to initialize. Otherwise,
 just skip to initialize stuffs which need pages from DMA zone.

Baoquan He (5):
  docs: kernel-parameters: Update to reflect the current default size of
    atomic pool
  dma-pool: allow user to disable atomic pool
  mm_zone: add function to check if managed dma zone exists
  dma/pool: create dma atomic pool only if dma zone has managed pages
  mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

 .../admin-guide/kernel-parameters.txt         |  5 ++++-
 include/linux/mmzone.h                        | 21 +++++++++++++++++++
 kernel/dma/pool.c                             | 11 ++++++----
 mm/page_alloc.c                               | 11 ++++++++++
 mm/slab_common.c                              |  9 ++++++++
 5 files changed, 52 insertions(+), 5 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-07  3:07 ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

***Problem observed:
On x86_64, when crash is triggered and entering into kdump kernel, page
allocation failure can always be seen.

 ---------------------------------
 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ......
  __alloc_pages+0x24d/0x2c0
  ......
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ------------------------------------

***Root cause:
In the current kernel, it assumes that DMA zone must have managed pages
and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
always true. E.g in kdump kernel of x86_64, only low 1M is presented and
locked down at very early stage of boot, so that this low 1M won't be
added into buddy allocator to become managed pages of DMA zone. This
exception will always cause page allocation failure if page is requested
from DMA zone.

***Investigation:
This failure happens since below commit merged into linus's tree.
  1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
  f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
  7c321eb2b843 x86/kdump: Remove the backup region handling
  6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified

Before them, on x86_64, the low 640K area will be reused by kdump kernel.
So in kdump kernel, the content of low 640K area is copied into a backup
region for dumping before jumping into kdump. Then except of those firmware
reserved region in [0, 640K], the left area will be added into buddy
allocator to become available managed pages of DMA zone.

However, after above commits applied, in kdump kernel of x86_64, the low
1M is reserved by memblock, but not released to buddy allocator. So any
later page allocation requested from DMA zone will fail. 

This low 1M lock down is needed because AMD SME encrypts memory making
the old backup region mechanims impossible when switching into kdump
kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
which is under development in kernel also needs lock down the low 1M.
So we can't simply revert above commits to fix the page allocation
failure from DMA zone as someone suggested.

***Solution:
Currently, only DMA atomic pool and dma-kmalloc will initialize and
request page allocation with GFP_DMA during bootup. So only initialize
them when DMA zone has available managed pages, otherwise just skip the
initialization. From testing and code, this doesn't matter. In kdump
kernel of x86_64, the page allocation failure disappear.

***Further thinking
On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
take care of antique ISA devices. In fact, on 64bit system, it rarely
need ZONE_DMA (which is low 16M) to support almost extinct ISA devices. 
However, some components treat DMA as a generic concept, e.g
kmalloc-dma, slab allocator initializes it for later any DMA related
buffer allocation, but not limited to ISA DMA. 

On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 
are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
the 32-bit addressable memory. 

I am wondering if we can also change the size of DMA and DMA32 ZONE as
dynamically adjusted, just as arm64 is doing? On x86_64, we can make
zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
(I am not familiar with ISA_DMA_API, will it require 24-bit addressable
memory when enabled?)

Change history:

v2 post:
https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/#u

v1 post:
https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/#u

v2->v2 RESEND:
 John pinged to push the repost of this patchset. So fix one typo of
 suject of patch 3/5; Fix a building error caused by mix declaration in
 patch 5/5. Both of them are found by John from his testing.

v1->v2:
 Change to check if managed DMA zone exists. If DMA zone has managed
 pages, go further to request page from DMA zone to initialize. Otherwise,
 just skip to initialize stuffs which need pages from DMA zone.

Baoquan He (5):
  docs: kernel-parameters: Update to reflect the current default size of
    atomic pool
  dma-pool: allow user to disable atomic pool
  mm_zone: add function to check if managed dma zone exists
  dma/pool: create dma atomic pool only if dma zone has managed pages
  mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

 .../admin-guide/kernel-parameters.txt         |  5 ++++-
 include/linux/mmzone.h                        | 21 +++++++++++++++++++
 kernel/dma/pool.c                             | 11 ++++++----
 mm/page_alloc.c                               | 11 ++++++++++
 mm/slab_common.c                              |  9 ++++++++
 5 files changed, 52 insertions(+), 5 deletions(-)

-- 
2.17.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool
  2021-12-07  3:07 ` Baoquan He
@ 2021-12-07  3:07   ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

Since commit 1d659236fb43("dma-pool: scale the default DMA coherent pool
size with memory capacity"), the default size of atomic pool has been
changed to take by scaling with system memory capacity. So update the
document in kerenl-parameter.txt accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..ec4d25e854a8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -664,7 +664,9 @@
 
 	coherent_pool=nn[KMG]	[ARM,KNL]
 			Sets the size of memory pool for coherent, atomic dma
-			allocations, by default set to 256K.
+			allocations. Otherwise the default size will be scaled
+			with memory capacity, while clamped between 128K and
+			1 << (PAGE_SHIFT + MAX_ORDER-1).
 
 	com20020=	[HW,NET] ARCnet - COM20020 chipset
 			Format:
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool
@ 2021-12-07  3:07   ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

Since commit 1d659236fb43("dma-pool: scale the default DMA coherent pool
size with memory capacity"), the default size of atomic pool has been
changed to take by scaling with system memory capacity. So update the
document in kerenl-parameter.txt accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..ec4d25e854a8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -664,7 +664,9 @@
 
 	coherent_pool=nn[KMG]	[ARM,KNL]
 			Sets the size of memory pool for coherent, atomic dma
-			allocations, by default set to 256K.
+			allocations. Otherwise the default size will be scaled
+			with memory capacity, while clamped between 128K and
+			1 << (PAGE_SHIFT + MAX_ORDER-1).
 
 	com20020=	[HW,NET] ARCnet - COM20020 chipset
 			Format:
-- 
2.17.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool
  2021-12-07  3:07 ` Baoquan He
@ 2021-12-07  3:07   ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

In the current code, three atomic memory pools are always created,
atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
specified in kernel command line. In fact, atomic pool is only
necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
which are needed on few ARCHes.

So change code to allow user to disable atomic pool by specifying
'coherent_pool=0'.

Meanwhile, update the relevant document in kernel-parameter.txt.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 3 ++-
 kernel/dma/pool.c                               | 7 +++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ec4d25e854a8..d7015309614b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -664,7 +664,8 @@
 
 	coherent_pool=nn[KMG]	[ARM,KNL]
 			Sets the size of memory pool for coherent, atomic dma
-			allocations. Otherwise the default size will be scaled
+			allocations. A value of 0 disables the three atomic
+			memory pool. Otherwise the default size will be scaled
 			with memory capacity, while clamped between 128K and
 			1 << (PAGE_SHIFT + MAX_ORDER-1).
 
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5f84e6cdb78e..5a85804b5beb 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -21,7 +21,7 @@ static struct gen_pool *atomic_pool_kernel __ro_after_init;
 static unsigned long pool_size_kernel;
 
 /* Size can be defined by the coherent_pool command line */
-static size_t atomic_pool_size;
+static unsigned long atomic_pool_size = -1;
 
 /* Dynamic background expansion when the atomic pool is near capacity */
 static struct work_struct atomic_pool_work;
@@ -188,11 +188,14 @@ static int __init dma_atomic_pool_init(void)
 {
 	int ret = 0;
 
+	if (!atomic_pool_size)
+		return 0;
+
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
 	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
 	 */
-	if (!atomic_pool_size) {
+	if (atomic_pool_size == -1) {
 		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
 		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
 		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool
@ 2021-12-07  3:07   ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

In the current code, three atomic memory pools are always created,
atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
specified in kernel command line. In fact, atomic pool is only
necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
which are needed on few ARCHes.

So change code to allow user to disable atomic pool by specifying
'coherent_pool=0'.

Meanwhile, update the relevant document in kernel-parameter.txt.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 3 ++-
 kernel/dma/pool.c                               | 7 +++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ec4d25e854a8..d7015309614b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -664,7 +664,8 @@
 
 	coherent_pool=nn[KMG]	[ARM,KNL]
 			Sets the size of memory pool for coherent, atomic dma
-			allocations. Otherwise the default size will be scaled
+			allocations. A value of 0 disables the three atomic
+			memory pool. Otherwise the default size will be scaled
 			with memory capacity, while clamped between 128K and
 			1 << (PAGE_SHIFT + MAX_ORDER-1).
 
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5f84e6cdb78e..5a85804b5beb 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -21,7 +21,7 @@ static struct gen_pool *atomic_pool_kernel __ro_after_init;
 static unsigned long pool_size_kernel;
 
 /* Size can be defined by the coherent_pool command line */
-static size_t atomic_pool_size;
+static unsigned long atomic_pool_size = -1;
 
 /* Dynamic background expansion when the atomic pool is near capacity */
 static struct work_struct atomic_pool_work;
@@ -188,11 +188,14 @@ static int __init dma_atomic_pool_init(void)
 {
 	int ret = 0;
 
+	if (!atomic_pool_size)
+		return 0;
+
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
 	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
 	 */
-	if (!atomic_pool_size) {
+	if (atomic_pool_size == -1) {
 		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
 		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
 		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
-- 
2.17.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
  2021-12-07  3:07 ` Baoquan He
@ 2021-12-07  3:07   ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
E.g in kdump kernel of x86_64, only low 1M is presented and locked down
at very early stage of boot, so that there's no managed pages at all in
DMA zone. This exception will always cause page allocation failure if page
is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages. It will be used in later
patches.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 include/linux/mmzone.h | 21 +++++++++++++++++++++
 mm/page_alloc.c        | 11 +++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..82d23e13e0e5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
 }
 #endif
 
+#ifdef CONFIG_ZONE_DMA
+static inline bool zone_is_dma(struct zone *zone)
+{
+	return zone_idx(zone) == ZONE_DMA;
+}
+#else
+static inline bool zone_is_dma(struct zone *zone)
+{
+	return false;
+}
+#endif
+
 /*
  * Returns true if a zone has pages managed by the buddy allocator.
  * All the reclaim decisions have to use this function rather than
@@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
 #endif
 }
 
+bool has_managed_dma(void);
 /**
  * is_highmem - helper function to quickly check if a struct zone is a
  *              highmem zone or not.  This is an attempt to keep references
@@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
 			; /* do nothing */		\
 		else
 
+#define for_each_managed_zone(zone)		        \
+	for (zone = (first_online_pgdat())->node_zones; \
+	     zone;					\
+	     zone = next_zone(zone))			\
+		if (!managed_zone(zone))		\
+			; /* do nothing */		\
+		else
+
 static inline struct zone *zonelist_zone(struct zoneref *zoneref)
 {
 	return zoneref->zone;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..ac0ea42a4e5f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret;
 }
+
+bool has_managed_dma(void)
+{
+	struct zone *zone;
+
+	for_each_managed_zone(zone) {
+		if (zone_is_dma(zone))
+			return true;
+	}
+	return false;
+}
 #endif
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
@ 2021-12-07  3:07   ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
E.g in kdump kernel of x86_64, only low 1M is presented and locked down
at very early stage of boot, so that there's no managed pages at all in
DMA zone. This exception will always cause page allocation failure if page
is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages. It will be used in later
patches.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 include/linux/mmzone.h | 21 +++++++++++++++++++++
 mm/page_alloc.c        | 11 +++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..82d23e13e0e5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
 }
 #endif
 
+#ifdef CONFIG_ZONE_DMA
+static inline bool zone_is_dma(struct zone *zone)
+{
+	return zone_idx(zone) == ZONE_DMA;
+}
+#else
+static inline bool zone_is_dma(struct zone *zone)
+{
+	return false;
+}
+#endif
+
 /*
  * Returns true if a zone has pages managed by the buddy allocator.
  * All the reclaim decisions have to use this function rather than
@@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
 #endif
 }
 
+bool has_managed_dma(void);
 /**
  * is_highmem - helper function to quickly check if a struct zone is a
  *              highmem zone or not.  This is an attempt to keep references
@@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
 			; /* do nothing */		\
 		else
 
+#define for_each_managed_zone(zone)		        \
+	for (zone = (first_online_pgdat())->node_zones; \
+	     zone;					\
+	     zone = next_zone(zone))			\
+		if (!managed_zone(zone))		\
+			; /* do nothing */		\
+		else
+
 static inline struct zone *zonelist_zone(struct zoneref *zoneref)
 {
 	return zoneref->zone;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..ac0ea42a4e5f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret;
 }
+
+bool has_managed_dma(void)
+{
+	struct zone *zone;
+
+	for_each_managed_zone(zone) {
+		if (zone_is_dma(zone))
+			return true;
+	}
+	return false;
+}
 #endif
-- 
2.17.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages
  2021-12-07  3:07 ` Baoquan He
  (?)
@ 2021-12-07  3:07   ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He, iommu

Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3+ #1
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ? _raw_spin_unlock_irq+0x24/0x40
  ? __alloc_pages_direct_compact+0x90/0x1b0
  __alloc_pages_slowpath.constprop.0+0xf29/0xf50
  ? __cond_resched+0x16/0x50
  ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
  __alloc_pages+0x24d/0x2c0
  ? __dma_atomic_pool_init+0x93/0x93
  alloc_page_interleave+0x13/0xb0
  atomic_pool_expand+0x118/0x210
  ? __dma_atomic_pool_init+0x93/0x93
  __dma_atomic_pool_init+0x45/0x93
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ......
 DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
 DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux-foundation.org
---
 kernel/dma/pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
 						    GFP_KERNEL);
 	if (!atomic_pool_kernel)
 		ret = -ENOMEM;
-	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+	if (has_managed_dma()) {
 		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
 						GFP_KERNEL | GFP_DMA);
 		if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 	if (prev == NULL) {
 		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
 			return atomic_pool_dma32;
-		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+		if (atomic_pool_dma && (gfp & GFP_DMA))
 			return atomic_pool_dma;
 		return atomic_pool_kernel;
 	}
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages
@ 2021-12-07  3:07   ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: John.p.donnelly, kexec, hch, penberg, linux-mm, iommu, rientjes,
	iamjoonsoo.kim, cl, robin.murphy, akpm, vbabka

Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3+ #1
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ? _raw_spin_unlock_irq+0x24/0x40
  ? __alloc_pages_direct_compact+0x90/0x1b0
  __alloc_pages_slowpath.constprop.0+0xf29/0xf50
  ? __cond_resched+0x16/0x50
  ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
  __alloc_pages+0x24d/0x2c0
  ? __dma_atomic_pool_init+0x93/0x93
  alloc_page_interleave+0x13/0xb0
  atomic_pool_expand+0x118/0x210
  ? __dma_atomic_pool_init+0x93/0x93
  __dma_atomic_pool_init+0x45/0x93
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ......
 DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
 DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux-foundation.org
---
 kernel/dma/pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
 						    GFP_KERNEL);
 	if (!atomic_pool_kernel)
 		ret = -ENOMEM;
-	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+	if (has_managed_dma()) {
 		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
 						GFP_KERNEL | GFP_DMA);
 		if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 	if (prev == NULL) {
 		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
 			return atomic_pool_dma32;
-		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+		if (atomic_pool_dma && (gfp & GFP_DMA))
 			return atomic_pool_dma;
 		return atomic_pool_kernel;
 	}
-- 
2.17.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages
@ 2021-12-07  3:07   ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He, iommu

Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3+ #1
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ? _raw_spin_unlock_irq+0x24/0x40
  ? __alloc_pages_direct_compact+0x90/0x1b0
  __alloc_pages_slowpath.constprop.0+0xf29/0xf50
  ? __cond_resched+0x16/0x50
  ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
  __alloc_pages+0x24d/0x2c0
  ? __dma_atomic_pool_init+0x93/0x93
  alloc_page_interleave+0x13/0xb0
  atomic_pool_expand+0x118/0x210
  ? __dma_atomic_pool_init+0x93/0x93
  __dma_atomic_pool_init+0x45/0x93
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ......
 DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
 DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux-foundation.org
---
 kernel/dma/pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
 						    GFP_KERNEL);
 	if (!atomic_pool_kernel)
 		ret = -ENOMEM;
-	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+	if (has_managed_dma()) {
 		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
 						GFP_KERNEL | GFP_DMA);
 		if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 	if (prev == NULL) {
 		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
 			return atomic_pool_dma32;
-		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+		if (atomic_pool_dma && (gfp & GFP_DMA))
 			return atomic_pool_dma;
 		return atomic_pool_kernel;
 	}
-- 
2.17.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
  2021-12-07  3:07 ` Baoquan He
@ 2021-12-07  3:07   ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
However, it will fail if DMA zone has no managed pages. The failure
can be seen in kdump kernel of x86_64 as below:

 kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0                          
 CPU: 0 PID: 36 Comm: kworker/u2:2 Not tainted 5.16.0-rc3+ #6
 Hardware name: Dell Inc. PowerEdge R815/06JC9T, BIOS 3.2.2 09/15/2014
 Workqueue: events_unbound async_run_entry_fn
 Call Trace:
  dump_stack_lvl+0x57/0x72
  warn_alloc.cold+0x72/0xd6
  __alloc_pages_slowpath.constprop.0+0xf56/0xf70
  __alloc_pages+0x23b/0x2b0
  allocate_slab+0x406/0x630
  ___slab_alloc+0x4b1/0x7e0
  ? sr_probe+0x200/0x600
  ? lock_acquire+0xc4/0x2e0
  ? fs_reclaim_acquire+0x4d/0xe0
  ? lock_is_held_type+0xa7/0x120
  ? sr_probe+0x200/0x600
  ? __slab_alloc+0x67/0x90
  __slab_alloc+0x67/0x90
  ? sr_probe+0x200/0x600
  ? sr_probe+0x200/0x600
  kmem_cache_alloc_trace+0x259/0x270
  sr_probe+0x200/0x600
  ......
  bus_probe_device+0x9f/0xb0
  device_add+0x3d2/0x970
  ......
  __scsi_add_device+0xea/0x100
  ata_scsi_scan_host+0x97/0x1d0
  async_run_entry_fn+0x30/0x130
  process_one_work+0x2b0/0x5c0
  worker_thread+0x55/0x3c0
  ? process_one_work+0x5c0/0x5c0
  kthread+0x149/0x170
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x22/0x30
 Mem-Info:
 ......

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages in there.
 sr_probe()
 --> get_capabilities()
     --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

The DMA zone should be checked if it has managed pages, then try to create
dma-kmalloc.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index e5d080a93009..ae4ef0f8903a 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -878,6 +878,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 {
 	int i;
 	enum kmalloc_cache_type type;
+#ifdef CONFIG_ZONE_DMA
+	bool managed_dma;
+#endif
 
 	/*
 	 * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
@@ -905,10 +908,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 	slab_state = UP;
 
 #ifdef CONFIG_ZONE_DMA
+	managed_dma = has_managed_dma();
+
 	for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
 		struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];
 
 		if (s) {
+			if (!managed_dma) {
+				kmalloc_caches[KMALLOC_DMA][i] = kmalloc_caches[KMALLOC_NORMAL][i];
+				continue;
+			}
 			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
 				kmalloc_info[i].name[KMALLOC_DMA],
 				kmalloc_info[i].size,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH RESEND v2 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
@ 2021-12-07  3:07   ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	Baoquan He

Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
However, it will fail if DMA zone has no managed pages. The failure
can be seen in kdump kernel of x86_64 as below:

 kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0                          
 CPU: 0 PID: 36 Comm: kworker/u2:2 Not tainted 5.16.0-rc3+ #6
 Hardware name: Dell Inc. PowerEdge R815/06JC9T, BIOS 3.2.2 09/15/2014
 Workqueue: events_unbound async_run_entry_fn
 Call Trace:
  dump_stack_lvl+0x57/0x72
  warn_alloc.cold+0x72/0xd6
  __alloc_pages_slowpath.constprop.0+0xf56/0xf70
  __alloc_pages+0x23b/0x2b0
  allocate_slab+0x406/0x630
  ___slab_alloc+0x4b1/0x7e0
  ? sr_probe+0x200/0x600
  ? lock_acquire+0xc4/0x2e0
  ? fs_reclaim_acquire+0x4d/0xe0
  ? lock_is_held_type+0xa7/0x120
  ? sr_probe+0x200/0x600
  ? __slab_alloc+0x67/0x90
  __slab_alloc+0x67/0x90
  ? sr_probe+0x200/0x600
  ? sr_probe+0x200/0x600
  kmem_cache_alloc_trace+0x259/0x270
  sr_probe+0x200/0x600
  ......
  bus_probe_device+0x9f/0xb0
  device_add+0x3d2/0x970
  ......
  __scsi_add_device+0xea/0x100
  ata_scsi_scan_host+0x97/0x1d0
  async_run_entry_fn+0x30/0x130
  process_one_work+0x2b0/0x5c0
  worker_thread+0x55/0x3c0
  ? process_one_work+0x5c0/0x5c0
  kthread+0x149/0x170
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x22/0x30
 Mem-Info:
 ......

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages in there.
 sr_probe()
 --> get_capabilities()
     --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

The DMA zone should be checked if it has managed pages, then try to create
dma-kmalloc.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index e5d080a93009..ae4ef0f8903a 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -878,6 +878,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 {
 	int i;
 	enum kmalloc_cache_type type;
+#ifdef CONFIG_ZONE_DMA
+	bool managed_dma;
+#endif
 
 	/*
 	 * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
@@ -905,10 +908,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 	slab_state = UP;
 
 #ifdef CONFIG_ZONE_DMA
+	managed_dma = has_managed_dma();
+
 	for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
 		struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];
 
 		if (s) {
+			if (!managed_dma) {
+				kmalloc_caches[KMALLOC_DMA][i] = kmalloc_caches[KMALLOC_NORMAL][i];
+				continue;
+			}
 			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
 				kmalloc_info[i].name[KMALLOC_DMA],
 				kmalloc_info[i].size,
-- 
2.17.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-07  3:07 ` Baoquan He
@ 2021-12-07  3:16   ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:16 UTC (permalink / raw)
  To: linux-kernel, tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	rppt

Sorry, forgot adding x86 and x86/mm maintainers

On 12/07/21 at 11:07am, Baoquan He wrote:
> ***Problem observed:
> On x86_64, when crash is triggered and entering into kdump kernel, page
> allocation failure can always be seen.
> 
>  ---------------------------------
>  DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>  swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>  CPU: 0 PID: 1 Comm: swapper/0 
>  Call Trace:
>   dump_stack+0x7f/0xa1
>   warn_alloc.cold+0x72/0xd6
>   ......
>   __alloc_pages+0x24d/0x2c0
>   ......
>   dma_atomic_pool_init+0xdb/0x176
>   do_one_initcall+0x67/0x320
>   ? rcu_read_lock_sched_held+0x3f/0x80
>   kernel_init_freeable+0x290/0x2dc
>   ? rest_init+0x24f/0x24f
>   kernel_init+0xa/0x111
>   ret_from_fork+0x22/0x30
>  Mem-Info:
>  ------------------------------------
> 
> ***Root cause:
> In the current kernel, it assumes that DMA zone must have managed pages
> and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
> always true. E.g in kdump kernel of x86_64, only low 1M is presented and
> locked down at very early stage of boot, so that this low 1M won't be
> added into buddy allocator to become managed pages of DMA zone. This
> exception will always cause page allocation failure if page is requested
> from DMA zone.
> 
> ***Investigation:
> This failure happens since below commit merged into linus's tree.
>   1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
>   23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
>   f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
>   7c321eb2b843 x86/kdump: Remove the backup region handling
>   6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
> 
> Before them, on x86_64, the low 640K area will be reused by kdump kernel.
> So in kdump kernel, the content of low 640K area is copied into a backup
> region for dumping before jumping into kdump. Then except of those firmware
> reserved region in [0, 640K], the left area will be added into buddy
> allocator to become available managed pages of DMA zone.
> 
> However, after above commits applied, in kdump kernel of x86_64, the low
> 1M is reserved by memblock, but not released to buddy allocator. So any
> later page allocation requested from DMA zone will fail. 
> 
> This low 1M lock down is needed because AMD SME encrypts memory making
> the old backup region mechanims impossible when switching into kdump
> kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
> which is under development in kernel also needs lock down the low 1M.
> So we can't simply revert above commits to fix the page allocation
> failure from DMA zone as someone suggested.
> 
> ***Solution:
> Currently, only DMA atomic pool and dma-kmalloc will initialize and
> request page allocation with GFP_DMA during bootup. So only initialize
> them when DMA zone has available managed pages, otherwise just skip the
> initialization. From testing and code, this doesn't matter. In kdump
> kernel of x86_64, the page allocation failure disappear.
> 
> ***Further thinking
> On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
> into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> take care of antique ISA devices. In fact, on 64bit system, it rarely
> need ZONE_DMA (which is low 16M) to support almost extinct ISA devices. 
> However, some components treat DMA as a generic concept, e.g
> kmalloc-dma, slab allocator initializes it for later any DMA related
> buffer allocation, but not limited to ISA DMA. 
> 
> On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 
> are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> the 32-bit addressable memory. 
> 
> I am wondering if we can also change the size of DMA and DMA32 ZONE as
> dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> memory when enabled?)
> 
> Change history:
> 
> v2 post:
> https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/#u
> 
> v1 post:
> https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/#u
> 
> v2->v2 RESEND:
>  John pinged to push the repost of this patchset. So fix one typo of
>  suject of patch 3/5; Fix a building error caused by mix declaration in
>  patch 5/5. Both of them are found by John from his testing.
> 
> v1->v2:
>  Change to check if managed DMA zone exists. If DMA zone has managed
>  pages, go further to request page from DMA zone to initialize. Otherwise,
>  just skip to initialize stuffs which need pages from DMA zone.
> 
> Baoquan He (5):
>   docs: kernel-parameters: Update to reflect the current default size of
>     atomic pool
>   dma-pool: allow user to disable atomic pool
>   mm_zone: add function to check if managed dma zone exists
>   dma/pool: create dma atomic pool only if dma zone has managed pages
>   mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
> 
>  .../admin-guide/kernel-parameters.txt         |  5 ++++-
>  include/linux/mmzone.h                        | 21 +++++++++++++++++++
>  kernel/dma/pool.c                             | 11 ++++++----
>  mm/page_alloc.c                               | 11 ++++++++++
>  mm/slab_common.c                              |  9 ++++++++
>  5 files changed, 52 insertions(+), 5 deletions(-)
> 
> -- 
> 2.17.2
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-07  3:16   ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-07  3:16 UTC (permalink / raw)
  To: linux-kernel, tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec,
	rppt

Sorry, forgot adding x86 and x86/mm maintainers

On 12/07/21 at 11:07am, Baoquan He wrote:
> ***Problem observed:
> On x86_64, when crash is triggered and entering into kdump kernel, page
> allocation failure can always be seen.
> 
>  ---------------------------------
>  DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>  swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>  CPU: 0 PID: 1 Comm: swapper/0 
>  Call Trace:
>   dump_stack+0x7f/0xa1
>   warn_alloc.cold+0x72/0xd6
>   ......
>   __alloc_pages+0x24d/0x2c0
>   ......
>   dma_atomic_pool_init+0xdb/0x176
>   do_one_initcall+0x67/0x320
>   ? rcu_read_lock_sched_held+0x3f/0x80
>   kernel_init_freeable+0x290/0x2dc
>   ? rest_init+0x24f/0x24f
>   kernel_init+0xa/0x111
>   ret_from_fork+0x22/0x30
>  Mem-Info:
>  ------------------------------------
> 
> ***Root cause:
> In the current kernel, it assumes that DMA zone must have managed pages
> and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
> always true. E.g in kdump kernel of x86_64, only low 1M is presented and
> locked down at very early stage of boot, so that this low 1M won't be
> added into buddy allocator to become managed pages of DMA zone. This
> exception will always cause page allocation failure if page is requested
> from DMA zone.
> 
> ***Investigation:
> This failure happens since below commit merged into linus's tree.
>   1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
>   23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
>   f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
>   7c321eb2b843 x86/kdump: Remove the backup region handling
>   6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
> 
> Before them, on x86_64, the low 640K area will be reused by kdump kernel.
> So in kdump kernel, the content of low 640K area is copied into a backup
> region for dumping before jumping into kdump. Then except of those firmware
> reserved region in [0, 640K], the left area will be added into buddy
> allocator to become available managed pages of DMA zone.
> 
> However, after above commits applied, in kdump kernel of x86_64, the low
> 1M is reserved by memblock, but not released to buddy allocator. So any
> later page allocation requested from DMA zone will fail. 
> 
> This low 1M lock down is needed because AMD SME encrypts memory making
> the old backup region mechanims impossible when switching into kdump
> kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
> which is under development in kernel also needs lock down the low 1M.
> So we can't simply revert above commits to fix the page allocation
> failure from DMA zone as someone suggested.
> 
> ***Solution:
> Currently, only DMA atomic pool and dma-kmalloc will initialize and
> request page allocation with GFP_DMA during bootup. So only initialize
> them when DMA zone has available managed pages, otherwise just skip the
> initialization. From testing and code, this doesn't matter. In kdump
> kernel of x86_64, the page allocation failure disappear.
> 
> ***Further thinking
> On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
> into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> take care of antique ISA devices. In fact, on 64bit system, it rarely
> need ZONE_DMA (which is low 16M) to support almost extinct ISA devices. 
> However, some components treat DMA as a generic concept, e.g
> kmalloc-dma, slab allocator initializes it for later any DMA related
> buffer allocation, but not limited to ISA DMA. 
> 
> On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 
> are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> the 32-bit addressable memory. 
> 
> I am wondering if we can also change the size of DMA and DMA32 ZONE as
> dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> memory when enabled?)
> 
> Change history:
> 
> v2 post:
> https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/#u
> 
> v1 post:
> https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/#u
> 
> v2->v2 RESEND:
>  John pinged to push the repost of this patchset. So fix one typo of
>  suject of patch 3/5; Fix a building error caused by mix declaration in
>  patch 5/5. Both of them are found by John from his testing.
> 
> v1->v2:
>  Change to check if managed DMA zone exists. If DMA zone has managed
>  pages, go further to request page from DMA zone to initialize. Otherwise,
>  just skip to initialize stuffs which need pages from DMA zone.
> 
> Baoquan He (5):
>   docs: kernel-parameters: Update to reflect the current default size of
>     atomic pool
>   dma-pool: allow user to disable atomic pool
>   mm_zone: add function to check if managed dma zone exists
>   dma/pool: create dma atomic pool only if dma zone has managed pages
>   mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
> 
>  .../admin-guide/kernel-parameters.txt         |  5 ++++-
>  include/linux/mmzone.h                        | 21 +++++++++++++++++++
>  kernel/dma/pool.c                             | 11 ++++++----
>  mm/page_alloc.c                               | 11 ++++++++++
>  mm/slab_common.c                              |  9 ++++++++
>  5 files changed, 52 insertions(+), 5 deletions(-)
> 
> -- 
> 2.17.2
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool
  2021-12-07  3:07   ` Baoquan He
@ 2021-12-07  3:53     ` John Donnelly
  -1 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:53 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec

On 12/6/21 9:07 PM, Baoquan He wrote:
> Since commit 1d659236fb43("dma-pool: scale the default DMA coherent pool
> size with memory capacity"), the default size of atomic pool has been
> changed to take by scaling with system memory capacity. So update the
> document in kerenl-parameter.txt accordingly.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>


> ---
>   Documentation/admin-guide/kernel-parameters.txt | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9725c546a0d4..ec4d25e854a8 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -664,7 +664,9 @@
>   
>   	coherent_pool=nn[KMG]	[ARM,KNL]
>   			Sets the size of memory pool for coherent, atomic dma
> -			allocations, by default set to 256K.
> +			allocations. Otherwise the default size will be scaled
> +			with memory capacity, while clamped between 128K and
> +			1 << (PAGE_SHIFT + MAX_ORDER-1).
>   
>   	com20020=	[HW,NET] ARCnet - COM20020 chipset
>   			Format:
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool
@ 2021-12-07  3:53     ` John Donnelly
  0 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:53 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec

On 12/6/21 9:07 PM, Baoquan He wrote:
> Since commit 1d659236fb43("dma-pool: scale the default DMA coherent pool
> size with memory capacity"), the default size of atomic pool has been
> changed to take by scaling with system memory capacity. So update the
> document in kerenl-parameter.txt accordingly.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>


> ---
>   Documentation/admin-guide/kernel-parameters.txt | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9725c546a0d4..ec4d25e854a8 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -664,7 +664,9 @@
>   
>   	coherent_pool=nn[KMG]	[ARM,KNL]
>   			Sets the size of memory pool for coherent, atomic dma
> -			allocations, by default set to 256K.
> +			allocations. Otherwise the default size will be scaled
> +			with memory capacity, while clamped between 128K and
> +			1 << (PAGE_SHIFT + MAX_ORDER-1).
>   
>   	com20020=	[HW,NET] ARCnet - COM20020 chipset
>   			Format:
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool
  2021-12-07  3:07   ` Baoquan He
@ 2021-12-07  3:53     ` John Donnelly
  -1 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:53 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec

On 12/6/21 9:07 PM, Baoquan He wrote:
> In the current code, three atomic memory pools are always created,
> atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
> specified in kernel command line. In fact, atomic pool is only
> necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
> which are needed on few ARCHes.
> 
> So change code to allow user to disable atomic pool by specifying
> 'coherent_pool=0'.
> 
> Meanwhile, update the relevant document in kernel-parameter.txt.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>
> ---
>   Documentation/admin-guide/kernel-parameters.txt | 3 ++-
>   kernel/dma/pool.c                               | 7 +++++--
>   2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ec4d25e854a8..d7015309614b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -664,7 +664,8 @@
>   
>   	coherent_pool=nn[KMG]	[ARM,KNL]
>   			Sets the size of memory pool for coherent, atomic dma
> -			allocations. Otherwise the default size will be scaled
> +			allocations. A value of 0 disables the three atomic
> +			memory pool. Otherwise the default size will be scaled
>   			with memory capacity, while clamped between 128K and
>   			1 << (PAGE_SHIFT + MAX_ORDER-1).
>   
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5f84e6cdb78e..5a85804b5beb 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -21,7 +21,7 @@ static struct gen_pool *atomic_pool_kernel __ro_after_init;
>   static unsigned long pool_size_kernel;
>   
>   /* Size can be defined by the coherent_pool command line */
> -static size_t atomic_pool_size;
> +static unsigned long atomic_pool_size = -1;
>   
>   /* Dynamic background expansion when the atomic pool is near capacity */
>   static struct work_struct atomic_pool_work;
> @@ -188,11 +188,14 @@ static int __init dma_atomic_pool_init(void)
>   {
>   	int ret = 0;
>   
> +	if (!atomic_pool_size)
> +		return 0;
> +
>   	/*
>   	 * If coherent_pool was not used on the command line, default the pool
>   	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
>   	 */
> -	if (!atomic_pool_size) {
> +	if (atomic_pool_size == -1) {
>   		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
>   		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
>   		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool
@ 2021-12-07  3:53     ` John Donnelly
  0 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:53 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec

On 12/6/21 9:07 PM, Baoquan He wrote:
> In the current code, three atomic memory pools are always created,
> atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
> specified in kernel command line. In fact, atomic pool is only
> necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
> which are needed on few ARCHes.
> 
> So change code to allow user to disable atomic pool by specifying
> 'coherent_pool=0'.
> 
> Meanwhile, update the relevant document in kernel-parameter.txt.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>
> ---
>   Documentation/admin-guide/kernel-parameters.txt | 3 ++-
>   kernel/dma/pool.c                               | 7 +++++--
>   2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ec4d25e854a8..d7015309614b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -664,7 +664,8 @@
>   
>   	coherent_pool=nn[KMG]	[ARM,KNL]
>   			Sets the size of memory pool for coherent, atomic dma
> -			allocations. Otherwise the default size will be scaled
> +			allocations. A value of 0 disables the three atomic
> +			memory pool. Otherwise the default size will be scaled
>   			with memory capacity, while clamped between 128K and
>   			1 << (PAGE_SHIFT + MAX_ORDER-1).
>   
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5f84e6cdb78e..5a85804b5beb 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -21,7 +21,7 @@ static struct gen_pool *atomic_pool_kernel __ro_after_init;
>   static unsigned long pool_size_kernel;
>   
>   /* Size can be defined by the coherent_pool command line */
> -static size_t atomic_pool_size;
> +static unsigned long atomic_pool_size = -1;
>   
>   /* Dynamic background expansion when the atomic pool is near capacity */
>   static struct work_struct atomic_pool_work;
> @@ -188,11 +188,14 @@ static int __init dma_atomic_pool_init(void)
>   {
>   	int ret = 0;
>   
> +	if (!atomic_pool_size)
> +		return 0;
> +
>   	/*
>   	 * If coherent_pool was not used on the command line, default the pool
>   	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
>   	 */
> -	if (!atomic_pool_size) {
> +	if (atomic_pool_size == -1) {
>   		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
>   		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
>   		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
  2021-12-07  3:07   ` Baoquan He
@ 2021-12-07  3:53     ` John Donnelly
  -1 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:53 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec

On 12/6/21 9:07 PM, Baoquan He wrote:
> In some places of the current kernel, it assumes that dma zone must have
> managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
> E.g in kdump kernel of x86_64, only low 1M is presented and locked down
> at very early stage of boot, so that there's no managed pages at all in
> DMA zone. This exception will always cause page allocation failure if page
> is requested from DMA zone.
> 
> Here add function has_managed_dma() and the relevant helper functions to
> check if there's DMA zone with managed pages. It will be used in later
> patches.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>
> ---
>   include/linux/mmzone.h | 21 +++++++++++++++++++++
>   mm/page_alloc.c        | 11 +++++++++++
>   2 files changed, 32 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..82d23e13e0e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
>   }
>   #endif
>   
> +#ifdef CONFIG_ZONE_DMA
> +static inline bool zone_is_dma(struct zone *zone)
> +{
> +	return zone_idx(zone) == ZONE_DMA;
> +}
> +#else
> +static inline bool zone_is_dma(struct zone *zone)
> +{
> +	return false;
> +}
> +#endif
> +
>   /*
>    * Returns true if a zone has pages managed by the buddy allocator.
>    * All the reclaim decisions have to use this function rather than
> @@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
>   #endif
>   }
>   
> +bool has_managed_dma(void);
>   /**
>    * is_highmem - helper function to quickly check if a struct zone is a
>    *              highmem zone or not.  This is an attempt to keep references
> @@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
>   			; /* do nothing */		\
>   		else
>   
> +#define for_each_managed_zone(zone)		        \
> +	for (zone = (first_online_pgdat())->node_zones; \
> +	     zone;					\
> +	     zone = next_zone(zone))			\
> +		if (!managed_zone(zone))		\
> +			; /* do nothing */		\
> +		else
> +
>   static inline struct zone *zonelist_zone(struct zoneref *zoneref)
>   {
>   	return zoneref->zone;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..ac0ea42a4e5f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
>   	spin_unlock_irqrestore(&zone->lock, flags);
>   	return ret;
>   }
> +
> +bool has_managed_dma(void)
> +{
> +	struct zone *zone;
> +
> +	for_each_managed_zone(zone) {
> +		if (zone_is_dma(zone))
> +			return true;
> +	}
> +	return false;
> +}
>   #endif
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
@ 2021-12-07  3:53     ` John Donnelly
  0 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:53 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec

On 12/6/21 9:07 PM, Baoquan He wrote:
> In some places of the current kernel, it assumes that dma zone must have
> managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
> E.g in kdump kernel of x86_64, only low 1M is presented and locked down
> at very early stage of boot, so that there's no managed pages at all in
> DMA zone. This exception will always cause page allocation failure if page
> is requested from DMA zone.
> 
> Here add function has_managed_dma() and the relevant helper functions to
> check if there's DMA zone with managed pages. It will be used in later
> patches.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>
> ---
>   include/linux/mmzone.h | 21 +++++++++++++++++++++
>   mm/page_alloc.c        | 11 +++++++++++
>   2 files changed, 32 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..82d23e13e0e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
>   }
>   #endif
>   
> +#ifdef CONFIG_ZONE_DMA
> +static inline bool zone_is_dma(struct zone *zone)
> +{
> +	return zone_idx(zone) == ZONE_DMA;
> +}
> +#else
> +static inline bool zone_is_dma(struct zone *zone)
> +{
> +	return false;
> +}
> +#endif
> +
>   /*
>    * Returns true if a zone has pages managed by the buddy allocator.
>    * All the reclaim decisions have to use this function rather than
> @@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
>   #endif
>   }
>   
> +bool has_managed_dma(void);
>   /**
>    * is_highmem - helper function to quickly check if a struct zone is a
>    *              highmem zone or not.  This is an attempt to keep references
> @@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
>   			; /* do nothing */		\
>   		else
>   
> +#define for_each_managed_zone(zone)		        \
> +	for (zone = (first_online_pgdat())->node_zones; \
> +	     zone;					\
> +	     zone = next_zone(zone))			\
> +		if (!managed_zone(zone))		\
> +			; /* do nothing */		\
> +		else
> +
>   static inline struct zone *zonelist_zone(struct zoneref *zoneref)
>   {
>   	return zoneref->zone;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..ac0ea42a4e5f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
>   	spin_unlock_irqrestore(&zone->lock, flags);
>   	return ret;
>   }
> +
> +bool has_managed_dma(void)
> +{
> +	struct zone *zone;
> +
> +	for_each_managed_zone(zone) {
> +		if (zone_is_dma(zone))
> +			return true;
> +	}
> +	return false;
> +}
>   #endif
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages
  2021-12-07  3:07   ` Baoquan He
  (?)
@ 2021-12-07  3:54     ` John Donnelly
  -1 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:54 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, iommu

On 12/6/21 9:07 PM, Baoquan He wrote:
> Currently three dma atomic pools are initialized as long as the relevant
> kernel codes are built in. While in kdump kernel of x86_64, this is not
> right when trying to create atomic_pool_dma, because there's no managed
> pages in DMA zone. In the case, DMA zone only has low 1M memory presented
> and locked down by memblock allocator. So no pages are added into buddy
> of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
> the first 1M of RAM").
> 
> Then in kdump kernel of x86_64, it always prints below failure message:
> 
>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3+ #1
>   Call Trace:
>    dump_stack+0x7f/0xa1
>    warn_alloc.cold+0x72/0xd6
>    ? _raw_spin_unlock_irq+0x24/0x40
>    ? __alloc_pages_direct_compact+0x90/0x1b0
>    __alloc_pages_slowpath.constprop.0+0xf29/0xf50
>    ? __cond_resched+0x16/0x50
>    ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
>    __alloc_pages+0x24d/0x2c0
>    ? __dma_atomic_pool_init+0x93/0x93
>    alloc_page_interleave+0x13/0xb0
>    atomic_pool_expand+0x118/0x210
>    ? __dma_atomic_pool_init+0x93/0x93
>    __dma_atomic_pool_init+0x45/0x93
>    dma_atomic_pool_init+0xdb/0x176
>    do_one_initcall+0x67/0x320
>    ? rcu_read_lock_sched_held+0x3f/0x80
>    kernel_init_freeable+0x290/0x2dc
>    ? rest_init+0x24f/0x24f
>    kernel_init+0xa/0x111
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ......
>   DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
>   DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> 
> Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
> if yes. Otherwise just skip it.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>


> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu@lists.linux-foundation.org
> ---
>   kernel/dma/pool.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5a85804b5beb..00df3edd6c5d 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
>   						    GFP_KERNEL);
>   	if (!atomic_pool_kernel)
>   		ret = -ENOMEM;
> -	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> +	if (has_managed_dma()) {
>   		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>   						GFP_KERNEL | GFP_DMA);
>   		if (!atomic_pool_dma)
> @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
>   	if (prev == NULL) {
>   		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>   			return atomic_pool_dma32;
> -		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> +		if (atomic_pool_dma && (gfp & GFP_DMA))
>   			return atomic_pool_dma;
>   		return atomic_pool_kernel;
>   	}
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages
@ 2021-12-07  3:54     ` John Donnelly
  0 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:54 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: kexec, hch, penberg, linux-mm, iommu, rientjes, iamjoonsoo.kim,
	cl, robin.murphy, akpm, vbabka

On 12/6/21 9:07 PM, Baoquan He wrote:
> Currently three dma atomic pools are initialized as long as the relevant
> kernel codes are built in. While in kdump kernel of x86_64, this is not
> right when trying to create atomic_pool_dma, because there's no managed
> pages in DMA zone. In the case, DMA zone only has low 1M memory presented
> and locked down by memblock allocator. So no pages are added into buddy
> of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
> the first 1M of RAM").
> 
> Then in kdump kernel of x86_64, it always prints below failure message:
> 
>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3+ #1
>   Call Trace:
>    dump_stack+0x7f/0xa1
>    warn_alloc.cold+0x72/0xd6
>    ? _raw_spin_unlock_irq+0x24/0x40
>    ? __alloc_pages_direct_compact+0x90/0x1b0
>    __alloc_pages_slowpath.constprop.0+0xf29/0xf50
>    ? __cond_resched+0x16/0x50
>    ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
>    __alloc_pages+0x24d/0x2c0
>    ? __dma_atomic_pool_init+0x93/0x93
>    alloc_page_interleave+0x13/0xb0
>    atomic_pool_expand+0x118/0x210
>    ? __dma_atomic_pool_init+0x93/0x93
>    __dma_atomic_pool_init+0x45/0x93
>    dma_atomic_pool_init+0xdb/0x176
>    do_one_initcall+0x67/0x320
>    ? rcu_read_lock_sched_held+0x3f/0x80
>    kernel_init_freeable+0x290/0x2dc
>    ? rest_init+0x24f/0x24f
>    kernel_init+0xa/0x111
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ......
>   DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
>   DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> 
> Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
> if yes. Otherwise just skip it.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>


> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu@lists.linux-foundation.org
> ---
>   kernel/dma/pool.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5a85804b5beb..00df3edd6c5d 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
>   						    GFP_KERNEL);
>   	if (!atomic_pool_kernel)
>   		ret = -ENOMEM;
> -	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> +	if (has_managed_dma()) {
>   		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>   						GFP_KERNEL | GFP_DMA);
>   		if (!atomic_pool_dma)
> @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
>   	if (prev == NULL) {
>   		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>   			return atomic_pool_dma32;
> -		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> +		if (atomic_pool_dma && (gfp & GFP_DMA))
>   			return atomic_pool_dma;
>   		return atomic_pool_kernel;
>   	}
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages
@ 2021-12-07  3:54     ` John Donnelly
  0 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:54 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, iommu

On 12/6/21 9:07 PM, Baoquan He wrote:
> Currently three dma atomic pools are initialized as long as the relevant
> kernel codes are built in. While in kdump kernel of x86_64, this is not
> right when trying to create atomic_pool_dma, because there's no managed
> pages in DMA zone. In the case, DMA zone only has low 1M memory presented
> and locked down by memblock allocator. So no pages are added into buddy
> of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
> the first 1M of RAM").
> 
> Then in kdump kernel of x86_64, it always prints below failure message:
> 
>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3+ #1
>   Call Trace:
>    dump_stack+0x7f/0xa1
>    warn_alloc.cold+0x72/0xd6
>    ? _raw_spin_unlock_irq+0x24/0x40
>    ? __alloc_pages_direct_compact+0x90/0x1b0
>    __alloc_pages_slowpath.constprop.0+0xf29/0xf50
>    ? __cond_resched+0x16/0x50
>    ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
>    __alloc_pages+0x24d/0x2c0
>    ? __dma_atomic_pool_init+0x93/0x93
>    alloc_page_interleave+0x13/0xb0
>    atomic_pool_expand+0x118/0x210
>    ? __dma_atomic_pool_init+0x93/0x93
>    __dma_atomic_pool_init+0x45/0x93
>    dma_atomic_pool_init+0xdb/0x176
>    do_one_initcall+0x67/0x320
>    ? rcu_read_lock_sched_held+0x3f/0x80
>    kernel_init_freeable+0x290/0x2dc
>    ? rest_init+0x24f/0x24f
>    kernel_init+0xa/0x111
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ......
>   DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
>   DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> 
> Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
> if yes. Otherwise just skip it.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>


> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu@lists.linux-foundation.org
> ---
>   kernel/dma/pool.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5a85804b5beb..00df3edd6c5d 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
>   						    GFP_KERNEL);
>   	if (!atomic_pool_kernel)
>   		ret = -ENOMEM;
> -	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> +	if (has_managed_dma()) {
>   		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>   						GFP_KERNEL | GFP_DMA);
>   		if (!atomic_pool_dma)
> @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
>   	if (prev == NULL) {
>   		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>   			return atomic_pool_dma32;
> -		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> +		if (atomic_pool_dma && (gfp & GFP_DMA))
>   			return atomic_pool_dma;
>   		return atomic_pool_kernel;
>   	}
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
  2021-12-07  3:07   ` Baoquan He
@ 2021-12-07  3:54     ` John Donnelly
  -1 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:54 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec

On 12/6/21 9:07 PM, Baoquan He wrote:
> Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
> However, it will fail if DMA zone has no managed pages. The failure
> can be seen in kdump kernel of x86_64 as below:
> 
>   kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 36 Comm: kworker/u2:2 Not tainted 5.16.0-rc3+ #6
>   Hardware name: Dell Inc. PowerEdge R815/06JC9T, BIOS 3.2.2 09/15/2014
>   Workqueue: events_unbound async_run_entry_fn
>   Call Trace:
>    dump_stack_lvl+0x57/0x72
>    warn_alloc.cold+0x72/0xd6
>    __alloc_pages_slowpath.constprop.0+0xf56/0xf70
>    __alloc_pages+0x23b/0x2b0
>    allocate_slab+0x406/0x630
>    ___slab_alloc+0x4b1/0x7e0
>    ? sr_probe+0x200/0x600
>    ? lock_acquire+0xc4/0x2e0
>    ? fs_reclaim_acquire+0x4d/0xe0
>    ? lock_is_held_type+0xa7/0x120
>    ? sr_probe+0x200/0x600
>    ? __slab_alloc+0x67/0x90
>    __slab_alloc+0x67/0x90
>    ? sr_probe+0x200/0x600
>    ? sr_probe+0x200/0x600
>    kmem_cache_alloc_trace+0x259/0x270
>    sr_probe+0x200/0x600
>    ......
>    bus_probe_device+0x9f/0xb0
>    device_add+0x3d2/0x970
>    ......
>    __scsi_add_device+0xea/0x100
>    ata_scsi_scan_host+0x97/0x1d0
>    async_run_entry_fn+0x30/0x130
>    process_one_work+0x2b0/0x5c0
>    worker_thread+0x55/0x3c0
>    ? process_one_work+0x5c0/0x5c0
>    kthread+0x149/0x170
>    ? set_kthread_struct+0x40/0x40
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ......
> 
> The above failure happened when calling kmalloc() to allocate buffer with
> GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> pages in there.
>   sr_probe()
>   --> get_capabilities()
>       --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> 
> The DMA zone should be checked if it has managed pages, then try to create
> dma-kmalloc.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>

> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> ---
>   mm/slab_common.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index e5d080a93009..ae4ef0f8903a 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -878,6 +878,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>   {
>   	int i;
>   	enum kmalloc_cache_type type;
> +#ifdef CONFIG_ZONE_DMA
> +	bool managed_dma;
> +#endif
>   
>   	/*
>   	 * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
> @@ -905,10 +908,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>   	slab_state = UP;
>   
>   #ifdef CONFIG_ZONE_DMA
> +	managed_dma = has_managed_dma();
> +
>   	for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
>   		struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];
>   
>   		if (s) {
> +			if (!managed_dma) {
> +				kmalloc_caches[KMALLOC_DMA][i] = kmalloc_caches[KMALLOC_NORMAL][i];
> +				continue;
> +			}
>   			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
>   				kmalloc_info[i].name[KMALLOC_DMA],
>   				kmalloc_info[i].size,
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
@ 2021-12-07  3:54     ` John Donnelly
  0 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  3:54 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec

On 12/6/21 9:07 PM, Baoquan He wrote:
> Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
> However, it will fail if DMA zone has no managed pages. The failure
> can be seen in kdump kernel of x86_64 as below:
> 
>   kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 36 Comm: kworker/u2:2 Not tainted 5.16.0-rc3+ #6
>   Hardware name: Dell Inc. PowerEdge R815/06JC9T, BIOS 3.2.2 09/15/2014
>   Workqueue: events_unbound async_run_entry_fn
>   Call Trace:
>    dump_stack_lvl+0x57/0x72
>    warn_alloc.cold+0x72/0xd6
>    __alloc_pages_slowpath.constprop.0+0xf56/0xf70
>    __alloc_pages+0x23b/0x2b0
>    allocate_slab+0x406/0x630
>    ___slab_alloc+0x4b1/0x7e0
>    ? sr_probe+0x200/0x600
>    ? lock_acquire+0xc4/0x2e0
>    ? fs_reclaim_acquire+0x4d/0xe0
>    ? lock_is_held_type+0xa7/0x120
>    ? sr_probe+0x200/0x600
>    ? __slab_alloc+0x67/0x90
>    __slab_alloc+0x67/0x90
>    ? sr_probe+0x200/0x600
>    ? sr_probe+0x200/0x600
>    kmem_cache_alloc_trace+0x259/0x270
>    sr_probe+0x200/0x600
>    ......
>    bus_probe_device+0x9f/0xb0
>    device_add+0x3d2/0x970
>    ......
>    __scsi_add_device+0xea/0x100
>    ata_scsi_scan_host+0x97/0x1d0
>    async_run_entry_fn+0x30/0x130
>    process_one_work+0x2b0/0x5c0
>    worker_thread+0x55/0x3c0
>    ? process_one_work+0x5c0/0x5c0
>    kthread+0x149/0x170
>    ? set_kthread_struct+0x40/0x40
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ......
> 
> The above failure happened when calling kmalloc() to allocate buffer with
> GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> pages in there.
>   sr_probe()
>   --> get_capabilities()
>       --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> 
> The DMA zone should be checked if it has managed pages, then try to create
> dma-kmalloc.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
  Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>
  Tested-by:  John Donnelly <john.p.donnelly@oracle.com>

> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> ---
>   mm/slab_common.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index e5d080a93009..ae4ef0f8903a 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -878,6 +878,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>   {
>   	int i;
>   	enum kmalloc_cache_type type;
> +#ifdef CONFIG_ZONE_DMA
> +	bool managed_dma;
> +#endif
>   
>   	/*
>   	 * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
> @@ -905,10 +908,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>   	slab_state = UP;
>   
>   #ifdef CONFIG_ZONE_DMA
> +	managed_dma = has_managed_dma();
> +
>   	for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
>   		struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];
>   
>   		if (s) {
> +			if (!managed_dma) {
> +				kmalloc_caches[KMALLOC_DMA][i] = kmalloc_caches[KMALLOC_NORMAL][i];
> +				continue;
> +			}
>   			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
>   				kmalloc_info[i].name[KMALLOC_DMA],
>   				kmalloc_info[i].size,
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-07  3:16   ` Baoquan He
@ 2021-12-07  4:03     ` John Donnelly
  -1 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  4:03 UTC (permalink / raw)
  To: Baoquan He, linux-kernel, tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, rppt

On 12/6/21 9:16 PM, Baoquan He wrote:
> Sorry, forgot adding x86 and x86/mm maintainers

Hi,

   These commits need applied to Linux-5.15.0 (LTS) too since it has the 
original regression :

  1d659236fb43 ("dma-pool: scale the default DMA coherent pool
size with memory capacity")

Maybe add "Fixes" to the other commits ?


> 
> On 12/07/21 at 11:07am, Baoquan He wrote:
>> ***Problem observed:
>> On x86_64, when crash is triggered and entering into kdump kernel, page
>> allocation failure can always be seen.
>>
>>   ---------------------------------
>>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>   CPU: 0 PID: 1 Comm: swapper/0
>>   Call Trace:
>>    dump_stack+0x7f/0xa1
>>    warn_alloc.cold+0x72/0xd6
>>    ......
>>    __alloc_pages+0x24d/0x2c0
>>    ......
>>    dma_atomic_pool_init+0xdb/0x176
>>    do_one_initcall+0x67/0x320
>>    ? rcu_read_lock_sched_held+0x3f/0x80
>>    kernel_init_freeable+0x290/0x2dc
>>    ? rest_init+0x24f/0x24f
>>    kernel_init+0xa/0x111
>>    ret_from_fork+0x22/0x30
>>   Mem-Info:
>>   ------------------------------------
>>
>> ***Root cause:
>> In the current kernel, it assumes that DMA zone must have managed pages
>> and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
>> always true. E.g in kdump kernel of x86_64, only low 1M is presented and
>> locked down at very early stage of boot, so that this low 1M won't be
>> added into buddy allocator to become managed pages of DMA zone. This
>> exception will always cause page allocation failure if page is requested
>> from DMA zone.
>>
>> ***Investigation:
>> This failure happens since below commit merged into linus's tree.
>>    1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
>>    23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
>>    f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
>>    7c321eb2b843 x86/kdump: Remove the backup region handling
>>    6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
>>
>> Before them, on x86_64, the low 640K area will be reused by kdump kernel.
>> So in kdump kernel, the content of low 640K area is copied into a backup
>> region for dumping before jumping into kdump. Then except of those firmware
>> reserved region in [0, 640K], the left area will be added into buddy
>> allocator to become available managed pages of DMA zone.
>>
>> However, after above commits applied, in kdump kernel of x86_64, the low
>> 1M is reserved by memblock, but not released to buddy allocator. So any
>> later page allocation requested from DMA zone will fail.
>>
>> This low 1M lock down is needed because AMD SME encrypts memory making
>> the old backup region mechanims impossible when switching into kdump
>> kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
>> which is under development in kernel also needs lock down the low 1M.
>> So we can't simply revert above commits to fix the page allocation
>> failure from DMA zone as someone suggested.
>>
>> ***Solution:
>> Currently, only DMA atomic pool and dma-kmalloc will initialize and
>> request page allocation with GFP_DMA during bootup. So only initialize
>> them when DMA zone has available managed pages, otherwise just skip the
>> initialization. From testing and code, this doesn't matter. In kdump
>> kernel of x86_64, the page allocation failure disappear.
>>
>> ***Further thinking
>> On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
>> into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
>> take care of antique ISA devices. In fact, on 64bit system, it rarely
>> need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
>> However, some components treat DMA as a generic concept, e.g
>> kmalloc-dma, slab allocator initializes it for later any DMA related
>> buffer allocation, but not limited to ISA DMA.
>>
>> On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
>> are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
>> empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
>> then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
>> the 32-bit addressable memory.
>>
>> I am wondering if we can also change the size of DMA and DMA32 ZONE as
>> dynamically adjusted, just as arm64 is doing? On x86_64, we can make
>> zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
>> default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
>> low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
>> (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
>> memory when enabled?)
>>
>> Change history:
>>
>> v2 post:
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EjaERCi0$
>>
>> v1 post:
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EgRgBiPP$
>>
>> v2->v2 RESEND:
>>   John pinged to push the repost of this patchset. So fix one typo of
>>   suject of patch 3/5; Fix a building error caused by mix declaration in
>>   patch 5/5. Both of them are found by John from his testing.
>>
>> v1->v2:
>>   Change to check if managed DMA zone exists. If DMA zone has managed
>>   pages, go further to request page from DMA zone to initialize. Otherwise,
>>   just skip to initialize stuffs which need pages from DMA zone.
>>
>> Baoquan He (5):
>>    docs: kernel-parameters: Update to reflect the current default size of
>>      atomic pool
>>    dma-pool: allow user to disable atomic pool
>>    mm_zone: add function to check if managed dma zone exists
>>    dma/pool: create dma atomic pool only if dma zone has managed pages
>>    mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
>>
>>   .../admin-guide/kernel-parameters.txt         |  5 ++++-
>>   include/linux/mmzone.h                        | 21 +++++++++++++++++++
>>   kernel/dma/pool.c                             | 11 ++++++----
>>   mm/page_alloc.c                               | 11 ++++++++++
>>   mm/slab_common.c                              |  9 ++++++++
>>   5 files changed, 52 insertions(+), 5 deletions(-)
>>
>> -- 
>> 2.17.2
>>
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-07  4:03     ` John Donnelly
  0 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-07  4:03 UTC (permalink / raw)
  To: Baoquan He, linux-kernel, tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, rppt

On 12/6/21 9:16 PM, Baoquan He wrote:
> Sorry, forgot adding x86 and x86/mm maintainers

Hi,

   These commits need applied to Linux-5.15.0 (LTS) too since it has the 
original regression :

  1d659236fb43 ("dma-pool: scale the default DMA coherent pool
size with memory capacity")

Maybe add "Fixes" to the other commits ?


> 
> On 12/07/21 at 11:07am, Baoquan He wrote:
>> ***Problem observed:
>> On x86_64, when crash is triggered and entering into kdump kernel, page
>> allocation failure can always be seen.
>>
>>   ---------------------------------
>>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>   CPU: 0 PID: 1 Comm: swapper/0
>>   Call Trace:
>>    dump_stack+0x7f/0xa1
>>    warn_alloc.cold+0x72/0xd6
>>    ......
>>    __alloc_pages+0x24d/0x2c0
>>    ......
>>    dma_atomic_pool_init+0xdb/0x176
>>    do_one_initcall+0x67/0x320
>>    ? rcu_read_lock_sched_held+0x3f/0x80
>>    kernel_init_freeable+0x290/0x2dc
>>    ? rest_init+0x24f/0x24f
>>    kernel_init+0xa/0x111
>>    ret_from_fork+0x22/0x30
>>   Mem-Info:
>>   ------------------------------------
>>
>> ***Root cause:
>> In the current kernel, it assumes that DMA zone must have managed pages
>> and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
>> always true. E.g in kdump kernel of x86_64, only low 1M is presented and
>> locked down at very early stage of boot, so that this low 1M won't be
>> added into buddy allocator to become managed pages of DMA zone. This
>> exception will always cause page allocation failure if page is requested
>> from DMA zone.
>>
>> ***Investigation:
>> This failure happens since below commit merged into linus's tree.
>>    1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
>>    23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
>>    f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
>>    7c321eb2b843 x86/kdump: Remove the backup region handling
>>    6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
>>
>> Before them, on x86_64, the low 640K area will be reused by kdump kernel.
>> So in kdump kernel, the content of low 640K area is copied into a backup
>> region for dumping before jumping into kdump. Then except of those firmware
>> reserved region in [0, 640K], the left area will be added into buddy
>> allocator to become available managed pages of DMA zone.
>>
>> However, after above commits applied, in kdump kernel of x86_64, the low
>> 1M is reserved by memblock, but not released to buddy allocator. So any
>> later page allocation requested from DMA zone will fail.
>>
>> This low 1M lock down is needed because AMD SME encrypts memory making
>> the old backup region mechanims impossible when switching into kdump
>> kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
>> which is under development in kernel also needs lock down the low 1M.
>> So we can't simply revert above commits to fix the page allocation
>> failure from DMA zone as someone suggested.
>>
>> ***Solution:
>> Currently, only DMA atomic pool and dma-kmalloc will initialize and
>> request page allocation with GFP_DMA during bootup. So only initialize
>> them when DMA zone has available managed pages, otherwise just skip the
>> initialization. From testing and code, this doesn't matter. In kdump
>> kernel of x86_64, the page allocation failure disappear.
>>
>> ***Further thinking
>> On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
>> into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
>> take care of antique ISA devices. In fact, on 64bit system, it rarely
>> need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
>> However, some components treat DMA as a generic concept, e.g
>> kmalloc-dma, slab allocator initializes it for later any DMA related
>> buffer allocation, but not limited to ISA DMA.
>>
>> On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
>> are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
>> empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
>> then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
>> the 32-bit addressable memory.
>>
>> I am wondering if we can also change the size of DMA and DMA32 ZONE as
>> dynamically adjusted, just as arm64 is doing? On x86_64, we can make
>> zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
>> default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
>> low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
>> (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
>> memory when enabled?)
>>
>> Change history:
>>
>> v2 post:
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EjaERCi0$
>>
>> v1 post:
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EgRgBiPP$
>>
>> v2->v2 RESEND:
>>   John pinged to push the repost of this patchset. So fix one typo of
>>   suject of patch 3/5; Fix a building error caused by mix declaration in
>>   patch 5/5. Both of them are found by John from his testing.
>>
>> v1->v2:
>>   Change to check if managed DMA zone exists. If DMA zone has managed
>>   pages, go further to request page from DMA zone to initialize. Otherwise,
>>   just skip to initialize stuffs which need pages from DMA zone.
>>
>> Baoquan He (5):
>>    docs: kernel-parameters: Update to reflect the current default size of
>>      atomic pool
>>    dma-pool: allow user to disable atomic pool
>>    mm_zone: add function to check if managed dma zone exists
>>    dma/pool: create dma atomic pool only if dma zone has managed pages
>>    mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
>>
>>   .../admin-guide/kernel-parameters.txt         |  5 ++++-
>>   include/linux/mmzone.h                        | 21 +++++++++++++++++++
>>   kernel/dma/pool.c                             | 11 ++++++----
>>   mm/page_alloc.c                               | 11 ++++++++++
>>   mm/slab_common.c                              |  9 ++++++++
>>   5 files changed, 52 insertions(+), 5 deletions(-)
>>
>> -- 
>> 2.17.2
>>
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-07  3:07 ` Baoquan He
@ 2021-12-07  8:05   ` Christoph Lameter
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2021-12-07  8:05 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On Tue, 7 Dec 2021, Baoquan He wrote:

> into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> take care of antique ISA devices. In fact, on 64bit system, it rarely
> need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> However, some components treat DMA as a generic concept, e.g
> kmalloc-dma, slab allocator initializes it for later any DMA related
> buffer allocation, but not limited to ISA DMA.

The idea of the slab allocator DMA support is to have memory available
for devices that can only support a limited range of physical addresses.
These are only to be enabled for platforms that have such requirements.

The slab allocators guarantee that all kmalloc allocations are DMA able
indepent of specifying ZONE_DMA/ZONE_DMA32

> On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> the 32-bit addressable memory.

ZONE_NORMAL should cover all memory. ARM does not need ZONE_DMA32.

> I am wondering if we can also change the size of DMA and DMA32 ZONE as
> dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> memory when enabled?)

The size of ZONE_DMA is traditionally depending on the platform. On some
it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
only be used if ZONE_DMA has already been used.

ZONE_DMA is dynamic in the sense of being different on different
platforms.

Generally I guess it would be possible to use ZONE_DMA for generic tagging
of special memory that can be configured to have a dynamic size but that is
not what it was designed to do.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-07  8:05   ` Christoph Lameter
  0 siblings, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2021-12-07  8:05 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On Tue, 7 Dec 2021, Baoquan He wrote:

> into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> take care of antique ISA devices. In fact, on 64bit system, it rarely
> need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> However, some components treat DMA as a generic concept, e.g
> kmalloc-dma, slab allocator initializes it for later any DMA related
> buffer allocation, but not limited to ISA DMA.

The idea of the slab allocator DMA support is to have memory available
for devices that can only support a limited range of physical addresses.
These are only to be enabled for platforms that have such requirements.

The slab allocators guarantee that all kmalloc allocations are DMA able
indepent of specifying ZONE_DMA/ZONE_DMA32

> On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> the 32-bit addressable memory.

ZONE_NORMAL should cover all memory. ARM does not need ZONE_DMA32.

> I am wondering if we can also change the size of DMA and DMA32 ZONE as
> dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> memory when enabled?)

The size of ZONE_DMA is traditionally depending on the platform. On some
it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
only be used if ZONE_DMA has already been used.

ZONE_DMA is dynamic in the sense of being different on different
platforms.

Generally I guess it would be possible to use ZONE_DMA for generic tagging
of special memory that can be configured to have a dynamic size but that is
not what it was designed to do.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
  2021-12-07  3:07   ` Baoquan He
@ 2021-12-07 11:23     ` David Hildenbrand
  -1 siblings, 0 replies; 64+ messages in thread
From: David Hildenbrand @ 2021-12-07 11:23 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec

On 07.12.21 04:07, Baoquan He wrote:
> In some places of the current kernel, it assumes that dma zone must have
> managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
> E.g in kdump kernel of x86_64, only low 1M is presented and locked down
> at very early stage of boot, so that there's no managed pages at all in
> DMA zone. This exception will always cause page allocation failure if page
> is requested from DMA zone.
> 
> Here add function has_managed_dma() and the relevant helper functions to
> check if there's DMA zone with managed pages. It will be used in later
> patches.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  include/linux/mmzone.h | 21 +++++++++++++++++++++
>  mm/page_alloc.c        | 11 +++++++++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..82d23e13e0e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
>  }
>  #endif
>  
> +#ifdef CONFIG_ZONE_DMA
> +static inline bool zone_is_dma(struct zone *zone)
> +{
> +	return zone_idx(zone) == ZONE_DMA;
> +}
> +#else
> +static inline bool zone_is_dma(struct zone *zone)
> +{
> +	return false;
> +}
> +#endif
> +
>  /*
>   * Returns true if a zone has pages managed by the buddy allocator.
>   * All the reclaim decisions have to use this function rather than
> @@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
>  #endif
>  }
>  
> +bool has_managed_dma(void);
>  /**
>   * is_highmem - helper function to quickly check if a struct zone is a
>   *              highmem zone or not.  This is an attempt to keep references
> @@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
>  			; /* do nothing */		\
>  		else
>  
> +#define for_each_managed_zone(zone)		        \
> +	for (zone = (first_online_pgdat())->node_zones; \
> +	     zone;					\
> +	     zone = next_zone(zone))			\
> +		if (!managed_zone(zone))		\
> +			; /* do nothing */		\
> +		else
> +
>  static inline struct zone *zonelist_zone(struct zoneref *zoneref)
>  {
>  	return zoneref->zone;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..ac0ea42a4e5f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	return ret;
>  }
> +
> +bool has_managed_dma(void)
> +{
> +	struct zone *zone;
> +
> +	for_each_managed_zone(zone) {
> +		if (zone_is_dma(zone))
> +			return true;
> +	}
> +	return false;
> +}

Wouldn't it be "easier/faster" to just iterate online nodes and directly
obtain the ZONE_DMA, checking if there are managed pages?


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
@ 2021-12-07 11:23     ` David Hildenbrand
  0 siblings, 0 replies; 64+ messages in thread
From: David Hildenbrand @ 2021-12-07 11:23 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly, kexec

On 07.12.21 04:07, Baoquan He wrote:
> In some places of the current kernel, it assumes that dma zone must have
> managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
> E.g in kdump kernel of x86_64, only low 1M is presented and locked down
> at very early stage of boot, so that there's no managed pages at all in
> DMA zone. This exception will always cause page allocation failure if page
> is requested from DMA zone.
> 
> Here add function has_managed_dma() and the relevant helper functions to
> check if there's DMA zone with managed pages. It will be used in later
> patches.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  include/linux/mmzone.h | 21 +++++++++++++++++++++
>  mm/page_alloc.c        | 11 +++++++++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..82d23e13e0e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
>  }
>  #endif
>  
> +#ifdef CONFIG_ZONE_DMA
> +static inline bool zone_is_dma(struct zone *zone)
> +{
> +	return zone_idx(zone) == ZONE_DMA;
> +}
> +#else
> +static inline bool zone_is_dma(struct zone *zone)
> +{
> +	return false;
> +}
> +#endif
> +
>  /*
>   * Returns true if a zone has pages managed by the buddy allocator.
>   * All the reclaim decisions have to use this function rather than
> @@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
>  #endif
>  }
>  
> +bool has_managed_dma(void);
>  /**
>   * is_highmem - helper function to quickly check if a struct zone is a
>   *              highmem zone or not.  This is an attempt to keep references
> @@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
>  			; /* do nothing */		\
>  		else
>  
> +#define for_each_managed_zone(zone)		        \
> +	for (zone = (first_online_pgdat())->node_zones; \
> +	     zone;					\
> +	     zone = next_zone(zone))			\
> +		if (!managed_zone(zone))		\
> +			; /* do nothing */		\
> +		else
> +
>  static inline struct zone *zonelist_zone(struct zoneref *zoneref)
>  {
>  	return zoneref->zone;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..ac0ea42a4e5f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	return ret;
>  }
> +
> +bool has_managed_dma(void)
> +{
> +	struct zone *zone;
> +
> +	for_each_managed_zone(zone) {
> +		if (zone_is_dma(zone))
> +			return true;
> +	}
> +	return false;
> +}

Wouldn't it be "easier/faster" to just iterate online nodes and directly
obtain the ZONE_DMA, checking if there are managed pages?


-- 
Thanks,

David / dhildenb


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-07  4:03     ` John Donnelly
@ 2021-12-08  4:33       ` Andrew Morton
  -1 siblings, 0 replies; 64+ messages in thread
From: Andrew Morton @ 2021-12-08  4:33 UTC (permalink / raw)
  To: John Donnelly
  Cc: Baoquan He, linux-kernel, tglx, mingo, bp, dave.hansen, luto,
	peterz, linux-mm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, rppt

On Mon, 6 Dec 2021 22:03:59 -0600 John Donnelly <John.p.donnelly@oracle.com> wrote:

> On 12/6/21 9:16 PM, Baoquan He wrote:
> > Sorry, forgot adding x86 and x86/mm maintainers
> 
> Hi,
> 
>    These commits need applied to Linux-5.15.0 (LTS) too since it has the 
> original regression :
> 
>   1d659236fb43 ("dma-pool: scale the default DMA coherent pool
> size with memory capacity")
> 
> Maybe add "Fixes" to the other commits ?

And cc:stable, please.  "Fixes:" doesn't always mean "should be
backported".

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-08  4:33       ` Andrew Morton
  0 siblings, 0 replies; 64+ messages in thread
From: Andrew Morton @ 2021-12-08  4:33 UTC (permalink / raw)
  To: John Donnelly
  Cc: Baoquan He, linux-kernel, tglx, mingo, bp, dave.hansen, luto,
	peterz, linux-mm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, rppt

On Mon, 6 Dec 2021 22:03:59 -0600 John Donnelly <John.p.donnelly@oracle.com> wrote:

> On 12/6/21 9:16 PM, Baoquan He wrote:
> > Sorry, forgot adding x86 and x86/mm maintainers
> 
> Hi,
> 
>    These commits need applied to Linux-5.15.0 (LTS) too since it has the 
> original regression :
> 
>   1d659236fb43 ("dma-pool: scale the default DMA coherent pool
> size with memory capacity")
> 
> Maybe add "Fixes" to the other commits ?

And cc:stable, please.  "Fixes:" doesn't always mean "should be
backported".

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-08  4:33       ` Andrew Morton
@ 2021-12-08  4:56         ` John Donnelly
  -1 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-08  4:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Baoquan He, linux-kernel, tglx, mingo, bp, dave.hansen, luto,
	peterz, linux-mm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, rppt

On 12/7/21 22:33, Andrew Morton wrote:
> On Mon, 6 Dec 2021 22:03:59 -0600 John Donnelly <John.p.donnelly@oracle.com> wrote:
> 
>> On 12/6/21 9:16 PM, Baoquan He wrote:
>>> Sorry, forgot adding x86 and x86/mm maintainers
>>
>> Hi,
>>
>>     These commits need applied to Linux-5.15.0 (LTS) too since it has the
>> original regression :
>>
>>    1d659236fb43 ("dma-pool: scale the default DMA coherent pool
>> size with memory capacity")
>>
>> Maybe add "Fixes" to the other commits ?
> 
> And cc:stable, please.  "Fixes:" doesn't always mean "should be
> backported
>

Hi.


Does this mean we need a v3 version ?



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-08  4:56         ` John Donnelly
  0 siblings, 0 replies; 64+ messages in thread
From: John Donnelly @ 2021-12-08  4:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Baoquan He, linux-kernel, tglx, mingo, bp, dave.hansen, luto,
	peterz, linux-mm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, rppt

On 12/7/21 22:33, Andrew Morton wrote:
> On Mon, 6 Dec 2021 22:03:59 -0600 John Donnelly <John.p.donnelly@oracle.com> wrote:
> 
>> On 12/6/21 9:16 PM, Baoquan He wrote:
>>> Sorry, forgot adding x86 and x86/mm maintainers
>>
>> Hi,
>>
>>     These commits need applied to Linux-5.15.0 (LTS) too since it has the
>> original regression :
>>
>>    1d659236fb43 ("dma-pool: scale the default DMA coherent pool
>> size with memory capacity")
>>
>> Maybe add "Fixes" to the other commits ?
> 
> And cc:stable, please.  "Fixes:" doesn't always mean "should be
> backported
>

Hi.


Does this mean we need a v3 version ?



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-07  8:05   ` Christoph Lameter
@ 2021-12-09  8:05     ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-09  8:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/07/21 at 09:05am, Christoph Lameter wrote:
> On Tue, 7 Dec 2021, Baoquan He wrote:
> 
> > into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> > take care of antique ISA devices. In fact, on 64bit system, it rarely
> > need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> > However, some components treat DMA as a generic concept, e.g
> > kmalloc-dma, slab allocator initializes it for later any DMA related
> > buffer allocation, but not limited to ISA DMA.

Thanks a lot for your reviewing and sharing.
> 
> The idea of the slab allocator DMA support is to have memory available
> for devices that can only support a limited range of physical addresses.
> These are only to be enabled for platforms that have such requirements.
> 
> The slab allocators guarantee that all kmalloc allocations are DMA able
> indepent of specifying ZONE_DMA/ZONE_DMA32

Here you mean we guarantee dma-kmalloc will be DMA able independent of
specifying ZONE_DMA/DMA32, or the whole sla/ub allocator? 

Sorry for late reply because I suddenly realized one test case is
missed. In my earlier test on this patchset, I only set crashkernel=256M
in cmdline, then it will reserve 256M memory under 4G. Then in kdump
kernel, all memory belongs to zone DMA32. So requiring dma buffer with
GFP_DMA will finally get memory from zone DMA32 since zone NORMAL
doesn't exist.

I tried crashkernel=256M,high yesterday, it will reserve 256M above 4G,
and another 256M under 4G. Then, the zone NORMAL will have memory above
4G. With this patchset applied, dma-kmalloc will take page from Normal zone,
get pages above 4G. What disappointed me is this patchset works too.

So the confusion to me is in ata_scsi device driver, it require dma buffer
with GFP_DMA, we feed it with memory above 4G, it can succeed too. I
added amd_iommu=off to cmdline to disable IOMMU. Furthermore, if on
risc and ia64, they only have zone DMA32, no zone DMA, and ata_scsi
device is deployed, it require dma buffer with GFP_DMA, but get memory
above 4G, isn't this wrong?

With my understanding, isn't the reasonable sequence zone DMA firstly if
GFP_DMA, then zone DMA32, finaly zone NORMAL. At least, on x86_64, I
believe device driver developer prefer to see this because most of time,
zone DMA and zone DMA32 are both used for dma buffer allocation, if
IOMMU is not enabled. However, memory got from zone NORMAL when required
with GFP_DMA, and it succeeds, does it mean that the developer doesn't
take the GFP_DMA flag seriously, just try to get buffer for allocation?

  --> sr_probe()
      -->get_capabilities()
         --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
         --> scsi_mode_sense()
             --> scsi_execute_req()
                 --> blk_rq_map_kern()
                     --> bio_copy_kern()
                         or
                     --> bio_map_kern()

> 
> > On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> > are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> > empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> > then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> > the 32-bit addressable memory.
> 
> ZONE_NORMAL should cover all memory. ARM does not need ZONE_DMA32.

I grep-ed all ARCHes which provide ZONE_DMA or| and ZONE_DMA32, and
summarize them at below. From these, for ARCH-es which has DMA32, only
x86_64 and mips (which is not on platform SGI_IP22 or SGI_IP28) have
ZONE_DMA of 16M. Obviously the ZONE_DMA is created because they carry
the legacy burden of the old ISA support. Arm64 will have ZONE_DMA to
cover the low 4G by default if ACPI/DT doesn't report a shorter limit of
dma capability. While both riscv and ia64 bypass ZONE_DMA, only use
ZONE_DMA32 to cover low 4G. As for s390 and ppc64, they both takes low
2G into ZONE_DMA, and no ZONE_DMA32 provided.

=============================
ARCH which has DMA32
        ZONE_DMA       ZONE_DMA32
arm64   0~X            X~4G  (X is got from ACPI or DT. Otherwise it's 4G by default, DMA32 is empty)
ia64    None           0~4G
mips    0 or 0~16M     X~4G  (zone DMA is empty on SGI_IP22 or SGI_IP28, otherwise 16M by default like i386)
riscv   None           0~4G
x86_64  16M            16M~4G


=============================
ARCH which has no DMA32
        ZONE_DMA
alpha   0~16M or empty if IOMMU enabled
arm     0~X (X is reported by fdt, 4G by default)
m68k    0~total memory
microblaze 0~total low memory
powerpc 0~2G
s390    0~2G
sparc   0~ total low memory
i386    0~16M



> 
> > I am wondering if we can also change the size of DMA and DMA32 ZONE as
> > dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> > zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> > default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> > low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> > (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> > memory when enabled?)
> 
> The size of ZONE_DMA is traditionally depending on the platform. On some
> it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> only be used if ZONE_DMA has already been used.

As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
cover low 4G with ZONE_DMA32 alone.

> 
> ZONE_DMA is dynamic in the sense of being different on different
> platforms.
> 
> Generally I guess it would be possible to use ZONE_DMA for generic tagging
> of special memory that can be configured to have a dynamic size but that is
> not what it was designed to do.
> 
Thanks again for these precious sharing. I am still a little confused with
the current ZONE_DMA and it's usage, e.g in slab. May need to continue
explore.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-09  8:05     ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-09  8:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/07/21 at 09:05am, Christoph Lameter wrote:
> On Tue, 7 Dec 2021, Baoquan He wrote:
> 
> > into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> > take care of antique ISA devices. In fact, on 64bit system, it rarely
> > need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> > However, some components treat DMA as a generic concept, e.g
> > kmalloc-dma, slab allocator initializes it for later any DMA related
> > buffer allocation, but not limited to ISA DMA.

Thanks a lot for your reviewing and sharing.
> 
> The idea of the slab allocator DMA support is to have memory available
> for devices that can only support a limited range of physical addresses.
> These are only to be enabled for platforms that have such requirements.
> 
> The slab allocators guarantee that all kmalloc allocations are DMA able
> indepent of specifying ZONE_DMA/ZONE_DMA32

Here you mean we guarantee dma-kmalloc will be DMA able independent of
specifying ZONE_DMA/DMA32, or the whole sla/ub allocator? 

Sorry for late reply because I suddenly realized one test case is
missed. In my earlier test on this patchset, I only set crashkernel=256M
in cmdline, then it will reserve 256M memory under 4G. Then in kdump
kernel, all memory belongs to zone DMA32. So requiring dma buffer with
GFP_DMA will finally get memory from zone DMA32 since zone NORMAL
doesn't exist.

I tried crashkernel=256M,high yesterday, it will reserve 256M above 4G,
and another 256M under 4G. Then, the zone NORMAL will have memory above
4G. With this patchset applied, dma-kmalloc will take page from Normal zone,
get pages above 4G. What disappointed me is this patchset works too.

So the confusion to me is in ata_scsi device driver, it require dma buffer
with GFP_DMA, we feed it with memory above 4G, it can succeed too. I
added amd_iommu=off to cmdline to disable IOMMU. Furthermore, if on
risc and ia64, they only have zone DMA32, no zone DMA, and ata_scsi
device is deployed, it require dma buffer with GFP_DMA, but get memory
above 4G, isn't this wrong?

With my understanding, isn't the reasonable sequence zone DMA firstly if
GFP_DMA, then zone DMA32, finaly zone NORMAL. At least, on x86_64, I
believe device driver developer prefer to see this because most of time,
zone DMA and zone DMA32 are both used for dma buffer allocation, if
IOMMU is not enabled. However, memory got from zone NORMAL when required
with GFP_DMA, and it succeeds, does it mean that the developer doesn't
take the GFP_DMA flag seriously, just try to get buffer for allocation?

  --> sr_probe()
      -->get_capabilities()
         --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
         --> scsi_mode_sense()
             --> scsi_execute_req()
                 --> blk_rq_map_kern()
                     --> bio_copy_kern()
                         or
                     --> bio_map_kern()

> 
> > On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> > are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> > empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> > then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> > the 32-bit addressable memory.
> 
> ZONE_NORMAL should cover all memory. ARM does not need ZONE_DMA32.

I grep-ed all ARCHes which provide ZONE_DMA or| and ZONE_DMA32, and
summarize them at below. From these, for ARCH-es which has DMA32, only
x86_64 and mips (which is not on platform SGI_IP22 or SGI_IP28) have
ZONE_DMA of 16M. Obviously the ZONE_DMA is created because they carry
the legacy burden of the old ISA support. Arm64 will have ZONE_DMA to
cover the low 4G by default if ACPI/DT doesn't report a shorter limit of
dma capability. While both riscv and ia64 bypass ZONE_DMA, only use
ZONE_DMA32 to cover low 4G. As for s390 and ppc64, they both takes low
2G into ZONE_DMA, and no ZONE_DMA32 provided.

=============================
ARCH which has DMA32
        ZONE_DMA       ZONE_DMA32
arm64   0~X            X~4G  (X is got from ACPI or DT. Otherwise it's 4G by default, DMA32 is empty)
ia64    None           0~4G
mips    0 or 0~16M     X~4G  (zone DMA is empty on SGI_IP22 or SGI_IP28, otherwise 16M by default like i386)
riscv   None           0~4G
x86_64  16M            16M~4G


=============================
ARCH which has no DMA32
        ZONE_DMA
alpha   0~16M or empty if IOMMU enabled
arm     0~X (X is reported by fdt, 4G by default)
m68k    0~total memory
microblaze 0~total low memory
powerpc 0~2G
s390    0~2G
sparc   0~ total low memory
i386    0~16M



> 
> > I am wondering if we can also change the size of DMA and DMA32 ZONE as
> > dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> > zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> > default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> > low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> > (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> > memory when enabled?)
> 
> The size of ZONE_DMA is traditionally depending on the platform. On some
> it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> only be used if ZONE_DMA has already been used.

As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
cover low 4G with ZONE_DMA32 alone.

> 
> ZONE_DMA is dynamic in the sense of being different on different
> platforms.
> 
> Generally I guess it would be possible to use ZONE_DMA for generic tagging
> of special memory that can be configured to have a dynamic size but that is
> not what it was designed to do.
> 
Thanks again for these precious sharing. I am still a little confused with
the current ZONE_DMA and it's usage, e.g in slab. May need to continue
explore.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-09  8:05     ` Baoquan He
@ 2021-12-09 12:59       ` Christoph Lameter
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2021-12-09 12:59 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On Thu, 9 Dec 2021, Baoquan He wrote:

> > The slab allocators guarantee that all kmalloc allocations are DMA able
> > indepent of specifying ZONE_DMA/ZONE_DMA32
>
> Here you mean we guarantee dma-kmalloc will be DMA able independent of
> specifying ZONE_DMA/DMA32, or the whole sla/ub allocator?

All memory obtained via kmalloc --independent of "dma-alloc", ZONE_DMA
etc-- must be dmaable.

> With my understanding, isn't the reasonable sequence zone DMA firstly if
> GFP_DMA, then zone DMA32, finaly zone NORMAL. At least, on x86_64, I
> believe device driver developer prefer to see this because most of time,
> zone DMA and zone DMA32 are both used for dma buffer allocation, if
> IOMMU is not enabled. However, memory got from zone NORMAL when required
> with GFP_DMA, and it succeeds, does it mean that the developer doesn't
> take the GFP_DMA flag seriously, just try to get buffer for allocation?

ZONE_NORMAL is also used for DMA allocations. ZONE_DMA and ZONE_DMA32 are
only used if the physical range of memory supported by a device does not
include all of normal memory.

> > The size of ZONE_DMA is traditionally depending on the platform. On some
> > it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> > only be used if ZONE_DMA has already been used.
>
> As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> cover low 4G with ZONE_DMA32 alone.

If you do not have devices that are crap and cannot address the full
memory then you dont need these special zones.

Sorry this subject has caused confusion multiple times over the years and
there are still arches that are not implementing this in a consistent way.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-09 12:59       ` Christoph Lameter
  0 siblings, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2021-12-09 12:59 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On Thu, 9 Dec 2021, Baoquan He wrote:

> > The slab allocators guarantee that all kmalloc allocations are DMA able
> > indepent of specifying ZONE_DMA/ZONE_DMA32
>
> Here you mean we guarantee dma-kmalloc will be DMA able independent of
> specifying ZONE_DMA/DMA32, or the whole sla/ub allocator?

All memory obtained via kmalloc --independent of "dma-alloc", ZONE_DMA
etc-- must be dmaable.

> With my understanding, isn't the reasonable sequence zone DMA firstly if
> GFP_DMA, then zone DMA32, finaly zone NORMAL. At least, on x86_64, I
> believe device driver developer prefer to see this because most of time,
> zone DMA and zone DMA32 are both used for dma buffer allocation, if
> IOMMU is not enabled. However, memory got from zone NORMAL when required
> with GFP_DMA, and it succeeds, does it mean that the developer doesn't
> take the GFP_DMA flag seriously, just try to get buffer for allocation?

ZONE_NORMAL is also used for DMA allocations. ZONE_DMA and ZONE_DMA32 are
only used if the physical range of memory supported by a device does not
include all of normal memory.

> > The size of ZONE_DMA is traditionally depending on the platform. On some
> > it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> > only be used if ZONE_DMA has already been used.
>
> As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> cover low 4G with ZONE_DMA32 alone.

If you do not have devices that are crap and cannot address the full
memory then you dont need these special zones.

Sorry this subject has caused confusion multiple times over the years and
there are still arches that are not implementing this in a consistent way.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
  2021-12-07 11:23     ` David Hildenbrand
@ 2021-12-09 13:02       ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-09 13:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/07/21 at 12:23pm, David Hildenbrand wrote:
> On 07.12.21 04:07, Baoquan He wrote:
> > In some places of the current kernel, it assumes that dma zone must have
> > managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
> > E.g in kdump kernel of x86_64, only low 1M is presented and locked down
> > at very early stage of boot, so that there's no managed pages at all in
> > DMA zone. This exception will always cause page allocation failure if page
> > is requested from DMA zone.
> > 
> > Here add function has_managed_dma() and the relevant helper functions to
> > check if there's DMA zone with managed pages. It will be used in later
> > patches.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  include/linux/mmzone.h | 21 +++++++++++++++++++++
> >  mm/page_alloc.c        | 11 +++++++++++
> >  2 files changed, 32 insertions(+)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 58e744b78c2c..82d23e13e0e5 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
> >  }
> >  #endif
> >  
> > +#ifdef CONFIG_ZONE_DMA
> > +static inline bool zone_is_dma(struct zone *zone)
> > +{
> > +	return zone_idx(zone) == ZONE_DMA;
> > +}
> > +#else
> > +static inline bool zone_is_dma(struct zone *zone)
> > +{
> > +	return false;
> > +}
> > +#endif
> > +
> >  /*
> >   * Returns true if a zone has pages managed by the buddy allocator.
> >   * All the reclaim decisions have to use this function rather than
> > @@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
> >  #endif
> >  }
> >  
> > +bool has_managed_dma(void);
> >  /**
> >   * is_highmem - helper function to quickly check if a struct zone is a
> >   *              highmem zone or not.  This is an attempt to keep references
> > @@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
> >  			; /* do nothing */		\
> >  		else
> >  
> > +#define for_each_managed_zone(zone)		        \
> > +	for (zone = (first_online_pgdat())->node_zones; \
> > +	     zone;					\
> > +	     zone = next_zone(zone))			\
> > +		if (!managed_zone(zone))		\
> > +			; /* do nothing */		\
> > +		else
> > +
> >  static inline struct zone *zonelist_zone(struct zoneref *zoneref)
> >  {
> >  	return zoneref->zone;
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index c5952749ad40..ac0ea42a4e5f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
> >  	spin_unlock_irqrestore(&zone->lock, flags);
> >  	return ret;
> >  }
> > +
> > +bool has_managed_dma(void)
> > +{
> > +	struct zone *zone;
> > +
> > +	for_each_managed_zone(zone) {
> > +		if (zone_is_dma(zone))
> > +			return true;
> > +	}
> > +	return false;
> > +}
> 
> Wouldn't it be "easier/faster" to just iterate online nodes and directly
> obtain the ZONE_DMA, checking if there are managed pages?

Thanks, Dave.

Please check for_each_managed_zone(), it is iterating online nodes and
it's each managed zone. 

Is below what you are suggesting? The only difference is I introduced
for_each_managed_zone() which can be reused later if needed. Not sure if
I got your suggestion correctly.

bool has_managed_dma(void)
{
        struct pglist_data *pgdat;
        struct zone *zone;
        enum zone_type i, j;

        for_each_online_pgdat(pgdat) {
                for (i = 0; i < MAX_NR_ZONES - 1; i++) {          
                        struct zone *zone = &pgdat->node_zones[i];
                        if (zone_is_dma(zone))                                                                                                    
                                return true;
                }
        }
        return false;

}


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
@ 2021-12-09 13:02       ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-09 13:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/07/21 at 12:23pm, David Hildenbrand wrote:
> On 07.12.21 04:07, Baoquan He wrote:
> > In some places of the current kernel, it assumes that dma zone must have
> > managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
> > E.g in kdump kernel of x86_64, only low 1M is presented and locked down
> > at very early stage of boot, so that there's no managed pages at all in
> > DMA zone. This exception will always cause page allocation failure if page
> > is requested from DMA zone.
> > 
> > Here add function has_managed_dma() and the relevant helper functions to
> > check if there's DMA zone with managed pages. It will be used in later
> > patches.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  include/linux/mmzone.h | 21 +++++++++++++++++++++
> >  mm/page_alloc.c        | 11 +++++++++++
> >  2 files changed, 32 insertions(+)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 58e744b78c2c..82d23e13e0e5 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
> >  }
> >  #endif
> >  
> > +#ifdef CONFIG_ZONE_DMA
> > +static inline bool zone_is_dma(struct zone *zone)
> > +{
> > +	return zone_idx(zone) == ZONE_DMA;
> > +}
> > +#else
> > +static inline bool zone_is_dma(struct zone *zone)
> > +{
> > +	return false;
> > +}
> > +#endif
> > +
> >  /*
> >   * Returns true if a zone has pages managed by the buddy allocator.
> >   * All the reclaim decisions have to use this function rather than
> > @@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
> >  #endif
> >  }
> >  
> > +bool has_managed_dma(void);
> >  /**
> >   * is_highmem - helper function to quickly check if a struct zone is a
> >   *              highmem zone or not.  This is an attempt to keep references
> > @@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
> >  			; /* do nothing */		\
> >  		else
> >  
> > +#define for_each_managed_zone(zone)		        \
> > +	for (zone = (first_online_pgdat())->node_zones; \
> > +	     zone;					\
> > +	     zone = next_zone(zone))			\
> > +		if (!managed_zone(zone))		\
> > +			; /* do nothing */		\
> > +		else
> > +
> >  static inline struct zone *zonelist_zone(struct zoneref *zoneref)
> >  {
> >  	return zoneref->zone;
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index c5952749ad40..ac0ea42a4e5f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
> >  	spin_unlock_irqrestore(&zone->lock, flags);
> >  	return ret;
> >  }
> > +
> > +bool has_managed_dma(void)
> > +{
> > +	struct zone *zone;
> > +
> > +	for_each_managed_zone(zone) {
> > +		if (zone_is_dma(zone))
> > +			return true;
> > +	}
> > +	return false;
> > +}
> 
> Wouldn't it be "easier/faster" to just iterate online nodes and directly
> obtain the ZONE_DMA, checking if there are managed pages?

Thanks, Dave.

Please check for_each_managed_zone(), it is iterating online nodes and
it's each managed zone. 

Is below what you are suggesting? The only difference is I introduced
for_each_managed_zone() which can be reused later if needed. Not sure if
I got your suggestion correctly.

bool has_managed_dma(void)
{
        struct pglist_data *pgdat;
        struct zone *zone;
        enum zone_type i, j;

        for_each_online_pgdat(pgdat) {
                for (i = 0; i < MAX_NR_ZONES - 1; i++) {          
                        struct zone *zone = &pgdat->node_zones[i];
                        if (zone_is_dma(zone))                                                                                                    
                                return true;
                }
        }
        return false;

}


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
  2021-12-09 13:02       ` Baoquan He
@ 2021-12-09 13:10         ` David Hildenbrand
  -1 siblings, 0 replies; 64+ messages in thread
From: David Hildenbrand @ 2021-12-09 13:10 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 09.12.21 14:02, Baoquan He wrote:
> On 12/07/21 at 12:23pm, David Hildenbrand wrote:
>> On 07.12.21 04:07, Baoquan He wrote:
>>> In some places of the current kernel, it assumes that dma zone must have
>>> managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
>>> E.g in kdump kernel of x86_64, only low 1M is presented and locked down
>>> at very early stage of boot, so that there's no managed pages at all in
>>> DMA zone. This exception will always cause page allocation failure if page
>>> is requested from DMA zone.
>>>
>>> Here add function has_managed_dma() and the relevant helper functions to
>>> check if there's DMA zone with managed pages. It will be used in later
>>> patches.
>>>
>>> Signed-off-by: Baoquan He <bhe@redhat.com>
>>> ---
>>>  include/linux/mmzone.h | 21 +++++++++++++++++++++
>>>  mm/page_alloc.c        | 11 +++++++++++
>>>  2 files changed, 32 insertions(+)
>>>
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 58e744b78c2c..82d23e13e0e5 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
>>>  }
>>>  #endif
>>>  
>>> +#ifdef CONFIG_ZONE_DMA
>>> +static inline bool zone_is_dma(struct zone *zone)
>>> +{
>>> +	return zone_idx(zone) == ZONE_DMA;
>>> +}
>>> +#else
>>> +static inline bool zone_is_dma(struct zone *zone)
>>> +{
>>> +	return false;
>>> +}
>>> +#endif
>>> +
>>>  /*
>>>   * Returns true if a zone has pages managed by the buddy allocator.
>>>   * All the reclaim decisions have to use this function rather than
>>> @@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
>>>  #endif
>>>  }
>>>  
>>> +bool has_managed_dma(void);
>>>  /**
>>>   * is_highmem - helper function to quickly check if a struct zone is a
>>>   *              highmem zone or not.  This is an attempt to keep references
>>> @@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
>>>  			; /* do nothing */		\
>>>  		else
>>>  
>>> +#define for_each_managed_zone(zone)		        \
>>> +	for (zone = (first_online_pgdat())->node_zones; \
>>> +	     zone;					\
>>> +	     zone = next_zone(zone))			\
>>> +		if (!managed_zone(zone))		\
>>> +			; /* do nothing */		\
>>> +		else
>>> +
>>>  static inline struct zone *zonelist_zone(struct zoneref *zoneref)
>>>  {
>>>  	return zoneref->zone;
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index c5952749ad40..ac0ea42a4e5f 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
>>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>>  	return ret;
>>>  }
>>> +
>>> +bool has_managed_dma(void)
>>> +{
>>> +	struct zone *zone;
>>> +
>>> +	for_each_managed_zone(zone) {
>>> +		if (zone_is_dma(zone))
>>> +			return true;
>>> +	}
>>> +	return false;
>>> +}
>>
>> Wouldn't it be "easier/faster" to just iterate online nodes and directly
>> obtain the ZONE_DMA, checking if there are managed pages?
> 
> Thanks, Dave.
> 
> Please check for_each_managed_zone(), it is iterating online nodes and
> it's each managed zone. 
> 
> Is below what you are suggesting? The only difference is I introduced
> for_each_managed_zone() which can be reused later if needed. Not sure if
> I got your suggestion correctly.
> 
> bool has_managed_dma(void)
> {
>         struct pglist_data *pgdat;
>         struct zone *zone;
>         enum zone_type i, j;
> 
>         for_each_online_pgdat(pgdat) {
>                 for (i = 0; i < MAX_NR_ZONES - 1; i++) {          
>                         struct zone *zone = &pgdat->node_zones[i];
>                         if (zone_is_dma(zone))                                                                                                    
>                                 return true;
>                 }
>         }
>         return false;
> 
> }


Even simpler, no need to iterate over zones at all, only over nodes:

#ifdef CONFIG_ZONE_DMA
bool has_managed_dma(void)
{
	struct pglist_data *pgdat;

	for_each_online_pgdat(pgdat) {
		struct zone *zone = &pgdat->node_zones[ZONE_DMA];

		if (managed_zone(zone)
			return true;
	}
	return false;
}
#endif /* CONFIG_ZONE_DMA */

Without CONFIG_ZONE_DMA, simply provide a dummy in the header that
returns false.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
@ 2021-12-09 13:10         ` David Hildenbrand
  0 siblings, 0 replies; 64+ messages in thread
From: David Hildenbrand @ 2021-12-09 13:10 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 09.12.21 14:02, Baoquan He wrote:
> On 12/07/21 at 12:23pm, David Hildenbrand wrote:
>> On 07.12.21 04:07, Baoquan He wrote:
>>> In some places of the current kernel, it assumes that dma zone must have
>>> managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
>>> E.g in kdump kernel of x86_64, only low 1M is presented and locked down
>>> at very early stage of boot, so that there's no managed pages at all in
>>> DMA zone. This exception will always cause page allocation failure if page
>>> is requested from DMA zone.
>>>
>>> Here add function has_managed_dma() and the relevant helper functions to
>>> check if there's DMA zone with managed pages. It will be used in later
>>> patches.
>>>
>>> Signed-off-by: Baoquan He <bhe@redhat.com>
>>> ---
>>>  include/linux/mmzone.h | 21 +++++++++++++++++++++
>>>  mm/page_alloc.c        | 11 +++++++++++
>>>  2 files changed, 32 insertions(+)
>>>
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 58e744b78c2c..82d23e13e0e5 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -998,6 +998,18 @@ static inline bool zone_is_zone_device(struct zone *zone)
>>>  }
>>>  #endif
>>>  
>>> +#ifdef CONFIG_ZONE_DMA
>>> +static inline bool zone_is_dma(struct zone *zone)
>>> +{
>>> +	return zone_idx(zone) == ZONE_DMA;
>>> +}
>>> +#else
>>> +static inline bool zone_is_dma(struct zone *zone)
>>> +{
>>> +	return false;
>>> +}
>>> +#endif
>>> +
>>>  /*
>>>   * Returns true if a zone has pages managed by the buddy allocator.
>>>   * All the reclaim decisions have to use this function rather than
>>> @@ -1046,6 +1058,7 @@ static inline int is_highmem_idx(enum zone_type idx)
>>>  #endif
>>>  }
>>>  
>>> +bool has_managed_dma(void);
>>>  /**
>>>   * is_highmem - helper function to quickly check if a struct zone is a
>>>   *              highmem zone or not.  This is an attempt to keep references
>>> @@ -1131,6 +1144,14 @@ extern struct zone *next_zone(struct zone *zone);
>>>  			; /* do nothing */		\
>>>  		else
>>>  
>>> +#define for_each_managed_zone(zone)		        \
>>> +	for (zone = (first_online_pgdat())->node_zones; \
>>> +	     zone;					\
>>> +	     zone = next_zone(zone))			\
>>> +		if (!managed_zone(zone))		\
>>> +			; /* do nothing */		\
>>> +		else
>>> +
>>>  static inline struct zone *zonelist_zone(struct zoneref *zoneref)
>>>  {
>>>  	return zoneref->zone;
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index c5952749ad40..ac0ea42a4e5f 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
>>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>>  	return ret;
>>>  }
>>> +
>>> +bool has_managed_dma(void)
>>> +{
>>> +	struct zone *zone;
>>> +
>>> +	for_each_managed_zone(zone) {
>>> +		if (zone_is_dma(zone))
>>> +			return true;
>>> +	}
>>> +	return false;
>>> +}
>>
>> Wouldn't it be "easier/faster" to just iterate online nodes and directly
>> obtain the ZONE_DMA, checking if there are managed pages?
> 
> Thanks, Dave.
> 
> Please check for_each_managed_zone(), it is iterating online nodes and
> it's each managed zone. 
> 
> Is below what you are suggesting? The only difference is I introduced
> for_each_managed_zone() which can be reused later if needed. Not sure if
> I got your suggestion correctly.
> 
> bool has_managed_dma(void)
> {
>         struct pglist_data *pgdat;
>         struct zone *zone;
>         enum zone_type i, j;
> 
>         for_each_online_pgdat(pgdat) {
>                 for (i = 0; i < MAX_NR_ZONES - 1; i++) {          
>                         struct zone *zone = &pgdat->node_zones[i];
>                         if (zone_is_dma(zone))                                                                                                    
>                                 return true;
>                 }
>         }
>         return false;
> 
> }


Even simpler, no need to iterate over zones at all, only over nodes:

#ifdef CONFIG_ZONE_DMA
bool has_managed_dma(void)
{
	struct pglist_data *pgdat;

	for_each_online_pgdat(pgdat) {
		struct zone *zone = &pgdat->node_zones[ZONE_DMA];

		if (managed_zone(zone)
			return true;
	}
	return false;
}
#endif /* CONFIG_ZONE_DMA */

Without CONFIG_ZONE_DMA, simply provide a dummy in the header that
returns false.

-- 
Thanks,

David / dhildenb


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
  2021-12-09 13:10         ` David Hildenbrand
@ 2021-12-09 13:23           ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-09 13:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/09/21 at 02:10pm, David Hildenbrand wrote:
......
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index c5952749ad40..ac0ea42a4e5f 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
> >>>  	spin_unlock_irqrestore(&zone->lock, flags);
> >>>  	return ret;
> >>>  }
> >>> +
> >>> +bool has_managed_dma(void)
> >>> +{
> >>> +	struct zone *zone;
> >>> +
> >>> +	for_each_managed_zone(zone) {
> >>> +		if (zone_is_dma(zone))
> >>> +			return true;
> >>> +	}
> >>> +	return false;
> >>> +}
> >>
> >> Wouldn't it be "easier/faster" to just iterate online nodes and directly
> >> obtain the ZONE_DMA, checking if there are managed pages?
> > 
> > Thanks, Dave.
> > 
> > Please check for_each_managed_zone(), it is iterating online nodes and
> > it's each managed zone. 
> > 
> > Is below what you are suggesting? The only difference is I introduced
> > for_each_managed_zone() which can be reused later if needed. Not sure if
> > I got your suggestion correctly.
> > 
> > bool has_managed_dma(void)
> > {
> >         struct pglist_data *pgdat;
> >         struct zone *zone;
> >         enum zone_type i, j;
> > 
> >         for_each_online_pgdat(pgdat) {
> >                 for (i = 0; i < MAX_NR_ZONES - 1; i++) {          
> >                         struct zone *zone = &pgdat->node_zones[i];
> >                         if (zone_is_dma(zone))                                                                                                    
> >                                 return true;
> >                 }
> >         }
> >         return false;
> > 
> > }
> 
> 
> Even simpler, no need to iterate over zones at all, only over nodes:
> 
> #ifdef CONFIG_ZONE_DMA
> bool has_managed_dma(void)
> {
> 	struct pglist_data *pgdat;
> 
> 	for_each_online_pgdat(pgdat) {
> 		struct zone *zone = &pgdat->node_zones[ZONE_DMA];
> 
> 		if (managed_zone(zone)
> 			return true;
> 	}
> 	return false;
> }
> #endif /* CONFIG_ZONE_DMA */
> 
> Without CONFIG_ZONE_DMA, simply provide a dummy in the header that
> returns false.

Yeah, it only iterates the number of nodes times. I will take this in
v3. Thanks, David.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists
@ 2021-12-09 13:23           ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-09 13:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/09/21 at 02:10pm, David Hildenbrand wrote:
......
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index c5952749ad40..ac0ea42a4e5f 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -9459,4 +9459,15 @@ bool take_page_off_buddy(struct page *page)
> >>>  	spin_unlock_irqrestore(&zone->lock, flags);
> >>>  	return ret;
> >>>  }
> >>> +
> >>> +bool has_managed_dma(void)
> >>> +{
> >>> +	struct zone *zone;
> >>> +
> >>> +	for_each_managed_zone(zone) {
> >>> +		if (zone_is_dma(zone))
> >>> +			return true;
> >>> +	}
> >>> +	return false;
> >>> +}
> >>
> >> Wouldn't it be "easier/faster" to just iterate online nodes and directly
> >> obtain the ZONE_DMA, checking if there are managed pages?
> > 
> > Thanks, Dave.
> > 
> > Please check for_each_managed_zone(), it is iterating online nodes and
> > it's each managed zone. 
> > 
> > Is below what you are suggesting? The only difference is I introduced
> > for_each_managed_zone() which can be reused later if needed. Not sure if
> > I got your suggestion correctly.
> > 
> > bool has_managed_dma(void)
> > {
> >         struct pglist_data *pgdat;
> >         struct zone *zone;
> >         enum zone_type i, j;
> > 
> >         for_each_online_pgdat(pgdat) {
> >                 for (i = 0; i < MAX_NR_ZONES - 1; i++) {          
> >                         struct zone *zone = &pgdat->node_zones[i];
> >                         if (zone_is_dma(zone))                                                                                                    
> >                                 return true;
> >                 }
> >         }
> >         return false;
> > 
> > }
> 
> 
> Even simpler, no need to iterate over zones at all, only over nodes:
> 
> #ifdef CONFIG_ZONE_DMA
> bool has_managed_dma(void)
> {
> 	struct pglist_data *pgdat;
> 
> 	for_each_online_pgdat(pgdat) {
> 		struct zone *zone = &pgdat->node_zones[ZONE_DMA];
> 
> 		if (managed_zone(zone)
> 			return true;
> 	}
> 	return false;
> }
> #endif /* CONFIG_ZONE_DMA */
> 
> Without CONFIG_ZONE_DMA, simply provide a dummy in the header that
> returns false.

Yeah, it only iterates the number of nodes times. I will take this in
v3. Thanks, David.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-07  4:03     ` John Donnelly
@ 2021-12-13  3:54       ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-13  3:54 UTC (permalink / raw)
  To: John Donnelly
  Cc: linux-kernel, tglx, mingo, bp, dave.hansen, luto, peterz,
	linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, rppt

On 12/06/21 at 10:03pm, John Donnelly wrote:
> On 12/6/21 9:16 PM, Baoquan He wrote:
> > Sorry, forgot adding x86 and x86/mm maintainers
> 
> Hi,
> 
>   These commits need applied to Linux-5.15.0 (LTS) too since it has the
> original regression :
> 
>  1d659236fb43 ("dma-pool: scale the default DMA coherent pool
> size with memory capacity")

Yeah, Fixes and stable need be added. Thanks for pointing out.

As I have said in cover letter, this issue didn't occur until below
commits applied. So I will add 'Fixes: 6f599d84231f ("x86/kdump: Always
reserve the low 1M when the crashkernel option is specified")' to patch
4, 5. The patch 1, 2 are cleanup|improvement, not related to this issue.

  1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
  f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
  7c321eb2b843 x86/kdump: Remove the backup region handling
  6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified

> 
> Maybe add "Fixes" to the other commits ?
> 
> 
> > 
> > On 12/07/21 at 11:07am, Baoquan He wrote:
> > > ***Problem observed:
> > > On x86_64, when crash is triggered and entering into kdump kernel, page
> > > allocation failure can always be seen.
> > > 
> > >   ---------------------------------
> > >   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
> > >   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> > >   CPU: 0 PID: 1 Comm: swapper/0
> > >   Call Trace:
> > >    dump_stack+0x7f/0xa1
> > >    warn_alloc.cold+0x72/0xd6
> > >    ......
> > >    __alloc_pages+0x24d/0x2c0
> > >    ......
> > >    dma_atomic_pool_init+0xdb/0x176
> > >    do_one_initcall+0x67/0x320
> > >    ? rcu_read_lock_sched_held+0x3f/0x80
> > >    kernel_init_freeable+0x290/0x2dc
> > >    ? rest_init+0x24f/0x24f
> > >    kernel_init+0xa/0x111
> > >    ret_from_fork+0x22/0x30
> > >   Mem-Info:
> > >   ------------------------------------
> > > 
> > > ***Root cause:
> > > In the current kernel, it assumes that DMA zone must have managed pages
> > > and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
> > > always true. E.g in kdump kernel of x86_64, only low 1M is presented and
> > > locked down at very early stage of boot, so that this low 1M won't be
> > > added into buddy allocator to become managed pages of DMA zone. This
> > > exception will always cause page allocation failure if page is requested
> > > from DMA zone.
> > > 
> > > ***Investigation:
> > > This failure happens since below commit merged into linus's tree.
> > >    1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
> > >    23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
> > >    f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
> > >    7c321eb2b843 x86/kdump: Remove the backup region handling
> > >    6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
> > > 
> > > Before them, on x86_64, the low 640K area will be reused by kdump kernel.
> > > So in kdump kernel, the content of low 640K area is copied into a backup
> > > region for dumping before jumping into kdump. Then except of those firmware
> > > reserved region in [0, 640K], the left area will be added into buddy
> > > allocator to become available managed pages of DMA zone.
> > > 
> > > However, after above commits applied, in kdump kernel of x86_64, the low
> > > 1M is reserved by memblock, but not released to buddy allocator. So any
> > > later page allocation requested from DMA zone will fail.
> > > 
> > > This low 1M lock down is needed because AMD SME encrypts memory making
> > > the old backup region mechanims impossible when switching into kdump
> > > kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
> > > which is under development in kernel also needs lock down the low 1M.
> > > So we can't simply revert above commits to fix the page allocation
> > > failure from DMA zone as someone suggested.
> > > 
> > > ***Solution:
> > > Currently, only DMA atomic pool and dma-kmalloc will initialize and
> > > request page allocation with GFP_DMA during bootup. So only initialize
> > > them when DMA zone has available managed pages, otherwise just skip the
> > > initialization. From testing and code, this doesn't matter. In kdump
> > > kernel of x86_64, the page allocation failure disappear.
> > > 
> > > ***Further thinking
> > > On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
> > > into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> > > take care of antique ISA devices. In fact, on 64bit system, it rarely
> > > need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> > > However, some components treat DMA as a generic concept, e.g
> > > kmalloc-dma, slab allocator initializes it for later any DMA related
> > > buffer allocation, but not limited to ISA DMA.
> > > 
> > > On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> > > are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> > > empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> > > then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> > > the 32-bit addressable memory.
> > > 
> > > I am wondering if we can also change the size of DMA and DMA32 ZONE as
> > > dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> > > zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> > > default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> > > low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> > > (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> > > memory when enabled?)
> > > 
> > > Change history:
> > > 
> > > v2 post:
> > > https://urldefense.com/v3/__https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EjaERCi0$
> > > 
> > > v1 post:
> > > https://urldefense.com/v3/__https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EgRgBiPP$
> > > 
> > > v2->v2 RESEND:
> > >   John pinged to push the repost of this patchset. So fix one typo of
> > >   suject of patch 3/5; Fix a building error caused by mix declaration in
> > >   patch 5/5. Both of them are found by John from his testing.
> > > 
> > > v1->v2:
> > >   Change to check if managed DMA zone exists. If DMA zone has managed
> > >   pages, go further to request page from DMA zone to initialize. Otherwise,
> > >   just skip to initialize stuffs which need pages from DMA zone.
> > > 
> > > Baoquan He (5):
> > >    docs: kernel-parameters: Update to reflect the current default size of
> > >      atomic pool
> > >    dma-pool: allow user to disable atomic pool
> > >    mm_zone: add function to check if managed dma zone exists
> > >    dma/pool: create dma atomic pool only if dma zone has managed pages
> > >    mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
> > > 
> > >   .../admin-guide/kernel-parameters.txt         |  5 ++++-
> > >   include/linux/mmzone.h                        | 21 +++++++++++++++++++
> > >   kernel/dma/pool.c                             | 11 ++++++----
> > >   mm/page_alloc.c                               | 11 ++++++++++
> > >   mm/slab_common.c                              |  9 ++++++++
> > >   5 files changed, 52 insertions(+), 5 deletions(-)
> > > 
> > > -- 
> > > 2.17.2
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-13  3:54       ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-13  3:54 UTC (permalink / raw)
  To: John Donnelly
  Cc: linux-kernel, tglx, mingo, bp, dave.hansen, luto, peterz,
	linux-mm, akpm, hch, robin.murphy, cl, penberg, rientjes,
	iamjoonsoo.kim, vbabka, m.szyprowski, kexec, rppt

On 12/06/21 at 10:03pm, John Donnelly wrote:
> On 12/6/21 9:16 PM, Baoquan He wrote:
> > Sorry, forgot adding x86 and x86/mm maintainers
> 
> Hi,
> 
>   These commits need applied to Linux-5.15.0 (LTS) too since it has the
> original regression :
> 
>  1d659236fb43 ("dma-pool: scale the default DMA coherent pool
> size with memory capacity")

Yeah, Fixes and stable need be added. Thanks for pointing out.

As I have said in cover letter, this issue didn't occur until below
commits applied. So I will add 'Fixes: 6f599d84231f ("x86/kdump: Always
reserve the low 1M when the crashkernel option is specified")' to patch
4, 5. The patch 1, 2 are cleanup|improvement, not related to this issue.

  1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
  f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
  7c321eb2b843 x86/kdump: Remove the backup region handling
  6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified

> 
> Maybe add "Fixes" to the other commits ?
> 
> 
> > 
> > On 12/07/21 at 11:07am, Baoquan He wrote:
> > > ***Problem observed:
> > > On x86_64, when crash is triggered and entering into kdump kernel, page
> > > allocation failure can always be seen.
> > > 
> > >   ---------------------------------
> > >   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
> > >   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> > >   CPU: 0 PID: 1 Comm: swapper/0
> > >   Call Trace:
> > >    dump_stack+0x7f/0xa1
> > >    warn_alloc.cold+0x72/0xd6
> > >    ......
> > >    __alloc_pages+0x24d/0x2c0
> > >    ......
> > >    dma_atomic_pool_init+0xdb/0x176
> > >    do_one_initcall+0x67/0x320
> > >    ? rcu_read_lock_sched_held+0x3f/0x80
> > >    kernel_init_freeable+0x290/0x2dc
> > >    ? rest_init+0x24f/0x24f
> > >    kernel_init+0xa/0x111
> > >    ret_from_fork+0x22/0x30
> > >   Mem-Info:
> > >   ------------------------------------
> > > 
> > > ***Root cause:
> > > In the current kernel, it assumes that DMA zone must have managed pages
> > > and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
> > > always true. E.g in kdump kernel of x86_64, only low 1M is presented and
> > > locked down at very early stage of boot, so that this low 1M won't be
> > > added into buddy allocator to become managed pages of DMA zone. This
> > > exception will always cause page allocation failure if page is requested
> > > from DMA zone.
> > > 
> > > ***Investigation:
> > > This failure happens since below commit merged into linus's tree.
> > >    1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
> > >    23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
> > >    f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
> > >    7c321eb2b843 x86/kdump: Remove the backup region handling
> > >    6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
> > > 
> > > Before them, on x86_64, the low 640K area will be reused by kdump kernel.
> > > So in kdump kernel, the content of low 640K area is copied into a backup
> > > region for dumping before jumping into kdump. Then except of those firmware
> > > reserved region in [0, 640K], the left area will be added into buddy
> > > allocator to become available managed pages of DMA zone.
> > > 
> > > However, after above commits applied, in kdump kernel of x86_64, the low
> > > 1M is reserved by memblock, but not released to buddy allocator. So any
> > > later page allocation requested from DMA zone will fail.
> > > 
> > > This low 1M lock down is needed because AMD SME encrypts memory making
> > > the old backup region mechanims impossible when switching into kdump
> > > kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
> > > which is under development in kernel also needs lock down the low 1M.
> > > So we can't simply revert above commits to fix the page allocation
> > > failure from DMA zone as someone suggested.
> > > 
> > > ***Solution:
> > > Currently, only DMA atomic pool and dma-kmalloc will initialize and
> > > request page allocation with GFP_DMA during bootup. So only initialize
> > > them when DMA zone has available managed pages, otherwise just skip the
> > > initialization. From testing and code, this doesn't matter. In kdump
> > > kernel of x86_64, the page allocation failure disappear.
> > > 
> > > ***Further thinking
> > > On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
> > > into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> > > take care of antique ISA devices. In fact, on 64bit system, it rarely
> > > need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> > > However, some components treat DMA as a generic concept, e.g
> > > kmalloc-dma, slab allocator initializes it for later any DMA related
> > > buffer allocation, but not limited to ISA DMA.
> > > 
> > > On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> > > are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> > > empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> > > then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> > > the 32-bit addressable memory.
> > > 
> > > I am wondering if we can also change the size of DMA and DMA32 ZONE as
> > > dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> > > zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> > > default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> > > low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> > > (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> > > memory when enabled?)
> > > 
> > > Change history:
> > > 
> > > v2 post:
> > > https://urldefense.com/v3/__https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EjaERCi0$
> > > 
> > > v1 post:
> > > https://urldefense.com/v3/__https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EgRgBiPP$
> > > 
> > > v2->v2 RESEND:
> > >   John pinged to push the repost of this patchset. So fix one typo of
> > >   suject of patch 3/5; Fix a building error caused by mix declaration in
> > >   patch 5/5. Both of them are found by John from his testing.
> > > 
> > > v1->v2:
> > >   Change to check if managed DMA zone exists. If DMA zone has managed
> > >   pages, go further to request page from DMA zone to initialize. Otherwise,
> > >   just skip to initialize stuffs which need pages from DMA zone.
> > > 
> > > Baoquan He (5):
> > >    docs: kernel-parameters: Update to reflect the current default size of
> > >      atomic pool
> > >    dma-pool: allow user to disable atomic pool
> > >    mm_zone: add function to check if managed dma zone exists
> > >    dma/pool: create dma atomic pool only if dma zone has managed pages
> > >    mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
> > > 
> > >   .../admin-guide/kernel-parameters.txt         |  5 ++++-
> > >   include/linux/mmzone.h                        | 21 +++++++++++++++++++
> > >   kernel/dma/pool.c                             | 11 ++++++----
> > >   mm/page_alloc.c                               | 11 ++++++++++
> > >   mm/slab_common.c                              |  9 ++++++++
> > >   5 files changed, 52 insertions(+), 5 deletions(-)
> > > 
> > > -- 
> > > 2.17.2
> > > 
> > 
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-09 12:59       ` Christoph Lameter
@ 2021-12-13  7:39         ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-13  7:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/09/21 at 01:59pm, Christoph Lameter wrote:
> On Thu, 9 Dec 2021, Baoquan He wrote:
> 
> > > The slab allocators guarantee that all kmalloc allocations are DMA able
> > > indepent of specifying ZONE_DMA/ZONE_DMA32
> >
> > Here you mean we guarantee dma-kmalloc will be DMA able independent of
> > specifying ZONE_DMA/DMA32, or the whole sla/ub allocator?
> 
> All memory obtained via kmalloc --independent of "dma-alloc", ZONE_DMA
> etc-- must be dmaable.

This has a prerequisite as you said at below, only if devices can
address full memory, right?


> 
> > With my understanding, isn't the reasonable sequence zone DMA firstly if
> > GFP_DMA, then zone DMA32, finaly zone NORMAL. At least, on x86_64, I
> > believe device driver developer prefer to see this because most of time,
> > zone DMA and zone DMA32 are both used for dma buffer allocation, if
> > IOMMU is not enabled. However, memory got from zone NORMAL when required
> > with GFP_DMA, and it succeeds, does it mean that the developer doesn't
> > take the GFP_DMA flag seriously, just try to get buffer for allocation?
> 
> ZONE_NORMAL is also used for DMA allocations. ZONE_DMA and ZONE_DMA32 are
> only used if the physical range of memory supported by a device does not
> include all of normal memory.

If devices can address full memory, ZONE_NORMAL can also be used for DMA
allocations. (This covers the systems where IOMMU is provided).

If device has address limit, e.g dma mask is 24bit or 32bit, ZONE_DMA
and ZONE_DMA32 are needed.

> 
> > > The size of ZONE_DMA is traditionally depending on the platform. On some
> > > it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> > > only be used if ZONE_DMA has already been used.
> >
> > As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> > cover low 4G with ZONE_DMA32 alone.
> 
> If you do not have devices that are crap and cannot address the full
> memory then you dont need these special zones.

I am not a DMA expert, with my understanding, on x86_64 and arm64, we
have PCIe devices which dma mask is 32bit, means they can only address
ZONE_DMA32. Supporting to address full memory might be too expensive for
devices, e.g on these two ARCHes, supported memory could be deployed on
Petabyte of address.

> 
> Sorry this subject has caused confusion multiple times over the years and
> there are still arches that are not implementing this in a consistent way.

Seems so.

And by the way, when I read slub code, noticed a strange phenomenon, I
haven't found out why. When create cache with kmem_cache_create(), zone
flag SLAB_CACHE_DMA, SLAB_CACHE_DMA32 can be specified. allocflags will
store them, and will take out to use when allocating new slab.
Meanwhile, we can also specify gfpflags, but it can't be GFP_DMA32,
because of GFP_SLAB_BUG_MASK. I traced back to very old git history,
didn't find out why GFP_DMA32 can't be specified during
kmem_cache_alloc().

We can completely rely on the cache->allocflags to mark the zone which
we will request page from, but we can also specify gfpflags in
kmem_cache_alloc() to change zone. GFP_DMA32 is prohibited. Here I can
only see that kmalloc() might be the reason, since kmalloc_large()
doesn't have created cache, so no ->allocflags to use.

Is this expected? What can we do to clarify or improve this, at
leaset on code readability?

I am going to post v3, will discard the 'Further thinking' in cover
letter according to your comment. Please help point out if anthing need
be done or missed.

Thanks a lot.

Baoquan
Thanks


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-13  7:39         ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-13  7:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/09/21 at 01:59pm, Christoph Lameter wrote:
> On Thu, 9 Dec 2021, Baoquan He wrote:
> 
> > > The slab allocators guarantee that all kmalloc allocations are DMA able
> > > indepent of specifying ZONE_DMA/ZONE_DMA32
> >
> > Here you mean we guarantee dma-kmalloc will be DMA able independent of
> > specifying ZONE_DMA/DMA32, or the whole sla/ub allocator?
> 
> All memory obtained via kmalloc --independent of "dma-alloc", ZONE_DMA
> etc-- must be dmaable.

This has a prerequisite as you said at below, only if devices can
address full memory, right?


> 
> > With my understanding, isn't the reasonable sequence zone DMA firstly if
> > GFP_DMA, then zone DMA32, finaly zone NORMAL. At least, on x86_64, I
> > believe device driver developer prefer to see this because most of time,
> > zone DMA and zone DMA32 are both used for dma buffer allocation, if
> > IOMMU is not enabled. However, memory got from zone NORMAL when required
> > with GFP_DMA, and it succeeds, does it mean that the developer doesn't
> > take the GFP_DMA flag seriously, just try to get buffer for allocation?
> 
> ZONE_NORMAL is also used for DMA allocations. ZONE_DMA and ZONE_DMA32 are
> only used if the physical range of memory supported by a device does not
> include all of normal memory.

If devices can address full memory, ZONE_NORMAL can also be used for DMA
allocations. (This covers the systems where IOMMU is provided).

If device has address limit, e.g dma mask is 24bit or 32bit, ZONE_DMA
and ZONE_DMA32 are needed.

> 
> > > The size of ZONE_DMA is traditionally depending on the platform. On some
> > > it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> > > only be used if ZONE_DMA has already been used.
> >
> > As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> > cover low 4G with ZONE_DMA32 alone.
> 
> If you do not have devices that are crap and cannot address the full
> memory then you dont need these special zones.

I am not a DMA expert, with my understanding, on x86_64 and arm64, we
have PCIe devices which dma mask is 32bit, means they can only address
ZONE_DMA32. Supporting to address full memory might be too expensive for
devices, e.g on these two ARCHes, supported memory could be deployed on
Petabyte of address.

> 
> Sorry this subject has caused confusion multiple times over the years and
> there are still arches that are not implementing this in a consistent way.

Seems so.

And by the way, when I read slub code, noticed a strange phenomenon, I
haven't found out why. When create cache with kmem_cache_create(), zone
flag SLAB_CACHE_DMA, SLAB_CACHE_DMA32 can be specified. allocflags will
store them, and will take out to use when allocating new slab.
Meanwhile, we can also specify gfpflags, but it can't be GFP_DMA32,
because of GFP_SLAB_BUG_MASK. I traced back to very old git history,
didn't find out why GFP_DMA32 can't be specified during
kmem_cache_alloc().

We can completely rely on the cache->allocflags to mark the zone which
we will request page from, but we can also specify gfpflags in
kmem_cache_alloc() to change zone. GFP_DMA32 is prohibited. Here I can
only see that kmalloc() might be the reason, since kmalloc_large()
doesn't have created cache, so no ->allocflags to use.

Is this expected? What can we do to clarify or improve this, at
leaset on code readability?

I am going to post v3, will discard the 'Further thinking' in cover
letter according to your comment. Please help point out if anthing need
be done or missed.

Thanks a lot.

Baoquan
Thanks


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool
  2021-12-07  3:07   ` Baoquan He
@ 2021-12-13  7:44     ` Christoph Hellwig
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoph Hellwig @ 2021-12-13  7:44 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On Tue, Dec 07, 2021 at 11:07:47AM +0800, Baoquan He wrote:
> In the current code, three atomic memory pools are always created,
> atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
> specified in kernel command line. In fact, atomic pool is only
> necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
> which are needed on few ARCHes.

And only these select the atomic pool, so it won't get created otherwise.
What problem are you trying to solve?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool
@ 2021-12-13  7:44     ` Christoph Hellwig
  0 siblings, 0 replies; 64+ messages in thread
From: Christoph Hellwig @ 2021-12-13  7:44 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On Tue, Dec 07, 2021 at 11:07:47AM +0800, Baoquan He wrote:
> In the current code, three atomic memory pools are always created,
> atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
> specified in kernel command line. In fact, atomic pool is only
> necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
> which are needed on few ARCHes.

And only these select the atomic pool, so it won't get created otherwise.
What problem are you trying to solve?

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-07  8:05   ` Christoph Lameter
@ 2021-12-13  7:47     ` Christoph Hellwig
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoph Hellwig @ 2021-12-13  7:47 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Baoquan He, linux-kernel, linux-mm, akpm, hch, robin.murphy,
	penberg, rientjes, iamjoonsoo.kim, vbabka, m.szyprowski,
	John.p.donnelly, kexec

On Tue, Dec 07, 2021 at 09:05:26AM +0100, Christoph Lameter wrote:
> On Tue, 7 Dec 2021, Baoquan He wrote:
> 
> > into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> > take care of antique ISA devices. In fact, on 64bit system, it rarely
> > need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> > However, some components treat DMA as a generic concept, e.g
> > kmalloc-dma, slab allocator initializes it for later any DMA related
> > buffer allocation, but not limited to ISA DMA.
> 
> The idea of the slab allocator DMA support is to have memory available
> for devices that can only support a limited range of physical addresses.
> These are only to be enabled for platforms that have such requirements.
> 
> The slab allocators guarantee that all kmalloc allocations are DMA able
> indepent of specifying ZONE_DMA/ZONE_DMA32

Yes.  And we never supported slab for ZONE_DMA32 and should work on
getting rid of it for ZONE_DMA as well.  The only thing that guarantees
device addressability is the DMA API.  The DMA API needs ZONE_DMA/DMA32
to back its page allocations, but supporting this in slab is a bad idea
only explained by historic reasons from before when we had a DMA API.

> > On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> > are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> > empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> > then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> > the 32-bit addressable memory.
> 
> ZONE_NORMAL should cover all memory. ARM does not need ZONE_DMA32.

arm32 not, arm64 does.  And the Pi 4 is an arm64 device.

> > I am wondering if we can also change the size of DMA and DMA32 ZONE as
> > dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> > zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> > default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> > low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> > (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> > memory when enabled?)
> 
> The size of ZONE_DMA is traditionally depending on the platform. On some
> it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> only be used if ZONE_DMA has already been used.

ZONE32 should be (and generally is) used whenever there is zone covering
the 32-bit CPU physical address limit.

> 
> ZONE_DMA is dynamic in the sense of being different on different
> platforms.

Agreed.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-13  7:47     ` Christoph Hellwig
  0 siblings, 0 replies; 64+ messages in thread
From: Christoph Hellwig @ 2021-12-13  7:47 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Baoquan He, linux-kernel, linux-mm, akpm, hch, robin.murphy,
	penberg, rientjes, iamjoonsoo.kim, vbabka, m.szyprowski,
	John.p.donnelly, kexec

On Tue, Dec 07, 2021 at 09:05:26AM +0100, Christoph Lameter wrote:
> On Tue, 7 Dec 2021, Baoquan He wrote:
> 
> > into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> > take care of antique ISA devices. In fact, on 64bit system, it rarely
> > need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> > However, some components treat DMA as a generic concept, e.g
> > kmalloc-dma, slab allocator initializes it for later any DMA related
> > buffer allocation, but not limited to ISA DMA.
> 
> The idea of the slab allocator DMA support is to have memory available
> for devices that can only support a limited range of physical addresses.
> These are only to be enabled for platforms that have such requirements.
> 
> The slab allocators guarantee that all kmalloc allocations are DMA able
> indepent of specifying ZONE_DMA/ZONE_DMA32

Yes.  And we never supported slab for ZONE_DMA32 and should work on
getting rid of it for ZONE_DMA as well.  The only thing that guarantees
device addressability is the DMA API.  The DMA API needs ZONE_DMA/DMA32
to back its page allocations, but supporting this in slab is a bad idea
only explained by historic reasons from before when we had a DMA API.

> > On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> > are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> > empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> > then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> > the 32-bit addressable memory.
> 
> ZONE_NORMAL should cover all memory. ARM does not need ZONE_DMA32.

arm32 not, arm64 does.  And the Pi 4 is an arm64 device.

> > I am wondering if we can also change the size of DMA and DMA32 ZONE as
> > dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> > zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> > default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> > low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> > (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> > memory when enabled?)
> 
> The size of ZONE_DMA is traditionally depending on the platform. On some
> it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> only be used if ZONE_DMA has already been used.

ZONE32 should be (and generally is) used whenever there is zone covering
the 32-bit CPU physical address limit.

> 
> ZONE_DMA is dynamic in the sense of being different on different
> platforms.

Agreed.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-13  7:39         ` Baoquan He
@ 2021-12-13  7:49           ` Christoph Hellwig
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoph Hellwig @ 2021-12-13  7:49 UTC (permalink / raw)
  To: Baoquan He
  Cc: Christoph Lameter, linux-kernel, linux-mm, akpm, hch,
	robin.murphy, penberg, rientjes, iamjoonsoo.kim, vbabka,
	m.szyprowski, John.p.donnelly, kexec

On Mon, Dec 13, 2021 at 03:39:25PM +0800, Baoquan He wrote:
> > > As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> > > cover low 4G with ZONE_DMA32 alone.
> > 
> > If you do not have devices that are crap and cannot address the full
> > memory then you dont need these special zones.
> 
> I am not a DMA expert, with my understanding, on x86_64 and arm64, we
> have PCIe devices which dma mask is 32bit

Yes, way to many, and they keep getting newly introduce as well.  Also
weirdo masks like 40, 44 or 48 bits.

> , means they can only address
> ZONE_DMA32.

Yes and no.  Offset between cpu physical and device address make this
complicated, even ignoring iommus.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-13  7:49           ` Christoph Hellwig
  0 siblings, 0 replies; 64+ messages in thread
From: Christoph Hellwig @ 2021-12-13  7:49 UTC (permalink / raw)
  To: Baoquan He
  Cc: Christoph Lameter, linux-kernel, linux-mm, akpm, hch,
	robin.murphy, penberg, rientjes, iamjoonsoo.kim, vbabka,
	m.szyprowski, John.p.donnelly, kexec

On Mon, Dec 13, 2021 at 03:39:25PM +0800, Baoquan He wrote:
> > > As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> > > cover low 4G with ZONE_DMA32 alone.
> > 
> > If you do not have devices that are crap and cannot address the full
> > memory then you dont need these special zones.
> 
> I am not a DMA expert, with my understanding, on x86_64 and arm64, we
> have PCIe devices which dma mask is 32bit

Yes, way to many, and they keep getting newly introduce as well.  Also
weirdo masks like 40, 44 or 48 bits.

> , means they can only address
> ZONE_DMA32.

Yes and no.  Offset between cpu physical and device address make this
complicated, even ignoring iommus.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool
  2021-12-13  7:44     ` Christoph Hellwig
@ 2021-12-13  8:16       ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-13  8:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-mm, akpm, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/13/21 at 08:44am, Christoph Hellwig wrote:
> On Tue, Dec 07, 2021 at 11:07:47AM +0800, Baoquan He wrote:
> > In the current code, three atomic memory pools are always created,
> > atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
> > specified in kernel command line. In fact, atomic pool is only
> > necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
> > which are needed on few ARCHes.
> 
> And only these select the atomic pool, so it won't get created otherwise.
> What problem are you trying to solve?

This tries to make "coherent_pool=0" behave normally. As you see,
'coherent_pool=0' will behave like no 'coherent_pool' being specified.
This is not consistent with other similar kernel parameter, e.g cma=.

At the beginning, I planned to add a knob to allow user to disable one
or all atomic pool. Later I changed. However I think this patch makes
sense on fixing the a little bizarre behaviour, 'coherent_pool=0' but
still get atomic pool created.

I can drop it if you think it's unnecessary.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool
@ 2021-12-13  8:16       ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-13  8:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-mm, akpm, robin.murphy, cl, penberg,
	rientjes, iamjoonsoo.kim, vbabka, m.szyprowski, John.p.donnelly,
	kexec

On 12/13/21 at 08:44am, Christoph Hellwig wrote:
> On Tue, Dec 07, 2021 at 11:07:47AM +0800, Baoquan He wrote:
> > In the current code, three atomic memory pools are always created,
> > atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
> > specified in kernel command line. In fact, atomic pool is only
> > necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
> > which are needed on few ARCHes.
> 
> And only these select the atomic pool, so it won't get created otherwise.
> What problem are you trying to solve?

This tries to make "coherent_pool=0" behave normally. As you see,
'coherent_pool=0' will behave like no 'coherent_pool' being specified.
This is not consistent with other similar kernel parameter, e.g cma=.

At the beginning, I planned to add a knob to allow user to disable one
or all atomic pool. Later I changed. However I think this patch makes
sense on fixing the a little bizarre behaviour, 'coherent_pool=0' but
still get atomic pool created.

I can drop it if you think it's unnecessary.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-07  3:16   ` Baoquan He
@ 2021-12-13 13:25     ` Borislav Petkov
  -1 siblings, 0 replies; 64+ messages in thread
From: Borislav Petkov @ 2021-12-13 13:25 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, tglx, mingo, dave.hansen, luto, peterz, linux-mm,
	akpm, hch, robin.murphy, cl, penberg, rientjes, iamjoonsoo.kim,
	vbabka, m.szyprowski, John.p.donnelly, kexec, rppt

On Tue, Dec 07, 2021 at 11:16:31AM +0800, Baoquan He wrote:
> > This low 1M lock down is needed because AMD SME encrypts memory making
> > the old backup region mechanims impossible when switching into kdump
> > kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
> > which is under development in kernel also needs lock down the low 1M.
> > So we can't simply revert above commits to fix the page allocation
> > failure from DMA zone as someone suggested.

Did you read

  f1d4d47c5851 ("x86/setup: Always reserve the first 1M of RAM")

carefully for a more generically important reason as to why the first 1M
should not be used?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-13 13:25     ` Borislav Petkov
  0 siblings, 0 replies; 64+ messages in thread
From: Borislav Petkov @ 2021-12-13 13:25 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, tglx, mingo, dave.hansen, luto, peterz, linux-mm,
	akpm, hch, robin.murphy, cl, penberg, rientjes, iamjoonsoo.kim,
	vbabka, m.szyprowski, John.p.donnelly, kexec, rppt

On Tue, Dec 07, 2021 at 11:16:31AM +0800, Baoquan He wrote:
> > This low 1M lock down is needed because AMD SME encrypts memory making
> > the old backup region mechanims impossible when switching into kdump
> > kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
> > which is under development in kernel also needs lock down the low 1M.
> > So we can't simply revert above commits to fix the page allocation
> > failure from DMA zone as someone suggested.

Did you read

  f1d4d47c5851 ("x86/setup: Always reserve the first 1M of RAM")

carefully for a more generically important reason as to why the first 1M
should not be used?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-13 13:25     ` Borislav Petkov
@ 2021-12-13 14:03       ` Baoquan He
  -1 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-13 14:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, tglx, mingo, dave.hansen, luto, peterz, linux-mm,
	akpm, hch, robin.murphy, cl, penberg, rientjes, iamjoonsoo.kim,
	vbabka, m.szyprowski, John.p.donnelly, kexec, rppt

On 12/13/21 at 02:25pm, Borislav Petkov wrote:
> On Tue, Dec 07, 2021 at 11:16:31AM +0800, Baoquan He wrote:
> > > This low 1M lock down is needed because AMD SME encrypts memory making
> > > the old backup region mechanims impossible when switching into kdump
> > > kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
> > > which is under development in kernel also needs lock down the low 1M.
> > > So we can't simply revert above commits to fix the page allocation
> > > failure from DMA zone as someone suggested.
> 
> Did you read
> 
>   f1d4d47c5851 ("x86/setup: Always reserve the first 1M of RAM")
> 
> carefully for a more generically important reason as to why the first 1M
> should not be used?

Apparently I didn't. I slacked off and just grabbed things stored in my
brain. This is the right justification and missed. Thanks for pointing
it out.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-13 14:03       ` Baoquan He
  0 siblings, 0 replies; 64+ messages in thread
From: Baoquan He @ 2021-12-13 14:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, tglx, mingo, dave.hansen, luto, peterz, linux-mm,
	akpm, hch, robin.murphy, cl, penberg, rientjes, iamjoonsoo.kim,
	vbabka, m.szyprowski, John.p.donnelly, kexec, rppt

On 12/13/21 at 02:25pm, Borislav Petkov wrote:
> On Tue, Dec 07, 2021 at 11:16:31AM +0800, Baoquan He wrote:
> > > This low 1M lock down is needed because AMD SME encrypts memory making
> > > the old backup region mechanims impossible when switching into kdump
> > > kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
> > > which is under development in kernel also needs lock down the low 1M.
> > > So we can't simply revert above commits to fix the page allocation
> > > failure from DMA zone as someone suggested.
> 
> Did you read
> 
>   f1d4d47c5851 ("x86/setup: Always reserve the first 1M of RAM")
> 
> carefully for a more generically important reason as to why the first 1M
> should not be used?

Apparently I didn't. I slacked off and just grabbed things stored in my
brain. This is the right justification and missed. Thanks for pointing
it out.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
  2021-12-09 12:59       ` Christoph Lameter
@ 2021-12-13 14:21         ` Hyeonggon Yoo
  -1 siblings, 0 replies; 64+ messages in thread
From: Hyeonggon Yoo @ 2021-12-13 14:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Baoquan He, linux-kernel, linux-mm, akpm, hch, robin.murphy,
	penberg, rientjes, iamjoonsoo.kim, vbabka, m.szyprowski,
	John.p.donnelly, kexec

On Thu, Dec 09, 2021 at 01:59:58PM +0100, Christoph Lameter wrote:
> On Thu, 9 Dec 2021, Baoquan He wrote:
> 
> > > The slab allocators guarantee that all kmalloc allocations are DMA able
> > > indepent of specifying ZONE_DMA/ZONE_DMA32
> >
> > Here you mean we guarantee dma-kmalloc will be DMA able independent of
> > specifying ZONE_DMA/DMA32, or the whole sla/ub allocator?
> 
> All memory obtained via kmalloc --independent of "dma-alloc", ZONE_DMA
> etc-- must be dmaable.
> 
> > With my understanding, isn't the reasonable sequence zone DMA firstly if
> > GFP_DMA, then zone DMA32, finaly zone NORMAL. At least, on x86_64, I
> > believe device driver developer prefer to see this because most of time,
> > zone DMA and zone DMA32 are both used for dma buffer allocation, if
> > IOMMU is not enabled. However, memory got from zone NORMAL when required
> > with GFP_DMA, and it succeeds, does it mean that the developer doesn't
> > take the GFP_DMA flag seriously, just try to get buffer for allocation?
> 
> ZONE_NORMAL is also used for DMA allocations. ZONE_DMA and ZONE_DMA32 are
> only used if the physical range of memory supported by a device does not
> include all of normal memory.
> 
> > > The size of ZONE_DMA is traditionally depending on the platform. On some
> > > it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> > > only be used if ZONE_DMA has already been used.
> >
> > As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> > cover low 4G with ZONE_DMA32 alone.
> 
> If you do not have devices that are crap and cannot address the full
> memory then you dont need these special zones.
> 
> Sorry this subject has caused confusion multiple times over the years and
> there are still arches that are not implementing this in a consistent way.

Hello Baoquan and Christoph.

I'm the confused one here too. :)

So the point is that ZONE_NORMAL is also dma-able if the device can access
normal memory. (which is false for ISA devices, ancient PCI devices,
...etc.)

Then if I understand right, I think the patch 5/5 (mm/slub: Avoid ...) should be removing
GFP_DMA flag from the function sr_probe() -> get_capabilities, rather than
copying copying normal kmalloc caches to dma kmalloc caches. (If the device does
not have limitation in its address space.)

Please let me know If I got it wrong :)

Thanks,
Hyeonggon.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
@ 2021-12-13 14:21         ` Hyeonggon Yoo
  0 siblings, 0 replies; 64+ messages in thread
From: Hyeonggon Yoo @ 2021-12-13 14:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Baoquan He, linux-kernel, linux-mm, akpm, hch, robin.murphy,
	penberg, rientjes, iamjoonsoo.kim, vbabka, m.szyprowski,
	John.p.donnelly, kexec

On Thu, Dec 09, 2021 at 01:59:58PM +0100, Christoph Lameter wrote:
> On Thu, 9 Dec 2021, Baoquan He wrote:
> 
> > > The slab allocators guarantee that all kmalloc allocations are DMA able
> > > indepent of specifying ZONE_DMA/ZONE_DMA32
> >
> > Here you mean we guarantee dma-kmalloc will be DMA able independent of
> > specifying ZONE_DMA/DMA32, or the whole sla/ub allocator?
> 
> All memory obtained via kmalloc --independent of "dma-alloc", ZONE_DMA
> etc-- must be dmaable.
> 
> > With my understanding, isn't the reasonable sequence zone DMA firstly if
> > GFP_DMA, then zone DMA32, finaly zone NORMAL. At least, on x86_64, I
> > believe device driver developer prefer to see this because most of time,
> > zone DMA and zone DMA32 are both used for dma buffer allocation, if
> > IOMMU is not enabled. However, memory got from zone NORMAL when required
> > with GFP_DMA, and it succeeds, does it mean that the developer doesn't
> > take the GFP_DMA flag seriously, just try to get buffer for allocation?
> 
> ZONE_NORMAL is also used for DMA allocations. ZONE_DMA and ZONE_DMA32 are
> only used if the physical range of memory supported by a device does not
> include all of normal memory.
> 
> > > The size of ZONE_DMA is traditionally depending on the platform. On some
> > > it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> > > only be used if ZONE_DMA has already been used.
> >
> > As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> > cover low 4G with ZONE_DMA32 alone.
> 
> If you do not have devices that are crap and cannot address the full
> memory then you dont need these special zones.
> 
> Sorry this subject has caused confusion multiple times over the years and
> there are still arches that are not implementing this in a consistent way.

Hello Baoquan and Christoph.

I'm the confused one here too. :)

So the point is that ZONE_NORMAL is also dma-able if the device can access
normal memory. (which is false for ISA devices, ancient PCI devices,
...etc.)

Then if I understand right, I think the patch 5/5 (mm/slub: Avoid ...) should be removing
GFP_DMA flag from the function sr_probe() -> get_capabilities, rather than
copying copying normal kmalloc caches to dma kmalloc caches. (If the device does
not have limitation in its address space.)

Please let me know If I got it wrong :)

Thanks,
Hyeonggon.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2021-12-13 14:21 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07  3:07 [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages Baoquan He
2021-12-07  3:07 ` Baoquan He
2021-12-07  3:07 ` [PATCH RESEND v2 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool Baoquan He
2021-12-07  3:07   ` Baoquan He
2021-12-07  3:53   ` John Donnelly
2021-12-07  3:53     ` John Donnelly
2021-12-07  3:07 ` [PATCH RESEND v2 2/5] dma-pool: allow user to disable " Baoquan He
2021-12-07  3:07   ` Baoquan He
2021-12-07  3:53   ` John Donnelly
2021-12-07  3:53     ` John Donnelly
2021-12-13  7:44   ` Christoph Hellwig
2021-12-13  7:44     ` Christoph Hellwig
2021-12-13  8:16     ` Baoquan He
2021-12-13  8:16       ` Baoquan He
2021-12-07  3:07 ` [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists Baoquan He
2021-12-07  3:07   ` Baoquan He
2021-12-07  3:53   ` John Donnelly
2021-12-07  3:53     ` John Donnelly
2021-12-07 11:23   ` David Hildenbrand
2021-12-07 11:23     ` David Hildenbrand
2021-12-09 13:02     ` Baoquan He
2021-12-09 13:02       ` Baoquan He
2021-12-09 13:10       ` David Hildenbrand
2021-12-09 13:10         ` David Hildenbrand
2021-12-09 13:23         ` Baoquan He
2021-12-09 13:23           ` Baoquan He
2021-12-07  3:07 ` [PATCH RESEND v2 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages Baoquan He
2021-12-07  3:07   ` Baoquan He
2021-12-07  3:07   ` Baoquan He
2021-12-07  3:54   ` John Donnelly
2021-12-07  3:54     ` John Donnelly
2021-12-07  3:54     ` John Donnelly
2021-12-07  3:07 ` [PATCH RESEND v2 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone Baoquan He
2021-12-07  3:07   ` Baoquan He
2021-12-07  3:54   ` John Donnelly
2021-12-07  3:54     ` John Donnelly
2021-12-07  3:16 ` [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages Baoquan He
2021-12-07  3:16   ` Baoquan He
2021-12-07  4:03   ` John Donnelly
2021-12-07  4:03     ` John Donnelly
2021-12-08  4:33     ` Andrew Morton
2021-12-08  4:33       ` Andrew Morton
2021-12-08  4:56       ` John Donnelly
2021-12-08  4:56         ` John Donnelly
2021-12-13  3:54     ` Baoquan He
2021-12-13  3:54       ` Baoquan He
2021-12-13 13:25   ` Borislav Petkov
2021-12-13 13:25     ` Borislav Petkov
2021-12-13 14:03     ` Baoquan He
2021-12-13 14:03       ` Baoquan He
2021-12-07  8:05 ` Christoph Lameter
2021-12-07  8:05   ` Christoph Lameter
2021-12-09  8:05   ` Baoquan He
2021-12-09  8:05     ` Baoquan He
2021-12-09 12:59     ` Christoph Lameter
2021-12-09 12:59       ` Christoph Lameter
2021-12-13  7:39       ` Baoquan He
2021-12-13  7:39         ` Baoquan He
2021-12-13  7:49         ` Christoph Hellwig
2021-12-13  7:49           ` Christoph Hellwig
2021-12-13 14:21       ` Hyeonggon Yoo
2021-12-13 14:21         ` Hyeonggon Yoo
2021-12-13  7:47   ` Christoph Hellwig
2021-12-13  7:47     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.