All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Handle warning of allocation failure on DMA zone w/o managed pages
@ 2021-12-23  9:44 ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-23  9:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, bhe, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

**Problem observed:
On x86_64, when crash is triggered and entering into kdump kernel, page
allocation failure can always be seen.

 ---------------------------------
 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ......
  __alloc_pages+0x24d/0x2c0
  ......
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ------------------------------------

***Root cause:
In the current kernel, it assumes that DMA zone must have managed pages
and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
always true. E.g in kdump kernel of x86_64, only low 1M is presented and
locked down at very early stage of boot, so that this low 1M won't be
added into buddy allocator to become managed pages of DMA zone. This
exception will always cause page allocation failure if page is requested
from DMA zone.

***Investigation:
This failure happens since below commit merged into linus's tree.
  1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
  f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
  7c321eb2b843 x86/kdump: Remove the backup region handling
  6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified

Before them, on x86_64, the low 640K area will be reused by kdump kernel.
So in kdump kernel, the content of low 640K area is copied into a backup
region for dumping before jumping into kdump. Then except of those firmware
reserved region in [0, 640K], the left area will be added into buddy
allocator to become available managed pages of DMA zone.

However, after above commits applied, in kdump kernel of x86_64, the low
1M is reserved by memblock, but not released to buddy allocator. So any
later page allocation requested from DMA zone will fail.

At the beginning, if crashkernel is reserved, the low 1M need be locked
down because AMD SME encrypts memory making the old backup region
mechanims impossible when switching into kdump kernel.

Later, it was also observed that there are BIOSes corrupting memory
under 1M. To solve this, in commit f1d4d47c5851, the entire region of
low 1M is always reserved after the real mode trampoline is allocated.

Besides, recently, Intel engineer mentioned their TDX (Trusted domain
extensions) which is under development in kernel also needs to lock down
the low 1M. So we can't simply revert above commits to fix the page allocation
failure from DMA zone as someone suggested.

***Solution:
Currently, only DMA atomic pool and dma-kmalloc will initialize and
request page allocation with GFP_DMA during bootup.

So only initializ DMA atomic pool when DMA zone has available managed
pages, otherwise just skip the initialization.

For dma-kmalloc(), for the time being, let's mute the warning of
allocation failure if requesting pages from DMA zone while no manged
pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc()
if not necessary. Christoph is posting patches to fix those under
drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as
people suggested.

Changelog:
v3->v4:
 - Split the old v3 into two separate patchset. The first two clean
   up/improvement patches in v3 have been sent out in a independent
   patchset. The fixes patchs are adapted and sent in this patchset.
 - Do not change dma-kmalloc(), mute the warning of allocation failure
   instead if it's requesting page from DMA zone which has no managed
   pages.

v2-Resend -> v3:
 - Re-implement has_managed_dma() according to David's suggestion.
 - Add Fixes tag and cc stable.

v2->v2 RESEND:
 - John pinged to push the repost of this patchset. So fix one typo of
   suject of patch 3/5; Fix a building error caused by mix declaration in
   patch 5/5. Both of them are found by John from his testing.
 - Rewrite cover letter to add more information.

v1->v2:
 Change to check if managed DMA zone exists. If DMA zone has managed
 pages, go further to request page from DMA zone to initialize. Otherwise,
 just skip to initialize stuffs which need pages from DMA zone.

v3:
https://lore.kernel.org/all/20211213122712.23805-1-bhe@redhat.com/T/#u

V2 RESEND post:
https://lore.kernel.org/all/20211207030750.30824-1-bhe@redhat.com/T/#u

v2 post:
https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/#u

v1 post:
https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/#u



Baoquan He (3):
  mm_zone: add function to check if managed dma zone exists
  dma/pool: create dma atomic pool only if dma zone has managed pages
  mm/page_alloc.c: do not warn allocation failure on zone DMA if no
    managed pages

 include/linux/mmzone.h |  9 +++++++++
 kernel/dma/pool.c      |  4 ++--
 mm/page_alloc.c        | 18 +++++++++++++++++-
 3 files changed, 28 insertions(+), 3 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 0/3] Handle warning of allocation failure on DMA zone w/o managed pages
@ 2021-12-23  9:44 ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-23  9:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, bhe, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

**Problem observed:
On x86_64, when crash is triggered and entering into kdump kernel, page
allocation failure can always be seen.

 ---------------------------------
 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ......
  __alloc_pages+0x24d/0x2c0
  ......
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ------------------------------------

***Root cause:
In the current kernel, it assumes that DMA zone must have managed pages
and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
always true. E.g in kdump kernel of x86_64, only low 1M is presented and
locked down at very early stage of boot, so that this low 1M won't be
added into buddy allocator to become managed pages of DMA zone. This
exception will always cause page allocation failure if page is requested
from DMA zone.

***Investigation:
This failure happens since below commit merged into linus's tree.
  1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
  f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
  7c321eb2b843 x86/kdump: Remove the backup region handling
  6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified

Before them, on x86_64, the low 640K area will be reused by kdump kernel.
So in kdump kernel, the content of low 640K area is copied into a backup
region for dumping before jumping into kdump. Then except of those firmware
reserved region in [0, 640K], the left area will be added into buddy
allocator to become available managed pages of DMA zone.

However, after above commits applied, in kdump kernel of x86_64, the low
1M is reserved by memblock, but not released to buddy allocator. So any
later page allocation requested from DMA zone will fail.

At the beginning, if crashkernel is reserved, the low 1M need be locked
down because AMD SME encrypts memory making the old backup region
mechanims impossible when switching into kdump kernel.

Later, it was also observed that there are BIOSes corrupting memory
under 1M. To solve this, in commit f1d4d47c5851, the entire region of
low 1M is always reserved after the real mode trampoline is allocated.

Besides, recently, Intel engineer mentioned their TDX (Trusted domain
extensions) which is under development in kernel also needs to lock down
the low 1M. So we can't simply revert above commits to fix the page allocation
failure from DMA zone as someone suggested.

***Solution:
Currently, only DMA atomic pool and dma-kmalloc will initialize and
request page allocation with GFP_DMA during bootup.

So only initializ DMA atomic pool when DMA zone has available managed
pages, otherwise just skip the initialization.

For dma-kmalloc(), for the time being, let's mute the warning of
allocation failure if requesting pages from DMA zone while no manged
pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc()
if not necessary. Christoph is posting patches to fix those under
drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as
people suggested.

Changelog:
v3->v4:
 - Split the old v3 into two separate patchset. The first two clean
   up/improvement patches in v3 have been sent out in a independent
   patchset. The fixes patchs are adapted and sent in this patchset.
 - Do not change dma-kmalloc(), mute the warning of allocation failure
   instead if it's requesting page from DMA zone which has no managed
   pages.

v2-Resend -> v3:
 - Re-implement has_managed_dma() according to David's suggestion.
 - Add Fixes tag and cc stable.

v2->v2 RESEND:
 - John pinged to push the repost of this patchset. So fix one typo of
   suject of patch 3/5; Fix a building error caused by mix declaration in
   patch 5/5. Both of them are found by John from his testing.
 - Rewrite cover letter to add more information.

v1->v2:
 Change to check if managed DMA zone exists. If DMA zone has managed
 pages, go further to request page from DMA zone to initialize. Otherwise,
 just skip to initialize stuffs which need pages from DMA zone.

v3:
https://lore.kernel.org/all/20211213122712.23805-1-bhe@redhat.com/T/#u

V2 RESEND post:
https://lore.kernel.org/all/20211207030750.30824-1-bhe@redhat.com/T/#u

v2 post:
https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/#u

v1 post:
https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/#u



Baoquan He (3):
  mm_zone: add function to check if managed dma zone exists
  dma/pool: create dma atomic pool only if dma zone has managed pages
  mm/page_alloc.c: do not warn allocation failure on zone DMA if no
    managed pages

 include/linux/mmzone.h |  9 +++++++++
 kernel/dma/pool.c      |  4 ++--
 mm/page_alloc.c        | 18 +++++++++++++++++-
 3 files changed, 28 insertions(+), 3 deletions(-)

-- 
2.26.3


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 1/3] mm_zone: add function to check if managed dma zone exists
  2021-12-23  9:44 ` Baoquan He
@ 2021-12-23  9:44   ` Baoquan He
  -1 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-23  9:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, bhe, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
E.g in kdump kernel of x86_64, only low 1M is presented and locked down
at very early stage of boot, so that there's no managed pages at all in
DMA zone. This exception will always cause page allocation failure if page
is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages. It will be used in later
patches.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Cc: stable@vger.kernel.org
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mmzone.h |  9 +++++++++
 mm/page_alloc.c        | 15 +++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..6e1b726e9adf 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1046,6 +1046,15 @@ static inline int is_highmem_idx(enum zone_type idx)
 #endif
 }
 
+#ifdef CONFIG_ZONE_DMA
+bool has_managed_dma(void);
+#else
+static inline bool has_managed_dma(void)
+{
+	return false;
+}
+#endif
+
 /**
  * is_highmem - helper function to quickly check if a struct zone is a
  *              highmem zone or not.  This is an attempt to keep references
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..7c7a0b5de2ff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9460,3 +9460,18 @@ bool take_page_off_buddy(struct page *page)
 	return ret;
 }
 #endif
+
+#ifdef CONFIG_ZONE_DMA
+bool has_managed_dma(void)
+{
+	struct pglist_data *pgdat;
+
+	for_each_online_pgdat(pgdat) {
+		struct zone *zone = &pgdat->node_zones[ZONE_DMA];
+
+		if (managed_zone(zone))
+			return true;
+	}
+	return false;
+}
+#endif /* CONFIG_ZONE_DMA */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 1/3] mm_zone: add function to check if managed dma zone exists
@ 2021-12-23  9:44   ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-23  9:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, bhe, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
E.g in kdump kernel of x86_64, only low 1M is presented and locked down
at very early stage of boot, so that there's no managed pages at all in
DMA zone. This exception will always cause page allocation failure if page
is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages. It will be used in later
patches.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Cc: stable@vger.kernel.org
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mmzone.h |  9 +++++++++
 mm/page_alloc.c        | 15 +++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..6e1b726e9adf 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1046,6 +1046,15 @@ static inline int is_highmem_idx(enum zone_type idx)
 #endif
 }
 
+#ifdef CONFIG_ZONE_DMA
+bool has_managed_dma(void);
+#else
+static inline bool has_managed_dma(void)
+{
+	return false;
+}
+#endif
+
 /**
  * is_highmem - helper function to quickly check if a struct zone is a
  *              highmem zone or not.  This is an attempt to keep references
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..7c7a0b5de2ff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9460,3 +9460,18 @@ bool take_page_off_buddy(struct page *page)
 	return ret;
 }
 #endif
+
+#ifdef CONFIG_ZONE_DMA
+bool has_managed_dma(void)
+{
+	struct pglist_data *pgdat;
+
+	for_each_online_pgdat(pgdat) {
+		struct zone *zone = &pgdat->node_zones[ZONE_DMA];
+
+		if (managed_zone(zone))
+			return true;
+	}
+	return false;
+}
+#endif /* CONFIG_ZONE_DMA */
-- 
2.26.3


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
  2021-12-23  9:44 ` Baoquan He
@ 2021-12-23  9:44   ` Baoquan He
  -1 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-23  9:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, bhe, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ? _raw_spin_unlock_irq+0x24/0x40
  ? __alloc_pages_direct_compact+0x90/0x1b0
  __alloc_pages_slowpath.constprop.0+0xf29/0xf50
  ? __cond_resched+0x16/0x50
  ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
  __alloc_pages+0x24d/0x2c0
  ? __dma_atomic_pool_init+0x93/0x93
  alloc_page_interleave+0x13/0xb0
  atomic_pool_expand+0x118/0x210
  ? __dma_atomic_pool_init+0x93/0x93
  __dma_atomic_pool_init+0x45/0x93
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ......
 DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
 DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Cc: stable@vger.kernel.org
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux-foundation.org
---
 kernel/dma/pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
 						    GFP_KERNEL);
 	if (!atomic_pool_kernel)
 		ret = -ENOMEM;
-	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+	if (has_managed_dma()) {
 		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
 						GFP_KERNEL | GFP_DMA);
 		if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 	if (prev == NULL) {
 		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
 			return atomic_pool_dma32;
-		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+		if (atomic_pool_dma && (gfp & GFP_DMA))
 			return atomic_pool_dma;
 		return atomic_pool_kernel;
 	}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
@ 2021-12-23  9:44   ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-23  9:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, bhe, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ? _raw_spin_unlock_irq+0x24/0x40
  ? __alloc_pages_direct_compact+0x90/0x1b0
  __alloc_pages_slowpath.constprop.0+0xf29/0xf50
  ? __cond_resched+0x16/0x50
  ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
  __alloc_pages+0x24d/0x2c0
  ? __dma_atomic_pool_init+0x93/0x93
  alloc_page_interleave+0x13/0xb0
  atomic_pool_expand+0x118/0x210
  ? __dma_atomic_pool_init+0x93/0x93
  __dma_atomic_pool_init+0x45/0x93
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ......
 DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
 DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Cc: stable@vger.kernel.org
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux-foundation.org
---
 kernel/dma/pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
 						    GFP_KERNEL);
 	if (!atomic_pool_kernel)
 		ret = -ENOMEM;
-	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+	if (has_managed_dma()) {
 		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
 						GFP_KERNEL | GFP_DMA);
 		if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 	if (prev == NULL) {
 		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
 			return atomic_pool_dma32;
-		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+		if (atomic_pool_dma && (gfp & GFP_DMA))
 			return atomic_pool_dma;
 		return atomic_pool_kernel;
 	}
-- 
2.26.3


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
  2021-12-23  9:44 ` Baoquan He
@ 2021-12-23  9:44   ` Baoquan He
  -1 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-23  9:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, bhe, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

In kdump kernel of x86_64, page allocation failure is observed:

 kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
 Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
 Workqueue: events_unbound async_run_entry_fn
 Call Trace:
  <TASK>
  dump_stack_lvl+0x48/0x5e
  warn_alloc.cold+0x72/0xd6
  __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
  __alloc_pages+0x1df/0x210
  new_slab+0x389/0x4d0
  ___slab_alloc+0x58f/0x770
  __slab_alloc.constprop.0+0x4a/0x80
  kmem_cache_alloc_trace+0x24b/0x2c0
  sr_probe+0x1db/0x620
  ......
  device_add+0x405/0x920
  ......
  __scsi_add_device+0xe5/0x100
  ata_scsi_scan_host+0x97/0x1d0
  async_run_entry_fn+0x30/0x130
  process_one_work+0x1e8/0x3c0
  worker_thread+0x50/0x3b0
  ? rescuer_thread+0x350/0x350
  kthread+0x16b/0x190
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x22/0x30
  </TASK>
 Mem-Info:
 ......

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages at all in there.
 sr_probe()
 --> get_capabilities()
     --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

Because in the current kernel, dma-kmalloc will be created as long as
CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
reserve the low 1M when the crashkernel option is specified"). The failure
can be always reproduced.

For now, let's mute the warning of allocation failure if requesting pages
from DMA zone while no managed pages.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Cc: stable@vger.kernel.org
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7c7a0b5de2ff..843bc8e5550a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
 	va_list args;
 	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
 
-	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
+	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
+		(gfp_mask & __GFP_DMA) && !has_managed_dma())
 		return;
 
 	va_start(args, fmt);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
@ 2021-12-23  9:44   ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-23  9:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, bhe, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

In kdump kernel of x86_64, page allocation failure is observed:

 kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
 Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
 Workqueue: events_unbound async_run_entry_fn
 Call Trace:
  <TASK>
  dump_stack_lvl+0x48/0x5e
  warn_alloc.cold+0x72/0xd6
  __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
  __alloc_pages+0x1df/0x210
  new_slab+0x389/0x4d0
  ___slab_alloc+0x58f/0x770
  __slab_alloc.constprop.0+0x4a/0x80
  kmem_cache_alloc_trace+0x24b/0x2c0
  sr_probe+0x1db/0x620
  ......
  device_add+0x405/0x920
  ......
  __scsi_add_device+0xe5/0x100
  ata_scsi_scan_host+0x97/0x1d0
  async_run_entry_fn+0x30/0x130
  process_one_work+0x1e8/0x3c0
  worker_thread+0x50/0x3b0
  ? rescuer_thread+0x350/0x350
  kthread+0x16b/0x190
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x22/0x30
  </TASK>
 Mem-Info:
 ......

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages at all in there.
 sr_probe()
 --> get_capabilities()
     --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

Because in the current kernel, dma-kmalloc will be created as long as
CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
reserve the low 1M when the crashkernel option is specified"). The failure
can be always reproduced.

For now, let's mute the warning of allocation failure if requesting pages
from DMA zone while no managed pages.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Cc: stable@vger.kernel.org
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7c7a0b5de2ff..843bc8e5550a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
 	va_list args;
 	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
 
-	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
+	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
+		(gfp_mask & __GFP_DMA) && !has_managed_dma())
 		return;
 
 	va_start(args, fmt);
-- 
2.26.3


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
  2021-12-23  9:44   ` Baoquan He
@ 2021-12-23 10:21     ` Christoph Hellwig
  -1 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2021-12-23 10:21 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, cl, John.p.donnelly, kexec,
	42.hyeyoo, penberg, rientjes, iamjoonsoo.kim, vbabka,
	David.Laight, david, x86, bp

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
@ 2021-12-23 10:21     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2021-12-23 10:21 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, cl, John.p.donnelly, kexec,
	42.hyeyoo, penberg, rientjes, iamjoonsoo.kim, vbabka,
	David.Laight, david, x86, bp

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/3] mm_zone: add function to check if managed dma zone exists
  2021-12-23  9:44   ` Baoquan He
@ 2021-12-23 15:00     ` john.p.donnelly
  -1 siblings, 0 replies; 26+ messages in thread
From: john.p.donnelly @ 2021-12-23 15:00 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, cl, kexec, 42.hyeyoo, penberg, rientjes,
	iamjoonsoo.kim, vbabka, David.Laight, david, x86, bp,
	John Donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:
> In some places of the current kernel, it assumes that dma zone must have
> managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
> E.g in kdump kernel of x86_64, only low 1M is presented and locked down
> at very early stage of boot, so that there's no managed pages at all in
> DMA zone. This exception will always cause page allocation failure if page
> is requested from DMA zone.
> 
> Here add function has_managed_dma() and the relevant helper functions to
> check if there's DMA zone with managed pages. It will be used in later
> patches.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>

> ---
>   include/linux/mmzone.h |  9 +++++++++
>   mm/page_alloc.c        | 15 +++++++++++++++
>   2 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..6e1b726e9adf 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1046,6 +1046,15 @@ static inline int is_highmem_idx(enum zone_type idx)
>   #endif
>   }
>   
> +#ifdef CONFIG_ZONE_DMA
> +bool has_managed_dma(void);
> +#else
> +static inline bool has_managed_dma(void)
> +{
> +	return false;
> +}
> +#endif
> +
>   /**
>    * is_highmem - helper function to quickly check if a struct zone is a
>    *              highmem zone or not.  This is an attempt to keep references
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..7c7a0b5de2ff 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9460,3 +9460,18 @@ bool take_page_off_buddy(struct page *page)
>   	return ret;
>   }
>   #endif
> +
> +#ifdef CONFIG_ZONE_DMA
> +bool has_managed_dma(void)
> +{
> +	struct pglist_data *pgdat;
> +
> +	for_each_online_pgdat(pgdat) {
> +		struct zone *zone = &pgdat->node_zones[ZONE_DMA];
> +
> +		if (managed_zone(zone))
> +			return true;
> +	}
> +	return false;
> +}
> +#endif /* CONFIG_ZONE_DMA */


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/3] mm_zone: add function to check if managed dma zone exists
@ 2021-12-23 15:00     ` john.p.donnelly
  0 siblings, 0 replies; 26+ messages in thread
From: john.p.donnelly @ 2021-12-23 15:00 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, cl, kexec, 42.hyeyoo, penberg, rientjes,
	iamjoonsoo.kim, vbabka, David.Laight, david, x86, bp,
	John Donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:
> In some places of the current kernel, it assumes that dma zone must have
> managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true.
> E.g in kdump kernel of x86_64, only low 1M is presented and locked down
> at very early stage of boot, so that there's no managed pages at all in
> DMA zone. This exception will always cause page allocation failure if page
> is requested from DMA zone.
> 
> Here add function has_managed_dma() and the relevant helper functions to
> check if there's DMA zone with managed pages. It will be used in later
> patches.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>

> ---
>   include/linux/mmzone.h |  9 +++++++++
>   mm/page_alloc.c        | 15 +++++++++++++++
>   2 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..6e1b726e9adf 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1046,6 +1046,15 @@ static inline int is_highmem_idx(enum zone_type idx)
>   #endif
>   }
>   
> +#ifdef CONFIG_ZONE_DMA
> +bool has_managed_dma(void);
> +#else
> +static inline bool has_managed_dma(void)
> +{
> +	return false;
> +}
> +#endif
> +
>   /**
>    * is_highmem - helper function to quickly check if a struct zone is a
>    *              highmem zone or not.  This is an attempt to keep references
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..7c7a0b5de2ff 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9460,3 +9460,18 @@ bool take_page_off_buddy(struct page *page)
>   	return ret;
>   }
>   #endif
> +
> +#ifdef CONFIG_ZONE_DMA
> +bool has_managed_dma(void)
> +{
> +	struct pglist_data *pgdat;
> +
> +	for_each_online_pgdat(pgdat) {
> +		struct zone *zone = &pgdat->node_zones[ZONE_DMA];
> +
> +		if (managed_zone(zone))
> +			return true;
> +	}
> +	return false;
> +}
> +#endif /* CONFIG_ZONE_DMA */


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
  2021-12-23  9:44   ` Baoquan He
@ 2021-12-23 15:01     ` john.p.donnelly
  -1 siblings, 0 replies; 26+ messages in thread
From: john.p.donnelly @ 2021-12-23 15:01 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, cl, kexec, 42.hyeyoo, penberg, rientjes,
	iamjoonsoo.kim, vbabka, David.Laight, david, x86, bp,
	John Donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:
> Currently three dma atomic pools are initialized as long as the relevant
> kernel codes are built in. While in kdump kernel of x86_64, this is not
> right when trying to create atomic_pool_dma, because there's no managed
> pages in DMA zone. In the case, DMA zone only has low 1M memory presented
> and locked down by memblock allocator. So no pages are added into buddy
> of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
> the first 1M of RAM").
> 
> Then in kdump kernel of x86_64, it always prints below failure message:
> 
>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
>   Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
>   Call Trace:
>    dump_stack+0x7f/0xa1
>    warn_alloc.cold+0x72/0xd6
>    ? _raw_spin_unlock_irq+0x24/0x40
>    ? __alloc_pages_direct_compact+0x90/0x1b0
>    __alloc_pages_slowpath.constprop.0+0xf29/0xf50
>    ? __cond_resched+0x16/0x50
>    ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
>    __alloc_pages+0x24d/0x2c0
>    ? __dma_atomic_pool_init+0x93/0x93
>    alloc_page_interleave+0x13/0xb0
>    atomic_pool_expand+0x118/0x210
>    ? __dma_atomic_pool_init+0x93/0x93
>    __dma_atomic_pool_init+0x45/0x93
>    dma_atomic_pool_init+0xdb/0x176
>    do_one_initcall+0x67/0x320
>    ? rcu_read_lock_sched_held+0x3f/0x80
>    kernel_init_freeable+0x290/0x2dc
>    ? rest_init+0x24f/0x24f
>    kernel_init+0xa/0x111
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ......
>   DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
>   DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> 
> Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
> if yes. Otherwise just skip it.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>

> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu@lists.linux-foundation.org
> ---
>   kernel/dma/pool.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5a85804b5beb..00df3edd6c5d 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
>   						    GFP_KERNEL);
>   	if (!atomic_pool_kernel)
>   		ret = -ENOMEM;
> -	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> +	if (has_managed_dma()) {
>   		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>   						GFP_KERNEL | GFP_DMA);
>   		if (!atomic_pool_dma)
> @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
>   	if (prev == NULL) {
>   		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>   			return atomic_pool_dma32;
> -		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> +		if (atomic_pool_dma && (gfp & GFP_DMA))
>   			return atomic_pool_dma;
>   		return atomic_pool_kernel;
>   	}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
@ 2021-12-23 15:01     ` john.p.donnelly
  0 siblings, 0 replies; 26+ messages in thread
From: john.p.donnelly @ 2021-12-23 15:01 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, cl, kexec, 42.hyeyoo, penberg, rientjes,
	iamjoonsoo.kim, vbabka, David.Laight, david, x86, bp,
	John Donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:
> Currently three dma atomic pools are initialized as long as the relevant
> kernel codes are built in. While in kdump kernel of x86_64, this is not
> right when trying to create atomic_pool_dma, because there's no managed
> pages in DMA zone. In the case, DMA zone only has low 1M memory presented
> and locked down by memblock allocator. So no pages are added into buddy
> of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
> the first 1M of RAM").
> 
> Then in kdump kernel of x86_64, it always prints below failure message:
> 
>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
>   Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
>   Call Trace:
>    dump_stack+0x7f/0xa1
>    warn_alloc.cold+0x72/0xd6
>    ? _raw_spin_unlock_irq+0x24/0x40
>    ? __alloc_pages_direct_compact+0x90/0x1b0
>    __alloc_pages_slowpath.constprop.0+0xf29/0xf50
>    ? __cond_resched+0x16/0x50
>    ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
>    __alloc_pages+0x24d/0x2c0
>    ? __dma_atomic_pool_init+0x93/0x93
>    alloc_page_interleave+0x13/0xb0
>    atomic_pool_expand+0x118/0x210
>    ? __dma_atomic_pool_init+0x93/0x93
>    __dma_atomic_pool_init+0x45/0x93
>    dma_atomic_pool_init+0xdb/0x176
>    do_one_initcall+0x67/0x320
>    ? rcu_read_lock_sched_held+0x3f/0x80
>    kernel_init_freeable+0x290/0x2dc
>    ? rest_init+0x24f/0x24f
>    kernel_init+0xa/0x111
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ......
>   DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
>   DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> 
> Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
> if yes. Otherwise just skip it.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>

> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu@lists.linux-foundation.org
> ---
>   kernel/dma/pool.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5a85804b5beb..00df3edd6c5d 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
>   						    GFP_KERNEL);
>   	if (!atomic_pool_kernel)
>   		ret = -ENOMEM;
> -	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> +	if (has_managed_dma()) {
>   		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>   						GFP_KERNEL | GFP_DMA);
>   		if (!atomic_pool_dma)
> @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
>   	if (prev == NULL) {
>   		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>   			return atomic_pool_dma32;
> -		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> +		if (atomic_pool_dma && (gfp & GFP_DMA))
>   			return atomic_pool_dma;
>   		return atomic_pool_kernel;
>   	}


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
  2021-12-23  9:44   ` Baoquan He
@ 2021-12-23 15:01     ` john.p.donnelly
  -1 siblings, 0 replies; 26+ messages in thread
From: john.p.donnelly @ 2021-12-23 15:01 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, cl, kexec, 42.hyeyoo, penberg, rientjes,
	iamjoonsoo.kim, vbabka, David.Laight, david, x86, bp,
	John Donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:
> In kdump kernel of x86_64, page allocation failure is observed:
> 
>   kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
>   Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
>   Workqueue: events_unbound async_run_entry_fn
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x48/0x5e
>    warn_alloc.cold+0x72/0xd6
>    __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
>    __alloc_pages+0x1df/0x210
>    new_slab+0x389/0x4d0
>    ___slab_alloc+0x58f/0x770
>    __slab_alloc.constprop.0+0x4a/0x80
>    kmem_cache_alloc_trace+0x24b/0x2c0
>    sr_probe+0x1db/0x620
>    ......
>    device_add+0x405/0x920
>    ......
>    __scsi_add_device+0xe5/0x100
>    ata_scsi_scan_host+0x97/0x1d0
>    async_run_entry_fn+0x30/0x130
>    process_one_work+0x1e8/0x3c0
>    worker_thread+0x50/0x3b0
>    ? rescuer_thread+0x350/0x350
>    kthread+0x16b/0x190
>    ? set_kthread_struct+0x40/0x40
>    ret_from_fork+0x22/0x30
>    </TASK>
>   Mem-Info:
>   ......
> 
> The above failure happened when calling kmalloc() to allocate buffer with
> GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> pages at all in there.
>   sr_probe()
>   --> get_capabilities()
>       --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> 
> Because in the current kernel, dma-kmalloc will be created as long as
> CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
> managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
> reserve the low 1M when the crashkernel option is specified"). The failure
> can be always reproduced.
> 
> For now, let's mute the warning of allocation failure if requesting pages
> from DMA zone while no managed pages.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>


> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> ---
>   mm/page_alloc.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7c7a0b5de2ff..843bc8e5550a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>   	va_list args;
>   	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
>   
> -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
>   		return;
>   
>   	va_start(args, fmt);


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
@ 2021-12-23 15:01     ` john.p.donnelly
  0 siblings, 0 replies; 26+ messages in thread
From: john.p.donnelly @ 2021-12-23 15:01 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, cl, kexec, 42.hyeyoo, penberg, rientjes,
	iamjoonsoo.kim, vbabka, David.Laight, david, x86, bp,
	John Donnelly

On 12/23/21 3:44 AM, Baoquan He wrote:
> In kdump kernel of x86_64, page allocation failure is observed:
> 
>   kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
>   Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
>   Workqueue: events_unbound async_run_entry_fn
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x48/0x5e
>    warn_alloc.cold+0x72/0xd6
>    __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
>    __alloc_pages+0x1df/0x210
>    new_slab+0x389/0x4d0
>    ___slab_alloc+0x58f/0x770
>    __slab_alloc.constprop.0+0x4a/0x80
>    kmem_cache_alloc_trace+0x24b/0x2c0
>    sr_probe+0x1db/0x620
>    ......
>    device_add+0x405/0x920
>    ......
>    __scsi_add_device+0xe5/0x100
>    ata_scsi_scan_host+0x97/0x1d0
>    async_run_entry_fn+0x30/0x130
>    process_one_work+0x1e8/0x3c0
>    worker_thread+0x50/0x3b0
>    ? rescuer_thread+0x350/0x350
>    kthread+0x16b/0x190
>    ? set_kthread_struct+0x40/0x40
>    ret_from_fork+0x22/0x30
>    </TASK>
>   Mem-Info:
>   ......
> 
> The above failure happened when calling kmalloc() to allocate buffer with
> GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> pages at all in there.
>   sr_probe()
>   --> get_capabilities()
>       --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> 
> Because in the current kernel, dma-kmalloc will be created as long as
> CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
> managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
> reserve the low 1M when the crashkernel option is specified"). The failure
> can be always reproduced.
> 
> For now, let's mute the warning of allocation failure if requesting pages
> from DMA zone while no managed pages.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>


> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> ---
>   mm/page_alloc.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7c7a0b5de2ff..843bc8e5550a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>   	va_list args;
>   	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
>   
> -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
>   		return;
>   
>   	va_start(args, fmt);


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
  2021-12-23  9:44   ` Baoquan He
@ 2021-12-25  5:53     ` Hyeonggon Yoo
  -1 siblings, 0 replies; 26+ messages in thread
From: Hyeonggon Yoo @ 2021-12-25  5:53 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, cl, John.p.donnelly, kexec,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

On Thu, Dec 23, 2021 at 05:44:35PM +0800, Baoquan He wrote:
> In kdump kernel of x86_64, page allocation failure is observed:
> 
>  kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>  CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
>  Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
>  Workqueue: events_unbound async_run_entry_fn
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x48/0x5e
>   warn_alloc.cold+0x72/0xd6
>   __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
>   __alloc_pages+0x1df/0x210
>   new_slab+0x389/0x4d0
>   ___slab_alloc+0x58f/0x770
>   __slab_alloc.constprop.0+0x4a/0x80
>   kmem_cache_alloc_trace+0x24b/0x2c0
>   sr_probe+0x1db/0x620
>   ......
>   device_add+0x405/0x920
>   ......
>   __scsi_add_device+0xe5/0x100
>   ata_scsi_scan_host+0x97/0x1d0
>   async_run_entry_fn+0x30/0x130
>   process_one_work+0x1e8/0x3c0
>   worker_thread+0x50/0x3b0
>   ? rescuer_thread+0x350/0x350
>   kthread+0x16b/0x190
>   ? set_kthread_struct+0x40/0x40
>   ret_from_fork+0x22/0x30
>   </TASK>
>  Mem-Info:
>  ......
> 
> The above failure happened when calling kmalloc() to allocate buffer with
> GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> pages at all in there.
>  sr_probe()
>  --> get_capabilities()
>      --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> 
> Because in the current kernel, dma-kmalloc will be created as long as
> CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
> managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
> reserve the low 1M when the crashkernel option is specified"). The failure
> can be always reproduced.
> 
> For now, let's mute the warning of allocation failure if requesting pages
> from DMA zone while no managed pages.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7c7a0b5de2ff..843bc8e5550a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>  	va_list args;
>  	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
>  
> -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
>  		return;
>

Warning when there's always no page in DMA zone is unnecessary 
and it confuses user.

The patch looks good.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

And there is some driers that allocate memory with GFP_DMA
even if that flag is unnecessary. We need to do cleanup later.

Baoquan Are you planning to do it soon?
I want to help that.

Merry Christmas,
Hyeonggon

>  	va_start(args, fmt);
> -- 
> 2.26.3
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
@ 2021-12-25  5:53     ` Hyeonggon Yoo
  0 siblings, 0 replies; 26+ messages in thread
From: Hyeonggon Yoo @ 2021-12-25  5:53 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, cl, John.p.donnelly, kexec,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

On Thu, Dec 23, 2021 at 05:44:35PM +0800, Baoquan He wrote:
> In kdump kernel of x86_64, page allocation failure is observed:
> 
>  kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>  CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
>  Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
>  Workqueue: events_unbound async_run_entry_fn
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x48/0x5e
>   warn_alloc.cold+0x72/0xd6
>   __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
>   __alloc_pages+0x1df/0x210
>   new_slab+0x389/0x4d0
>   ___slab_alloc+0x58f/0x770
>   __slab_alloc.constprop.0+0x4a/0x80
>   kmem_cache_alloc_trace+0x24b/0x2c0
>   sr_probe+0x1db/0x620
>   ......
>   device_add+0x405/0x920
>   ......
>   __scsi_add_device+0xe5/0x100
>   ata_scsi_scan_host+0x97/0x1d0
>   async_run_entry_fn+0x30/0x130
>   process_one_work+0x1e8/0x3c0
>   worker_thread+0x50/0x3b0
>   ? rescuer_thread+0x350/0x350
>   kthread+0x16b/0x190
>   ? set_kthread_struct+0x40/0x40
>   ret_from_fork+0x22/0x30
>   </TASK>
>  Mem-Info:
>  ......
> 
> The above failure happened when calling kmalloc() to allocate buffer with
> GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> pages at all in there.
>  sr_probe()
>  --> get_capabilities()
>      --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> 
> Because in the current kernel, dma-kmalloc will be created as long as
> CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
> managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
> reserve the low 1M when the crashkernel option is specified"). The failure
> can be always reproduced.
> 
> For now, let's mute the warning of allocation failure if requesting pages
> from DMA zone while no managed pages.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7c7a0b5de2ff..843bc8e5550a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>  	va_list args;
>  	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
>  
> -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
>  		return;
>

Warning when there's always no page in DMA zone is unnecessary 
and it confuses user.

The patch looks good.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

And there is some driers that allocate memory with GFP_DMA
even if that flag is unnecessary. We need to do cleanup later.

Baoquan Are you planning to do it soon?
I want to help that.

Merry Christmas,
Hyeonggon

>  	va_start(args, fmt);
> -- 
> 2.26.3
> 
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
  2021-12-25  5:53     ` Hyeonggon Yoo
@ 2021-12-27  8:32       ` Baoquan He
  -1 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-27  8:32 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: linux-kernel, linux-mm, akpm, hch, cl, John.p.donnelly, kexec,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

On 12/25/21 at 05:53am, Hyeonggon Yoo wrote:
> On Thu, Dec 23, 2021 at 05:44:35PM +0800, Baoquan He wrote:
...... 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 7c7a0b5de2ff..843bc8e5550a 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
> >  	va_list args;
> >  	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
> >  
> > -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> > +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> > +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
> >  		return;
> >
> 
> Warning when there's always no page in DMA zone is unnecessary 
> and it confuses user.
> 
> The patch looks good.
> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> And there is some driers that allocate memory with GFP_DMA
> even if that flag is unnecessary. We need to do cleanup later.

Thanks for reviewing and giving out some awesome suggestions.

> 
> Baoquan Are you planning to do it soon?
> I want to help that.

Yes, I had the plan and have done a little part. I talked to Christoph
about my thought. I planned to collect all kmalloc(GFP_DMA) callsite and
post a RFC mail, CC mailing list and maintainers related. Anyone
interested or know one or several callsites well can help.

Now, Christoph has handled all under drviers/scsi, and post patches to
fix them. I have gone throug those places and found out below callsites
where we can remove GFP_DMA directly when calling kmalloc() since not
necessary. And even found one place kmalloc(GFP_DMA32).

(HEAD -> master) vxge: don't use GFP_DMA
mtd: rawnand: marvell: don't use GFP_DMA
HID: intel-ish-hid: remove wrong GFP_DMA32 flag
ps3disk: don't use GFP_DMA
atm: iphase: don't use GFP_DMA

Next, I will send a RFC mail to contain those suspect callsites. We can
track them and can help if needed. Suggest to change them with:
1) using dma_alloc_xx , or dma_map_xx after kmalloc()
2) using alloc_pages(GFP_DMA) instead

When we fix, we all post patch with subject key words as
'xxxx: don't use GFP_DMA'. Christoph has posted patch with the similar
subject, we can search subject to get all related patches for later back
porting.

I will add you to CC when sending. Could be tomorrow. Any suggestion or thought?

Thanks
Baoquan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
@ 2021-12-27  8:32       ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2021-12-27  8:32 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: linux-kernel, linux-mm, akpm, hch, cl, John.p.donnelly, kexec,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

On 12/25/21 at 05:53am, Hyeonggon Yoo wrote:
> On Thu, Dec 23, 2021 at 05:44:35PM +0800, Baoquan He wrote:
...... 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 7c7a0b5de2ff..843bc8e5550a 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
> >  	va_list args;
> >  	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
> >  
> > -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> > +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> > +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
> >  		return;
> >
> 
> Warning when there's always no page in DMA zone is unnecessary 
> and it confuses user.
> 
> The patch looks good.
> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> And there is some driers that allocate memory with GFP_DMA
> even if that flag is unnecessary. We need to do cleanup later.

Thanks for reviewing and giving out some awesome suggestions.

> 
> Baoquan Are you planning to do it soon?
> I want to help that.

Yes, I had the plan and have done a little part. I talked to Christoph
about my thought. I planned to collect all kmalloc(GFP_DMA) callsite and
post a RFC mail, CC mailing list and maintainers related. Anyone
interested or know one or several callsites well can help.

Now, Christoph has handled all under drviers/scsi, and post patches to
fix them. I have gone throug those places and found out below callsites
where we can remove GFP_DMA directly when calling kmalloc() since not
necessary. And even found one place kmalloc(GFP_DMA32).

(HEAD -> master) vxge: don't use GFP_DMA
mtd: rawnand: marvell: don't use GFP_DMA
HID: intel-ish-hid: remove wrong GFP_DMA32 flag
ps3disk: don't use GFP_DMA
atm: iphase: don't use GFP_DMA

Next, I will send a RFC mail to contain those suspect callsites. We can
track them and can help if needed. Suggest to change them with:
1) using dma_alloc_xx , or dma_map_xx after kmalloc()
2) using alloc_pages(GFP_DMA) instead

When we fix, we all post patch with subject key words as
'xxxx: don't use GFP_DMA'. Christoph has posted patch with the similar
subject, we can search subject to get all related patches for later back
porting.

I will add you to CC when sending. Could be tomorrow. Any suggestion or thought?

Thanks
Baoquan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
  2021-12-27  8:32       ` Baoquan He
@ 2021-12-28  5:06         ` Hyeonggon Yoo
  -1 siblings, 0 replies; 26+ messages in thread
From: Hyeonggon Yoo @ 2021-12-28  5:06 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, cl, John.p.donnelly, kexec,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

On Mon, Dec 27, 2021 at 04:32:53PM +0800, Baoquan He wrote:
> On 12/25/21 at 05:53am, Hyeonggon Yoo wrote:
> > On Thu, Dec 23, 2021 at 05:44:35PM +0800, Baoquan He wrote:
> ...... 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 7c7a0b5de2ff..843bc8e5550a 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
> > >  	va_list args;
> > >  	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
> > >  
> > > -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> > > +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> > > +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
> > >  		return;
> > >
> > 
> > Warning when there's always no page in DMA zone is unnecessary 
> > and it confuses user.
> > 
> > The patch looks good.
> > Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > 
> > And there is some driers that allocate memory with GFP_DMA
> > even if that flag is unnecessary. We need to do cleanup later.
> 
> Thanks for reviewing and giving out some awesome suggestions.
> 

You're welcome. Impressed to see you keep following the issue.

> > 
> > Baoquan Are you planning to do it soon?
> > I want to help that.
> 
> Yes, I had the plan and have done a little part. I talked to Christoph
> about my thought. I planned to collect all kmalloc(GFP_DMA) callsite and
> post a RFC mail, CC mailing list and maintainers related. Anyone
> interested or know one or several callsites well can help.
>

Good to hear that.
I want to help by reviewing and discussing your patches.

> Now, Christoph has handled all under drviers/scsi, and post patches to
> fix them.

Oh, didn't know he was already doing that work.

> I have gone throug those places and found out below callsites
> where we can remove GFP_DMA directly when calling kmalloc() since not
> necessary.

Note that some of them might have 24bit addressing limitation.
we need to ask maintainer or read its specification to know GFP_DMA
is unnecessary.

> And even found one place kmalloc(GFP_DMA32).

kmalloc(GFP_DMA32) is wrong because we do not create DMA32 kmalloc caches.

> (HEAD -> master) vxge: don't use GFP_DMA
> mtd: rawnand: marvell: don't use GFP_DMA
> HID: intel-ish-hid: remove wrong GFP_DMA32 flag
> ps3disk: don't use GFP_DMA
> atm: iphase: don't use GFP_DMA

> Next, I will send a RFC mail to contain those suspect callsites. We can
> track them and can help if needed. Suggest to change them with:
> 1) using dma_alloc_xx , or dma_map_xx after kmalloc()
> 2) using alloc_pages(GFP_DMA) instead

Well if the buffer is not sensitive to performance, we can just
allocate with kmalloc(GFP_KERNEL) so that dma api can use proper bounce
buffer.

if the buffer is for fastpath, I think we should convert them to
use dma_alloc_pages() to get a proper buffer.

Note that most of devices are already calling dma_map_xx directly or indirectly
(think about block layer for example) if they don't use deprecated virt_to_bus()
or friends.

But if the device do not use DMA API at all, we have few choices.
maybe convert them to use alloc_pages(GFP_DMA/GFP_DMA32) I guess?

> When we fix, we all post patch with subject key words as
> 'xxxx: don't use GFP_DMA'. Christoph has posted patch with the similar
> subject, we can search subject to get all related patches for later back
> porting.
> 
> I will add you to CC when sending. Could be tomorrow. Any suggestion or thought?
>
> Thanks
> Baoquan
> 

Thank you!
Hyeonggon

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
@ 2021-12-28  5:06         ` Hyeonggon Yoo
  0 siblings, 0 replies; 26+ messages in thread
From: Hyeonggon Yoo @ 2021-12-28  5:06 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, hch, cl, John.p.donnelly, kexec,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, david,
	x86, bp

On Mon, Dec 27, 2021 at 04:32:53PM +0800, Baoquan He wrote:
> On 12/25/21 at 05:53am, Hyeonggon Yoo wrote:
> > On Thu, Dec 23, 2021 at 05:44:35PM +0800, Baoquan He wrote:
> ...... 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 7c7a0b5de2ff..843bc8e5550a 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
> > >  	va_list args;
> > >  	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
> > >  
> > > -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> > > +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> > > +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
> > >  		return;
> > >
> > 
> > Warning when there's always no page in DMA zone is unnecessary 
> > and it confuses user.
> > 
> > The patch looks good.
> > Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > 
> > And there is some driers that allocate memory with GFP_DMA
> > even if that flag is unnecessary. We need to do cleanup later.
> 
> Thanks for reviewing and giving out some awesome suggestions.
> 

You're welcome. Impressed to see you keep following the issue.

> > 
> > Baoquan Are you planning to do it soon?
> > I want to help that.
> 
> Yes, I had the plan and have done a little part. I talked to Christoph
> about my thought. I planned to collect all kmalloc(GFP_DMA) callsite and
> post a RFC mail, CC mailing list and maintainers related. Anyone
> interested or know one or several callsites well can help.
>

Good to hear that.
I want to help by reviewing and discussing your patches.

> Now, Christoph has handled all under drviers/scsi, and post patches to
> fix them.

Oh, didn't know he was already doing that work.

> I have gone throug those places and found out below callsites
> where we can remove GFP_DMA directly when calling kmalloc() since not
> necessary.

Note that some of them might have 24bit addressing limitation.
we need to ask maintainer or read its specification to know GFP_DMA
is unnecessary.

> And even found one place kmalloc(GFP_DMA32).

kmalloc(GFP_DMA32) is wrong because we do not create DMA32 kmalloc caches.

> (HEAD -> master) vxge: don't use GFP_DMA
> mtd: rawnand: marvell: don't use GFP_DMA
> HID: intel-ish-hid: remove wrong GFP_DMA32 flag
> ps3disk: don't use GFP_DMA
> atm: iphase: don't use GFP_DMA

> Next, I will send a RFC mail to contain those suspect callsites. We can
> track them and can help if needed. Suggest to change them with:
> 1) using dma_alloc_xx , or dma_map_xx after kmalloc()
> 2) using alloc_pages(GFP_DMA) instead

Well if the buffer is not sensitive to performance, we can just
allocate with kmalloc(GFP_KERNEL) so that dma api can use proper bounce
buffer.

if the buffer is for fastpath, I think we should convert them to
use dma_alloc_pages() to get a proper buffer.

Note that most of devices are already calling dma_map_xx directly or indirectly
(think about block layer for example) if they don't use deprecated virt_to_bus()
or friends.

But if the device do not use DMA API at all, we have few choices.
maybe convert them to use alloc_pages(GFP_DMA/GFP_DMA32) I guess?

> When we fix, we all post patch with subject key words as
> 'xxxx: don't use GFP_DMA'. Christoph has posted patch with the similar
> subject, we can search subject to get all related patches for later back
> porting.
> 
> I will add you to CC when sending. Could be tomorrow. Any suggestion or thought?
>
> Thanks
> Baoquan
> 

Thank you!
Hyeonggon

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
  2021-12-23  9:44   ` Baoquan He
@ 2022-01-03  9:34     ` David Hildenbrand
  -1 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-01-03  9:34 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, cl, John.p.donnelly, kexec, 42.hyeyoo,
	penberg, rientjes, iamjoonsoo.kim, vbabka, David.Laight, x86, bp

On 23.12.21 10:44, Baoquan He wrote:
> Currently three dma atomic pools are initialized as long as the relevant
> kernel codes are built in. While in kdump kernel of x86_64, this is not
> right when trying to create atomic_pool_dma, because there's no managed
> pages in DMA zone. In the case, DMA zone only has low 1M memory presented
> and locked down by memblock allocator. So no pages are added into buddy
> of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
> the first 1M of RAM").
> 
> Then in kdump kernel of x86_64, it always prints below failure message:
> 
>  DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>  swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
>  Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
>  Call Trace:
>   dump_stack+0x7f/0xa1
>   warn_alloc.cold+0x72/0xd6
>   ? _raw_spin_unlock_irq+0x24/0x40
>   ? __alloc_pages_direct_compact+0x90/0x1b0
>   __alloc_pages_slowpath.constprop.0+0xf29/0xf50
>   ? __cond_resched+0x16/0x50
>   ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
>   __alloc_pages+0x24d/0x2c0
>   ? __dma_atomic_pool_init+0x93/0x93
>   alloc_page_interleave+0x13/0xb0
>   atomic_pool_expand+0x118/0x210
>   ? __dma_atomic_pool_init+0x93/0x93
>   __dma_atomic_pool_init+0x45/0x93
>   dma_atomic_pool_init+0xdb/0x176
>   do_one_initcall+0x67/0x320
>   ? rcu_read_lock_sched_held+0x3f/0x80
>   kernel_init_freeable+0x290/0x2dc
>   ? rest_init+0x24f/0x24f
>   kernel_init+0xa/0x111
>   ret_from_fork+0x22/0x30
>  Mem-Info:
>  ......
>  DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
>  DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> 
> Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
> if yes. Otherwise just skip it.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu@lists.linux-foundation.org
> ---
>  kernel/dma/pool.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5a85804b5beb..00df3edd6c5d 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
>  						    GFP_KERNEL);
>  	if (!atomic_pool_kernel)
>  		ret = -ENOMEM;
> -	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> +	if (has_managed_dma()) {
>  		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>  						GFP_KERNEL | GFP_DMA);
>  		if (!atomic_pool_dma)
> @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
>  	if (prev == NULL) {
>  		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>  			return atomic_pool_dma32;
> -		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> +		if (atomic_pool_dma && (gfp & GFP_DMA))
>  			return atomic_pool_dma;
>  		return atomic_pool_kernel;
>  	}

I thought for a second that we might have to tweak
atomic_pool_work_fn(), but atomic_pool_resize() handles it properly already.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages
@ 2022-01-03  9:34     ` David Hildenbrand
  0 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-01-03  9:34 UTC (permalink / raw)
  To: kexec

On 23.12.21 10:44, Baoquan He wrote:
> Currently three dma atomic pools are initialized as long as the relevant
> kernel codes are built in. While in kdump kernel of x86_64, this is not
> right when trying to create atomic_pool_dma, because there's no managed
> pages in DMA zone. In the case, DMA zone only has low 1M memory presented
> and locked down by memblock allocator. So no pages are added into buddy
> of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
> the first 1M of RAM").
> 
> Then in kdump kernel of x86_64, it always prints below failure message:
> 
>  DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>  swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
>  Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
>  Call Trace:
>   dump_stack+0x7f/0xa1
>   warn_alloc.cold+0x72/0xd6
>   ? _raw_spin_unlock_irq+0x24/0x40
>   ? __alloc_pages_direct_compact+0x90/0x1b0
>   __alloc_pages_slowpath.constprop.0+0xf29/0xf50
>   ? __cond_resched+0x16/0x50
>   ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
>   __alloc_pages+0x24d/0x2c0
>   ? __dma_atomic_pool_init+0x93/0x93
>   alloc_page_interleave+0x13/0xb0
>   atomic_pool_expand+0x118/0x210
>   ? __dma_atomic_pool_init+0x93/0x93
>   __dma_atomic_pool_init+0x45/0x93
>   dma_atomic_pool_init+0xdb/0x176
>   do_one_initcall+0x67/0x320
>   ? rcu_read_lock_sched_held+0x3f/0x80
>   kernel_init_freeable+0x290/0x2dc
>   ? rest_init+0x24f/0x24f
>   kernel_init+0xa/0x111
>   ret_from_fork+0x22/0x30
>  Mem-Info:
>  ......
>  DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
>  DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> 
> Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
> if yes. Otherwise just skip it.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable at vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu at lists.linux-foundation.org
> ---
>  kernel/dma/pool.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5a85804b5beb..00df3edd6c5d 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
>  						    GFP_KERNEL);
>  	if (!atomic_pool_kernel)
>  		ret = -ENOMEM;
> -	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> +	if (has_managed_dma()) {
>  		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>  						GFP_KERNEL | GFP_DMA);
>  		if (!atomic_pool_dma)
> @@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
>  	if (prev == NULL) {
>  		if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>  			return atomic_pool_dma32;
> -		if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> +		if (atomic_pool_dma && (gfp & GFP_DMA))
>  			return atomic_pool_dma;
>  		return atomic_pool_kernel;
>  	}

I thought for a second that we might have to tweak
atomic_pool_work_fn(), but atomic_pool_resize() handles it properly already.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] Handle warning of allocation failure on DMA zone w/o managed pages
  2021-12-23  9:44 ` Baoquan He
@ 2022-01-12 16:25   ` john.p.donnelly
  -1 siblings, 0 replies; 26+ messages in thread
From: john.p.donnelly @ 2022-01-12 16:25 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, hch, cl, kexec, 42.hyeyoo, penberg, rientjes,
	iamjoonsoo.kim, vbabka, David.Laight, david, x86, bp

On 12/23/21 3:44 AM, Baoquan He wrote:
> **Problem observed:
> On x86_64, when crash is triggered and entering into kdump kernel, page
> allocation failure can always be seen.
> 
>   ---------------------------------
>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 1 Comm: swapper/0
>   Call Trace:
>    dump_stack+0x7f/0xa1
>    warn_alloc.cold+0x72/0xd6
>    ......
>    __alloc_pages+0x24d/0x2c0
>    ......
>    dma_atomic_pool_init+0xdb/0x176
>    do_one_initcall+0x67/0x320
>    ? rcu_read_lock_sched_held+0x3f/0x80
>    kernel_init_freeable+0x290/0x2dc
>    ? rest_init+0x24f/0x24f
>    kernel_init+0xa/0x111
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ------------------------------------
> 
> ***Root cause:
> In the current kernel, it assumes that DMA zone must have managed pages
> and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
> always true. E.g in kdump kernel of x86_64, only low 1M is presented and
> locked down at very early stage of boot, so that this low 1M won't be
> added into buddy allocator to become managed pages of DMA zone. This
> exception will always cause page allocation failure if page is requested
> from DMA zone.
> 
> ***Investigation:
> This failure happens since below commit merged into linus's tree.
>    1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
>    23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
>    f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
>    7c321eb2b843 x86/kdump: Remove the backup region handling
>    6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
> 
> Before them, on x86_64, the low 640K area will be reused by kdump kernel.
> So in kdump kernel, the content of low 640K area is copied into a backup
> region for dumping before jumping into kdump. Then except of those firmware
> reserved region in [0, 640K], the left area will be added into buddy
> allocator to become available managed pages of DMA zone.
> 
> However, after above commits applied, in kdump kernel of x86_64, the low
> 1M is reserved by memblock, but not released to buddy allocator. So any
> later page allocation requested from DMA zone will fail.
> 
> At the beginning, if crashkernel is reserved, the low 1M need be locked
> down because AMD SME encrypts memory making the old backup region
> mechanims impossible when switching into kdump kernel.
> 
> Later, it was also observed that there are BIOSes corrupting memory
> under 1M. To solve this, in commit f1d4d47c5851, the entire region of
> low 1M is always reserved after the real mode trampoline is allocated.
> 
> Besides, recently, Intel engineer mentioned their TDX (Trusted domain
> extensions) which is under development in kernel also needs to lock down
> the low 1M. So we can't simply revert above commits to fix the page allocation
> failure from DMA zone as someone suggested.
> 
> ***Solution:
> Currently, only DMA atomic pool and dma-kmalloc will initialize and
> request page allocation with GFP_DMA during bootup.
> 
> So only initializ DMA atomic pool when DMA zone has available managed
> pages, otherwise just skip the initialization.
> 
> For dma-kmalloc(), for the time being, let's mute the warning of
> allocation failure if requesting pages from DMA zone while no manged
> pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
> replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc()
> if not necessary. Christoph is posting patches to fix those under
> drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as
> people suggested.
> 
> Changelog:
> v3->v4:
>   - Split the old v3 into two separate patchset. The first two clean
>     up/improvement patches in v3 have been sent out in a independent
>     patchset. The fixes patchs are adapted and sent in this patchset.
>   - Do not change dma-kmalloc(), mute the warning of allocation failure
>     instead if it's requesting page from DMA zone which has no managed
>     pages.
> 
> v2-Resend -> v3:
>   - Re-implement has_managed_dma() according to David's suggestion.
>   - Add Fixes tag and cc stable.
> 
> v2->v2 RESEND:
>   - John pinged to push the repost of this patchset. So fix one typo of
>     suject of patch 3/5; Fix a building error caused by mix declaration in
>     patch 5/5. Both of them are found by John from his testing.
>   - Rewrite cover letter to add more information.
> 
> v1->v2:
>   Change to check if managed DMA zone exists. If DMA zone has managed
>   pages, go further to request page from DMA zone to initialize. Otherwise,
>   just skip to initialize stuffs which need pages from DMA zone.
> 
> v3:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20211213122712.23805-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpS2Y0ecPm$
> 
> V2 RESEND post:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20211207030750.30824-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpSzZmH18k$
> 
> v2 post:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpS0Fbih0f$
> 
> v1 post:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpSwc3nQBz$
> 
> 
> 
> Baoquan He (3):
>    mm_zone: add function to check if managed dma zone exists
>    dma/pool: create dma atomic pool only if dma zone has managed pages
>    mm/page_alloc.c: do not warn allocation failure on zone DMA if no
>      managed pages
> 
>   include/linux/mmzone.h |  9 +++++++++
>   kernel/dma/pool.c      |  4 ++--
>   mm/page_alloc.c        | 18 +++++++++++++++++-
>   3 files changed, 28 insertions(+), 3 deletions(-)
> 


Tested-by: John Donnelly

I don't see GFP malloc failures when the CD-ROM is enumerated anymore 
either when kdump kernel is started.

tested on 5.15.13.






^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 0/3] Handle warning of allocation failure on DMA zone w/o managed pages
@ 2022-01-12 16:25   ` john.p.donnelly
  0 siblings, 0 replies; 26+ messages in thread
From: john.p.donnelly @ 2022-01-12 16:25 UTC (permalink / raw)
  To: kexec

On 12/23/21 3:44 AM, Baoquan He wrote:
> **Problem observed:
> On x86_64, when crash is triggered and entering into kdump kernel, page
> allocation failure can always be seen.
> 
>   ---------------------------------
>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 1 Comm: swapper/0
>   Call Trace:
>    dump_stack+0x7f/0xa1
>    warn_alloc.cold+0x72/0xd6
>    ......
>    __alloc_pages+0x24d/0x2c0
>    ......
>    dma_atomic_pool_init+0xdb/0x176
>    do_one_initcall+0x67/0x320
>    ? rcu_read_lock_sched_held+0x3f/0x80
>    kernel_init_freeable+0x290/0x2dc
>    ? rest_init+0x24f/0x24f
>    kernel_init+0xa/0x111
>    ret_from_fork+0x22/0x30
>   Mem-Info:
>   ------------------------------------
> 
> ***Root cause:
> In the current kernel, it assumes that DMA zone must have managed pages
> and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
> always true. E.g in kdump kernel of x86_64, only low 1M is presented and
> locked down at very early stage of boot, so that this low 1M won't be
> added into buddy allocator to become managed pages of DMA zone. This
> exception will always cause page allocation failure if page is requested
> from DMA zone.
> 
> ***Investigation:
> This failure happens since below commit merged into linus's tree.
>    1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
>    23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
>    f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
>    7c321eb2b843 x86/kdump: Remove the backup region handling
>    6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
> 
> Before them, on x86_64, the low 640K area will be reused by kdump kernel.
> So in kdump kernel, the content of low 640K area is copied into a backup
> region for dumping before jumping into kdump. Then except of those firmware
> reserved region in [0, 640K], the left area will be added into buddy
> allocator to become available managed pages of DMA zone.
> 
> However, after above commits applied, in kdump kernel of x86_64, the low
> 1M is reserved by memblock, but not released to buddy allocator. So any
> later page allocation requested from DMA zone will fail.
> 
> At the beginning, if crashkernel is reserved, the low 1M need be locked
> down because AMD SME encrypts memory making the old backup region
> mechanims impossible when switching into kdump kernel.
> 
> Later, it was also observed that there are BIOSes corrupting memory
> under 1M. To solve this, in commit f1d4d47c5851, the entire region of
> low 1M is always reserved after the real mode trampoline is allocated.
> 
> Besides, recently, Intel engineer mentioned their TDX (Trusted domain
> extensions) which is under development in kernel also needs to lock down
> the low 1M. So we can't simply revert above commits to fix the page allocation
> failure from DMA zone as someone suggested.
> 
> ***Solution:
> Currently, only DMA atomic pool and dma-kmalloc will initialize and
> request page allocation with GFP_DMA during bootup.
> 
> So only initializ DMA atomic pool when DMA zone has available managed
> pages, otherwise just skip the initialization.
> 
> For dma-kmalloc(), for the time being, let's mute the warning of
> allocation failure if requesting pages from DMA zone while no manged
> pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
> replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc()
> if not necessary. Christoph is posting patches to fix those under
> drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as
> people suggested.
> 
> Changelog:
> v3->v4:
>   - Split the old v3 into two separate patchset. The first two clean
>     up/improvement patches in v3 have been sent out in a independent
>     patchset. The fixes patchs are adapted and sent in this patchset.
>   - Do not change dma-kmalloc(), mute the warning of allocation failure
>     instead if it's requesting page from DMA zone which has no managed
>     pages.
> 
> v2-Resend -> v3:
>   - Re-implement has_managed_dma() according to David's suggestion.
>   - Add Fixes tag and cc stable.
> 
> v2->v2 RESEND:
>   - John pinged to push the repost of this patchset. So fix one typo of
>     suject of patch 3/5; Fix a building error caused by mix declaration in
>     patch 5/5. Both of them are found by John from his testing.
>   - Rewrite cover letter to add more information.
> 
> v1->v2:
>   Change to check if managed DMA zone exists. If DMA zone has managed
>   pages, go further to request page from DMA zone to initialize. Otherwise,
>   just skip to initialize stuffs which need pages from DMA zone.
> 
> v3:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20211213122712.23805-1-bhe at redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpS2Y0ecPm$
> 
> V2 RESEND post:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20211207030750.30824-1-bhe at redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpSzZmH18k$
> 
> v2 post:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20210810094835.13402-1-bhe at redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpS0Fbih0f$
> 
> v1 post:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20210624052010.5676-1-bhe at redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!e1KjpVuZycBkxdeNxcsRUQ7MH92KQQk7FfCZs5tzEcBVusUiph0w9zpxOgKpSwc3nQBz$
> 
> 
> 
> Baoquan He (3):
>    mm_zone: add function to check if managed dma zone exists
>    dma/pool: create dma atomic pool only if dma zone has managed pages
>    mm/page_alloc.c: do not warn allocation failure on zone DMA if no
>      managed pages
> 
>   include/linux/mmzone.h |  9 +++++++++
>   kernel/dma/pool.c      |  4 ++--
>   mm/page_alloc.c        | 18 +++++++++++++++++-
>   3 files changed, 28 insertions(+), 3 deletions(-)
> 


Tested-by: John Donnelly

I don't see GFP malloc failures when the CD-ROM is enumerated anymore 
either when kdump kernel is started.

tested on 5.15.13.







^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-01-12 16:26 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-23  9:44 [PATCH v4 0/3] Handle warning of allocation failure on DMA zone w/o managed pages Baoquan He
2021-12-23  9:44 ` Baoquan He
2021-12-23  9:44 ` [PATCH v4 1/3] mm_zone: add function to check if managed dma zone exists Baoquan He
2021-12-23  9:44   ` Baoquan He
2021-12-23 15:00   ` john.p.donnelly
2021-12-23 15:00     ` john.p.donnelly
2021-12-23  9:44 ` [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages Baoquan He
2021-12-23  9:44   ` Baoquan He
2021-12-23 10:21   ` Christoph Hellwig
2021-12-23 10:21     ` Christoph Hellwig
2021-12-23 15:01   ` john.p.donnelly
2021-12-23 15:01     ` john.p.donnelly
2022-01-03  9:34   ` David Hildenbrand
2022-01-03  9:34     ` David Hildenbrand
2021-12-23  9:44 ` [PATCH v4 3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no " Baoquan He
2021-12-23  9:44   ` Baoquan He
2021-12-23 15:01   ` john.p.donnelly
2021-12-23 15:01     ` john.p.donnelly
2021-12-25  5:53   ` Hyeonggon Yoo
2021-12-25  5:53     ` Hyeonggon Yoo
2021-12-27  8:32     ` Baoquan He
2021-12-27  8:32       ` Baoquan He
2021-12-28  5:06       ` Hyeonggon Yoo
2021-12-28  5:06         ` Hyeonggon Yoo
2022-01-12 16:25 ` [PATCH v4 0/3] Handle warning of allocation failure on DMA zone w/o " john.p.donnelly
2022-01-12 16:25   ` john.p.donnelly

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.