linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
@ 2020-05-21  9:38 Chen Zhou
  2020-05-21  9:38 ` [PATCH v8 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Chen Zhou @ 2020-05-21  9:38 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, dyoung, bhe, robh+dt
  Cc: arnd, John.p.donnelly, pkushwaha, horms, guohanjun, chenzhou10,
	linux-arm-kernel, devicetree, linux-doc, linux-kernel, kexec

This patch series enable reserving crashkernel above 4G in arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required, crash dump kernel
will boot failure because there is no low memory available for allocation.

To solve these issues, introduce crashkernel=X,low to reserve specified
size low memory.
Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
is specified simultaneously, kernel should reserve specified size low memory
for crash dump kernel devices. So there may be two crash kernel regions, one
is below 4G, the other is above 4G.
In order to distinct from the high region and make no effect to the use of
kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
"linux,low-memory-range" to crash dump kernel's dtb to pass the low region.

Besides, we need to modify kexec-tools:
arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])

The previous changes and discussions can be retrieved from:

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt 
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
- Add Tested-by from Jhon and pk

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2020-May/025128.html
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411

Chen Zhou (5):
  x86: kdump: move reserve_crashkernel_low() into crash_core.c
  arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  arm64: kdump: add memory for devices by DT property, low-memory-range
  kdump: update Documentation about crashkernel on arm64
  dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump

 Documentation/admin-guide/kdump/kdump.rst     | 13 ++-
 .../admin-guide/kernel-parameters.txt         | 12 ++-
 Documentation/devicetree/bindings/chosen.txt  | 25 ++++++
 arch/arm64/kernel/setup.c                     |  8 +-
 arch/arm64/mm/init.c                          | 61 ++++++++++++-
 arch/x86/kernel/setup.c                       | 66 ++------------
 include/linux/crash_core.h                    |  3 +
 include/linux/kexec.h                         |  2 -
 kernel/crash_core.c                           | 85 +++++++++++++++++++
 kernel/kexec_core.c                           | 17 ----
 10 files changed, 208 insertions(+), 84 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v8 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2020-05-21  9:38 [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
@ 2020-05-21  9:38 ` Chen Zhou
  2020-05-26  0:56   ` Baoquan He
  2020-05-21  9:38 ` [PATCH v8 2/5] arm64: kdump: reserve crashkenel above 4G for crash dump kernel Chen Zhou
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Chen Zhou @ 2020-05-21  9:38 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, dyoung, bhe, robh+dt
  Cc: arnd, John.p.donnelly, pkushwaha, horms, guohanjun, chenzhou10,
	linux-arm-kernel, devicetree, linux-doc, linux-kernel, kexec

In preparation for supporting reserve_crashkernel_low in arm64 as
x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
BTW, move x86 CRASH_ALIGN to 2M.

Note, in arm64, we reserve low memory if and only if crashkernel=X,low
is specified. Different with x86_64, don't set low memory automatically.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Tested-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
---
 arch/x86/kernel/setup.c    | 66 ++++-------------------------
 include/linux/crash_core.h |  3 ++
 include/linux/kexec.h      |  2 -
 kernel/crash_core.c        | 85 ++++++++++++++++++++++++++++++++++++++
 kernel/kexec_core.c        | 17 --------
 5 files changed, 96 insertions(+), 77 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 4b3fa6cd3106..de75fec73d47 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -395,8 +395,8 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 
 #ifdef CONFIG_KEXEC_CORE
 
-/* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN		SZ_16M
+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGN		SZ_2M
 
 /*
  * Keep the crash kernel below this limit.
@@ -419,59 +419,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_HIGH_MAX	SZ_64T
 #endif
 
-static int __init reserve_crashkernel_low(void)
-{
-#ifdef CONFIG_X86_64
-	unsigned long long base, low_base = 0, low_size = 0;
-	unsigned long total_low_mem;
-	int ret;
-
-	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
-
-	/* crashkernel=Y,low */
-	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
-	if (ret) {
-		/*
-		 * two parts from kernel/dma/swiotlb.c:
-		 * -swiotlb size: user-specified with swiotlb= or default.
-		 *
-		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
-		 * to 8M for other buffers that may need to stay low too. Also
-		 * make sure we allocate enough extra low memory so that we
-		 * don't run out of DMA buffers for 32-bit devices.
-		 */
-		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
-	} else {
-		/* passed with crashkernel=0,low ? */
-		if (!low_size)
-			return 0;
-	}
-
-	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
-	if (!low_base) {
-		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
-		       (unsigned long)(low_size >> 20));
-		return -ENOMEM;
-	}
-
-	ret = memblock_reserve(low_base, low_size);
-	if (ret) {
-		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
-		return ret;
-	}
-
-	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
-		(unsigned long)(low_size >> 20),
-		(unsigned long)(low_base >> 20),
-		(unsigned long)(total_low_mem >> 20));
-
-	crashk_low_res.start = low_base;
-	crashk_low_res.end   = low_base + low_size - 1;
-	insert_resource(&iomem_resource, &crashk_low_res);
-#endif
-	return 0;
-}
-
 static void __init reserve_crashkernel(void)
 {
 	unsigned long long crash_size, crash_base, total_mem;
@@ -535,9 +482,12 @@ static void __init reserve_crashkernel(void)
 		return;
 	}
 
-	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
-		memblock_free(crash_base, crash_size);
-		return;
+	if (crash_base >= (1ULL << 32)) {
+		if (reserve_crashkernel_low()) {
+			memblock_free(crash_base, crash_size);
+			return;
+		}
+		insert_resource(&iomem_resource, &crashk_low_res);
 	}
 
 	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 525510a9f965..4df8c0bff03e 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 extern unsigned char *vmcoreinfo_data;
 extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
+extern struct resource crashk_res;
+extern struct resource crashk_low_res;
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len);
@@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
+int __init reserve_crashkernel_low(void);
 
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1776eb2e43a4..5d5d9635b18d 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -330,8 +330,6 @@ extern int kexec_load_disabled;
 
 /* Location of a reserved region to hold the crash kernel.
  */
-extern struct resource crashk_res;
-extern struct resource crashk_low_res;
 extern note_buf_t __percpu *crash_notes;
 
 /* flag to track if kexec reboot is in progress */
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 9f1557b98468..a7580d291c37 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -7,6 +7,8 @@
 #include <linux/crash_core.h>
 #include <linux/utsname.h>
 #include <linux/vmalloc.h>
+#include <linux/memblock.h>
+#include <linux/swiotlb.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
 /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
 static unsigned char *vmcoreinfo_data_safecopy;
 
+/* Location of the reserved area for the crash kernel */
+struct resource crashk_res = {
+	.name  = "Crash kernel",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+	.desc  = IORES_DESC_CRASH_KERNEL
+};
+struct resource crashk_low_res = {
+	.name  = "Crash kernel",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+	.desc  = IORES_DESC_CRASH_KERNEL
+};
+
 /*
  * parsing the "crashkernel" commandline
  *
@@ -292,6 +310,73 @@ int __init parse_crashkernel_low(char *cmdline,
 				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
 }
 
+#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
+#define CRASH_ALIGN		SZ_2M
+#endif
+
+int __init reserve_crashkernel_low(void)
+{
+#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
+	unsigned long long base, low_base = 0, low_size = 0;
+	unsigned long total_low_mem;
+	int ret;
+
+	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
+
+	/* crashkernel=Y,low */
+	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
+			&base);
+	if (ret) {
+#ifdef CONFIG_X86_64
+		/*
+		 * two parts from lib/swiotlb.c:
+		 * -swiotlb size: user-specified with swiotlb= or default.
+		 *
+		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
+		 * to 8M for other buffers that may need to stay low too. Also
+		 * make sure we allocate enough extra low memory so that we
+		 * don't run out of DMA buffers for 32-bit devices.
+		 */
+		low_size = max(swiotlb_size_or_default() + (8UL << 20),
+				256UL << 20);
+#else
+		/*
+		 * in arm64, reserve low memory if and only if crashkernel=X,low
+		 * specified.
+		 */
+		return -EINVAL;
+#endif
+	} else {
+		/* passed with crashkernel=0,low ? */
+		if (!low_size)
+			return 0;
+	}
+
+	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
+	if (!low_base) {
+		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
+		       (unsigned long)(low_size >> 20));
+		return -ENOMEM;
+	}
+
+	ret = memblock_reserve(low_base, low_size);
+	if (ret) {
+		pr_err("%s: Error reserving crashkernel low memblock.\n",
+				__func__);
+		return ret;
+	}
+
+	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
+		(unsigned long)(low_size >> 20),
+		(unsigned long)(low_base >> 20),
+		(unsigned long)(total_low_mem >> 20));
+
+	crashk_low_res.start = low_base;
+	crashk_low_res.end   = low_base + low_size - 1;
+#endif
+	return 0;
+}
+
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len)
 {
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0dad1ebe..db66bbabfff3 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
 /* Flag to indicate we are going to kexec a new kernel */
 bool kexec_in_progress = false;
 
-
-/* Location of the reserved area for the crash kernel */
-struct resource crashk_res = {
-	.name  = "Crash kernel",
-	.start = 0,
-	.end   = 0,
-	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
-	.desc  = IORES_DESC_CRASH_KERNEL
-};
-struct resource crashk_low_res = {
-	.name  = "Crash kernel",
-	.start = 0,
-	.end   = 0,
-	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
-	.desc  = IORES_DESC_CRASH_KERNEL
-};
-
 int kexec_should_crash(struct task_struct *p)
 {
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v8 2/5] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-05-21  9:38 [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
  2020-05-21  9:38 ` [PATCH v8 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
@ 2020-05-21  9:38 ` Chen Zhou
  2020-05-26  0:59   ` Baoquan He
  2020-05-21  9:38 ` [PATCH v8 3/5] arm64: kdump: add memory for devices by DT property, low-memory-range Chen Zhou
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Chen Zhou @ 2020-05-21  9:38 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, dyoung, bhe, robh+dt
  Cc: arnd, John.p.donnelly, pkushwaha, horms, guohanjun, chenzhou10,
	linux-arm-kernel, devicetree, linux-doc, linux-kernel, kexec

Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=X,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Tested-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
---
 arch/arm64/kernel/setup.c |  8 +++++++-
 arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 3fd2c11c09fc..a8487e4d3e5a 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
 		    kernel_data.end <= res->end)
 			request_resource(res, &kernel_data);
 #ifdef CONFIG_KEXEC_CORE
-		/* Userspace will find "Crash kernel" region in /proc/iomem. */
+		/*
+		 * Userspace will find "Crash kernel" region in /proc/iomem.
+		 * Note: the low region is renamed as Crash kernel (low).
+		 */
+		if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+				crashk_low_res.end <= res->end)
+			request_resource(res, &crashk_low_res);
 		if (crashk_res.end && crashk_res.start >= res->start &&
 		    crashk_res.end <= res->end)
 			request_resource(res, &crashk_res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e42727e3568e..71498acf0cd8 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -81,6 +81,7 @@ static void __init reserve_crashkernel(void)
 {
 	unsigned long long crash_base, crash_size;
 	int ret;
+	phys_addr_t crash_max = arm64_dma32_phys_limit;
 
 	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
 				&crash_size, &crash_base);
@@ -88,12 +89,38 @@ static void __init reserve_crashkernel(void)
 	if (ret || !crash_size)
 		return;
 
+	ret = reserve_crashkernel_low();
+	if (!ret && crashk_low_res.end) {
+		/*
+		 * If crashkernel=X,low specified, there may be two regions,
+		 * we need to make some changes as follows:
+		 *
+		 * 1. rename the low region as "Crash kernel (low)"
+		 * In order to distinct from the high region and make no effect
+		 * to the use of existing kexec-tools, rename the low region as
+		 * "Crash kernel (low)".
+		 *
+		 * 2. change the upper bound for crash memory
+		 * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
+		 *
+		 * 3. mark the low region as "nomap"
+		 * The low region is intended to be used for crash dump kernel
+		 * devices, just mark the low region as "nomap" simply.
+		 */
+		const char *rename = "Crash kernel (low)";
+
+		crashk_low_res.name = rename;
+		crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
+		memblock_mark_nomap(crashk_low_res.start,
+				    resource_size(&crashk_low_res));
+	}
+
 	crash_size = PAGE_ALIGN(crash_size);
 
 	if (crash_base == 0) {
 		/* Current arm64 boot protocol requires 2MB alignment */
-		crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
-				crash_size, SZ_2M);
+		crash_base = memblock_find_in_range(0, crash_max, crash_size,
+				SZ_2M);
 		if (crash_base == 0) {
 			pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
 				crash_size);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v8 3/5] arm64: kdump: add memory for devices by DT property, low-memory-range
  2020-05-21  9:38 [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
  2020-05-21  9:38 ` [PATCH v8 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
  2020-05-21  9:38 ` [PATCH v8 2/5] arm64: kdump: reserve crashkenel above 4G for crash dump kernel Chen Zhou
@ 2020-05-21  9:38 ` Chen Zhou
  2020-05-21  9:38 ` [PATCH v8 4/5] kdump: update Documentation about crashkernel on arm64 Chen Zhou
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Chen Zhou @ 2020-05-21  9:38 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, dyoung, bhe, robh+dt
  Cc: arnd, John.p.donnelly, pkushwaha, horms, guohanjun, chenzhou10,
	linux-arm-kernel, devicetree, linux-doc, linux-kernel, kexec

If we want to reserve crashkernel above 4G, we could use parameters
"crashkernel=X crashkernel=Y,low", in this case, specified size low
memory is reserved for crash dump kernel devices and never mapped by
the first kernel. This memory range is advertised to crash dump kernel
via DT property under /chosen,
	linux,low-memory-range=<BASE SIZE>

Crash dump kernel reads this property at boot time and call
memblock_add() after memblock_cap_memory_range() has been called.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Tested-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
---
 arch/arm64/mm/init.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 71498acf0cd8..fcc3abee7003 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -323,6 +323,26 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static int __init early_init_dt_scan_lowmem(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	struct memblock_region *lowmem = data;
+	const __be32 *reg;
+	int len;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,low-memory-range", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	lowmem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	lowmem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
 static int __init early_init_dt_scan_usablemem(unsigned long node,
 		const char *uname, int depth, void *data)
 {
@@ -353,13 +373,21 @@ static void __init fdt_enforce_memory_region(void)
 
 	if (reg.size)
 		memblock_cap_memory_range(reg.base, reg.size);
+
+	of_scan_flat_dt(early_init_dt_scan_lowmem, &reg);
+
+	if (reg.size)
+		memblock_add(reg.base, reg.size);
 }
 
 void __init arm64_memblock_init(void)
 {
 	const s64 linear_region_size = BIT(vabits_actual - 1);
 
-	/* Handle linux,usable-memory-range property */
+	/*
+	 * Handle linux,usable-memory-range and linux,low-memory-range
+	 * properties.
+	 */
 	fdt_enforce_memory_region();
 
 	/* Remove memory above our supported physical address size */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v8 4/5] kdump: update Documentation about crashkernel on arm64
  2020-05-21  9:38 [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
                   ` (2 preceding siblings ...)
  2020-05-21  9:38 ` [PATCH v8 3/5] arm64: kdump: add memory for devices by DT property, low-memory-range Chen Zhou
@ 2020-05-21  9:38 ` Chen Zhou
  2020-05-21  9:38 ` [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump Chen Zhou
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Chen Zhou @ 2020-05-21  9:38 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, dyoung, bhe, robh+dt
  Cc: arnd, John.p.donnelly, pkushwaha, horms, guohanjun, chenzhou10,
	linux-arm-kernel, devicetree, linux-doc, linux-kernel, kexec

Now we support crashkernel=X,[low] on arm64, update the Documentation.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Tested-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
---
 Documentation/admin-guide/kdump/kdump.rst       | 13 +++++++++++--
 Documentation/admin-guide/kernel-parameters.txt | 12 +++++++++++-
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index ac7e131d2935..e55173ec1666 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -299,7 +299,13 @@ Boot into System Kernel
    "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
    starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
 
-   On x86 and x86_64, use "crashkernel=64M@16M".
+   On x86 use "crashkernel=64M@16M".
+
+   On x86_64, use "crashkernel=Y[@X]" to select a region under 4G first, and
+   fall back to reserve region above 4G when '@offset' hasn't been specified.
+   We can also use "crashkernel=X,high" to select a region above 4G, which
+   also tries to allocate at least 256M below 4G automatically and
+   "crashkernel=Y,low" can be used to allocate specified size low memory.
 
    On ppc64, use "crashkernel=128M@32M".
 
@@ -316,8 +322,11 @@ Boot into System Kernel
    kernel will automatically locate the crash kernel image within the
    first 512MB of RAM if X is not given.
 
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   On arm64, use "crashkernel=Y[@X]". Note that the start address of
    the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+   If crashkernel=Z,low is specified simultaneously, reserve spcified size
+   low memory for crash kdump kernel devices firstly and then reserve memory
+   above 4G.
 
 Load the Dump-capture Kernel
 ============================
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7bc83f3d9bdf..97695783b817 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -722,6 +722,9 @@
 			[KNL, x86_64] select a region under 4G first, and
 			fall back to reserve region above 4G when '@offset'
 			hasn't been specified.
+			[KNL, arm64] If crashkernel=X,low is specified, reserve
+			spcified size low memory for crash kdump kernel devices
+			firstly, and then reserve memory above 4G.
 			See Documentation/admin-guide/kdump/kdump.rst for further details.
 
 	crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -746,12 +749,19 @@
 			requires at least 64M+32K low memory, also enough extra
 			low memory is needed to make sure DMA buffers for 32-bit
 			devices won't run out. Kernel would try to allocate at
-			at least 256M below 4G automatically.
+			least 256M below 4G automatically.
 			This one let user to specify own low range under 4G
 			for second kernel instead.
 			0: to disable low allocation.
 			It will be ignored when crashkernel=X,high is not used
 			or memory reserved is below 4G.
+			[KNL, arm64] range under 4G.
+			This one let user to specify own low range under 4G
+			for crash dump kernel instead.
+			Different with x86_64, kernel allocates specified size
+			physical memory region only when this parameter is specified
+			instead of trying to allocate at least 256M below 4G
+			automatically.
 
 	cryptomgr.notests
 			[KNL] Disable crypto self-tests
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
  2020-05-21  9:38 [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
                   ` (3 preceding siblings ...)
  2020-05-21  9:38 ` [PATCH v8 4/5] kdump: update Documentation about crashkernel on arm64 Chen Zhou
@ 2020-05-21  9:38 ` Chen Zhou
  2020-05-21 13:29   ` Rob Herring
  2020-05-26  1:42 ` [PATCH v8 0/5] support reserving crashkernel above 4G on " Baoquan He
  2020-06-01 12:02 ` Prabhakar Kushwaha
  6 siblings, 1 reply; 34+ messages in thread
From: Chen Zhou @ 2020-05-21  9:38 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, dyoung, bhe, robh+dt
  Cc: arnd, John.p.donnelly, pkushwaha, horms, guohanjun, chenzhou10,
	linux-arm-kernel, devicetree, linux-doc, linux-kernel, kexec

Add documentation for DT property used by arm64 kdump:
linux,low-memory-range.
"linux,low-memory-range" is an another memory region used for crash
dump kernel devices.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 Documentation/devicetree/bindings/chosen.txt | 25 ++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 45e79172a646..bfe6fb6976e6 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -103,6 +103,31 @@ While this property does not represent a real hardware, the address
 and the size are expressed in #address-cells and #size-cells,
 respectively, of the root node.
 
+linux,low-memory-range
+----------------------
+This property (arm64 only) holds a base address and size, describing a
+limited region below 4G. Similar to "linux,usable-memory-range", it is
+an another memory range which may be considered available for use by the
+kernel.
+
+e.g.
+
+/ {
+	chosen {
+		linux,low-memory-range = <0x0 0x70000000 0x0 0x10000000>;
+		linux,usable-memory-range = <0x202f 0xc0000000 0x0 0x40000000>;
+	};
+};
+
+The main usage is for crash dump kernel devices when reserving crashkernel
+above 4G. When reserving crashkernel above 4G, there may be two crash kernel
+regions, one is below 4G, the other is above 4G. In order to distinct from
+the high region, use this property to pass the low region.
+
+While this property does not represent a real hardware, the address
+and the size are expressed in #address-cells and #size-cells,
+respectively, of the root node.
+
 linux,elfcorehdr
 ----------------
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
  2020-05-21  9:38 ` [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump Chen Zhou
@ 2020-05-21 13:29   ` Rob Herring
  2020-05-22  3:24     ` chenzhou
  0 siblings, 1 reply; 34+ messages in thread
From: Rob Herring @ 2020-05-21 13:29 UTC (permalink / raw)
  To: Chen Zhou
  Cc: Thomas Gleixner, Ingo Molnar, Catalin Marinas, Will Deacon,
	dyoung, Baoquan He, Arnd Bergmann, John.p.donnelly, pkushwaha,
	Simon Horman, Hanjun Guo,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	devicetree, Linux Doc Mailing List, linux-kernel, kexec

On Thu, May 21, 2020 at 3:35 AM Chen Zhou <chenzhou10@huawei.com> wrote:
>
> Add documentation for DT property used by arm64 kdump:
> linux,low-memory-range.
> "linux,low-memory-range" is an another memory region used for crash
> dump kernel devices.
>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> ---
>  Documentation/devicetree/bindings/chosen.txt | 25 ++++++++++++++++++++
>  1 file changed, 25 insertions(+)

chosen is now a schema documented here[1].

> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> index 45e79172a646..bfe6fb6976e6 100644
> --- a/Documentation/devicetree/bindings/chosen.txt
> +++ b/Documentation/devicetree/bindings/chosen.txt
> @@ -103,6 +103,31 @@ While this property does not represent a real hardware, the address
>  and the size are expressed in #address-cells and #size-cells,
>  respectively, of the root node.
>
> +linux,low-memory-range
> +----------------------
> +This property (arm64 only) holds a base address and size, describing a
> +limited region below 4G. Similar to "linux,usable-memory-range", it is
> +an another memory range which may be considered available for use by the
> +kernel.

Why can't you just add a range to "linux,usable-memory-range"? It
shouldn't be hard to figure out which part is below 4G.

Rob

[1] https://github.com/devicetree-org/dt-schema/blob/master/schemas/chosen.yaml

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
  2020-05-21 13:29   ` Rob Herring
@ 2020-05-22  3:24     ` chenzhou
  2020-05-26 21:18       ` Rob Herring
  0 siblings, 1 reply; 34+ messages in thread
From: chenzhou @ 2020-05-22  3:24 UTC (permalink / raw)
  To: Rob Herring
  Cc: Thomas Gleixner, Ingo Molnar, Catalin Marinas, Will Deacon,
	dyoung, Baoquan He, Arnd Bergmann, John.p.donnelly, pkushwaha,
	Simon Horman, Hanjun Guo,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	devicetree, Linux Doc Mailing List, linux-kernel, kexec

Hi Rob,

On 2020/5/21 21:29, Rob Herring wrote:
> On Thu, May 21, 2020 at 3:35 AM Chen Zhou <chenzhou10@huawei.com> wrote:
>> Add documentation for DT property used by arm64 kdump:
>> linux,low-memory-range.
>> "linux,low-memory-range" is an another memory region used for crash
>> dump kernel devices.
>>
>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>> ---
>>  Documentation/devicetree/bindings/chosen.txt | 25 ++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
> chosen is now a schema documented here[1].
Ok, that is, i don't need to modify the doc in kernel, just create a pull request in github [1]?

>
>> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
>> index 45e79172a646..bfe6fb6976e6 100644
>> --- a/Documentation/devicetree/bindings/chosen.txt
>> +++ b/Documentation/devicetree/bindings/chosen.txt
>> @@ -103,6 +103,31 @@ While this property does not represent a real hardware, the address
>>  and the size are expressed in #address-cells and #size-cells,
>>  respectively, of the root node.
>>
>> +linux,low-memory-range
>> +----------------------
>> +This property (arm64 only) holds a base address and size, describing a
>> +limited region below 4G. Similar to "linux,usable-memory-range", it is
>> +an another memory range which may be considered available for use by the
>> +kernel.
> Why can't you just add a range to "linux,usable-memory-range"? It
> shouldn't be hard to figure out which part is below 4G.
I did like this in my previous version, such as v5. After discussed with James, i modified it to the current way.

We think the existing behavior should be unchanged, which helps with keeping compatibility with existing
user-space and older kdump kernels.

The comments from James:
> linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>.
Won't this break if your kdump kernel doesn't know what the extra parameters are?
Or if it expects two ranges, but only gets one? These DT properties should be treated as
ABI between kernel versions, we can't really change it like this.

I think the 'low' region is an optional-extra, that is never mapped by the first kernel. I
think the simplest thing to do is to add an 'linux,low-memory-range' that we
memblock_add() after memblock_cap_memory_range() has been called.
If its missing, or the new kernel doesn't know what its for, everything keeps working.

previous discusses:
https://lkml.org/lkml/2019/6/5/674
https://lkml.org/lkml/2019/6/13/229

Thanks,
Chen Zhou

>
> Rob
>
> [1] https://github.com/devicetree-org/dt-schema/blob/master/schemas/chosen.yaml
>
> .
>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2020-05-21  9:38 ` [PATCH v8 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
@ 2020-05-26  0:56   ` Baoquan He
  0 siblings, 0 replies; 34+ messages in thread
From: Baoquan He @ 2020-05-26  0:56 UTC (permalink / raw)
  To: Chen Zhou
  Cc: tglx, mingo, catalin.marinas, will, dyoung, robh+dt,
	John.p.donnelly, arnd, devicetree, linux-doc, kexec,
	linux-kernel, horms, guohanjun, pkushwaha, linux-arm-kernel

On 05/21/20 at 05:38pm, Chen Zhou wrote:
> In preparation for supporting reserve_crashkernel_low in arm64 as
> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.



> BTW, move x86 CRASH_ALIGN to 2M.

The reason is?

> 
> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
> is specified. Different with x86_64, don't set low memory automatically.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> ---
>  arch/x86/kernel/setup.c    | 66 ++++-------------------------
>  include/linux/crash_core.h |  3 ++
>  include/linux/kexec.h      |  2 -
>  kernel/crash_core.c        | 85 ++++++++++++++++++++++++++++++++++++++
>  kernel/kexec_core.c        | 17 --------
>  5 files changed, 96 insertions(+), 77 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 4b3fa6cd3106..de75fec73d47 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -395,8 +395,8 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>  
>  #ifdef CONFIG_KEXEC_CORE
>  
> -/* 16M alignment for crash kernel regions */
> -#define CRASH_ALIGN		SZ_16M
> +/* 2M alignment for crash kernel regions */
> +#define CRASH_ALIGN		SZ_2M
>  
>  /*
>   * Keep the crash kernel below this limit.
> @@ -419,59 +419,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>  # define CRASH_ADDR_HIGH_MAX	SZ_64T
>  #endif
>  
> -static int __init reserve_crashkernel_low(void)
> -{
> -#ifdef CONFIG_X86_64
> -	unsigned long long base, low_base = 0, low_size = 0;
> -	unsigned long total_low_mem;
> -	int ret;
> -
> -	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> -
> -	/* crashkernel=Y,low */
> -	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
> -	if (ret) {
> -		/*
> -		 * two parts from kernel/dma/swiotlb.c:
> -		 * -swiotlb size: user-specified with swiotlb= or default.
> -		 *
> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> -		 * to 8M for other buffers that may need to stay low too. Also
> -		 * make sure we allocate enough extra low memory so that we
> -		 * don't run out of DMA buffers for 32-bit devices.
> -		 */
> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> -	} else {
> -		/* passed with crashkernel=0,low ? */
> -		if (!low_size)
> -			return 0;
> -	}
> -
> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> -	if (!low_base) {
> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> -		       (unsigned long)(low_size >> 20));
> -		return -ENOMEM;
> -	}
> -
> -	ret = memblock_reserve(low_base, low_size);
> -	if (ret) {
> -		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
> -		return ret;
> -	}
> -
> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
> -		(unsigned long)(low_size >> 20),
> -		(unsigned long)(low_base >> 20),
> -		(unsigned long)(total_low_mem >> 20));
> -
> -	crashk_low_res.start = low_base;
> -	crashk_low_res.end   = low_base + low_size - 1;
> -	insert_resource(&iomem_resource, &crashk_low_res);
> -#endif
> -	return 0;
> -}
> -
>  static void __init reserve_crashkernel(void)
>  {
>  	unsigned long long crash_size, crash_base, total_mem;
> @@ -535,9 +482,12 @@ static void __init reserve_crashkernel(void)
>  		return;
>  	}
>  
> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> -		memblock_free(crash_base, crash_size);
> -		return;
> +	if (crash_base >= (1ULL << 32)) {
> +		if (reserve_crashkernel_low()) {
> +			memblock_free(crash_base, crash_size);
> +			return;
> +		}
> +		insert_resource(&iomem_resource, &crashk_low_res);
>  	}
>  
>  	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index 525510a9f965..4df8c0bff03e 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
>  extern unsigned char *vmcoreinfo_data;
>  extern size_t vmcoreinfo_size;
>  extern u32 *vmcoreinfo_note;
> +extern struct resource crashk_res;
> +extern struct resource crashk_low_res;
>  
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>  			  void *data, size_t data_len);
> @@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>  		unsigned long long *crash_size, unsigned long long *crash_base);
>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>  		unsigned long long *crash_size, unsigned long long *crash_base);
> +int __init reserve_crashkernel_low(void);
>  
>  #endif /* LINUX_CRASH_CORE_H */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 1776eb2e43a4..5d5d9635b18d 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -330,8 +330,6 @@ extern int kexec_load_disabled;
>  
>  /* Location of a reserved region to hold the crash kernel.
>   */
> -extern struct resource crashk_res;
> -extern struct resource crashk_low_res;
>  extern note_buf_t __percpu *crash_notes;
>  
>  /* flag to track if kexec reboot is in progress */
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 9f1557b98468..a7580d291c37 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -7,6 +7,8 @@
>  #include <linux/crash_core.h>
>  #include <linux/utsname.h>
>  #include <linux/vmalloc.h>
> +#include <linux/memblock.h>
> +#include <linux/swiotlb.h>
>  
>  #include <asm/page.h>
>  #include <asm/sections.h>
> @@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
>  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>  static unsigned char *vmcoreinfo_data_safecopy;
>  
> +/* Location of the reserved area for the crash kernel */
> +struct resource crashk_res = {
> +	.name  = "Crash kernel",
> +	.start = 0,
> +	.end   = 0,
> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> +	.desc  = IORES_DESC_CRASH_KERNEL
> +};
> +struct resource crashk_low_res = {
> +	.name  = "Crash kernel",
> +	.start = 0,
> +	.end   = 0,
> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> +	.desc  = IORES_DESC_CRASH_KERNEL
> +};
> +
>  /*
>   * parsing the "crashkernel" commandline
>   *
> @@ -292,6 +310,73 @@ int __init parse_crashkernel_low(char *cmdline,
>  				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
>  }
>  
> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
> +#define CRASH_ALIGN		SZ_2M
> +#endif
> +
> +int __init reserve_crashkernel_low(void)
> +{
> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
> +	unsigned long long base, low_base = 0, low_size = 0;
> +	unsigned long total_low_mem;
> +	int ret;
> +
> +	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> +
> +	/* crashkernel=Y,low */
> +	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
> +			&base);
> +	if (ret) {
> +#ifdef CONFIG_X86_64
> +		/*
> +		 * two parts from lib/swiotlb.c:
> +		 * -swiotlb size: user-specified with swiotlb= or default.
> +		 *
> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> +		 * to 8M for other buffers that may need to stay low too. Also
> +		 * make sure we allocate enough extra low memory so that we
> +		 * don't run out of DMA buffers for 32-bit devices.
> +		 */
> +		low_size = max(swiotlb_size_or_default() + (8UL << 20),
> +				256UL << 20);
> +#else
> +		/*
> +		 * in arm64, reserve low memory if and only if crashkernel=X,low
> +		 * specified.
> +		 */
> +		return -EINVAL;
> +#endif
> +	} else {
> +		/* passed with crashkernel=0,low ? */
> +		if (!low_size)
> +			return 0;
> +	}
> +
> +	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> +	if (!low_base) {
> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> +		       (unsigned long)(low_size >> 20));
> +		return -ENOMEM;
> +	}
> +
> +	ret = memblock_reserve(low_base, low_size);
> +	if (ret) {
> +		pr_err("%s: Error reserving crashkernel low memblock.\n",
> +				__func__);
> +		return ret;
> +	}
> +
> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
> +		(unsigned long)(low_size >> 20),
> +		(unsigned long)(low_base >> 20),
> +		(unsigned long)(total_low_mem >> 20));
> +
> +	crashk_low_res.start = low_base;
> +	crashk_low_res.end   = low_base + low_size - 1;
> +#endif
> +	return 0;
> +}
> +
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>  			  void *data, size_t data_len)
>  {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..db66bbabfff3 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>  /* Flag to indicate we are going to kexec a new kernel */
>  bool kexec_in_progress = false;
>  
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> -	.name  = "Crash kernel",
> -	.start = 0,
> -	.end   = 0,
> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> -	.desc  = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> -	.name  = "Crash kernel",
> -	.start = 0,
> -	.end   = 0,
> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> -	.desc  = IORES_DESC_CRASH_KERNEL
> -};
> -
>  int kexec_should_crash(struct task_struct *p)
>  {
>  	/*
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 2/5] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-05-21  9:38 ` [PATCH v8 2/5] arm64: kdump: reserve crashkenel above 4G for crash dump kernel Chen Zhou
@ 2020-05-26  0:59   ` Baoquan He
  0 siblings, 0 replies; 34+ messages in thread
From: Baoquan He @ 2020-05-26  0:59 UTC (permalink / raw)
  To: Chen Zhou
  Cc: tglx, mingo, catalin.marinas, will, dyoung, robh+dt,
	John.p.donnelly, arnd, devicetree, linux-doc, kexec,
	linux-kernel, horms, guohanjun, pkushwaha, linux-arm-kernel

On 05/21/20 at 05:38pm, Chen Zhou wrote:
> Crashkernel=X tries to reserve memory for the crash dump kernel under
> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
> size low memory for crash kdump kernel devices firstly and then reserve
> memory above 4G.

Wondering why crashkernel=,high is not introduced to arm64 to be
consistent with x86_64, to make the behaviour be the same on all
architecutres. 

> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> ---
>  arch/arm64/kernel/setup.c |  8 +++++++-
>  arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 3fd2c11c09fc..a8487e4d3e5a 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
>  		    kernel_data.end <= res->end)
>  			request_resource(res, &kernel_data);
>  #ifdef CONFIG_KEXEC_CORE
> -		/* Userspace will find "Crash kernel" region in /proc/iomem. */
> +		/*
> +		 * Userspace will find "Crash kernel" region in /proc/iomem.
> +		 * Note: the low region is renamed as Crash kernel (low).
> +		 */
> +		if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> +				crashk_low_res.end <= res->end)
> +			request_resource(res, &crashk_low_res);
>  		if (crashk_res.end && crashk_res.start >= res->start &&
>  		    crashk_res.end <= res->end)
>  			request_resource(res, &crashk_res);
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index e42727e3568e..71498acf0cd8 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -81,6 +81,7 @@ static void __init reserve_crashkernel(void)
>  {
>  	unsigned long long crash_base, crash_size;
>  	int ret;
> +	phys_addr_t crash_max = arm64_dma32_phys_limit;
>  
>  	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>  				&crash_size, &crash_base);
> @@ -88,12 +89,38 @@ static void __init reserve_crashkernel(void)
>  	if (ret || !crash_size)
>  		return;
>  
> +	ret = reserve_crashkernel_low();
> +	if (!ret && crashk_low_res.end) {
> +		/*
> +		 * If crashkernel=X,low specified, there may be two regions,
> +		 * we need to make some changes as follows:
> +		 *
> +		 * 1. rename the low region as "Crash kernel (low)"
> +		 * In order to distinct from the high region and make no effect
> +		 * to the use of existing kexec-tools, rename the low region as
> +		 * "Crash kernel (low)".
> +		 *
> +		 * 2. change the upper bound for crash memory
> +		 * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
> +		 *
> +		 * 3. mark the low region as "nomap"
> +		 * The low region is intended to be used for crash dump kernel
> +		 * devices, just mark the low region as "nomap" simply.
> +		 */
> +		const char *rename = "Crash kernel (low)";
> +
> +		crashk_low_res.name = rename;
> +		crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
> +		memblock_mark_nomap(crashk_low_res.start,
> +				    resource_size(&crashk_low_res));
> +	}
> +
>  	crash_size = PAGE_ALIGN(crash_size);
>  
>  	if (crash_base == 0) {
>  		/* Current arm64 boot protocol requires 2MB alignment */
> -		crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
> -				crash_size, SZ_2M);
> +		crash_base = memblock_find_in_range(0, crash_max, crash_size,
> +				SZ_2M);
>  		if (crash_base == 0) {
>  			pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>  				crash_size);
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-05-21  9:38 [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
                   ` (4 preceding siblings ...)
  2020-05-21  9:38 ` [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump Chen Zhou
@ 2020-05-26  1:42 ` Baoquan He
  2020-05-26  2:28   ` chenzhou
  2020-05-28 22:20   ` John Donnelly
  2020-06-01 12:02 ` Prabhakar Kushwaha
  6 siblings, 2 replies; 34+ messages in thread
From: Baoquan He @ 2020-05-26  1:42 UTC (permalink / raw)
  To: Chen Zhou
  Cc: tglx, mingo, catalin.marinas, will, dyoung, robh+dt,
	John.p.donnelly, arnd, devicetree, linux-doc, kexec,
	linux-kernel, horms, guohanjun, pkushwaha, linux-arm-kernel

On 05/21/20 at 05:38pm, Chen Zhou wrote:
> This patch series enable reserving crashkernel above 4G in arm64.
> 
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> when there is no enough low memory.
> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> will boot failure because there is no low memory available for allocation.
> 
> To solve these issues, introduce crashkernel=X,low to reserve specified
> size low memory.
> Crashkernel=X tries to reserve memory for the crash dump kernel under
> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> size low memory for crash kdump kernel devices firstly and then reserve
> memory above 4G.
> 
> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
> is specified simultaneously, kernel should reserve specified size low memory
> for crash dump kernel devices. So there may be two crash kernel regions, one
> is below 4G, the other is above 4G.
> In order to distinct from the high region and make no effect to the use of
> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
> 
> Besides, we need to modify kexec-tools:
> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
> 
> The previous changes and discussions can be retrieved from:
> 
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.

OK, moving x86 CRASH_ALIGN to 2M is suggested by Dave. Because
CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M. So 2M seems good.
But, anyway, we should tell the reason why it need be changed in commit
log.


arch/x86/Kconfig:
config PHYSICAL_ALIGN
        hex "Alignment value to which kernel should be aligned"
        default "0x200000"
        range 0x2000 0x1000000 if X86_32
        range 0x200000 0x1000000 if X86_64

> - Update Documentation/devicetree/bindings/chosen.txt 
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
> - Add Tested-by from Jhon and pk
> 
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
> 
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.

And the crashkernel=X,high being deleted need be told too. Otherwise
people reading the commit have to check why themselves. I didn't follow
the old version, can't see why ,high can't be specified explicitly.

> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
> 
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> 
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
> 
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
> 
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
> 
> [1]: http://lists.infradead.org/pipermail/kexec/2020-May/025128.html
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> 
> Chen Zhou (5):
>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>   arm64: kdump: add memory for devices by DT property, low-memory-range
>   kdump: update Documentation about crashkernel on arm64
>   dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
> 
>  Documentation/admin-guide/kdump/kdump.rst     | 13 ++-
>  .../admin-guide/kernel-parameters.txt         | 12 ++-
>  Documentation/devicetree/bindings/chosen.txt  | 25 ++++++
>  arch/arm64/kernel/setup.c                     |  8 +-
>  arch/arm64/mm/init.c                          | 61 ++++++++++++-
>  arch/x86/kernel/setup.c                       | 66 ++------------
>  include/linux/crash_core.h                    |  3 +
>  include/linux/kexec.h                         |  2 -
>  kernel/crash_core.c                           | 85 +++++++++++++++++++
>  kernel/kexec_core.c                           | 17 ----
>  10 files changed, 208 insertions(+), 84 deletions(-)
> 
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-05-26  1:42 ` [PATCH v8 0/5] support reserving crashkernel above 4G on " Baoquan He
@ 2020-05-26  2:28   ` chenzhou
  2020-05-28 22:20   ` John Donnelly
  1 sibling, 0 replies; 34+ messages in thread
From: chenzhou @ 2020-05-26  2:28 UTC (permalink / raw)
  To: Baoquan He
  Cc: tglx, mingo, catalin.marinas, will, dyoung, robh+dt,
	John.p.donnelly, arnd, devicetree, linux-doc, kexec,
	linux-kernel, horms, guohanjun, pkushwaha, linux-arm-kernel

Hi Baoquan,


Thanks for your suggestions.

You are right, some details should be made in the commit log.


Thanks,

Chen Zhou


On 2020/5/26 9:42, Baoquan He wrote:
> On 05/21/20 at 05:38pm, Chen Zhou wrote:
>> This patch series enable reserving crashkernel above 4G in arm64.
>>
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>> when there is no enough low memory.
>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>> will boot failure because there is no low memory available for allocation.
>>
>> To solve these issues, introduce crashkernel=X,low to reserve specified
>> size low memory.
>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>> size low memory for crash kdump kernel devices firstly and then reserve
>> memory above 4G.
>>
>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
>> is specified simultaneously, kernel should reserve specified size low memory
>> for crash dump kernel devices. So there may be two crash kernel regions, one
>> is below 4G, the other is above 4G.
>> In order to distinct from the high region and make no effect to the use of
>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
>>
>> Besides, we need to modify kexec-tools:
>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>>
>> The previous changes and discussions can be retrieved from:
>>
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> OK, moving x86 CRASH_ALIGN to 2M is suggested by Dave. Because
> CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M. So 2M seems good.
> But, anyway, we should tell the reason why it need be changed in commit
> log.
>
>
> arch/x86/Kconfig:
> config PHYSICAL_ALIGN
>         hex "Alignment value to which kernel should be aligned"
>         default "0x200000"
>         range 0x2000 0x1000000 if X86_32
>         range 0x200000 0x1000000 if X86_64
>
>> - Update Documentation/devicetree/bindings/chosen.txt 
>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
>> - Add Tested-by from Jhon and pk
>>
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>>
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
> And the crashkernel=X,high being deleted need be told too. Otherwise
> people reading the commit have to check why themselves. I didn't follow
> the old version, can't see why ,high can't be specified explicitly.
>
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>> pass to crash dump kernel by DT property "linux,low-memory-range".
>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>
>> Changes since [v4]
>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>
>> Changes since [v3]
>> - Add memblock_cap_memory_ranges back for multiple ranges.
>> - Fix some compiling warnings.
>>
>> Changes since [v2]
>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>> patch.
>>
>> Changes since [v1]:
>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>> in fdt_enforce_memory_region().
>> There are at most two crash kernel regions, for two crash kernel regions
>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>> and then remove the memory range in the middle.
>>
>> [1]: http://lists.infradead.org/pipermail/kexec/2020-May/025128.html
>> [v1]: https://lkml.org/lkml/2019/4/2/1174
>> [v2]: https://lkml.org/lkml/2019/4/9/86
>> [v3]: https://lkml.org/lkml/2019/4/9/306
>> [v4]: https://lkml.org/lkml/2019/4/15/273
>> [v5]: https://lkml.org/lkml/2019/5/6/1360
>> [v6]: https://lkml.org/lkml/2019/8/30/142
>> [v7]: https://lkml.org/lkml/2019/12/23/411
>>
>> Chen Zhou (5):
>>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
>>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>>   arm64: kdump: add memory for devices by DT property, low-memory-range
>>   kdump: update Documentation about crashkernel on arm64
>>   dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
>>
>>  Documentation/admin-guide/kdump/kdump.rst     | 13 ++-
>>  .../admin-guide/kernel-parameters.txt         | 12 ++-
>>  Documentation/devicetree/bindings/chosen.txt  | 25 ++++++
>>  arch/arm64/kernel/setup.c                     |  8 +-
>>  arch/arm64/mm/init.c                          | 61 ++++++++++++-
>>  arch/x86/kernel/setup.c                       | 66 ++------------
>>  include/linux/crash_core.h                    |  3 +
>>  include/linux/kexec.h                         |  2 -
>>  kernel/crash_core.c                           | 85 +++++++++++++++++++
>>  kernel/kexec_core.c                           | 17 ----
>>  10 files changed, 208 insertions(+), 84 deletions(-)
>>
>> -- 
>> 2.20.1
>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
>>
>
> .
>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
  2020-05-22  3:24     ` chenzhou
@ 2020-05-26 21:18       ` Rob Herring
  2020-05-29 16:11         ` James Morse
  0 siblings, 1 reply; 34+ messages in thread
From: Rob Herring @ 2020-05-26 21:18 UTC (permalink / raw)
  To: chenzhou, James Morse
  Cc: Thomas Gleixner, Ingo Molnar, Catalin Marinas, Will Deacon,
	dyoung, Baoquan He, Arnd Bergmann, John.p.donnelly, pkushwaha,
	Simon Horman, Hanjun Guo,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	devicetree, Linux Doc Mailing List, linux-kernel, kexec

On Fri, May 22, 2020 at 11:24:11AM +0800, chenzhou wrote:
> Hi Rob,

+James M (It's nice to Cc folks if you mention/quote them)


> On 2020/5/21 21:29, Rob Herring wrote:
> > On Thu, May 21, 2020 at 3:35 AM Chen Zhou <chenzhou10@huawei.com> wrote:
> >> Add documentation for DT property used by arm64 kdump:
> >> linux,low-memory-range.
> >> "linux,low-memory-range" is an another memory region used for crash
> >> dump kernel devices.
> >>
> >> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> >> ---
> >>  Documentation/devicetree/bindings/chosen.txt | 25 ++++++++++++++++++++
> >>  1 file changed, 25 insertions(+)
> > chosen is now a schema documented here[1].
> Ok, that is, i don't need to modify the doc in kernel, just create a pull request in github [1]?
> 
> >
> >> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> >> index 45e79172a646..bfe6fb6976e6 100644
> >> --- a/Documentation/devicetree/bindings/chosen.txt
> >> +++ b/Documentation/devicetree/bindings/chosen.txt
> >> @@ -103,6 +103,31 @@ While this property does not represent a real hardware, the address
> >>  and the size are expressed in #address-cells and #size-cells,
> >>  respectively, of the root node.
> >>
> >> +linux,low-memory-range
> >> +----------------------
> >> +This property (arm64 only) holds a base address and size, describing a
> >> +limited region below 4G. Similar to "linux,usable-memory-range", it is
> >> +an another memory range which may be considered available for use by the
> >> +kernel.
> > Why can't you just add a range to "linux,usable-memory-range"? It
> > shouldn't be hard to figure out which part is below 4G.
> I did like this in my previous version, such as v5. After discussed with James, i modified it to the current way.
> 
> We think the existing behavior should be unchanged, which helps with keeping compatibility with existing
> user-space and older kdump kernels.
> 
> The comments from James:
> > linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>.
> Won't this break if your kdump kernel doesn't know what the extra parameters are?
> Or if it expects two ranges, but only gets one? These DT properties should be treated as
> ABI between kernel versions, we can't really change it like this.
> 
> I think the 'low' region is an optional-extra, that is never mapped by the first kernel. I
> think the simplest thing to do is to add an 'linux,low-memory-range' that we
> memblock_add() after memblock_cap_memory_range() has been called.
> If its missing, or the new kernel doesn't know what its for, everything keeps working.


I don't think there's a compatibility issue here though. The current 
kernel doesn't care if the property is longer than 1 base+size. It only 
checks if the size is less than 1 base+size. And yes, we can rely on 
that implementation detail. It's only an ABI if an existing user 
notices.

Now, if the low memory is listed first, then an older kdump kernel 
would get a different memory range. If that's a problem, then define 
that low memory goes last. 

Rob

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-05-26  1:42 ` [PATCH v8 0/5] support reserving crashkernel above 4G on " Baoquan He
  2020-05-26  2:28   ` chenzhou
@ 2020-05-28 22:20   ` John Donnelly
  2020-05-29  8:05     ` Will Deacon
  1 sibling, 1 reply; 34+ messages in thread
From: John Donnelly @ 2020-05-28 22:20 UTC (permalink / raw)
  To: Baoquan He, Chen Zhou
  Cc: tglx, mingo, catalin.marinas, will, dyoung, robh+dt, arnd,
	devicetree, linux-doc, kexec, linux-kernel, horms, guohanjun,
	pkushwaha, linux-arm-kernel


On 5/25/20 8:42 PM, Baoquan He wrote:
> On 05/21/20 at 05:38pm, Chen Zhou wrote:
>> This patch series enable reserving crashkernel above 4G in arm64.
>>
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>> when there is no enough low memory.
>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>> will boot failure because there is no low memory available for allocation.
>>
>> To solve these issues, introduce crashkernel=X,low to reserve specified
>> size low memory.
>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>> size low memory for crash kdump kernel devices firstly and then reserve
>> memory above 4G.
>>
>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
>> is specified simultaneously, kernel should reserve specified size low memory
>> for crash dump kernel devices. So there may be two crash kernel regions, one
>> is below 4G, the other is above 4G.
>> In order to distinct from the high region and make no effect to the use of
>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
>>
>> Besides, we need to modify kexec-tools:
>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>>
>> The previous changes and discussions can be retrieved from:
>>
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> OK, moving x86 CRASH_ALIGN to 2M is suggested by Dave. Because
> CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M. So 2M seems good.
> But, anyway, we should tell the reason why it need be changed in commit
> log.
>
>
> arch/x86/Kconfig:
> config PHYSICAL_ALIGN
>          hex "Alignment value to which kernel should be aligned"
>          default "0x200000"
>          range 0x2000 0x1000000 if X86_32
>          range 0x200000 0x1000000 if X86_64
>
>> - Update Documentation/devicetree/bindings/chosen.txt
>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
>> - Add Tested-by from Jhon and pk
>>
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>>
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
> And the crashkernel=X,high being deleted need be told too. Otherwise
> people reading the commit have to check why themselves. I didn't follow
> the old version, can't see why ,high can't be specified explicitly.
>
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>> pass to crash dump kernel by DT property "linux,low-memory-range".
>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>
>> Changes since [v4]
>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>
>> Changes since [v3]
>> - Add memblock_cap_memory_ranges back for multiple ranges.
>> - Fix some compiling warnings.
>>
>> Changes since [v2]
>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>> patch.
>>
>> Changes since [v1]:
>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>> in fdt_enforce_memory_region().
>> There are at most two crash kernel regions, for two crash kernel regions
>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>> and then remove the memory range in the middle.
>>
>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJwQs3C4x$
>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ6e-mIEp$
>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJyUVjUta$
>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ3CXBRdT$
>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ7SxW1Vj$
>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ2wyJ9tj$
>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJzvGhWBh$
>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ6pAg6tX$
>>
>> Chen Zhou (5):
>>    x86: kdump: move reserve_crashkernel_low() into crash_core.c
>>    arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>>    arm64: kdump: add memory for devices by DT property, low-memory-range
>>    kdump: update Documentation about crashkernel on arm64
>>    dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
>>
>>   Documentation/admin-guide/kdump/kdump.rst     | 13 ++-
>>   .../admin-guide/kernel-parameters.txt         | 12 ++-
>>   Documentation/devicetree/bindings/chosen.txt  | 25 ++++++
>>   arch/arm64/kernel/setup.c                     |  8 +-
>>   arch/arm64/mm/init.c                          | 61 ++++++++++++-
>>   arch/x86/kernel/setup.c                       | 66 ++------------
>>   include/linux/crash_core.h                    |  3 +
>>   include/linux/kexec.h                         |  2 -
>>   kernel/crash_core.c                           | 85 +++++++++++++++++++
>>   kernel/kexec_core.c                           | 17 ----
>>   10 files changed, 208 insertions(+), 84 deletions(-)
>>
>> -- 
>> 2.20.1
>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> https://urldefense.com/v3/__http://lists.infradead.org/mailman/listinfo/kexec__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJwwX8HSl$
>>



Hi,



This proposal to improve vmcore creation on Arm  has been going on for 
almost a year now.

Who is the  final maintainer that needs to approve and except these ?

What are the lingering issues that are remaining so we get these 
accepted into a upstream commit ?


Thank you.

John.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-05-28 22:20   ` John Donnelly
@ 2020-05-29  8:05     ` Will Deacon
  0 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2020-05-29  8:05 UTC (permalink / raw)
  To: John Donnelly
  Cc: Baoquan He, Chen Zhou, tglx, mingo, catalin.marinas, dyoung,
	robh+dt, arnd, devicetree, linux-doc, kexec, linux-kernel, horms,
	guohanjun, pkushwaha, linux-arm-kernel, james.morse

[+James Morse]

On Thu, May 28, 2020 at 05:20:34PM -0500, John Donnelly wrote:
> On 5/25/20 8:42 PM, Baoquan He wrote:
> > On 05/21/20 at 05:38pm, Chen Zhou wrote:
> > > This patch series enable reserving crashkernel above 4G in arm64.

[...]

> > > Chen Zhou (5):
> > >    x86: kdump: move reserve_crashkernel_low() into crash_core.c
> > >    arm64: kdump: reserve crashkenel above 4G for crash dump kernel
> > >    arm64: kdump: add memory for devices by DT property, low-memory-range
> > >    kdump: update Documentation about crashkernel on arm64
> > >    dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
> > > 
> > >   Documentation/admin-guide/kdump/kdump.rst     | 13 ++-
> > >   .../admin-guide/kernel-parameters.txt         | 12 ++-
> > >   Documentation/devicetree/bindings/chosen.txt  | 25 ++++++
> > >   arch/arm64/kernel/setup.c                     |  8 +-
> > >   arch/arm64/mm/init.c                          | 61 ++++++++++++-
> > >   arch/x86/kernel/setup.c                       | 66 ++------------
> > >   include/linux/crash_core.h                    |  3 +
> > >   include/linux/kexec.h                         |  2 -
> > >   kernel/crash_core.c                           | 85 +++++++++++++++++++
> > >   kernel/kexec_core.c                           | 17 ----
> > >   10 files changed, 208 insertions(+), 84 deletions(-)
> > > 
> This proposal to improve vmcore creation on Arm  has been going on for
> almost a year now.
> 
> Who is the  final maintainer that needs to approve and except these ?
> 
> What are the lingering issues that are remaining so we get these accepted
> into a upstream commit ?

The arm64 bits need an Ack from James Morse, but he's not on CC despite
offering feedback on earlier versions.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
  2020-05-26 21:18       ` Rob Herring
@ 2020-05-29 16:11         ` James Morse
  2020-06-20  3:54           ` chenzhou
  0 siblings, 1 reply; 34+ messages in thread
From: James Morse @ 2020-05-29 16:11 UTC (permalink / raw)
  To: Rob Herring, chenzhou
  Cc: Thomas Gleixner, Ingo Molnar, Catalin Marinas, Will Deacon,
	dyoung, Baoquan He, Arnd Bergmann, John.p.donnelly, pkushwaha,
	Simon Horman, Hanjun Guo,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	devicetree, Linux Doc Mailing List, linux-kernel, kexec

Hi guys,

On 26/05/2020 22:18, Rob Herring wrote:
> On Fri, May 22, 2020 at 11:24:11AM +0800, chenzhou wrote:
>> On 2020/5/21 21:29, Rob Herring wrote:
>>> On Thu, May 21, 2020 at 3:35 AM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>> Add documentation for DT property used by arm64 kdump:
>>>> linux,low-memory-range.
>>>> "linux,low-memory-range" is an another memory region used for crash
>>>> dump kernel devices.

>>>> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
>>>> index 45e79172a646..bfe6fb6976e6 100644
>>>> --- a/Documentation/devicetree/bindings/chosen.txt
>>>> +++ b/Documentation/devicetree/bindings/chosen.txt

>>>> +linux,low-memory-range
>>>> +----------------------
>>>> +This property (arm64 only) holds a base address and size, describing a
>>>> +limited region below 4G. Similar to "linux,usable-memory-range", it is
>>>> +an another memory range which may be considered available for use by the
>>>> +kernel.

>>> Why can't you just add a range to "linux,usable-memory-range"? It
>>> shouldn't be hard to figure out which part is below 4G.

>> The comments from James:
>> Won't this break if your kdump kernel doesn't know what the extra parameters are?
>> Or if it expects two ranges, but only gets one? These DT properties should be treated as
>> ABI between kernel versions, we can't really change it like this.
>>
>> I think the 'low' region is an optional-extra, that is never mapped by the first kernel. I
>> think the simplest thing to do is to add an 'linux,low-memory-range' that we
>> memblock_add() after memblock_cap_memory_range() has been called.
>> If its missing, or the new kernel doesn't know what its for, everything keeps working.
> 
> 
> I don't think there's a compatibility issue here though. The current 
> kernel doesn't care if the property is longer than 1 base+size. It only 
> checks if the size is less than 1 base+size.

Aha! I missed that.


> And yes, we can rely on 
> that implementation detail. It's only an ABI if an existing user 
> notices.
> 
> Now, if the low memory is listed first, then an older kdump kernel 
> would get a different memory range. If that's a problem, then define 
> that low memory goes last. 

This first entry would need to be the 'crashkernel' range where the kdump kernel is
placed, otherwise an older kernel won't boot. The rest can be optional extras, as long as
we are tolerant of it being missing...

I'll try and look at the rest of this series on Monday,


Thanks,

James

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-05-21  9:38 [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
                   ` (5 preceding siblings ...)
  2020-05-26  1:42 ` [PATCH v8 0/5] support reserving crashkernel above 4G on " Baoquan He
@ 2020-06-01 12:02 ` Prabhakar Kushwaha
  2020-06-01 19:30   ` John Donnelly
  6 siblings, 1 reply; 34+ messages in thread
From: Prabhakar Kushwaha @ 2020-06-01 12:02 UTC (permalink / raw)
  To: Chen Zhou
  Cc: Thomas Gleixner, mingo, Catalin Marinas, Will Deacon, dyoung,
	bhe, robh+dt, John Donnelly, arnd, devicetree,
	Linux Doc Mailing List, kexec mailing list,
	Linux Kernel Mailing List, horms, guohanjun, Prabhakar Kushwaha,
	linux-arm-kernel

Hi Chen,

On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>
> This patch series enable reserving crashkernel above 4G in arm64.
>
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> when there is no enough low memory.
> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> will boot failure because there is no low memory available for allocation.
>
> To solve these issues, introduce crashkernel=X,low to reserve specified
> size low memory.
> Crashkernel=X tries to reserve memory for the crash dump kernel under
> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> size low memory for crash kdump kernel devices firstly and then reserve
> memory above 4G.
>
> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
> is specified simultaneously, kernel should reserve specified size low memory
> for crash dump kernel devices. So there may be two crash kernel regions, one
> is below 4G, the other is above 4G.
> In order to distinct from the high region and make no effect to the use of
> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
>
> Besides, we need to modify kexec-tools:
> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>
> The previous changes and discussions can be retrieved from:
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
> - Add Tested-by from Jhon and pk
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
>
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
>
> [1]: http://lists.infradead.org/pipermail/kexec/2020-May/025128.html
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
>
> Chen Zhou (5):
>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>   arm64: kdump: add memory for devices by DT property, low-memory-range
>   kdump: update Documentation about crashkernel on arm64
>   dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
>

We are getting "warn_alloc" [1] warning during boot of kdump kernel
with bootargs as [2] of primary kernel.
This error observed on ThunderX2  ARM64 platform.

It is observed with latest upstream tag (v5.7-rc3) with this patch set
 and https://lists.infradead.org/pipermail/kexec/2020-May/025128.html
Also **without** this patch-set
"https://www.spinics.net/lists/arm-kernel/msg806882.html"

This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
More details discussed earlier in
https://www.spinics.net/lists/arm-kernel/msg806882.html without any
solution

This patch-set is expected to solve similar kind of issue.
i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
observation should be considered/fixed. .

--pk

[1]
[   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
[   30.367696] NET: Registered protocol family 16
[   30.369973] swapper/0: page allocation failure: order:6,
mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
[   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
[   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
[   30.369984] Call trace:
[   30.369989]  dump_backtrace+0x0/0x1f8
[   30.369991]  show_stack+0x20/0x30
[   30.369997]  dump_stack+0xc0/0x10c
[   30.370001]  warn_alloc+0x10c/0x178
[   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
[   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
[   30.370008]  alloc_page_interleave+0x24/0x98
[   30.370011]  alloc_pages_current+0xe4/0x108
[   30.370017]  dma_atomic_pool_init+0x44/0x1a4
[   30.370020]  do_one_initcall+0x54/0x228
[   30.370027]  kernel_init_freeable+0x228/0x2cc
[   30.370031]  kernel_init+0x1c/0x110
[   30.370034]  ret_from_fork+0x10/0x18
[   30.370036] Mem-Info:
[   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
[   30.370064]  active_file:0 inactive_file:0 isolated_file:0
[   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
[   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
[   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
[   30.370064]  free:1537719 free_pcp:219 free_cma:0
[   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   30.370084] lowmem_reserve[]: 0 250 6063 6063
[   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   30.370094] lowmem_reserve[]: 0 0 5813 5813
[   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
[   30.370104] lowmem_reserve[]: 0 0 0 0
[   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
[   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
(UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
3*2048kB (UME) 1436*4096kB (M) = 5893600kB
[   30.370129] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[   30.370130] 0 total pagecache pages
[   30.370132] 0 pages in swap cache
[   30.370134] Swap cache stats: add 0, delete 0, find 0/0
[   30.370135] Free swap  = 0kB
[   30.370136] Total swap = 0kB
[   30.370137] 2164609 pages RAM
[   30.370139] 0 pages HighMem/MovableOnly
[   30.370140] 612331 pages reserved
[   30.370141] 0 pages hwpoisoned
[   30.370143] DMA: failed to allocate 256 KiB pool for atomic
coherent allocation

[2]
root@localhost$ dmesg | grep crash
[    0.000000] Reserving 250MB of low memory at 3724MB for crashkernel
(System low RAM: 2029MB)
[    0.000000] crashkernel reserved: 0x0000000e00000000 -
0x0000001000000000 (8192 MB)
[    0.000000] Kernel command line:
BOOT_IMAGE=(hd11,gpt2)/vmlinuz-5.7.0-rc3+
root=UUID=e5c34f86-6727-4668-81f9-f41433555df6 ro crashkernel=250M,low
crashkernel=8G nowatchdog console=ttyAMA0 default_hugepagesz=1G
hugepagesz=1G hugepages=2
[   44.019393]     crashkernel=8G

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-01 12:02 ` Prabhakar Kushwaha
@ 2020-06-01 19:30   ` John Donnelly
  2020-06-01 21:02     ` Bhupesh Sharma
  0 siblings, 1 reply; 34+ messages in thread
From: John Donnelly @ 2020-06-01 19:30 UTC (permalink / raw)
  To: Prabhakar Kushwaha, Chen Zhou
  Cc: Thomas Gleixner, mingo, Catalin Marinas, Will Deacon, dyoung,
	bhe, robh+dt, arnd, devicetree, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, horms, guohanjun,
	Prabhakar Kushwaha, linux-arm-kernel, James Morse

Hi,


On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> Hi Chen,
>
> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>> This patch series enable reserving crashkernel above 4G in arm64.
>>
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>> when there is no enough low memory.
>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>> will boot failure because there is no low memory available for allocation.
>>
>> To solve these issues, introduce crashkernel=X,low to reserve specified
>> size low memory.
>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>> size low memory for crash kdump kernel devices firstly and then reserve
>> memory above 4G.
>>
>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
>> is specified simultaneously, kernel should reserve specified size low memory
>> for crash dump kernel devices. So there may be two crash kernel regions, one
>> is below 4G, the other is above 4G.
>> In order to distinct from the high region and make no effect to the use of
>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
>>
>> Besides, we need to modify kexec-tools:
>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>>
>> The previous changes and discussions can be retrieved from:
>>
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>> - Update Documentation/devicetree/bindings/chosen.txt
>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
>> - Add Tested-by from Jhon and pk
>>
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>>
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>> pass to crash dump kernel by DT property "linux,low-memory-range".
>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>
>> Changes since [v4]
>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>
>> Changes since [v3]
>> - Add memblock_cap_memory_ranges back for multiple ranges.
>> - Fix some compiling warnings.
>>
>> Changes since [v2]
>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>> patch.
>>
>> Changes since [v1]:
>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>> in fdt_enforce_memory_region().
>> There are at most two crash kernel regions, for two crash kernel regions
>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>> and then remove the memory range in the middle.
>>
>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbt0xN9PE$
>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbub7yUQH$
>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbnc4zPPV$
>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvsAsZLu$
>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbl24n-79$
>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbs7r8G2a$
>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiFUH90G$
>>
>> Chen Zhou (5):
>>    x86: kdump: move reserve_crashkernel_low() into crash_core.c
>>    arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>>    arm64: kdump: add memory for devices by DT property, low-memory-range
>>    kdump: update Documentation about crashkernel on arm64
>>    dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
>>
> We are getting "warn_alloc" [1] warning during boot of kdump kernel
> with bootargs as [2] of primary kernel.
> This error observed on ThunderX2  ARM64 platform.
>
> It is observed with latest upstream tag (v5.7-rc3) with this patch set
>   and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> Also **without** this patch-set
> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
>
> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
> More details discussed earlier in
> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
> solution
>
> This patch-set is expected to solve similar kind of issue.
> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
> observation should be considered/fixed. .
>
> --pk
>
> [1]
> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> [   30.367696] NET: Registered protocol family 16
> [   30.369973] swapper/0: page allocation failure: order:6,
> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> [   30.369984] Call trace:
> [   30.369989]  dump_backtrace+0x0/0x1f8
> [   30.369991]  show_stack+0x20/0x30
> [   30.369997]  dump_stack+0xc0/0x10c
> [   30.370001]  warn_alloc+0x10c/0x178
> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
> [   30.370008]  alloc_page_interleave+0x24/0x98
> [   30.370011]  alloc_pages_current+0xe4/0x108
> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
> [   30.370020]  do_one_initcall+0x54/0x228
> [   30.370027]  kernel_init_freeable+0x228/0x2cc
> [   30.370031]  kernel_init+0x1c/0x110
> [   30.370034]  ret_from_fork+0x10/0x18
> [   30.370036] Mem-Info:
> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> unstable:0kB all_unreclaimable? no
> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> unstable:0kB all_unreclaimable? no
> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
> [   30.370104] lowmem_reserve[]: 0 0 0 0
> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_size=1048576kB
> [   30.370130] 0 total pagecache pages
> [   30.370132] 0 pages in swap cache
> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
> [   30.370135] Free swap  = 0kB
> [   30.370136] Total swap = 0kB
> [   30.370137] 2164609 pages RAM
> [   30.370139] 0 pages HighMem/MovableOnly
> [   30.370140] 612331 pages reserved
> [   30.370141] 0 pages hwpoisoned
> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
> coherent allocation


During my testing I saw the same error and Chen's  solution corrected it .



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-01 19:30   ` John Donnelly
@ 2020-06-01 21:02     ` Bhupesh Sharma
  2020-06-01 21:59       ` John Donnelly
  0 siblings, 1 reply; 34+ messages in thread
From: Bhupesh Sharma @ 2020-06-01 21:02 UTC (permalink / raw)
  To: John Donnelly
  Cc: Prabhakar Kushwaha, Chen Zhou, Simon Horman, Devicetree List,
	Baoquan He, Will Deacon, Linux Doc Mailing List, Catalin Marinas,
	kexec mailing list, Linux Kernel Mailing List, Rob Herring,
	Ingo Molnar, Arnd Bergmann, guohanjun, James Morse,
	Thomas Gleixner, Prabhakar Kushwaha, RuiRui Yang,
	linux-arm-kernel

Hi John,

On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
>
> Hi,
>
>
> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> > Hi Chen,
> >
> > On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
> >> This patch series enable reserving crashkernel above 4G in arm64.
> >>
> >> There are following issues in arm64 kdump:
> >> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> >> when there is no enough low memory.
> >> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> >> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> >> will boot failure because there is no low memory available for allocation.
> >>
> >> To solve these issues, introduce crashkernel=X,low to reserve specified
> >> size low memory.
> >> Crashkernel=X tries to reserve memory for the crash dump kernel under
> >> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> >> size low memory for crash kdump kernel devices firstly and then reserve
> >> memory above 4G.
> >>
> >> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
> >> is specified simultaneously, kernel should reserve specified size low memory
> >> for crash dump kernel devices. So there may be two crash kernel regions, one
> >> is below 4G, the other is above 4G.
> >> In order to distinct from the high region and make no effect to the use of
> >> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
> >> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
> >>
> >> Besides, we need to modify kexec-tools:
> >> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
> >>
> >> The previous changes and discussions can be retrieved from:
> >>
> >> Changes since [v7]
> >> - Move x86 CRASH_ALIGN to 2M
> >> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> >> - Update Documentation/devicetree/bindings/chosen.txt
> >> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
> >> - Add Tested-by from Jhon and pk
> >>
> >> Changes since [v6]
> >> - Fix build errors reported by kbuild test robot.
> >>
> >> Changes since [v5]
> >> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> >> - Delete crashkernel=X,high.
> >> - Modify crashkernel=X,low.
> >> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> >> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> >> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> >> pass to crash dump kernel by DT property "linux,low-memory-range".
> >> - Update Documentation/admin-guide/kdump/kdump.rst.
> >>
> >> Changes since [v4]
> >> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> >>
> >> Changes since [v3]
> >> - Add memblock_cap_memory_ranges back for multiple ranges.
> >> - Fix some compiling warnings.
> >>
> >> Changes since [v2]
> >> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> >> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> >> patch.
> >>
> >> Changes since [v1]:
> >> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> >> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> >> in fdt_enforce_memory_region().
> >> There are at most two crash kernel regions, for two crash kernel regions
> >> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> >> and then remove the memory range in the middle.
> >>
> >> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
> >> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbt0xN9PE$
> >> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbub7yUQH$
> >> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbnc4zPPV$
> >> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvsAsZLu$
> >> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbl24n-79$
> >> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbs7r8G2a$
> >> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiFUH90G$
> >>
> >> Chen Zhou (5):
> >>    x86: kdump: move reserve_crashkernel_low() into crash_core.c
> >>    arm64: kdump: reserve crashkenel above 4G for crash dump kernel
> >>    arm64: kdump: add memory for devices by DT property, low-memory-range
> >>    kdump: update Documentation about crashkernel on arm64
> >>    dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
> >>
> > We are getting "warn_alloc" [1] warning during boot of kdump kernel
> > with bootargs as [2] of primary kernel.
> > This error observed on ThunderX2  ARM64 platform.
> >
> > It is observed with latest upstream tag (v5.7-rc3) with this patch set
> >   and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> > Also **without** this patch-set
> > "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
> >
> > This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
> > More details discussed earlier in
> > https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
> > solution
> >
> > This patch-set is expected to solve similar kind of issue.
> > i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
> > observation should be considered/fixed. .
> >
> > --pk
> >
> > [1]
> > [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> > TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> > [   30.367696] NET: Registered protocol family 16
> > [   30.369973] swapper/0: page allocation failure: order:6,
> > mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> > [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
> > [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
> > TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> > [   30.369984] Call trace:
> > [   30.369989]  dump_backtrace+0x0/0x1f8
> > [   30.369991]  show_stack+0x20/0x30
> > [   30.369997]  dump_stack+0xc0/0x10c
> > [   30.370001]  warn_alloc+0x10c/0x178
> > [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
> > [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
> > [   30.370008]  alloc_page_interleave+0x24/0x98
> > [   30.370011]  alloc_pages_current+0xe4/0x108
> > [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
> > [   30.370020]  do_one_initcall+0x54/0x228
> > [   30.370027]  kernel_init_freeable+0x228/0x2cc
> > [   30.370031]  kernel_init+0x1c/0x110
> > [   30.370034]  ret_from_fork+0x10/0x18
> > [   30.370036] Mem-Info:
> > [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
> > [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
> > [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
> > [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
> > [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
> > [   30.370064]  free:1537719 free_pcp:219 free_cma:0
> > [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
> > active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> > isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> > unstable:0kB all_unreclaimable? no
> > [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
> > active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> > isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> > unstable:0kB all_unreclaimable? no
> > [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
> > reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> > active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> > present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
> > bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [   30.370084] lowmem_reserve[]: 0 250 6063 6063
> > [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
> > high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> > active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> > present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
> > pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [   30.370094] lowmem_reserve[]: 0 0 5813 5813
> > [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
> > high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> > active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> > present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
> > pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
> > [   30.370104] lowmem_reserve[]: 0 0 0 0
> > [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> > 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> > [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> > 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
> > [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
> > (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
> > 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
> > [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
> > hugepages_surp=0 hugepages_size=1048576kB
> > [   30.370130] 0 total pagecache pages
> > [   30.370132] 0 pages in swap cache
> > [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
> > [   30.370135] Free swap  = 0kB
> > [   30.370136] Total swap = 0kB
> > [   30.370137] 2164609 pages RAM
> > [   30.370139] 0 pages HighMem/MovableOnly
> > [   30.370140] 612331 pages reserved
> > [   30.370141] 0 pages hwpoisoned
> > [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
> > coherent allocation
>
>
> During my testing I saw the same error and Chen's  solution corrected it .

Which combination you are using on your side? I am using Prabhakar's
suggested environment and can reproduce the issue
with or without Chen's crashkernel support above 4G patchset.

I am also using a ThunderX2 platform with latest makedumpfile code and
kexec-tools (with the suggested patch
<https://lists.infradead.org/pipermail/kexec/2020-May/025128.html>).

Thanks,
Bhupesh


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-01 21:02     ` Bhupesh Sharma
@ 2020-06-01 21:59       ` John Donnelly
  2020-06-02  5:38         ` Prabhakar Kushwaha
  0 siblings, 1 reply; 34+ messages in thread
From: John Donnelly @ 2020-06-01 21:59 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Prabhakar Kushwaha, Chen Zhou, Simon Horman, Devicetree List,
	Baoquan He, Will Deacon, Linux Doc Mailing List, Catalin Marinas,
	kexec mailing list, Linux Kernel Mailing List, Rob Herring,
	Ingo Molnar, Arnd Bergmann, guohanjun, James Morse,
	Thomas Gleixner, Prabhakar Kushwaha, RuiRui Yang,
	linux-arm-kernel

Hi .  See below ! 

> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> 
> Hi John,
> 
> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
>> 
>> Hi,
>> 
>> 
>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>> Hi Chen,
>>> 
>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>> 
>>>> There are following issues in arm64 kdump:
>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>>> when there is no enough low memory.
>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>>> will boot failure because there is no low memory available for allocation.
>>>> 
>>>> To solve these issues, introduce crashkernel=X,low to reserve specified
>>>> size low memory.
>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>>>> size low memory for crash kdump kernel devices firstly and then reserve
>>>> memory above 4G.
>>>> 
>>>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
>>>> is specified simultaneously, kernel should reserve specified size low memory
>>>> for crash dump kernel devices. So there may be two crash kernel regions, one
>>>> is below 4G, the other is above 4G.
>>>> In order to distinct from the high region and make no effect to the use of
>>>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
>>>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
>>>> 
>>>> Besides, we need to modify kexec-tools:
>>>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>>>> 
>>>> The previous changes and discussions can be retrieved from:
>>>> 
>>>> Changes since [v7]
>>>> - Move x86 CRASH_ALIGN to 2M
>>>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>>>> - Update Documentation/devicetree/bindings/chosen.txt
>>>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
>>>> - Add Tested-by from Jhon and pk
>>>> 
>>>> Changes since [v6]
>>>> - Fix build errors reported by kbuild test robot.
>>>> 
>>>> Changes since [v5]
>>>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>>>> - Delete crashkernel=X,high.
>>>> - Modify crashkernel=X,low.
>>>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>>>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>>>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>>>> pass to crash dump kernel by DT property "linux,low-memory-range".
>>>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>>> 
>>>> Changes since [v4]
>>>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>>> 
>>>> Changes since [v3]
>>>> - Add memblock_cap_memory_ranges back for multiple ranges.
>>>> - Fix some compiling warnings.
>>>> 
>>>> Changes since [v2]
>>>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>>>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>>>> patch.
>>>> 
>>>> Changes since [v1]:
>>>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>>>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>>>> in fdt_enforce_memory_region().
>>>> There are at most two crash kernel regions, for two crash kernel regions
>>>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>>>> and then remove the memory range in the middle.
>>>> 
>>>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
>>>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbt0xN9PE$
>>>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbub7yUQH$
>>>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbnc4zPPV$
>>>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvsAsZLu$
>>>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbl24n-79$
>>>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbs7r8G2a$
>>>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiFUH90G$
>>>> 
>>>> Chen Zhou (5):
>>>>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
>>>>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>>>>   arm64: kdump: add memory for devices by DT property, low-memory-range
>>>>   kdump: update Documentation about crashkernel on arm64
>>>>   dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
>>>> 
>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
>>> with bootargs as [2] of primary kernel.
>>> This error observed on ThunderX2  ARM64 platform.
>>> 
>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
>>>  and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>> Also **without** this patch-set
>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
>>> 
>>> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
>>> More details discussed earlier in
>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
>>> solution
>>> 
>>> This patch-set is expected to solve similar kind of issue.
>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
>>> observation should be considered/fixed. .
>>> 
>>> --pk
>>> 
>>> [1]
>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>> [   30.367696] NET: Registered protocol family 16
>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>> [   30.369984] Call trace:
>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>> [   30.369991]  show_stack+0x20/0x30
>>> [   30.369997]  dump_stack+0xc0/0x10c
>>> [   30.370001]  warn_alloc+0x10c/0x178
>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>> [   30.370020]  do_one_initcall+0x54/0x228
>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>> [   30.370031]  kernel_init+0x1c/0x110
>>> [   30.370034]  ret_from_fork+0x10/0x18
>>> [   30.370036] Mem-Info:
>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>> unstable:0kB all_unreclaimable? no
>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>> unstable:0kB all_unreclaimable? no
>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
>>> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
>>> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>> hugepages_surp=0 hugepages_size=1048576kB
>>> [   30.370130] 0 total pagecache pages
>>> [   30.370132] 0 pages in swap cache
>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>> [   30.370135] Free swap  = 0kB
>>> [   30.370136] Total swap = 0kB
>>> [   30.370137] 2164609 pages RAM
>>> [   30.370139] 0 pages HighMem/MovableOnly
>>> [   30.370140] 612331 pages reserved
>>> [   30.370141] 0 pages hwpoisoned
>>> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
>>> coherent allocation
>> 
>> 
>> During my testing I saw the same error and Chen's  solution corrected it .
> 
> Which combination you are using on your side? I am using Prabhakar's
> suggested environment and can reproduce the issue
> with or without Chen's crashkernel support above 4G patchset.
> 
> I am also using a ThunderX2 platform with latest makedumpfile code and
> kexec-tools (with the suggested patch
> <https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$ >).
> 
> Thanks,
> Bhupesh


I did this activity 5 months ago and I have moved on to other activities. My DMA failures were related to PCI devices that could not be enumerated because  low-DMA space was not  available  when crashkernel was moved above 4G; I don’t recall the exact platform. 



For this failure , 

>>>  DMA: failed to allocate 256 KiB pool for atomic
>>> coherent allocation


Is due to :


 3618082c
 ("arm64 use both ZONE_DMA and ZONE_DMA32")

With the introduction of ZONE_DMA to support the Raspberry DMA
region below 1G, the crashkernel is placed in the upper 4G
ZONE_DMA_32 region. Since the crashkernel does not have access
to the ZONE_DMA region, it prints out call trace during bootup.

It is due to having this CONFIG item  ON  :


CONFIG_ZONE_DMA=y

Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
use the device tree to specify memory below 1G.


I would like to see Chen’s feature added , perhaps as EXPERIMENTAL,  so we can get some configuration testing done on it.   It corrects having a DMA zone in low memory while crash-kernel is above 4GB.  This has been going on for a year now. 


Thank you,

John.


( Note  .. I am not on the all the kernel-dlist emails  so this won’t be seen by everyone , -  someone may have to bounce it )








^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-01 21:59       ` John Donnelly
@ 2020-06-02  5:38         ` Prabhakar Kushwaha
  2020-06-02 14:41           ` John Donnelly
  0 siblings, 1 reply; 34+ messages in thread
From: Prabhakar Kushwaha @ 2020-06-02  5:38 UTC (permalink / raw)
  To: John Donnelly
  Cc: Bhupesh Sharma, Chen Zhou, Simon Horman, Devicetree List,
	Baoquan He, Will Deacon, Linux Doc Mailing List, Catalin Marinas,
	kexec mailing list, Linux Kernel Mailing List, Rob Herring,
	Ingo Molnar, Arnd Bergmann, guohanjun, James Morse,
	Thomas Gleixner, Prabhakar Kushwaha, RuiRui Yang,
	linux-arm-kernel

On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <john.p.donnelly@oracle.com> wrote:
>
> Hi .  See below !
>
> > On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> >
> > Hi John,
> >
> > On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
> >>
> >> Hi,
> >>
> >>
> >> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> >>> Hi Chen,
> >>>
> >>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
> >>>> This patch series enable reserving crashkernel above 4G in arm64.
> >>>>
> >>>> There are following issues in arm64 kdump:
> >>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> >>>> when there is no enough low memory.
> >>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> >>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> >>>> will boot failure because there is no low memory available for allocation.
> >>>>
> >>>> To solve these issues, introduce crashkernel=X,low to reserve specified
> >>>> size low memory.
> >>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
> >>>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> >>>> size low memory for crash kdump kernel devices firstly and then reserve
> >>>> memory above 4G.
> >>>>
> >>>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
> >>>> is specified simultaneously, kernel should reserve specified size low memory
> >>>> for crash dump kernel devices. So there may be two crash kernel regions, one
> >>>> is below 4G, the other is above 4G.
> >>>> In order to distinct from the high region and make no effect to the use of
> >>>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
> >>>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
> >>>>
> >>>> Besides, we need to modify kexec-tools:
> >>>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
> >>>>
> >>>> The previous changes and discussions can be retrieved from:
> >>>>
> >>>> Changes since [v7]
> >>>> - Move x86 CRASH_ALIGN to 2M
> >>>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> >>>> - Update Documentation/devicetree/bindings/chosen.txt
> >>>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
> >>>> - Add Tested-by from Jhon and pk
> >>>>
> >>>> Changes since [v6]
> >>>> - Fix build errors reported by kbuild test robot.
> >>>>
> >>>> Changes since [v5]
> >>>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> >>>> - Delete crashkernel=X,high.
> >>>> - Modify crashkernel=X,low.
> >>>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> >>>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> >>>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> >>>> pass to crash dump kernel by DT property "linux,low-memory-range".
> >>>> - Update Documentation/admin-guide/kdump/kdump.rst.
> >>>>
> >>>> Changes since [v4]
> >>>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> >>>>
> >>>> Changes since [v3]
> >>>> - Add memblock_cap_memory_ranges back for multiple ranges.
> >>>> - Fix some compiling warnings.
> >>>>
> >>>> Changes since [v2]
> >>>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> >>>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> >>>> patch.
> >>>>
> >>>> Changes since [v1]:
> >>>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> >>>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> >>>> in fdt_enforce_memory_region().
> >>>> There are at most two crash kernel regions, for two crash kernel regions
> >>>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> >>>> and then remove the memory range in the middle.
> >>>>
> >>>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
> >>>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbt0xN9PE$
> >>>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbub7yUQH$
> >>>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbnc4zPPV$
> >>>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvsAsZLu$
> >>>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbl24n-79$
> >>>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbs7r8G2a$
> >>>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiFUH90G$
> >>>>
> >>>> Chen Zhou (5):
> >>>>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
> >>>>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
> >>>>   arm64: kdump: add memory for devices by DT property, low-memory-range
> >>>>   kdump: update Documentation about crashkernel on arm64
> >>>>   dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
> >>>>
> >>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
> >>> with bootargs as [2] of primary kernel.
> >>> This error observed on ThunderX2  ARM64 platform.
> >>>
> >>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
> >>>  and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> >>> Also **without** this patch-set
> >>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
> >>>
> >>> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
> >>> More details discussed earlier in
> >>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
> >>> solution
> >>>
> >>> This patch-set is expected to solve similar kind of issue.
> >>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
> >>> observation should be considered/fixed. .
> >>>
> >>> --pk
> >>>
> >>> [1]
> >>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> >>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> >>> [   30.367696] NET: Registered protocol family 16
> >>> [   30.369973] swapper/0: page allocation failure: order:6,
> >>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> >>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
> >>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
> >>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> >>> [   30.369984] Call trace:
> >>> [   30.369989]  dump_backtrace+0x0/0x1f8
> >>> [   30.369991]  show_stack+0x20/0x30
> >>> [   30.369997]  dump_stack+0xc0/0x10c
> >>> [   30.370001]  warn_alloc+0x10c/0x178
> >>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
> >>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
> >>> [   30.370008]  alloc_page_interleave+0x24/0x98
> >>> [   30.370011]  alloc_pages_current+0xe4/0x108
> >>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
> >>> [   30.370020]  do_one_initcall+0x54/0x228
> >>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
> >>> [   30.370031]  kernel_init+0x1c/0x110
> >>> [   30.370034]  ret_from_fork+0x10/0x18
> >>> [   30.370036] Mem-Info:
> >>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
> >>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
> >>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
> >>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
> >>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
> >>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
> >>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
> >>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> >>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> >>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> >>> unstable:0kB all_unreclaimable? no
> >>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
> >>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> >>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> >>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> >>> unstable:0kB all_unreclaimable? no
> >>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
> >>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
> >>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
> >>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
> >>> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
> >>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
> >>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
> >>> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
> >>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
> >>> [   30.370104] lowmem_reserve[]: 0 0 0 0
> >>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> >>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> >>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> >>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
> >>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
> >>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
> >>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
> >>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
> >>> hugepages_surp=0 hugepages_size=1048576kB
> >>> [   30.370130] 0 total pagecache pages
> >>> [   30.370132] 0 pages in swap cache
> >>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
> >>> [   30.370135] Free swap  = 0kB
> >>> [   30.370136] Total swap = 0kB
> >>> [   30.370137] 2164609 pages RAM
> >>> [   30.370139] 0 pages HighMem/MovableOnly
> >>> [   30.370140] 612331 pages reserved
> >>> [   30.370141] 0 pages hwpoisoned
> >>> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
> >>> coherent allocation
> >>
> >>
> >> During my testing I saw the same error and Chen's  solution corrected it .
> >
> > Which combination you are using on your side? I am using Prabhakar's
> > suggested environment and can reproduce the issue
> > with or without Chen's crashkernel support above 4G patchset.
> >
> > I am also using a ThunderX2 platform with latest makedumpfile code and
> > kexec-tools (with the suggested patch
> > <https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$ >).
> >
> > Thanks,
> > Bhupesh
>
>
> I did this activity 5 months ago and I have moved on to other activities. My DMA failures were related to PCI devices that could not be enumerated because  low-DMA space was not  available  when crashkernel was moved above 4G; I don’t recall the exact platform.
>
>
>
> For this failure ,
>
> >>>  DMA: failed to allocate 256 KiB pool for atomic
> >>> coherent allocation
>
>
> Is due to :
>
>
>  3618082c
>  ("arm64 use both ZONE_DMA and ZONE_DMA32")
>
> With the introduction of ZONE_DMA to support the Raspberry DMA
> region below 1G, the crashkernel is placed in the upper 4G
> ZONE_DMA_32 region. Since the crashkernel does not have access
> to the ZONE_DMA region, it prints out call trace during bootup.
>
> It is due to having this CONFIG item  ON  :
>
>
> CONFIG_ZONE_DMA=y
>
> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
> use the device tree to specify memory below 1G.
>
>

Disabling ZONE_DMA is temporary solution.  We may need proper solution

> I would like to see Chen’s feature added , perhaps as EXPERIMENTAL,  so we can get some configuration testing done on it.   It corrects having a DMA zone in low memory while crash-kernel is above 4GB.  This has been going on for a year now.

I will also like this patch to be added in Linux as early as possible.

Issue mentioned by me happens with or without this patch.

This patch-set can consider fixing because it uses low memory for DMA
& swiotlb only.
We can consider restricting crashkernel within the required range like below

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 7f9e5a6dc48c..bd67b90d35bd 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
                        return 0;
        }

-       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
+       low_base = memblock_find_in_range(0, 0xc0000000, low_size, CRASH_ALIGN);
        if (!low_base) {
                pr_err("Cannot reserve %ldMB crashkernel low memory,
please try smaller size.\n",
                       (unsigned long)(low_size >> 20));


Similar change can be considered for scenario "without" this patch.
But it will decrease memory availability for crashkernel.
Hence increase the failure probability of crashkernel reservation.

--pk

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-02  5:38         ` Prabhakar Kushwaha
@ 2020-06-02 14:41           ` John Donnelly
  2020-06-03 11:47             ` Prabhakar Kushwaha
  0 siblings, 1 reply; 34+ messages in thread
From: John Donnelly @ 2020-06-02 14:41 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: Bhupesh Sharma, Chen Zhou, Simon Horman, Devicetree List,
	Baoquan He, Will Deacon, Linux Doc Mailing List, Catalin Marinas,
	kexec mailing list, Linux Kernel Mailing List, Rob Herring,
	Ingo Molnar, Arnd Bergmann, guohanjun, James Morse,
	Thomas Gleixner, Prabhakar Kushwaha, RuiRui Yang,
	linux-arm-kernel



> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <prabhakar.pkin@gmail.com> wrote:
> 
> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <john.p.donnelly@oracle.com> wrote:
>> 
>> Hi .  See below !
>> 
>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>>> 
>>> Hi John,
>>> 
>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> 
>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>> Hi Chen,
>>>>> 
>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>>>> 
>>>>>> There are following issues in arm64 kdump:
>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>>>>> when there is no enough low memory.
>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>>>>> will boot failure because there is no low memory available for allocation.
>>>>>> 
>>>>>> To solve these issues, introduce crashkernel=X,low to reserve specified
>>>>>> size low memory.
>>>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>>>>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>>>>>> size low memory for crash kdump kernel devices firstly and then reserve
>>>>>> memory above 4G.
>>>>>> 
>>>>>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
>>>>>> is specified simultaneously, kernel should reserve specified size low memory
>>>>>> for crash dump kernel devices. So there may be two crash kernel regions, one
>>>>>> is below 4G, the other is above 4G.
>>>>>> In order to distinct from the high region and make no effect to the use of
>>>>>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
>>>>>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
>>>>>> 
>>>>>> Besides, we need to modify kexec-tools:
>>>>>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>>>>>> 
>>>>>> The previous changes and discussions can be retrieved from:
>>>>>> 
>>>>>> Changes since [v7]
>>>>>> - Move x86 CRASH_ALIGN to 2M
>>>>>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>>>>>> - Update Documentation/devicetree/bindings/chosen.txt
>>>>>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
>>>>>> - Add Tested-by from Jhon and pk
>>>>>> 
>>>>>> Changes since [v6]
>>>>>> - Fix build errors reported by kbuild test robot.
>>>>>> 
>>>>>> Changes since [v5]
>>>>>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>>> - Delete crashkernel=X,high.
>>>>>> - Modify crashkernel=X,low.
>>>>>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>>>>>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>>>>>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>>>>>> pass to crash dump kernel by DT property "linux,low-memory-range".
>>>>>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>>>>> 
>>>>>> Changes since [v4]
>>>>>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>>>>> 
>>>>>> Changes since [v3]
>>>>>> - Add memblock_cap_memory_ranges back for multiple ranges.
>>>>>> - Fix some compiling warnings.
>>>>>> 
>>>>>> Changes since [v2]
>>>>>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>>>>>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>>>>>> patch.
>>>>>> 
>>>>>> Changes since [v1]:
>>>>>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>>>>>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>>>>>> in fdt_enforce_memory_region().
>>>>>> There are at most two crash kernel regions, for two crash kernel regions
>>>>>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>>>>>> and then remove the memory range in the middle.
>>>>>> 
>>>>>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
>>>>>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbt0xN9PE$
>>>>>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbub7yUQH$
>>>>>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbnc4zPPV$
>>>>>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvsAsZLu$
>>>>>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbl24n-79$
>>>>>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbs7r8G2a$
>>>>>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiFUH90G$
>>>>>> 
>>>>>> Chen Zhou (5):
>>>>>>  x86: kdump: move reserve_crashkernel_low() into crash_core.c
>>>>>>  arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>>>>>>  arm64: kdump: add memory for devices by DT property, low-memory-range
>>>>>>  kdump: update Documentation about crashkernel on arm64
>>>>>>  dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
>>>>>> 
>>>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
>>>>> with bootargs as [2] of primary kernel.
>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>> 
>>>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
>>>>> and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>> Also **without** this patch-set
>>>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
>>>>> 
>>>>> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
>>>>> More details discussed earlier in
>>>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
>>>>> solution
>>>>> 
>>>>> This patch-set is expected to solve similar kind of issue.
>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
>>>>> observation should be considered/fixed. .
>>>>> 
>>>>> --pk
>>>>> 
>>>>> [1]
>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>> [   30.367696] NET: Registered protocol family 16
>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>> [   30.369984] Call trace:
>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>> [   30.369997]  dump_stack+0xc0/0x10c
>>>>> [   30.370001]  warn_alloc+0x10c/0x178
>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>>>> [   30.370020]  do_one_initcall+0x54/0x228
>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>>>> [   30.370031]  kernel_init+0x1c/0x110
>>>>> [   30.370034]  ret_from_fork+0x10/0x18
>>>>> [   30.370036] Mem-Info:
>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>>>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>>>> unstable:0kB all_unreclaimable? no
>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>>>> unstable:0kB all_unreclaimable? no
>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>>>> hugepages_surp=0 hugepages_size=1048576kB
>>>>> [   30.370130] 0 total pagecache pages
>>>>> [   30.370132] 0 pages in swap cache
>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>>>> [   30.370135] Free swap  = 0kB
>>>>> [   30.370136] Total swap = 0kB
>>>>> [   30.370137] 2164609 pages RAM
>>>>> [   30.370139] 0 pages HighMem/MovableOnly
>>>>> [   30.370140] 612331 pages reserved
>>>>> [   30.370141] 0 pages hwpoisoned
>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
>>>>> coherent allocation
>>>> 
>>>> 
>>>> During my testing I saw the same error and Chen's  solution corrected it .
>>> 
>>> Which combination you are using on your side? I am using Prabhakar's
>>> suggested environment and can reproduce the issue
>>> with or without Chen's crashkernel support above 4G patchset.
>>> 
>>> I am also using a ThunderX2 platform with latest makedumpfile code and
>>> kexec-tools (with the suggested patch
>>> <https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$ >).
>>> 
>>> Thanks,
>>> Bhupesh
>> 
>> 
>> I did this activity 5 months ago and I have moved on to other activities. My DMA failures were related to PCI devices that could not be enumerated because  low-DMA space was not  available when crashkernel was moved above 4G; I don’t recall the exact platform.
>> 
>> 
>> 
>> For this failure ,
>> 
>>>>> DMA: failed to allocate 256 KiB pool for atomic
>>>>> coherent allocation
>> 
>> 
>> Is due to :
>> 
>> 
>> 3618082c
>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
>> 
>> With the introduction of ZONE_DMA to support the Raspberry DMA
>> region below 1G, the crashkernel is placed in the upper 4G
>> ZONE_DMA_32 region. Since the crashkernel does not have access
>> to the ZONE_DMA region, it prints out call trace during bootup.
>> 
>> It is due to having this CONFIG item  ON  :
>> 
>> 
>> CONFIG_ZONE_DMA=y
>> 
>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
>> use the device tree to specify memory below 1G.
>> 
>> 
> 
> Disabling ZONE_DMA is temporary solution.  We may need proper solution


Perhaps the Raspberry platform configuration dependencies need separated  from “server class” Arm  equipment ?  Or auto-configured on boot ?  Consult an expert ;-) 



> 
>> I would like to see Chen’s feature added , perhaps as EXPERIMENTAL,  so we can get some configuration testing done on it.   It corrects having a DMA zone in low memory while crash-kernel is above 4GB.  This has been going on for a year now.
> 
> I will also like this patch to be added in Linux as early as possible.
> 
> Issue mentioned by me happens with or without this patch.
> 
> This patch-set can consider fixing because it uses low memory for DMA
> & swiotlb only.
> We can consider restricting crashkernel within the required range like below
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 7f9e5a6dc48c..bd67b90d35bd 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
>                        return 0;
>        }
> 
> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> +       low_base = memblock_find_in_range(0,0xc0000000, low_size, CRASH_ALIGN);
>        if (!low_base) {
>                pr_err("Cannot reserve %ldMB crashkernel low memory,
> please try smaller size.\n",
>                       (unsigned long)(low_size >> 20));
> 
> 

    I suspect  0xc0000000  would need to be a CONFIG item  and not hard-coded.
    
    

> Similar change can be considered for scenario "without" this patch.
> But it will decrease memory availability for crashkernel.
> Hence increase the failure probability of crashkernel reservation.
> 
> --pk


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-02 14:41           ` John Donnelly
@ 2020-06-03 11:47             ` Prabhakar Kushwaha
  2020-06-03 13:20               ` chenzhou
  0 siblings, 1 reply; 34+ messages in thread
From: Prabhakar Kushwaha @ 2020-06-03 11:47 UTC (permalink / raw)
  To: John Donnelly
  Cc: Bhupesh Sharma, Chen Zhou, Simon Horman, Devicetree List,
	Baoquan He, Will Deacon, Linux Doc Mailing List, Catalin Marinas,
	kexec mailing list, Linux Kernel Mailing List, Rob Herring,
	Ingo Molnar, Arnd Bergmann, guohanjun, James Morse,
	Thomas Gleixner, Prabhakar Kushwaha, RuiRui Yang,
	linux-arm-kernel

Hi Chen,

On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com> wrote:
>
>
>
> > On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <prabhakar.pkin@gmail.com> wrote:
> >
> > On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <john.p.donnelly@oracle.com> wrote:
> >>
> >> Hi .  See below !
> >>
> >>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> >>>
> >>> Hi John,
> >>>
> >>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> >>>>> Hi Chen,
> >>>>>
> >>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
> >>>>>> This patch series enable reserving crashkernel above 4G in arm64.
> >>>>>>
> >>>>>> There are following issues in arm64 kdump:
> >>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> >>>>>> when there is no enough low memory.
> >>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> >>>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> >>>>>> will boot failure because there is no low memory available for allocation.
> >>>>>>
> >>>>>> To solve these issues, introduce crashkernel=X,low to reserve specified
> >>>>>> size low memory.
> >>>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
> >>>>>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> >>>>>> size low memory for crash kdump kernel devices firstly and then reserve
> >>>>>> memory above 4G.
> >>>>>>
> >>>>>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
> >>>>>> is specified simultaneously, kernel should reserve specified size low memory
> >>>>>> for crash dump kernel devices. So there may be two crash kernel regions, one
> >>>>>> is below 4G, the other is above 4G.
> >>>>>> In order to distinct from the high region and make no effect to the use of
> >>>>>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
> >>>>>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
> >>>>>>
> >>>>>> Besides, we need to modify kexec-tools:
> >>>>>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
> >>>>>>
> >>>>>> The previous changes and discussions can be retrieved from:
> >>>>>>
> >>>>>> Changes since [v7]
> >>>>>> - Move x86 CRASH_ALIGN to 2M
> >>>>>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> >>>>>> - Update Documentation/devicetree/bindings/chosen.txt
> >>>>>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
> >>>>>> - Add Tested-by from Jhon and pk
> >>>>>>
> >>>>>> Changes since [v6]
> >>>>>> - Fix build errors reported by kbuild test robot.
> >>>>>>
> >>>>>> Changes since [v5]
> >>>>>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> >>>>>> - Delete crashkernel=X,high.
> >>>>>> - Modify crashkernel=X,low.
> >>>>>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> >>>>>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> >>>>>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> >>>>>> pass to crash dump kernel by DT property "linux,low-memory-range".
> >>>>>> - Update Documentation/admin-guide/kdump/kdump.rst.
> >>>>>>
> >>>>>> Changes since [v4]
> >>>>>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> >>>>>>
> >>>>>> Changes since [v3]
> >>>>>> - Add memblock_cap_memory_ranges back for multiple ranges.
> >>>>>> - Fix some compiling warnings.
> >>>>>>
> >>>>>> Changes since [v2]
> >>>>>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> >>>>>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> >>>>>> patch.
> >>>>>>
> >>>>>> Changes since [v1]:
> >>>>>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> >>>>>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> >>>>>> in fdt_enforce_memory_region().
> >>>>>> There are at most two crash kernel regions, for two crash kernel regions
> >>>>>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> >>>>>> and then remove the memory range in the middle.
> >>>>>>
> >>>>>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
> >>>>>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbt0xN9PE$
> >>>>>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbub7yUQH$
> >>>>>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbnc4zPPV$
> >>>>>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvsAsZLu$
> >>>>>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbl24n-79$
> >>>>>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbs7r8G2a$
> >>>>>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiFUH90G$
> >>>>>>
> >>>>>> Chen Zhou (5):
> >>>>>>  x86: kdump: move reserve_crashkernel_low() into crash_core.c
> >>>>>>  arm64: kdump: reserve crashkenel above 4G for crash dump kernel
> >>>>>>  arm64: kdump: add memory for devices by DT property, low-memory-range
> >>>>>>  kdump: update Documentation about crashkernel on arm64
> >>>>>>  dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
> >>>>>>
> >>>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
> >>>>> with bootargs as [2] of primary kernel.
> >>>>> This error observed on ThunderX2  ARM64 platform.
> >>>>>
> >>>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
> >>>>> and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> >>>>> Also **without** this patch-set
> >>>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
> >>>>>
> >>>>> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
> >>>>> More details discussed earlier in
> >>>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
> >>>>> solution
> >>>>>
> >>>>> This patch-set is expected to solve similar kind of issue.
> >>>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
> >>>>> observation should be considered/fixed. .
> >>>>>
> >>>>> --pk
> >>>>>
> >>>>> [1]
> >>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> >>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> >>>>> [   30.367696] NET: Registered protocol family 16
> >>>>> [   30.369973] swapper/0: page allocation failure: order:6,
> >>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> >>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
> >>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
> >>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> >>>>> [   30.369984] Call trace:
> >>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
> >>>>> [   30.369991]  show_stack+0x20/0x30
> >>>>> [   30.369997]  dump_stack+0xc0/0x10c
> >>>>> [   30.370001]  warn_alloc+0x10c/0x178
> >>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
> >>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
> >>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
> >>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
> >>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
> >>>>> [   30.370020]  do_one_initcall+0x54/0x228
> >>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
> >>>>> [   30.370031]  kernel_init+0x1c/0x110
> >>>>> [   30.370034]  ret_from_fork+0x10/0x18
> >>>>> [   30.370036] Mem-Info:
> >>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
> >>>>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
> >>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
> >>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
> >>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
> >>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
> >>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
> >>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> >>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> >>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> >>>>> unstable:0kB all_unreclaimable? no
> >>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
> >>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> >>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> >>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> >>>>> unstable:0kB all_unreclaimable? no
> >>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
> >>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
> >>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
> >>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
> >>>>> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>>>> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
> >>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
> >>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
> >>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>>>> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
> >>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
> >>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
> >>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> >>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> >>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> >>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
> >>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
> >>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
> >>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
> >>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
> >>>>> hugepages_surp=0 hugepages_size=1048576kB
> >>>>> [   30.370130] 0 total pagecache pages
> >>>>> [   30.370132] 0 pages in swap cache
> >>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
> >>>>> [   30.370135] Free swap  = 0kB
> >>>>> [   30.370136] Total swap = 0kB
> >>>>> [   30.370137] 2164609 pages RAM
> >>>>> [   30.370139] 0 pages HighMem/MovableOnly
> >>>>> [   30.370140] 612331 pages reserved
> >>>>> [   30.370141] 0 pages hwpoisoned
> >>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
> >>>>> coherent allocation
> >>>>
> >>>>
> >>>> During my testing I saw the same error and Chen's  solution corrected it .
> >>>
> >>> Which combination you are using on your side? I am using Prabhakar's
> >>> suggested environment and can reproduce the issue
> >>> with or without Chen's crashkernel support above 4G patchset.
> >>>
> >>> I am also using a ThunderX2 platform with latest makedumpfile code and
> >>> kexec-tools (with the suggested patch
> >>> <https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$ >).
> >>>
> >>> Thanks,
> >>> Bhupesh
> >>
> >>
> >> I did this activity 5 months ago and I have moved on to other activities. My DMA failures were related to PCI devices that could not be enumerated because  low-DMA space was not  available when crashkernel was moved above 4G; I don’t recall the exact platform.
> >>
> >>
> >>
> >> For this failure ,
> >>
> >>>>> DMA: failed to allocate 256 KiB pool for atomic
> >>>>> coherent allocation
> >>
> >>
> >> Is due to :
> >>
> >>
> >> 3618082c
> >> ("arm64 use both ZONE_DMA and ZONE_DMA32")
> >>
> >> With the introduction of ZONE_DMA to support the Raspberry DMA
> >> region below 1G, the crashkernel is placed in the upper 4G
> >> ZONE_DMA_32 region. Since the crashkernel does not have access
> >> to the ZONE_DMA region, it prints out call trace during bootup.
> >>
> >> It is due to having this CONFIG item  ON  :
> >>
> >>
> >> CONFIG_ZONE_DMA=y
> >>
> >> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
> >> use the device tree to specify memory below 1G.
> >>
> >>
> >
> > Disabling ZONE_DMA is temporary solution.  We may need proper solution
>
>
> Perhaps the Raspberry platform configuration dependencies need separated  from “server class” Arm  equipment ?  Or auto-configured on boot ?  Consult an expert ;-)
>
>
>
> >
> >> I would like to see Chen’s feature added , perhaps as EXPERIMENTAL,  so we can get some configuration testing done on it.   It corrects having a DMA zone in low memory while crash-kernel is above 4GB.  This has been going on for a year now.
> >
> > I will also like this patch to be added in Linux as early as possible.
> >
> > Issue mentioned by me happens with or without this patch.
> >
> > This patch-set can consider fixing because it uses low memory for DMA
> > & swiotlb only.
> > We can consider restricting crashkernel within the required range like below
> >
> > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > index 7f9e5a6dc48c..bd67b90d35bd 100644
> > --- a/kernel/crash_core.c
> > +++ b/kernel/crash_core.c
> > @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
> >                        return 0;
> >        }
> >
> > -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> > +       low_base = memblock_find_in_range(0,0xc0000000, low_size, CRASH_ALIGN);
> >        if (!low_base) {
> >                pr_err("Cannot reserve %ldMB crashkernel low memory,
> > please try smaller size.\n",
> >                       (unsigned long)(low_size >> 20));
> >
> >
>
>     I suspect  0xc0000000  would need to be a CONFIG item  and not hard-coded.
>

if you consider this as valid change,  can you please incorporate as
part of your patch-set.

--pk.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-03 11:47             ` Prabhakar Kushwaha
@ 2020-06-03 13:20               ` chenzhou
  2020-06-03 15:30                 ` John Donnelly
  0 siblings, 1 reply; 34+ messages in thread
From: chenzhou @ 2020-06-03 13:20 UTC (permalink / raw)
  To: Prabhakar Kushwaha, John Donnelly
  Cc: Bhupesh Sharma, Simon Horman, Devicetree List, Baoquan He,
	Will Deacon, Linux Doc Mailing List, Catalin Marinas,
	kexec mailing list, Linux Kernel Mailing List, Rob Herring,
	Ingo Molnar, Arnd Bergmann, guohanjun, James Morse,
	Thomas Gleixner, Prabhakar Kushwaha, RuiRui Yang,
	linux-arm-kernel

Hi,


On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
> Hi Chen,
>
> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com> wrote:
>>
>>
>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <prabhakar.pkin@gmail.com> wrote:
>>>
>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <john.p.donnelly@oracle.com> wrote:
>>>> Hi .  See below !
>>>>
>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>>>>>
>>>>> Hi John,
>>>>>
>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>> Hi Chen,
>>>>>>>
>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>>>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>>>>>>
>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>>>>>>> when there is no enough low memory.
>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>>>>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>>>>>>> will boot failure because there is no low memory available for allocation.
>>>>>>>>
>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>
>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
>>>>>>> and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>> Also **without** this patch-set
>>>>>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
>>>>>>>
>>>>>>> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
>>>>>>> More details discussed earlier in
>>>>>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
>>>>>>> solution
>>>>>>>
>>>>>>> This patch-set is expected to solve similar kind of issue.
>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
>>>>>>> observation should be considered/fixed. .
>>>>>>>
>>>>>>> --pk
>>>>>>>
>>>>>>> [1]
>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>> [   30.367696] NET: Registered protocol family 16
>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>> [   30.369984] Call trace:
>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>>>> [   30.369997]  dump_stack+0xc0/0x10c
>>>>>>> [   30.370001]  warn_alloc+0x10c/0x178
>>>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
>>>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>>>>>> [   30.370020]  do_one_initcall+0x54/0x228
>>>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>>>>>> [   30.370031]  kernel_init+0x1c/0x110
>>>>>>> [   30.370034]  ret_from_fork+0x10/0x18
>>>>>>> [   30.370036] Mem-Info:
>>>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>>>>>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
>>>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
>>>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
>>>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
>>>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
>>>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
>>>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
>>>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
>>>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
>>>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
>>>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>>>>>> hugepages_surp=0 hugepages_size=1048576kB
>>>>>>> [   30.370130] 0 total pagecache pages
>>>>>>> [   30.370132] 0 pages in swap cache
>>>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>> [   30.370135] Free swap  = 0kB
>>>>>>> [   30.370136] Total swap = 0kB
>>>>>>> [   30.370137] 2164609 pages RAM
>>>>>>> [   30.370139] 0 pages HighMem/MovableOnly
>>>>>>> [   30.370140] 612331 pages reserved
>>>>>>> [   30.370141] 0 pages hwpoisoned
>>>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
>>>>>>> coherent allocation
>>>>>>
>>>>>> During my testing I saw the same error and Chen's  solution corrected it .
>>>>> Which combination you are using on your side? I am using Prabhakar's
>>>>> suggested environment and can reproduce the issue
>>>>> with or without Chen's crashkernel support above 4G patchset.
>>>>>
>>>>> I am also using a ThunderX2 platform with latest makedumpfile code and
>>>>> kexec-tools (with the suggested patch
>>>>> <https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$ >).
>>>>>
>>>>> Thanks,
>>>>> Bhupesh
>>>>
>>>> I did this activity 5 months ago and I have moved on to other activities. My DMA failures were related to PCI devices that could not be enumerated because  low-DMA space was not  available when crashkernel was moved above 4G; I don’t recall the exact platform.
>>>>
>>>>
>>>>
>>>> For this failure ,
>>>>
>>>>>>> DMA: failed to allocate 256 KiB pool for atomic
>>>>>>> coherent allocation
>>>>
>>>> Is due to :
>>>>
>>>>
>>>> 3618082c
>>>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
>>>>
>>>> With the introduction of ZONE_DMA to support the Raspberry DMA
>>>> region below 1G, the crashkernel is placed in the upper 4G
>>>> ZONE_DMA_32 region. Since the crashkernel does not have access
>>>> to the ZONE_DMA region, it prints out call trace during bootup.
>>>>
>>>> It is due to having this CONFIG item  ON  :
>>>>
>>>>
>>>> CONFIG_ZONE_DMA=y
>>>>
>>>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
>>>> use the device tree to specify memory below 1G.
>>>>
>>>>
>>> Disabling ZONE_DMA is temporary solution.  We may need proper solution
>>
>> Perhaps the Raspberry platform configuration dependencies need separated  from “server class” Arm  equipment ?  Or auto-configured on boot ?  Consult an expert ;-)
>>
>>
>>
>>>> I would like to see Chen’s feature added , perhaps as EXPERIMENTAL,  so we can get some configuration testing done on it.   It corrects having a DMA zone in low memory while crash-kernel is above 4GB.  This has been going on for a year now.
>>> I will also like this patch to be added in Linux as early as possible.
>>>
>>> Issue mentioned by me happens with or without this patch.
>>>
>>> This patch-set can consider fixing because it uses low memory for DMA
>>> & swiotlb only.
>>> We can consider restricting crashkernel within the required range like below
>>>
>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
>>> --- a/kernel/crash_core.c
>>> +++ b/kernel/crash_core.c
>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
>>>                        return 0;
>>>        }
>>>
>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size, CRASH_ALIGN);
>>>        if (!low_base) {
>>>                pr_err("Cannot reserve %ldMB crashkernel low memory,
>>> please try smaller size.\n",
>>>                       (unsigned long)(low_size >> 20));
>>>
>>>
>>     I suspect  0xc0000000  would need to be a CONFIG item  and not hard-coded.
>>
> if you consider this as valid change,  can you please incorporate as
> part of your patch-set.

After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-4G memory is splited
to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem 0x0000000040000000-0x00000000ffffffff] on arm64.

From the above discussion, on your platform, the low crashkernel fall in DMA32 region, but your environment needs to access DMA
region, so there is the call trace.

I have a question, why do you choose 0xc0000000 here?

Besides, this is common code, we also need to consider about x86.

Thanks,
Chen Zhou

>
> --pk.
>
> .
>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-03 13:20               ` chenzhou
@ 2020-06-03 15:30                 ` John Donnelly
  2020-06-03 19:47                   ` Bhupesh Sharma
  0 siblings, 1 reply; 34+ messages in thread
From: John Donnelly @ 2020-06-03 15:30 UTC (permalink / raw)
  To: chenzhou
  Cc: Prabhakar Kushwaha, Devicetree List, Arnd Bergmann, Baoquan He,
	Linux Doc Mailing List, Catalin Marinas, Bhupesh Sharma,
	RuiRui Yang, kexec mailing list, Linux Kernel Mailing List,
	Rob Herring, Simon Horman, James Morse, guohanjun,
	Thomas Gleixner, Prabhakar Kushwaha, Will Deacon, Ingo Molnar,
	linux-arm-kernel, nsaenzjulienne



> On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
> 
> Hi,
> 
> 
> On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
>> Hi Chen,
>> 
>> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com> wrote:
>>> 
>>> 
>>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <prabhakar.pkin@gmail.com> wrote:
>>>> 
>>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <john.p.donnelly@oracle.com> wrote:
>>>>> Hi .  See below !
>>>>> 
>>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>>>>>> 
>>>>>> Hi John,
>>>>>> 
>>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> 
>>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>>> Hi Chen,
>>>>>>>> 
>>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>>>>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>>>>>>> 
>>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>>>>>>>> when there is no enough low memory.
>>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>>>>>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>>>>>>>> will boot failure because there is no low memory available for allocation.
>>>>>>>>> 
>>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
>>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>> 
>>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
>>>>>>>> and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>>> Also **without** this patch-set
>>>>>>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
>>>>>>>> 
>>>>>>>> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
>>>>>>>> More details discussed earlier in
>>>>>>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
>>>>>>>> solution
>>>>>>>> 
>>>>>>>> This patch-set is expected to solve similar kind of issue.
>>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
>>>>>>>> observation should be considered/fixed. .
>>>>>>>> 
>>>>>>>> --pk
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>> [   30.367696] NET: Registered protocol family 16
>>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
>>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>> [   30.369984] Call trace:
>>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>>>>> [   30.369997]  dump_stack+0xc0/0x10c
>>>>>>>> [   30.370001]  warn_alloc+0x10c/0x178
>>>>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
>>>>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>>>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>>>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>>>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>>>>>>> [   30.370020]  do_one_initcall+0x54/0x228
>>>>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>>>>>>> [   30.370031]  kernel_init+0x1c/0x110
>>>>>>>> [   30.370034]  ret_from_fork+0x10/0x18
>>>>>>>> [   30.370036] Mem-Info:
>>>>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>>>>>>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
>>>>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>>>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>>>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>>>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>>>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>>>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
>>>>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>>>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
>>>>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>>> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
>>>>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>>>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
>>>>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>>> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
>>>>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
>>>>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>>>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>>>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>>>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
>>>>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
>>>>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
>>>>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>>>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>>>>>>> hugepages_surp=0 hugepages_size=1048576kB
>>>>>>>> [   30.370130] 0 total pagecache pages
>>>>>>>> [   30.370132] 0 pages in swap cache
>>>>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>>> [   30.370135] Free swap  = 0kB
>>>>>>>> [   30.370136] Total swap = 0kB
>>>>>>>> [   30.370137] 2164609 pages RAM
>>>>>>>> [   30.370139] 0 pages HighMem/MovableOnly
>>>>>>>> [   30.370140] 612331 pages reserved
>>>>>>>> [   30.370141] 0 pages hwpoisoned
>>>>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
>>>>>>>> coherent allocation
>>>>>>> 
>>>>>>> During my testing I saw the same error and Chen's  solution corrected it .
>>>>>> Which combination you are using on your side? I am using Prabhakar's
>>>>>> suggested environment and can reproduce the issue
>>>>>> with or without Chen's crashkernel support above 4G patchset.
>>>>>> 
>>>>>> I am also using a ThunderX2 platform with latest makedumpfile code and
>>>>>> kexec-tools (with the suggested patch
>>>>>> <https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$ >).
>>>>>> 
>>>>>> Thanks,
>>>>>> Bhupesh
>>>>> 
>>>>> I did this activity 5 months ago and I have moved on to other activities. My DMA failures were related to PCI devices that could not be enumerated because  low-DMA space was not  available when crashkernel was moved above 4G; I don’t recall the exact platform.
>>>>> 
>>>>> 
>>>>> 
>>>>> For this failure ,
>>>>> 
>>>>>>>> DMA: failed to allocate 256 KiB pool for atomic
>>>>>>>> coherent allocation
>>>>> 
>>>>> Is due to :
>>>>> 
>>>>> 
>>>>> 3618082c
>>>>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
>>>>> 
>>>>> With the introduction of ZONE_DMA to support the Raspberry DMA
>>>>> region below 1G, the crashkernel is placed in the upper 4G
>>>>> ZONE_DMA_32 region. Since the crashkernel does not have access
>>>>> to the ZONE_DMA region, it prints out call trace during bootup.
>>>>> 
>>>>> It is due to having this CONFIG item  ON  :
>>>>> 
>>>>> 
>>>>> CONFIG_ZONE_DMA=y
>>>>> 
>>>>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
>>>>> use the device tree to specify memory below 1G.
>>>>> 
>>>>> 
>>>> Disabling ZONE_DMA is temporary solution.  We may need proper solution
>>> 
>>> Perhaps the Raspberry platform configuration dependencies need separated  from “server class” Arm  equipment ?  Or auto-configured on boot ?  Consult an expert ;-)
>>> 
>>> 
>>> 
>>>>> I would like to see Chen’s feature added , perhaps as EXPERIMENTAL,  so we can get some configuration testing done on it.   It corrects having a DMA zone in low memory while crash-kernel is above 4GB.  This has been going on for a year now.
>>>> I will also like this patch to be added in Linux as early as possible.
>>>> 
>>>> Issue mentioned by me happens with or without this patch.
>>>> 
>>>> This patch-set can consider fixing because it uses low memory for DMA
>>>> & swiotlb only.
>>>> We can consider restricting crashkernel within the required range like below
>>>> 
>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
>>>> --- a/kernel/crash_core.c
>>>> +++ b/kernel/crash_core.c
>>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
>>>>                       return 0;
>>>>       }
>>>> 
>>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size, CRASH_ALIGN);
>>>>       if (!low_base) {
>>>>               pr_err("Cannot reserve %ldMB crashkernel low memory,
>>>> please try smaller size.\n",
>>>>                      (unsigned long)(low_size >> 20));
>>>> 
>>>> 
>>>    I suspect  0xc0000000  would need to be a CONFIG item  and not hard-coded.
>>> 
>> if you consider this as valid change,  can you please incorporate as
>> part of your patch-set.
> 
> After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-4G memory is splited
> to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem 0x0000000040000000-0x00000000ffffffff] on arm64.
> 
> From the above discussion, on your platform, the low crashkernel fall in DMA32 region, but your environment needs to access DMA
> region, so there is the call trace.
> 
> I have a question, why do you choose 0xc0000000 here?
> 
> Besides, this is common code, we also need to consider about x86.
> 

 + nsaenzjulienne@suse.de 

  Exactly .  This is why it needs to be a CONFIG option for  Raspberry ..,  or device tree option. 


  We could revert 1a8e1cef7 since it broke  Arm kdump too.


> 
> Thanks,
> Chen Zhou
> 


 
 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-03 15:30                 ` John Donnelly
@ 2020-06-03 19:47                   ` Bhupesh Sharma
  2020-06-04  7:14                     ` Will Deacon
  2020-06-04 17:01                     ` Nicolas Saenz Julienne
  0 siblings, 2 replies; 34+ messages in thread
From: Bhupesh Sharma @ 2020-06-03 19:47 UTC (permalink / raw)
  To: John Donnelly
  Cc: chenzhou, Simon Horman, Devicetree List, Arnd Bergmann,
	Baoquan He, Linux Doc Mailing List, Catalin Marinas, guohanjun,
	kexec mailing list, Linux Kernel Mailing List, Will Deacon,
	Rob Herring, James Morse, nsaenzjulienne, Prabhakar Kushwaha,
	Thomas Gleixner, Prabhakar Kushwaha, RuiRui Yang, Ingo Molnar,
	linux-arm-kernel

Hi All,

On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@oracle.com> wrote:
>
>
>
> > On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
> >
> > Hi,
> >
> >
> > On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
> >> Hi Chen,
> >>
> >> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com> wrote:
> >>>
> >>>
> >>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <prabhakar.pkin@gmail.com> wrote:
> >>>>
> >>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <john.p.donnelly@oracle.com> wrote:
> >>>>> Hi .  See below !
> >>>>>
> >>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> >>>>>>
> >>>>>> Hi John,
> >>>>>>
> >>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>>
> >>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> >>>>>>>> Hi Chen,
> >>>>>>>>
> >>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
> >>>>>>>>> This patch series enable reserving crashkernel above 4G in arm64.
> >>>>>>>>>
> >>>>>>>>> There are following issues in arm64 kdump:
> >>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> >>>>>>>>> when there is no enough low memory.
> >>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> >>>>>>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> >>>>>>>>> will boot failure because there is no low memory available for allocation.
> >>>>>>>>>
> >>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
> >>>>>>>> with bootargs as [2] of primary kernel.
> >>>>>>>> This error observed on ThunderX2  ARM64 platform.
> >>>>>>>>
> >>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
> >>>>>>>> and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> >>>>>>>> Also **without** this patch-set
> >>>>>>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
> >>>>>>>>
> >>>>>>>> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
> >>>>>>>> More details discussed earlier in
> >>>>>>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
> >>>>>>>> solution
> >>>>>>>>
> >>>>>>>> This patch-set is expected to solve similar kind of issue.
> >>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
> >>>>>>>> observation should be considered/fixed. .
> >>>>>>>>
> >>>>>>>> --pk
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> >>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> >>>>>>>> [   30.367696] NET: Registered protocol family 16
> >>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
> >>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> >>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
> >>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
> >>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> >>>>>>>> [   30.369984] Call trace:
> >>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
> >>>>>>>> [   30.369991]  show_stack+0x20/0x30
> >>>>>>>> [   30.369997]  dump_stack+0xc0/0x10c
> >>>>>>>> [   30.370001]  warn_alloc+0x10c/0x178
> >>>>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
> >>>>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
> >>>>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
> >>>>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
> >>>>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
> >>>>>>>> [   30.370020]  do_one_initcall+0x54/0x228
> >>>>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
> >>>>>>>> [   30.370031]  kernel_init+0x1c/0x110
> >>>>>>>> [   30.370034]  ret_from_fork+0x10/0x18
> >>>>>>>> [   30.370036] Mem-Info:
> >>>>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
> >>>>>>>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
> >>>>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
> >>>>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
> >>>>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
> >>>>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
> >>>>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
> >>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> >>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> >>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> >>>>>>>> unstable:0kB all_unreclaimable? no
> >>>>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
> >>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> >>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
> >>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
> >>>>>>>> unstable:0kB all_unreclaimable? no
> >>>>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
> >>>>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>>>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
> >>>>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
> >>>>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
> >>>>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>>>>>>> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
> >>>>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
> >>>>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
> >>>>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >>>>>>>> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
> >>>>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
> >>>>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
> >>>>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> >>>>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> >>>>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> >>>>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
> >>>>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
> >>>>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
> >>>>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
> >>>>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
> >>>>>>>> hugepages_surp=0 hugepages_size=1048576kB
> >>>>>>>> [   30.370130] 0 total pagecache pages
> >>>>>>>> [   30.370132] 0 pages in swap cache
> >>>>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
> >>>>>>>> [   30.370135] Free swap  = 0kB
> >>>>>>>> [   30.370136] Total swap = 0kB
> >>>>>>>> [   30.370137] 2164609 pages RAM
> >>>>>>>> [   30.370139] 0 pages HighMem/MovableOnly
> >>>>>>>> [   30.370140] 612331 pages reserved
> >>>>>>>> [   30.370141] 0 pages hwpoisoned
> >>>>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
> >>>>>>>> coherent allocation
> >>>>>>>
> >>>>>>> During my testing I saw the same error and Chen's  solution corrected it .
> >>>>>> Which combination you are using on your side? I am using Prabhakar's
> >>>>>> suggested environment and can reproduce the issue
> >>>>>> with or without Chen's crashkernel support above 4G patchset.
> >>>>>>
> >>>>>> I am also using a ThunderX2 platform with latest makedumpfile code and
> >>>>>> kexec-tools (with the suggested patch
> >>>>>> <https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$ >).
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Bhupesh
> >>>>>
> >>>>> I did this activity 5 months ago and I have moved on to other activities. My DMA failures were related to PCI devices that could not be enumerated because  low-DMA space was not  available when crashkernel was moved above 4G; I don’t recall the exact platform.
> >>>>>
> >>>>>
> >>>>>
> >>>>> For this failure ,
> >>>>>
> >>>>>>>> DMA: failed to allocate 256 KiB pool for atomic
> >>>>>>>> coherent allocation
> >>>>>
> >>>>> Is due to :
> >>>>>
> >>>>>
> >>>>> 3618082c
> >>>>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
> >>>>>
> >>>>> With the introduction of ZONE_DMA to support the Raspberry DMA
> >>>>> region below 1G, the crashkernel is placed in the upper 4G
> >>>>> ZONE_DMA_32 region. Since the crashkernel does not have access
> >>>>> to the ZONE_DMA region, it prints out call trace during bootup.
> >>>>>
> >>>>> It is due to having this CONFIG item  ON  :
> >>>>>
> >>>>>
> >>>>> CONFIG_ZONE_DMA=y
> >>>>>
> >>>>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
> >>>>> use the device tree to specify memory below 1G.
> >>>>>
> >>>>>
> >>>> Disabling ZONE_DMA is temporary solution.  We may need proper solution
> >>>
> >>> Perhaps the Raspberry platform configuration dependencies need separated  from “server class” Arm  equipment ?  Or auto-configured on boot ?  Consult an expert ;-)
> >>>
> >>>
> >>>
> >>>>> I would like to see Chen’s feature added , perhaps as EXPERIMENTAL,  so we can get some configuration testing done on it.   It corrects having a DMA zone in low memory while crash-kernel is above 4GB.  This has been going on for a year now.
> >>>> I will also like this patch to be added in Linux as early as possible.
> >>>>
> >>>> Issue mentioned by me happens with or without this patch.
> >>>>
> >>>> This patch-set can consider fixing because it uses low memory for DMA
> >>>> & swiotlb only.
> >>>> We can consider restricting crashkernel within the required range like below
> >>>>
> >>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> >>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
> >>>> --- a/kernel/crash_core.c
> >>>> +++ b/kernel/crash_core.c
> >>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
> >>>>                       return 0;
> >>>>       }
> >>>>
> >>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> >>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size, CRASH_ALIGN);
> >>>>       if (!low_base) {
> >>>>               pr_err("Cannot reserve %ldMB crashkernel low memory,
> >>>> please try smaller size.\n",
> >>>>                      (unsigned long)(low_size >> 20));
> >>>>
> >>>>
> >>>    I suspect  0xc0000000  would need to be a CONFIG item  and not hard-coded.
> >>>
> >> if you consider this as valid change,  can you please incorporate as
> >> part of your patch-set.
> >
> > After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-4G memory is splited
> > to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem 0x0000000040000000-0x00000000ffffffff] on arm64.
> >
> > From the above discussion, on your platform, the low crashkernel fall in DMA32 region, but your environment needs to access DMA
> > region, so there is the call trace.
> >
> > I have a question, why do you choose 0xc0000000 here?
> >
> > Besides, this is common code, we also need to consider about x86.
> >
>
>  + nsaenzjulienne@suse.de
>
>   Exactly .  This is why it needs to be a CONFIG option for  Raspberry ..,  or device tree option.
>
>
>   We could revert 1a8e1cef7 since it broke  Arm kdump too.

Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
list, thus we couldn't get many eyes on it for a thorough review from
kexec/kdump p-o-v.

Also we historically never had distinction in common arch code on the
basis of the intended end use-case: embedded, server or automotive, so
I am not sure introducing a Raspberry specific CONFIG option would be
a good idea.

So, rather than reverting the patch, we can look at addressing the
same properly this time - especially from a kdump p-o-v.
This issue has been reported by some Red Hat arm64 partners with
upstream kernel also and as we have noticed in the past as well,
hardcoding the placement of the crashkernel base address (unless the
base address is specified by a crashkernel=X@Y like bootargs) is also
not a portable suggestion.

I am working on a possible fix and will have more updates on the same
in a day-or-two.

Thanks,
Bhupesh


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-03 19:47                   ` Bhupesh Sharma
@ 2020-06-04  7:14                     ` Will Deacon
  2020-06-04 17:01                     ` Nicolas Saenz Julienne
  1 sibling, 0 replies; 34+ messages in thread
From: Will Deacon @ 2020-06-04  7:14 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: John Donnelly, chenzhou, Simon Horman, Devicetree List,
	Arnd Bergmann, Baoquan He, Linux Doc Mailing List,
	Catalin Marinas, guohanjun, kexec mailing list,
	Linux Kernel Mailing List, Rob Herring, James Morse,
	nsaenzjulienne, Prabhakar Kushwaha, Thomas Gleixner,
	Prabhakar Kushwaha, RuiRui Yang, Ingo Molnar, linux-arm-kernel

On Thu, Jun 04, 2020 at 01:17:06AM +0530, Bhupesh Sharma wrote:
> On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@oracle.com> wrote:
> > > On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
> > > On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
> > >>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > >>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
> > >>>> --- a/kernel/crash_core.c
> > >>>> +++ b/kernel/crash_core.c
> > >>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
> > >>>>                       return 0;
> > >>>>       }
> > >>>>
> > >>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> > >>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size, CRASH_ALIGN);
> > >>>>       if (!low_base) {
> > >>>>               pr_err("Cannot reserve %ldMB crashkernel low memory,
> > >>>> please try smaller size.\n",
> > >>>>                      (unsigned long)(low_size >> 20));
> > >>>>
> > >>>>
> > >>>    I suspect  0xc0000000  would need to be a CONFIG item  and not hard-coded.
> > >>>
> > >> if you consider this as valid change,  can you please incorporate as
> > >> part of your patch-set.
> > >
> > > After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and
> > > ZONE_DMA32"),the 0-4G memory is splited to DMA [mem
> > > 0x0000000000000000-0x000000003fffffff] and DMA32 [mem
> > > 0x0000000040000000-0x00000000ffffffff] on arm64.
> > >
> > > From the above discussion, on your platform, the low crashkernel fall
> > > in DMA32 region, but your environment needs to access DMA region, so
> > > there is the call trace.
> > >
> > > I have a question, why do you choose 0xc0000000 here?
> > >
> > > Besides, this is common code, we also need to consider about x86.
> > >
> >
> >  + nsaenzjulienne@suse.de
> >
> >   Exactly .  This is why it needs to be a CONFIG option for  Raspberry
> >   ..,  or device tree option.
> >
> >
> >   We could revert 1a8e1cef7 since it broke  Arm kdump too.
> 
> Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
> both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
> list, thus we couldn't get many eyes on it for a thorough review from
> kexec/kdump p-o-v.
> 
> Also we historically never had distinction in common arch code on the
> basis of the intended end use-case: embedded, server or automotive, so
> I am not sure introducing a Raspberry specific CONFIG option would be
> a good idea.

Right, we need a fix that works for everybody, since we try hard for a
single Image that works for all platforms.

What I don't really understand is why, with Chen's patches applied, we can't
just keep the crashkernel out of the DMA zones altogether when no base is
specified. I guess I'll just look out for your patch!

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-03 19:47                   ` Bhupesh Sharma
  2020-06-04  7:14                     ` Will Deacon
@ 2020-06-04 17:01                     ` Nicolas Saenz Julienne
  2020-06-05  2:26                       ` John Donnelly
  2020-06-19  2:32                       ` John Donnelly
  1 sibling, 2 replies; 34+ messages in thread
From: Nicolas Saenz Julienne @ 2020-06-04 17:01 UTC (permalink / raw)
  To: Bhupesh Sharma, John Donnelly
  Cc: chenzhou, Simon Horman, Devicetree List, Arnd Bergmann,
	Baoquan He, Linux Doc Mailing List, Catalin Marinas, guohanjun,
	kexec mailing list, Linux Kernel Mailing List, Will Deacon,
	Rob Herring, James Morse, Prabhakar Kushwaha, Thomas Gleixner,
	Prabhakar Kushwaha, RuiRui Yang, Ingo Molnar, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 16696 bytes --]

On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:
> Hi All,
> 
> On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@oracle.com>
> wrote:
> > 
> > 
> > > On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
> > > 
> > > Hi,
> > > 
> > > 
> > > On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
> > > > Hi Chen,
> > > > 
> > > > On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com
> > > > > wrote:
> > > > > 
> > > > > > On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
> > > > > > prabhakar.pkin@gmail.com> wrote:
> > > > > > 
> > > > > > On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
> > > > > > john.p.donnelly@oracle.com> wrote:
> > > > > > > Hi .  See below !
> > > > > > > 
> > > > > > > > On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com>
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > Hi John,
> > > > > > > > 
> > > > > > > > On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
> > > > > > > > John.P.donnelly@oracle.com> wrote:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> > > > > > > > > > Hi Chen,
> > > > > > > > > > 
> > > > > > > > > > On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
> > > > > > > > > > chenzhou10@huawei.com> wrote:
> > > > > > > > > > > This patch series enable reserving crashkernel above 4G in
> > > > > > > > > > > arm64.
> > > > > > > > > > > 
> > > > > > > > > > > There are following issues in arm64 kdump:
> > > > > > > > > > > 1. We use crashkernel=X to reserve crashkernel below 4G,
> > > > > > > > > > > which will fail
> > > > > > > > > > > when there is no enough low memory.
> > > > > > > > > > > 2. Currently, crashkernel=Y@X can be used to reserve
> > > > > > > > > > > crashkernel above 4G,
> > > > > > > > > > > in this case, if swiotlb or DMA buffers are required,
> > > > > > > > > > > crash dump kernel
> > > > > > > > > > > will boot failure because there is no low memory available
> > > > > > > > > > > for allocation.
> > > > > > > > > > > 
> > > > > > > > > > We are getting "warn_alloc" [1] warning during boot of kdump
> > > > > > > > > > kernel
> > > > > > > > > > with bootargs as [2] of primary kernel.
> > > > > > > > > > This error observed on ThunderX2  ARM64 platform.
> > > > > > > > > > 
> > > > > > > > > > It is observed with latest upstream tag (v5.7-rc3) with this
> > > > > > > > > > patch set
> > > > > > > > > > and 
> > > > > > > > > > 
https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> > > > > > > > > > Also **without** this patch-set
> > > > > > > > > > "
> > > > > > > > > > 
https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
> > > > > > > > > > "
> > > > > > > > > > 
> > > > > > > > > > This issue comes whenever crashkernel memory is reserved
> > > > > > > > > > after 0xc000_0000.
> > > > > > > > > > More details discussed earlier in
> > > > > > > > > > 
https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
  without
> > > > > > > > > > any
> > > > > > > > > > solution
> > > > > > > > > > 
> > > > > > > > > > This patch-set is expected to solve similar kind of issue.
> > > > > > > > > > i.e. low memory is only targeted for DMA, swiotlb; So above
> > > > > > > > > > mentioned
> > > > > > > > > > observation should be considered/fixed. .
> > > > > > > > > > 
> > > > > > > > > > --pk
> > > > > > > > > > 
> > > > > > > > > > [1]
> > > > > > > > > > [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> > > > > > > > > > TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> > > > > > > > > > [   30.367696] NET: Registered protocol family 16
> > > > > > > > > > [   30.369973] swapper/0: page allocation failure: order:6,
> > > > > > > > > > mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> > > > > > > > > > [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > > > > > > > > 5.7.0-rc3+ #121
> > > > > > > > > > [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
> > > > > > > > > > TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> > > > > > > > > > [   30.369984] Call trace:
> > > > > > > > > > [   30.369989]  dump_backtrace+0x0/0x1f8
> > > > > > > > > > [   30.369991]  show_stack+0x20/0x30
> > > > > > > > > > [   30.369997]  dump_stack+0xc0/0x10c
> > > > > > > > > > [   30.370001]  warn_alloc+0x10c/0x178
> > > > > > > > > > [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
> > > > > > > > > > xb50
> > > > > > > > > > [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
> > > > > > > > > > [   30.370008]  alloc_page_interleave+0x24/0x98
> > > > > > > > > > [   30.370011]  alloc_pages_current+0xe4/0x108
> > > > > > > > > > [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
> > > > > > > > > > [   30.370020]  do_one_initcall+0x54/0x228
> > > > > > > > > > [   30.370027]  kernel_init_freeable+0x228/0x2cc
> > > > > > > > > > [   30.370031]  kernel_init+0x1c/0x110
> > > > > > > > > > [   30.370034]  ret_from_fork+0x10/0x18
> > > > > > > > > > [   30.370036] Mem-Info:
> > > > > > > > > > [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
> > > > > > > > > > [   30.370064]  active_file:0 inactive_file:0
> > > > > > > > > > isolated_file:0
> > > > > > > > > > [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
> > > > > > > > > > [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
> > > > > > > > > > [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
> > > > > > > > > > [   30.370064]  free:1537719 free_pcp:219 free_cma:0
> > > > > > > > > > [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
> > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > isolated(anon):0kB
> > > > > > > > > > isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
> > > > > > > > > > shmem:0kB
> > > > > > > > > > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
> > > > > > > > > > writeback_tmp:0kB
> > > > > > > > > > unstable:0kB all_unreclaimable? no
> > > > > > > > > > [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
> > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > isolated(anon):0kB
> > > > > > > > > > isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
> > > > > > > > > > shmem:0kB
> > > > > > > > > > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
> > > > > > > > > > writeback_tmp:0kB
> > > > > > > > > > unstable:0kB all_unreclaimable? no
> > > > > > > > > > [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
> > > > > > > > > > reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > writepending:0kB
> > > > > > > > > > present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
> > > > > > > > > > pagetables:0kB
> > > > > > > > > > bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > > > > > > > [   30.370084] lowmem_reserve[]: 0 250 6063 6063
> > > > > > > > > > [   30.370090] Node 0 DMA32 free:256000kB min:408kB
> > > > > > > > > > low:664kB
> > > > > > > > > > high:920kB reserved_highatomic:0KB active_anon:0kB
> > > > > > > > > > inactive_anon:0kB
> > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > writepending:0kB
> > > > > > > > > > present:269700kB managed:256000kB mlocked:0kB
> > > > > > > > > > kernel_stack:0kB
> > > > > > > > > > pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
> > > > > > > > > > free_cma:0kB
> > > > > > > > > > [   30.370094] lowmem_reserve[]: 0 0 5813 5813
> > > > > > > > > > [   30.370100] Node 0 Normal free:5894876kB min:9552kB
> > > > > > > > > > low:15504kB
> > > > > > > > > > high:21456kB reserved_highatomic:0KB active_anon:0kB
> > > > > > > > > > inactive_anon:0kB
> > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > writepending:0kB
> > > > > > > > > > present:8388608kB managed:5953112kB mlocked:0kB
> > > > > > > > > > kernel_stack:21672kB
> > > > > > > > > > pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB
> > > > > > > > > > free_cma:0kB
> > > > > > > > > > [   30.370104] lowmem_reserve[]: 0 0 0 0
> > > > > > > > > > [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB
> > > > > > > > > > 0*128kB
> > > > > > > > > > 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> > > > > > > > > > [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB
> > > > > > > > > > 0*64kB 0*128kB
> > > > > > > > > > 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) =
> > > > > > > > > > 256000kB
> > > > > > > > > > [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB
> > > > > > > > > > (UE) 3*32kB
> > > > > > > > > > (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME)
> > > > > > > > > > 3*1024kB (ME)
> > > > > > > > > > 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
> > > > > > > > > > [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
> > > > > > > > > > hugepages_surp=0 hugepages_size=1048576kB
> > > > > > > > > > [   30.370130] 0 total pagecache pages
> > > > > > > > > > [   30.370132] 0 pages in swap cache
> > > > > > > > > > [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
> > > > > > > > > > [   30.370135] Free swap  = 0kB
> > > > > > > > > > [   30.370136] Total swap = 0kB
> > > > > > > > > > [   30.370137] 2164609 pages RAM
> > > > > > > > > > [   30.370139] 0 pages HighMem/MovableOnly
> > > > > > > > > > [   30.370140] 612331 pages reserved
> > > > > > > > > > [   30.370141] 0 pages hwpoisoned
> > > > > > > > > > [   30.370143] DMA: failed to allocate 256 KiB pool for
> > > > > > > > > > atomic
> > > > > > > > > > coherent allocation
> > > > > > > > > 
> > > > > > > > > During my testing I saw the same error and Chen's  solution
> > > > > > > > > corrected it .
> > > > > > > > Which combination you are using on your side? I am using
> > > > > > > > Prabhakar's
> > > > > > > > suggested environment and can reproduce the issue
> > > > > > > > with or without Chen's crashkernel support above 4G patchset.
> > > > > > > > 
> > > > > > > > I am also using a ThunderX2 platform with latest makedumpfile
> > > > > > > > code and
> > > > > > > > kexec-tools (with the suggested patch
> > > > > > > > <
> > > > > > > > 
https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$
> > > > > > > > >).
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Bhupesh
> > > > > > > 
> > > > > > > I did this activity 5 months ago and I have moved on to other
> > > > > > > activities. My DMA failures were related to PCI devices that could
> > > > > > > not be enumerated because  low-DMA space was not  available when
> > > > > > > crashkernel was moved above 4G; I don’t recall the exact platform.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > For this failure ,
> > > > > > > 
> > > > > > > > > > DMA: failed to allocate 256 KiB pool for atomic
> > > > > > > > > > coherent allocation
> > > > > > > 
> > > > > > > Is due to :
> > > > > > > 
> > > > > > > 
> > > > > > > 3618082c
> > > > > > > ("arm64 use both ZONE_DMA and ZONE_DMA32")
> > > > > > > 
> > > > > > > With the introduction of ZONE_DMA to support the Raspberry DMA
> > > > > > > region below 1G, the crashkernel is placed in the upper 4G
> > > > > > > ZONE_DMA_32 region. Since the crashkernel does not have access
> > > > > > > to the ZONE_DMA region, it prints out call trace during bootup.
> > > > > > > 
> > > > > > > It is due to having this CONFIG item  ON  :
> > > > > > > 
> > > > > > > 
> > > > > > > CONFIG_ZONE_DMA=y
> > > > > > > 
> > > > > > > Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
> > > > > > > use the device tree to specify memory below 1G.
> > > > > > > 
> > > > > > > 
> > > > > > Disabling ZONE_DMA is temporary solution.  We may need proper
> > > > > > solution
> > > > > 
> > > > > Perhaps the Raspberry platform configuration dependencies need
> > > > > separated  from “server class” Arm  equipment ?  Or auto-configured on
> > > > > boot ?  Consult an expert ;-)
> > > > > 
> > > > > 
> > > > > 
> > > > > > > I would like to see Chen’s feature added , perhaps as
> > > > > > > EXPERIMENTAL,  so we can get some configuration testing done on
> > > > > > > it.   It corrects having a DMA zone in low memory while crash-
> > > > > > > kernel is above 4GB.  This has been going on for a year now.
> > > > > > I will also like this patch to be added in Linux as early as
> > > > > > possible.
> > > > > > 
> > > > > > Issue mentioned by me happens with or without this patch.
> > > > > > 
> > > > > > This patch-set can consider fixing because it uses low memory for
> > > > > > DMA
> > > > > > & swiotlb only.
> > > > > > We can consider restricting crashkernel within the required range
> > > > > > like below
> > > > > > 
> > > > > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > > > > > index 7f9e5a6dc48c..bd67b90d35bd 100644
> > > > > > --- a/kernel/crash_core.c
> > > > > > +++ b/kernel/crash_core.c
> > > > > > @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
> > > > > >                       return 0;
> > > > > >       }
> > > > > > 
> > > > > > -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size,
> > > > > > CRASH_ALIGN);
> > > > > > +       low_base = memblock_find_in_range(0,0xc0000000, low_size,
> > > > > > CRASH_ALIGN);
> > > > > >       if (!low_base) {
> > > > > >               pr_err("Cannot reserve %ldMB crashkernel low memory,
> > > > > > please try smaller size.\n",
> > > > > >                      (unsigned long)(low_size >> 20));
> > > > > > 
> > > > > > 
> > > > >    I suspect  0xc0000000  would need to be a CONFIG item  and not
> > > > > hard-coded.
> > > > > 
> > > > if you consider this as valid change,  can you please incorporate as
> > > > part of your patch-set.
> > > 
> > > After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-
> > > 4G memory is splited
> > > to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem
> > > 0x0000000040000000-0x00000000ffffffff] on arm64.
> > > 
> > > From the above discussion, on your platform, the low crashkernel fall in
> > > DMA32 region, but your environment needs to access DMA
> > > region, so there is the call trace.
> > > 
> > > I have a question, why do you choose 0xc0000000 here?
> > > 
> > > Besides, this is common code, we also need to consider about x86.
> > > 
> > 
> >  + nsaenzjulienne@suse.de

Thanks for adding me to the conversation, and sorry for the headaches.

> > 
> >   Exactly .  This is why it needs to be a CONFIG option for  Raspberry
> > ..,  or device tree option.
> > 
> > 
> >   We could revert 1a8e1cef7 since it broke  Arm kdump too.
> 
> Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
> both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
> list, thus we couldn't get many eyes on it for a thorough review from
> kexec/kdump p-o-v.
> 
> Also we historically never had distinction in common arch code on the
> basis of the intended end use-case: embedded, server or automotive, so
> I am not sure introducing a Raspberry specific CONFIG option would be
> a good idea.

+1

From the distros perspective it's very important to keep a single kernel image.

> So, rather than reverting the patch, we can look at addressing the
> same properly this time - especially from a kdump p-o-v.
> This issue has been reported by some Red Hat arm64 partners with
> upstream kernel also and as we have noticed in the past as well,
> hardcoding the placement of the crashkernel base address (unless the
> base address is specified by a crashkernel=X@Y like bootargs) is also
> not a portable suggestion.
> 
> I am working on a possible fix and will have more updates on the same
> in a day-or-two.

Please keep me in the loop, we've also had issues pointing to this reported by
SUSE partners. I can do some testing both on the RPi4 and on big servers that
need huge crashkernel sizes.

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-04 17:01                     ` Nicolas Saenz Julienne
@ 2020-06-05  2:26                       ` John Donnelly
  2020-06-05  8:21                         ` Nicolas Saenz Julienne
  2020-06-19  2:32                       ` John Donnelly
  1 sibling, 1 reply; 34+ messages in thread
From: John Donnelly @ 2020-06-05  2:26 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, Bhupesh Sharma
  Cc: chenzhou, Simon Horman, Devicetree List, Arnd Bergmann,
	Baoquan He, Linux Doc Mailing List, Catalin Marinas, guohanjun,
	kexec mailing list, Linux Kernel Mailing List, Will Deacon,
	Rob Herring, James Morse, Prabhakar Kushwaha, Thomas Gleixner,
	Prabhakar Kushwaha, RuiRui Yang, Ingo Molnar, linux-arm-kernel


On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:
> On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:
>> Hi All,
>>
>> On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@oracle.com>
>> wrote:
>>>
>>>> On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
>>>>> Hi Chen,
>>>>>
>>>>> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com
>>>>>> wrote:
>>>>>>
>>>>>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
>>>>>>> prabhakar.pkin@gmail.com> wrote:
>>>>>>>
>>>>>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
>>>>>>> john.p.donnelly@oracle.com> wrote:
>>>>>>>> Hi .  See below !
>>>>>>>>
>>>>>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi John,
>>>>>>>>>
>>>>>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
>>>>>>>>> John.P.donnelly@oracle.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>>>>>> Hi Chen,
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
>>>>>>>>>>> chenzhou10@huawei.com> wrote:
>>>>>>>>>>>> This patch series enable reserving crashkernel above 4G in
>>>>>>>>>>>> arm64.
>>>>>>>>>>>>
>>>>>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G,
>>>>>>>>>>>> which will fail
>>>>>>>>>>>> when there is no enough low memory.
>>>>>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve
>>>>>>>>>>>> crashkernel above 4G,
>>>>>>>>>>>> in this case, if swiotlb or DMA buffers are required,
>>>>>>>>>>>> crash dump kernel
>>>>>>>>>>>> will boot failure because there is no low memory available
>>>>>>>>>>>> for allocation.
>>>>>>>>>>>>
>>>>>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump
>>>>>>>>>>> kernel
>>>>>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>>>>>
>>>>>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this
>>>>>>>>>>> patch set
>>>>>>>>>>> and
>>>>>>>>>>>
> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>>>>>> Also **without** this patch-set
>>>>>>>>>>> "
>>>>>>>>>>>
> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>>>>>>>>>> "
>>>>>>>>>>>
>>>>>>>>>>> This issue comes whenever crashkernel memory is reserved
>>>>>>>>>>> after 0xc000_0000.
>>>>>>>>>>> More details discussed earlier in
>>>>>>>>>>>
> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>    without
>>>>>>>>>>> any
>>>>>>>>>>> solution
>>>>>>>>>>>
>>>>>>>>>>> This patch-set is expected to solve similar kind of issue.
>>>>>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above
>>>>>>>>>>> mentioned
>>>>>>>>>>> observation should be considered/fixed. .
>>>>>>>>>>>
>>>>>>>>>>> --pk
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>>>>> [   30.367696] NET: Registered protocol family 16
>>>>>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>>>>>>>>>>> 5.7.0-rc3+ #121
>>>>>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>>>>> [   30.369984] Call trace:
>>>>>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>>>>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>>>>>>>> [   30.369997]  dump_stack+0xc0/0x10c
>>>>>>>>>>> [   30.370001]  warn_alloc+0x10c/0x178
>>>>>>>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
>>>>>>>>>>> xb50
>>>>>>>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>>>>>>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>>>>>>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>>>>>>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>>>>>>>>>> [   30.370020]  do_one_initcall+0x54/0x228
>>>>>>>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>>>>>>>>>> [   30.370031]  kernel_init+0x1c/0x110
>>>>>>>>>>> [   30.370034]  ret_from_fork+0x10/0x18
>>>>>>>>>>> [   30.370036] Mem-Info:
>>>>>>>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>>>>>>>>>> [   30.370064]  active_file:0 inactive_file:0
>>>>>>>>>>> isolated_file:0
>>>>>>>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>>>>>>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>>>>>>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>>>>>>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>>>>>>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> isolated(anon):0kB
>>>>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
>>>>>>>>>>> shmem:0kB
>>>>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>>>>>>>>>>> writeback_tmp:0kB
>>>>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> isolated(anon):0kB
>>>>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
>>>>>>>>>>> shmem:0kB
>>>>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>>>>>>>>>>> writeback_tmp:0kB
>>>>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>>>>>>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> writepending:0kB
>>>>>>>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
>>>>>>>>>>> pagetables:0kB
>>>>>>>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>>>>>>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB
>>>>>>>>>>> low:664kB
>>>>>>>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB
>>>>>>>>>>> inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> writepending:0kB
>>>>>>>>>>> present:269700kB managed:256000kB mlocked:0kB
>>>>>>>>>>> kernel_stack:0kB
>>>>>>>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
>>>>>>>>>>> free_cma:0kB
>>>>>>>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>>>>>>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB
>>>>>>>>>>> low:15504kB
>>>>>>>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB
>>>>>>>>>>> inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> writepending:0kB
>>>>>>>>>>> present:8388608kB managed:5953112kB mlocked:0kB
>>>>>>>>>>> kernel_stack:21672kB
>>>>>>>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB
>>>>>>>>>>> free_cma:0kB
>>>>>>>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>>>>>>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB
>>>>>>>>>>> 0*128kB
>>>>>>>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>>>>>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB
>>>>>>>>>>> 0*64kB 0*128kB
>>>>>>>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) =
>>>>>>>>>>> 256000kB
>>>>>>>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB
>>>>>>>>>>> (UE) 3*32kB
>>>>>>>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME)
>>>>>>>>>>> 3*1024kB (ME)
>>>>>>>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>>>>>>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>>>>>>>>>> hugepages_surp=0 hugepages_size=1048576kB
>>>>>>>>>>> [   30.370130] 0 total pagecache pages
>>>>>>>>>>> [   30.370132] 0 pages in swap cache
>>>>>>>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>>>>>> [   30.370135] Free swap  = 0kB
>>>>>>>>>>> [   30.370136] Total swap = 0kB
>>>>>>>>>>> [   30.370137] 2164609 pages RAM
>>>>>>>>>>> [   30.370139] 0 pages HighMem/MovableOnly
>>>>>>>>>>> [   30.370140] 612331 pages reserved
>>>>>>>>>>> [   30.370141] 0 pages hwpoisoned
>>>>>>>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for
>>>>>>>>>>> atomic
>>>>>>>>>>> coherent allocation
>>>>>>>>>> During my testing I saw the same error and Chen's  solution
>>>>>>>>>> corrected it .
>>>>>>>>> Which combination you are using on your side? I am using
>>>>>>>>> Prabhakar's
>>>>>>>>> suggested environment and can reproduce the issue
>>>>>>>>> with or without Chen's crashkernel support above 4G patchset.
>>>>>>>>>
>>>>>>>>> I am also using a ThunderX2 platform with latest makedumpfile
>>>>>>>>> code and
>>>>>>>>> kexec-tools (with the suggested patch
>>>>>>>>> <
>>>>>>>>>
> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$
>>>>>>>>>> ).
>>>>>>>>> Thanks,
>>>>>>>>> Bhupesh
>>>>>>>> I did this activity 5 months ago and I have moved on to other
>>>>>>>> activities. My DMA failures were related to PCI devices that could
>>>>>>>> not be enumerated because  low-DMA space was not  available when
>>>>>>>> crashkernel was moved above 4G; I don’t recall the exact platform.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> For this failure ,
>>>>>>>>
>>>>>>>>>>> DMA: failed to allocate 256 KiB pool for atomic
>>>>>>>>>>> coherent allocation
>>>>>>>> Is due to :
>>>>>>>>
>>>>>>>>
>>>>>>>> 3618082c
>>>>>>>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
>>>>>>>>
>>>>>>>> With the introduction of ZONE_DMA to support the Raspberry DMA
>>>>>>>> region below 1G, the crashkernel is placed in the upper 4G
>>>>>>>> ZONE_DMA_32 region. Since the crashkernel does not have access
>>>>>>>> to the ZONE_DMA region, it prints out call trace during bootup.
>>>>>>>>
>>>>>>>> It is due to having this CONFIG item  ON  :
>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIG_ZONE_DMA=y
>>>>>>>>
>>>>>>>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
>>>>>>>> use the device tree to specify memory below 1G.
>>>>>>>>
>>>>>>>>
>>>>>>> Disabling ZONE_DMA is temporary solution.  We may need proper
>>>>>>> solution
>>>>>> Perhaps the Raspberry platform configuration dependencies need
>>>>>> separated  from “server class” Arm  equipment ?  Or auto-configured on
>>>>>> boot ?  Consult an expert ;-)
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> I would like to see Chen’s feature added , perhaps as
>>>>>>>> EXPERIMENTAL,  so we can get some configuration testing done on
>>>>>>>> it.   It corrects having a DMA zone in low memory while crash-
>>>>>>>> kernel is above 4GB.  This has been going on for a year now.
>>>>>>> I will also like this patch to be added in Linux as early as
>>>>>>> possible.
>>>>>>>
>>>>>>> Issue mentioned by me happens with or without this patch.
>>>>>>>
>>>>>>> This patch-set can consider fixing because it uses low memory for
>>>>>>> DMA
>>>>>>> & swiotlb only.
>>>>>>> We can consider restricting crashkernel within the required range
>>>>>>> like below
>>>>>>>
>>>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
>>>>>>> --- a/kernel/crash_core.c
>>>>>>> +++ b/kernel/crash_core.c
>>>>>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
>>>>>>>                        return 0;
>>>>>>>        }
>>>>>>>
>>>>>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size,
>>>>>>> CRASH_ALIGN);
>>>>>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size,
>>>>>>> CRASH_ALIGN);
>>>>>>>        if (!low_base) {
>>>>>>>                pr_err("Cannot reserve %ldMB crashkernel low memory,
>>>>>>> please try smaller size.\n",
>>>>>>>                       (unsigned long)(low_size >> 20));
>>>>>>>
>>>>>>>
>>>>>>     I suspect  0xc0000000  would need to be a CONFIG item  and not
>>>>>> hard-coded.
>>>>>>
>>>>> if you consider this as valid change,  can you please incorporate as
>>>>> part of your patch-set.
>>>> After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-
>>>> 4G memory is splited
>>>> to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem
>>>> 0x0000000040000000-0x00000000ffffffff] on arm64.
>>>>
>>>>  From the above discussion, on your platform, the low crashkernel fall in
>>>> DMA32 region, but your environment needs to access DMA
>>>> region, so there is the call trace.
>>>>
>>>> I have a question, why do you choose 0xc0000000 here?
>>>>
>>>> Besides, this is common code, we also need to consider about x86.
>>>>
>>>   + nsaenzjulienne@suse.de
> Thanks for adding me to the conversation, and sorry for the headaches.
>
>>>    Exactly .  This is why it needs to be a CONFIG option for  Raspberry
>>> ..,  or device tree option.
>>>
>>>
>>>    We could revert 1a8e1cef7 since it broke  Arm kdump too.
>> Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
>> both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
>> list, thus we couldn't get many eyes on it for a thorough review from
>> kexec/kdump p-o-v.
>>
>> Also we historically never had distinction in common arch code on the
>> basis of the intended end use-case: embedded, server or automotive, so
>> I am not sure introducing a Raspberry specific CONFIG option would be
>> a good idea.
> +1
>
>  From the distros perspective it's very important to keep a single kernel image.
>
>> So, rather than reverting the patch, we can look at addressing the
>> same properly this time - especially from a kdump p-o-v.
>> This issue has been reported by some Red Hat arm64 partners with
>> upstream kernel also and as we have noticed in the past as well,
>> hardcoding the placement of the crashkernel base address (unless the
>> base address is specified by a crashkernel=X@Y like bootargs) is also
>> not a portable suggestion.
>>
>> I am working on a possible fix and will have more updates on the same
>> in a day-or-two.
> Please keep me in the loop, we've also had issues pointing to this reported by
> SUSE partners. I can do some testing both on the RPi4 and on big servers that
> need huge crashkernel sizes.
>
> Regards,
> Nicolas
>
Hi Nicolas,


You want want to review this topic with the various email threads . It 
has been a long journey.



[1]:https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJwQs3C4x$  
[v1]:https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ6e-mIEp$  
[v2]:https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJyUVjUta$  
[v3]:https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ3CXBRdT$  
[v4]:https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ7SxW1Vj$  
[v5]:https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ2wyJ9tj$  
[v6]:https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJzvGhWBh$  
[v7]:https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ6pAg6tX$  

Chen Zhou (5):
   x86: kdump: move reserve_crashkernel_low() into crash_core.c
   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
   arm64: kdump: add memory for devices by DT property, low-memory-range
   kdump: update Documentation about crashkernel on arm64
   dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump

  Documentation/admin-guide/kdump/kdump.rst     | 13 ++-
  .../admin-guide/kernel-parameters.txt         | 12 ++-
  Documentation/devicetree/bindings/chosen.txt  | 25 ++++++
  arch/arm64/kernel/setup.c                     |  8 +-
  arch/arm64/mm/init.c                          | 61 ++++++++++++-
  arch/x86/kernel/setup.c                       | 66 ++------------
  include/linux/crash_core.h                    |  3 +
  include/linux/kexec.h                         |  2 -
  kernel/crash_core.c                           | 85 +++++++++++++++++++
  kernel/kexec_core.c                           | 17 ----
  10 files changed, 208 insertions(+), 84 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-05  2:26                       ` John Donnelly
@ 2020-06-05  8:21                         ` Nicolas Saenz Julienne
  0 siblings, 0 replies; 34+ messages in thread
From: Nicolas Saenz Julienne @ 2020-06-05  8:21 UTC (permalink / raw)
  To: John Donnelly, Bhupesh Sharma
  Cc: chenzhou, Simon Horman, Devicetree List, Arnd Bergmann,
	Baoquan He, Linux Doc Mailing List, Catalin Marinas, guohanjun,
	kexec mailing list, Linux Kernel Mailing List, Will Deacon,
	Rob Herring, James Morse, Prabhakar Kushwaha, Thomas Gleixner,
	Prabhakar Kushwaha, RuiRui Yang, Ingo Molnar, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 19044 bytes --]

On Thu, 2020-06-04 at 21:26 -0500, John Donnelly wrote:
> On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:
> > > Hi All,
> > > 
> > > On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@oracle.com>
> > > wrote:
> > > > > On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > 
> > > > > On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
> > > > > > Hi Chen,
> > > > > > 
> > > > > > On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <
> > > > > > john.p.donnelly@oracle.com
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
> > > > > > > > prabhakar.pkin@gmail.com> wrote:
> > > > > > > > 
> > > > > > > > On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
> > > > > > > > john.p.donnelly@oracle.com> wrote:
> > > > > > > > > Hi .  See below !
> > > > > > > > > 
> > > > > > > > > > On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <
> > > > > > > > > > bhsharma@redhat.com>
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > Hi John,
> > > > > > > > > > 
> > > > > > > > > > On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
> > > > > > > > > > John.P.donnelly@oracle.com> wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> > > > > > > > > > > > Hi Chen,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
> > > > > > > > > > > > chenzhou10@huawei.com> wrote:
> > > > > > > > > > > > > This patch series enable reserving crashkernel above
> > > > > > > > > > > > > 4G in
> > > > > > > > > > > > > arm64.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > There are following issues in arm64 kdump:
> > > > > > > > > > > > > 1. We use crashkernel=X to reserve crashkernel below
> > > > > > > > > > > > > 4G,
> > > > > > > > > > > > > which will fail
> > > > > > > > > > > > > when there is no enough low memory.
> > > > > > > > > > > > > 2. Currently, crashkernel=Y@X can be used to reserve
> > > > > > > > > > > > > crashkernel above 4G,
> > > > > > > > > > > > > in this case, if swiotlb or DMA buffers are required,
> > > > > > > > > > > > > crash dump kernel
> > > > > > > > > > > > > will boot failure because there is no low memory
> > > > > > > > > > > > > available
> > > > > > > > > > > > > for allocation.
> > > > > > > > > > > > > 
> > > > > > > > > > > > We are getting "warn_alloc" [1] warning during boot of
> > > > > > > > > > > > kdump
> > > > > > > > > > > > kernel
> > > > > > > > > > > > with bootargs as [2] of primary kernel.
> > > > > > > > > > > > This error observed on ThunderX2  ARM64 platform.
> > > > > > > > > > > > 
> > > > > > > > > > > > It is observed with latest upstream tag (v5.7-rc3) with
> > > > > > > > > > > > this
> > > > > > > > > > > > patch set
> > > > > > > > > > > > and
> > > > > > > > > > > > 
> > 
https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> > > > > > > > > > > > Also **without** this patch-set
> > > > > > > > > > > > "
> > > > > > > > > > > > 
> > 
https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
> > > > > > > > > > > > "
> > > > > > > > > > > > 
> > > > > > > > > > > > This issue comes whenever crashkernel memory is reserved
> > > > > > > > > > > > after 0xc000_0000.
> > > > > > > > > > > > More details discussed earlier in
> > > > > > > > > > > > 
> > 
https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
> >    without
> > > > > > > > > > > > any
> > > > > > > > > > > > solution
> > > > > > > > > > > > 
> > > > > > > > > > > > This patch-set is expected to solve similar kind of
> > > > > > > > > > > > issue.
> > > > > > > > > > > > i.e. low memory is only targeted for DMA, swiotlb; So
> > > > > > > > > > > > above
> > > > > > > > > > > > mentioned
> > > > > > > > > > > > observation should be considered/fixed. .
> > > > > > > > > > > > 
> > > > > > > > > > > > --pk
> > > > > > > > > > > > 
> > > > > > > > > > > > [1]
> > > > > > > > > > > > [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> > > > > > > > > > > > TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> > > > > > > > > > > > [   30.367696] NET: Registered protocol family 16
> > > > > > > > > > > > [   30.369973] swapper/0: page allocation failure:
> > > > > > > > > > > > order:6,
> > > > > > > > > > > > mode:0x1(GFP_DMA),
> > > > > > > > > > > > nodemask=(null),cpuset=/,mems_allowed=0
> > > > > > > > > > > > [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > > > > > > > > > > 5.7.0-rc3+ #121
> > > > > > > > > > > > [   30.369981] Hardware name: Cavium Inc. Saber/Saber,
> > > > > > > > > > > > BIOS
> > > > > > > > > > > > TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> > > > > > > > > > > > [   30.369984] Call trace:
> > > > > > > > > > > > [   30.369989]  dump_backtrace+0x0/0x1f8
> > > > > > > > > > > > [   30.369991]  show_stack+0x20/0x30
> > > > > > > > > > > > [   30.369997]  dump_stack+0xc0/0x10c
> > > > > > > > > > > > [   30.370001]  warn_alloc+0x10c/0x178
> > > > > > > > > > > > [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb
> > > > > > > > > > > > 10/0
> > > > > > > > > > > > xb50
> > > > > > > > > > > > [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
> > > > > > > > > > > > [   30.370008]  alloc_page_interleave+0x24/0x98
> > > > > > > > > > > > [   30.370011]  alloc_pages_current+0xe4/0x108
> > > > > > > > > > > > [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
> > > > > > > > > > > > [   30.370020]  do_one_initcall+0x54/0x228
> > > > > > > > > > > > [   30.370027]  kernel_init_freeable+0x228/0x2cc
> > > > > > > > > > > > [   30.370031]  kernel_init+0x1c/0x110
> > > > > > > > > > > > [   30.370034]  ret_from_fork+0x10/0x18
> > > > > > > > > > > > [   30.370036] Mem-Info:
> > > > > > > > > > > > [   30.370064] active_anon:0 inactive_anon:0
> > > > > > > > > > > > isolated_anon:0
> > > > > > > > > > > > [   30.370064]  active_file:0 inactive_file:0
> > > > > > > > > > > > isolated_file:0
> > > > > > > > > > > > [   30.370064]  unevictable:0 dirty:0 writeback:0
> > > > > > > > > > > > unstable:0
> > > > > > > > > > > > [   30.370064]  slab_reclaimable:34
> > > > > > > > > > > > slab_unreclaimable:4438
> > > > > > > > > > > > [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
> > > > > > > > > > > > [   30.370064]  free:1537719 free_pcp:219 free_cma:0
> > > > > > > > > > > > [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > isolated(anon):0kB
> > > > > > > > > > > > isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
> > > > > > > > > > > > shmem:0kB
> > > > > > > > > > > > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
> > > > > > > > > > > > writeback_tmp:0kB
> > > > > > > > > > > > unstable:0kB all_unreclaimable? no
> > > > > > > > > > > > [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > isolated(anon):0kB
> > > > > > > > > > > > isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
> > > > > > > > > > > > shmem:0kB
> > > > > > > > > > > > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
> > > > > > > > > > > > writeback_tmp:0kB
> > > > > > > > > > > > unstable:0kB all_unreclaimable? no
> > > > > > > > > > > > [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB
> > > > > > > > > > > > high:0kB
> > > > > > > > > > > > reserved_highatomic:0KB active_anon:0kB
> > > > > > > > > > > > inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > writepending:0kB
> > > > > > > > > > > > present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
> > > > > > > > > > > > pagetables:0kB
> > > > > > > > > > > > bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > > > > > > > > > [   30.370084] lowmem_reserve[]: 0 250 6063 6063
> > > > > > > > > > > > [   30.370090] Node 0 DMA32 free:256000kB min:408kB
> > > > > > > > > > > > low:664kB
> > > > > > > > > > > > high:920kB reserved_highatomic:0KB active_anon:0kB
> > > > > > > > > > > > inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > writepending:0kB
> > > > > > > > > > > > present:269700kB managed:256000kB mlocked:0kB
> > > > > > > > > > > > kernel_stack:0kB
> > > > > > > > > > > > pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
> > > > > > > > > > > > free_cma:0kB
> > > > > > > > > > > > [   30.370094] lowmem_reserve[]: 0 0 5813 5813
> > > > > > > > > > > > [   30.370100] Node 0 Normal free:5894876kB min:9552kB
> > > > > > > > > > > > low:15504kB
> > > > > > > > > > > > high:21456kB reserved_highatomic:0KB active_anon:0kB
> > > > > > > > > > > > inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > writepending:0kB
> > > > > > > > > > > > present:8388608kB managed:5953112kB mlocked:0kB
> > > > > > > > > > > > kernel_stack:21672kB
> > > > > > > > > > > > pagetables:56kB bounce:0kB free_pcp:876kB
> > > > > > > > > > > > local_pcp:176kB
> > > > > > > > > > > > free_cma:0kB
> > > > > > > > > > > > [   30.370104] lowmem_reserve[]: 0 0 0 0
> > > > > > > > > > > > [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB
> > > > > > > > > > > > 0*64kB
> > > > > > > > > > > > 0*128kB
> > > > > > > > > > > > 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> > > > > > > > > > > > [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB
> > > > > > > > > > > > 0*64kB 0*128kB
> > > > > > > > > > > > 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) =
> > > > > > > > > > > > 256000kB
> > > > > > > > > > > > [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME)
> > > > > > > > > > > > 2*16kB
> > > > > > > > > > > > (UE) 3*32kB
> > > > > > > > > > > > (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME)
> > > > > > > > > > > > 3*1024kB (ME)
> > > > > > > > > > > > 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
> > > > > > > > > > > > [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
> > > > > > > > > > > > hugepages_surp=0 hugepages_size=1048576kB
> > > > > > > > > > > > [   30.370130] 0 total pagecache pages
> > > > > > > > > > > > [   30.370132] 0 pages in swap cache
> > > > > > > > > > > > [   30.370134] Swap cache stats: add 0, delete 0, find
> > > > > > > > > > > > 0/0
> > > > > > > > > > > > [   30.370135] Free swap  = 0kB
> > > > > > > > > > > > [   30.370136] Total swap = 0kB
> > > > > > > > > > > > [   30.370137] 2164609 pages RAM
> > > > > > > > > > > > [   30.370139] 0 pages HighMem/MovableOnly
> > > > > > > > > > > > [   30.370140] 612331 pages reserved
> > > > > > > > > > > > [   30.370141] 0 pages hwpoisoned
> > > > > > > > > > > > [   30.370143] DMA: failed to allocate 256 KiB pool for
> > > > > > > > > > > > atomic
> > > > > > > > > > > > coherent allocation
> > > > > > > > > > > During my testing I saw the same error and
> > > > > > > > > > > Chen's  solution
> > > > > > > > > > > corrected it .
> > > > > > > > > > Which combination you are using on your side? I am using
> > > > > > > > > > Prabhakar's
> > > > > > > > > > suggested environment and can reproduce the issue
> > > > > > > > > > with or without Chen's crashkernel support above 4G
> > > > > > > > > > patchset.
> > > > > > > > > > 
> > > > > > > > > > I am also using a ThunderX2 platform with latest
> > > > > > > > > > makedumpfile
> > > > > > > > > > code and
> > > > > > > > > > kexec-tools (with the suggested patch
> > > > > > > > > > <
> > > > > > > > > > 
> > 
https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$
> > > > > > > > > > > ).
> > > > > > > > > > Thanks,
> > > > > > > > > > Bhupesh
> > > > > > > > > I did this activity 5 months ago and I have moved on to other
> > > > > > > > > activities. My DMA failures were related to PCI devices that
> > > > > > > > > could
> > > > > > > > > not be enumerated because  low-DMA space was not  available
> > > > > > > > > when
> > > > > > > > > crashkernel was moved above 4G; I don’t recall the exact
> > > > > > > > > platform.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > For this failure ,
> > > > > > > > > 
> > > > > > > > > > > > DMA: failed to allocate 256 KiB pool for atomic
> > > > > > > > > > > > coherent allocation
> > > > > > > > > Is due to :
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 3618082c
> > > > > > > > > ("arm64 use both ZONE_DMA and ZONE_DMA32")
> > > > > > > > > 
> > > > > > > > > With the introduction of ZONE_DMA to support the Raspberry DMA
> > > > > > > > > region below 1G, the crashkernel is placed in the upper 4G
> > > > > > > > > ZONE_DMA_32 region. Since the crashkernel does not have access
> > > > > > > > > to the ZONE_DMA region, it prints out call trace during
> > > > > > > > > bootup.
> > > > > > > > > 
> > > > > > > > > It is due to having this CONFIG item  ON  :
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > CONFIG_ZONE_DMA=y
> > > > > > > > > 
> > > > > > > > > Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
> > > > > > > > > use the device tree to specify memory below 1G.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > Disabling ZONE_DMA is temporary solution.  We may need proper
> > > > > > > > solution
> > > > > > > Perhaps the Raspberry platform configuration dependencies need
> > > > > > > separated  from “server class” Arm  equipment ?  Or auto-
> > > > > > > configured on
> > > > > > > boot ?  Consult an expert ;-)
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > > I would like to see Chen’s feature added , perhaps as
> > > > > > > > > EXPERIMENTAL,  so we can get some configuration testing done
> > > > > > > > > on
> > > > > > > > > it.   It corrects having a DMA zone in low memory while crash-
> > > > > > > > > kernel is above 4GB.  This has been going on for a year now.
> > > > > > > > I will also like this patch to be added in Linux as early as
> > > > > > > > possible.
> > > > > > > > 
> > > > > > > > Issue mentioned by me happens with or without this patch.
> > > > > > > > 
> > > > > > > > This patch-set can consider fixing because it uses low memory
> > > > > > > > for
> > > > > > > > DMA
> > > > > > > > & swiotlb only.
> > > > > > > > We can consider restricting crashkernel within the required
> > > > > > > > range
> > > > > > > > like below
> > > > > > > > 
> > > > > > > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > > > > > > > index 7f9e5a6dc48c..bd67b90d35bd 100644
> > > > > > > > --- a/kernel/crash_core.c
> > > > > > > > +++ b/kernel/crash_core.c
> > > > > > > > @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
> > > > > > > >                        return 0;
> > > > > > > >        }
> > > > > > > > 
> > > > > > > > -       low_base = memblock_find_in_range(0, 1ULL << 32,
> > > > > > > > low_size,
> > > > > > > > CRASH_ALIGN);
> > > > > > > > +       low_base = memblock_find_in_range(0,0xc0000000,
> > > > > > > > low_size,
> > > > > > > > CRASH_ALIGN);
> > > > > > > >        if (!low_base) {
> > > > > > > >                pr_err("Cannot reserve %ldMB crashkernel low
> > > > > > > > memory,
> > > > > > > > please try smaller size.\n",
> > > > > > > >                       (unsigned long)(low_size >> 20));
> > > > > > > > 
> > > > > > > > 
> > > > > > >     I suspect  0xc0000000  would need to be a CONFIG item  and not
> > > > > > > hard-coded.
> > > > > > > 
> > > > > > if you consider this as valid change,  can you please incorporate as
> > > > > > part of your patch-set.
> > > > > After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the
> > > > > 0-
> > > > > 4G memory is splited
> > > > > to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem
> > > > > 0x0000000040000000-0x00000000ffffffff] on arm64.
> > > > > 
> > > > >  From the above discussion, on your platform, the low crashkernel fall
> > > > > in
> > > > > DMA32 region, but your environment needs to access DMA
> > > > > region, so there is the call trace.
> > > > > 
> > > > > I have a question, why do you choose 0xc0000000 here?
> > > > > 
> > > > > Besides, this is common code, we also need to consider about x86.
> > > > > 
> > > >   + nsaenzjulienne@suse.de
> > Thanks for adding me to the conversation, and sorry for the headaches.
> > 
> > > >    Exactly .  This is why it needs to be a CONFIG option for  Raspberry
> > > > ..,  or device tree option.
> > > > 
> > > > 
> > > >    We could revert 1a8e1cef7 since it broke  Arm kdump too.
> > > Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
> > > both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
> > > list, thus we couldn't get many eyes on it for a thorough review from
> > > kexec/kdump p-o-v.
> > > 
> > > Also we historically never had distinction in common arch code on the
> > > basis of the intended end use-case: embedded, server or automotive, so
> > > I am not sure introducing a Raspberry specific CONFIG option would be
> > > a good idea.
> > +1
> > 
> >  From the distros perspective it's very important to keep a single kernel
> > image.
> > 
> > > So, rather than reverting the patch, we can look at addressing the
> > > same properly this time - especially from a kdump p-o-v.
> > > This issue has been reported by some Red Hat arm64 partners with
> > > upstream kernel also and as we have noticed in the past as well,
> > > hardcoding the placement of the crashkernel base address (unless the
> > > base address is specified by a crashkernel=X@Y like bootargs) is also
> > > not a portable suggestion.
> > > 
> > > I am working on a possible fix and will have more updates on the same
> > > in a day-or-two.
> > Please keep me in the loop, we've also had issues pointing to this reported
> > by
> > SUSE partners. I can do some testing both on the RPi4 and on big servers
> > that
> > need huge crashkernel sizes.
> > 
> > Regards,
> > Nicolas
> > 
> Hi Nicolas,
> 
> 
> You want want to review this topic with the various email threads . It 
> has been a long journey.

Will do, thanks!

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-04 17:01                     ` Nicolas Saenz Julienne
  2020-06-05  2:26                       ` John Donnelly
@ 2020-06-19  2:32                       ` John Donnelly
  2020-06-19  8:21                         ` chenzhou
  1 sibling, 1 reply; 34+ messages in thread
From: John Donnelly @ 2020-06-19  2:32 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, Bhupesh Sharma
  Cc: chenzhou, Simon Horman, Devicetree List, Arnd Bergmann,
	Baoquan He, Linux Doc Mailing List, Catalin Marinas, guohanjun,
	kexec mailing list, Linux Kernel Mailing List, Will Deacon,
	Rob Herring, James Morse, Prabhakar Kushwaha, Thomas Gleixner,
	Prabhakar Kushwaha, RuiRui Yang, Ingo Molnar, linux-arm-kernel,
	john.p.donnelly


On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:
> On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:
>> Hi All,
>>
>> On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@oracle.com>
>> wrote:
>>>
>>>> On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
>>>>> Hi Chen,
>>>>>
>>>>> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com
>>>>>> wrote:
>>>>>>
>>>>>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
>>>>>>> prabhakar.pkin@gmail.com> wrote:
>>>>>>>
>>>>>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
>>>>>>> john.p.donnelly@oracle.com> wrote:
>>>>>>>> Hi .  See below !
>>>>>>>>
>>>>>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi John,
>>>>>>>>>
>>>>>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
>>>>>>>>> John.P.donnelly@oracle.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>>>>>> Hi Chen,
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
>>>>>>>>>>> chenzhou10@huawei.com> wrote:
>>>>>>>>>>>> This patch series enable reserving crashkernel above 4G in
>>>>>>>>>>>> arm64.
>>>>>>>>>>>>
>>>>>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G,
>>>>>>>>>>>> which will fail
>>>>>>>>>>>> when there is no enough low memory.
>>>>>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve
>>>>>>>>>>>> crashkernel above 4G,
>>>>>>>>>>>> in this case, if swiotlb or DMA buffers are required,
>>>>>>>>>>>> crash dump kernel
>>>>>>>>>>>> will boot failure because there is no low memory available
>>>>>>>>>>>> for allocation.
>>>>>>>>>>>>
>>>>>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump
>>>>>>>>>>> kernel
>>>>>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>>>>>
>>>>>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this
>>>>>>>>>>> patch set
>>>>>>>>>>> and
>>>>>>>>>>>
> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>>>>>> Also **without** this patch-set
>>>>>>>>>>> "
>>>>>>>>>>>
> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>>>>>>>>>> "
>>>>>>>>>>>
>>>>>>>>>>> This issue comes whenever crashkernel memory is reserved
>>>>>>>>>>> after 0xc000_0000.
>>>>>>>>>>> More details discussed earlier in
>>>>>>>>>>>
> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>    without
>>>>>>>>>>> any
>>>>>>>>>>> solution
>>>>>>>>>>>
>>>>>>>>>>> This patch-set is expected to solve similar kind of issue.
>>>>>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above
>>>>>>>>>>> mentioned
>>>>>>>>>>> observation should be considered/fixed. .
>>>>>>>>>>>
>>>>>>>>>>> --pk
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>>>>> [   30.367696] NET: Registered protocol family 16
>>>>>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>>>>>>>>>>> 5.7.0-rc3+ #121
>>>>>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>>>>> [   30.369984] Call trace:
>>>>>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>>>>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>>>>>>>> [   30.369997]  dump_stack+0xc0/0x10c
>>>>>>>>>>> [   30.370001]  warn_alloc+0x10c/0x178
>>>>>>>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
>>>>>>>>>>> xb50
>>>>>>>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>>>>>>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>>>>>>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>>>>>>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>>>>>>>>>> [   30.370020]  do_one_initcall+0x54/0x228
>>>>>>>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>>>>>>>>>> [   30.370031]  kernel_init+0x1c/0x110
>>>>>>>>>>> [   30.370034]  ret_from_fork+0x10/0x18
>>>>>>>>>>> [   30.370036] Mem-Info:
>>>>>>>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>>>>>>>>>> [   30.370064]  active_file:0 inactive_file:0
>>>>>>>>>>> isolated_file:0
>>>>>>>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>>>>>>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>>>>>>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>>>>>>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>>>>>>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> isolated(anon):0kB
>>>>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
>>>>>>>>>>> shmem:0kB
>>>>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>>>>>>>>>>> writeback_tmp:0kB
>>>>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> isolated(anon):0kB
>>>>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
>>>>>>>>>>> shmem:0kB
>>>>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>>>>>>>>>>> writeback_tmp:0kB
>>>>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>>>>>>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> writepending:0kB
>>>>>>>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
>>>>>>>>>>> pagetables:0kB
>>>>>>>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>>>>>>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB
>>>>>>>>>>> low:664kB
>>>>>>>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB
>>>>>>>>>>> inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> writepending:0kB
>>>>>>>>>>> present:269700kB managed:256000kB mlocked:0kB
>>>>>>>>>>> kernel_stack:0kB
>>>>>>>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
>>>>>>>>>>> free_cma:0kB
>>>>>>>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>>>>>>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB
>>>>>>>>>>> low:15504kB
>>>>>>>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB
>>>>>>>>>>> inactive_anon:0kB
>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>> writepending:0kB
>>>>>>>>>>> present:8388608kB managed:5953112kB mlocked:0kB
>>>>>>>>>>> kernel_stack:21672kB
>>>>>>>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB
>>>>>>>>>>> free_cma:0kB
>>>>>>>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>>>>>>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB
>>>>>>>>>>> 0*128kB
>>>>>>>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>>>>>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB
>>>>>>>>>>> 0*64kB 0*128kB
>>>>>>>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) =
>>>>>>>>>>> 256000kB
>>>>>>>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB
>>>>>>>>>>> (UE) 3*32kB
>>>>>>>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME)
>>>>>>>>>>> 3*1024kB (ME)
>>>>>>>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>>>>>>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>>>>>>>>>> hugepages_surp=0 hugepages_size=1048576kB
>>>>>>>>>>> [   30.370130] 0 total pagecache pages
>>>>>>>>>>> [   30.370132] 0 pages in swap cache
>>>>>>>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>>>>>> [   30.370135] Free swap  = 0kB
>>>>>>>>>>> [   30.370136] Total swap = 0kB
>>>>>>>>>>> [   30.370137] 2164609 pages RAM
>>>>>>>>>>> [   30.370139] 0 pages HighMem/MovableOnly
>>>>>>>>>>> [   30.370140] 612331 pages reserved
>>>>>>>>>>> [   30.370141] 0 pages hwpoisoned
>>>>>>>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for
>>>>>>>>>>> atomic
>>>>>>>>>>> coherent allocation
>>>>>>>>>> During my testing I saw the same error and Chen's  solution
>>>>>>>>>> corrected it .
>>>>>>>>> Which combination you are using on your side? I am using
>>>>>>>>> Prabhakar's
>>>>>>>>> suggested environment and can reproduce the issue
>>>>>>>>> with or without Chen's crashkernel support above 4G patchset.
>>>>>>>>>
>>>>>>>>> I am also using a ThunderX2 platform with latest makedumpfile
>>>>>>>>> code and
>>>>>>>>> kexec-tools (with the suggested patch
>>>>>>>>> <
>>>>>>>>>
> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$
>>>>>>>>>> ).
>>>>>>>>> Thanks,
>>>>>>>>> Bhupesh
>>>>>>>> I did this activity 5 months ago and I have moved on to other
>>>>>>>> activities. My DMA failures were related to PCI devices that could
>>>>>>>> not be enumerated because  low-DMA space was not  available when
>>>>>>>> crashkernel was moved above 4G; I don’t recall the exact platform.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> For this failure ,
>>>>>>>>
>>>>>>>>>>> DMA: failed to allocate 256 KiB pool for atomic
>>>>>>>>>>> coherent allocation
>>>>>>>> Is due to :
>>>>>>>>
>>>>>>>>
>>>>>>>> 3618082c
>>>>>>>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
>>>>>>>>
>>>>>>>> With the introduction of ZONE_DMA to support the Raspberry DMA
>>>>>>>> region below 1G, the crashkernel is placed in the upper 4G
>>>>>>>> ZONE_DMA_32 region. Since the crashkernel does not have access
>>>>>>>> to the ZONE_DMA region, it prints out call trace during bootup.
>>>>>>>>
>>>>>>>> It is due to having this CONFIG item  ON  :
>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIG_ZONE_DMA=y
>>>>>>>>
>>>>>>>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
>>>>>>>> use the device tree to specify memory below 1G.
>>>>>>>>
>>>>>>>>
>>>>>>> Disabling ZONE_DMA is temporary solution.  We may need proper
>>>>>>> solution
>>>>>> Perhaps the Raspberry platform configuration dependencies need
>>>>>> separated  from “server class” Arm  equipment ?  Or auto-configured on
>>>>>> boot ?  Consult an expert ;-)
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> I would like to see Chen’s feature added , perhaps as
>>>>>>>> EXPERIMENTAL,  so we can get some configuration testing done on
>>>>>>>> it.   It corrects having a DMA zone in low memory while crash-
>>>>>>>> kernel is above 4GB.  This has been going on for a year now.
>>>>>>> I will also like this patch to be added in Linux as early as
>>>>>>> possible.
>>>>>>>
>>>>>>> Issue mentioned by me happens with or without this patch.
>>>>>>>
>>>>>>> This patch-set can consider fixing because it uses low memory for
>>>>>>> DMA
>>>>>>> & swiotlb only.
>>>>>>> We can consider restricting crashkernel within the required range
>>>>>>> like below
>>>>>>>
>>>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
>>>>>>> --- a/kernel/crash_core.c
>>>>>>> +++ b/kernel/crash_core.c
>>>>>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
>>>>>>>                        return 0;
>>>>>>>        }
>>>>>>>
>>>>>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size,
>>>>>>> CRASH_ALIGN);
>>>>>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size,
>>>>>>> CRASH_ALIGN);
>>>>>>>        if (!low_base) {
>>>>>>>                pr_err("Cannot reserve %ldMB crashkernel low memory,
>>>>>>> please try smaller size.\n",
>>>>>>>                       (unsigned long)(low_size >> 20));
>>>>>>>
>>>>>>>
>>>>>>     I suspect  0xc0000000  would need to be a CONFIG item  and not
>>>>>> hard-coded.
>>>>>>
>>>>> if you consider this as valid change,  can you please incorporate as
>>>>> part of your patch-set.
>>>> After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-
>>>> 4G memory is splited
>>>> to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem
>>>> 0x0000000040000000-0x00000000ffffffff] on arm64.
>>>>
>>>>  From the above discussion, on your platform, the low crashkernel fall in
>>>> DMA32 region, but your environment needs to access DMA
>>>> region, so there is the call trace.
>>>>
>>>> I have a question, why do you choose 0xc0000000 here?
>>>>
>>>> Besides, this is common code, we also need to consider about x86.
>>>>
>>>   + nsaenzjulienne@suse.de
> Thanks for adding me to the conversation, and sorry for the headaches.
>
>>>    Exactly .  This is why it needs to be a CONFIG option for  Raspberry
>>> ..,  or device tree option.
>>>
>>>
>>>    We could revert 1a8e1cef7 since it broke  Arm kdump too.
>> Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
>> both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
>> list, thus we couldn't get many eyes on it for a thorough review from
>> kexec/kdump p-o-v.
>>
>> Also we historically never had distinction in common arch code on the
>> basis of the intended end use-case: embedded, server or automotive, so
>> I am not sure introducing a Raspberry specific CONFIG option would be
>> a good idea.
> +1
>
>  From the distros perspective it's very important to keep a single kernel image.
>
>> So, rather than reverting the patch, we can look at addressing the
>> same properly this time - especially from a kdump p-o-v.
>> This issue has been reported by some Red Hat arm64 partners with
>> upstream kernel also and as we have noticed in the past as well,
>> hardcoding the placement of the crashkernel base address (unless the
>> base address is specified by a crashkernel=X@Y like bootargs) is also
>> not a portable suggestion.
>>
>> I am working on a possible fix and will have more updates on the same
>> in a day-or-two.
> Please keep me in the loop, we've also had issues pointing to this reported by
> SUSE partners. I can do some testing both on the RPi4 and on big servers that
> need huge crashkernel sizes.
>
> Regards,
> Nicolas
>

   Hi

   Has there been any progress on this ? It appears we are stalled 
because Nicolas's  and Chen's changes are not compatible . One is needed 
for RPi4 and the other for server class equipment.


Thanks,

John



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-19  2:32                       ` John Donnelly
@ 2020-06-19  8:21                         ` chenzhou
  2020-06-20  0:01                           ` John Donnelly
  0 siblings, 1 reply; 34+ messages in thread
From: chenzhou @ 2020-06-19  8:21 UTC (permalink / raw)
  To: John Donnelly, Nicolas Saenz Julienne, Bhupesh Sharma
  Cc: Simon Horman, Devicetree List, Arnd Bergmann, Baoquan He,
	Linux Doc Mailing List, Catalin Marinas, guohanjun,
	kexec mailing list, Linux Kernel Mailing List, Will Deacon,
	Rob Herring, James Morse, Prabhakar Kushwaha, Thomas Gleixner,
	Prabhakar Kushwaha, RuiRui Yang, Ingo Molnar, linux-arm-kernel


On 2020/6/19 10:32, John Donnelly wrote:
>
> On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:
>> On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:
>>> Hi All,
>>>
>>> On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@oracle.com>
>>> wrote:
>>>>
>>>>> On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
>>>>>> Hi Chen,
>>>>>>
>>>>>> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
>>>>>>>> prabhakar.pkin@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
>>>>>>>> john.p.donnelly@oracle.com> wrote:
>>>>>>>>> Hi .  See below !
>>>>>>>>>
>>>>>>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
>>>>>>>>>> John.P.donnelly@oracle.com> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>>>>>>> Hi Chen,
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
>>>>>>>>>>>> chenzhou10@huawei.com> wrote:
>>>>>>>>>>>>> This patch series enable reserving crashkernel above 4G in
>>>>>>>>>>>>> arm64.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G,
>>>>>>>>>>>>> which will fail
>>>>>>>>>>>>> when there is no enough low memory.
>>>>>>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve
>>>>>>>>>>>>> crashkernel above 4G,
>>>>>>>>>>>>> in this case, if swiotlb or DMA buffers are required,
>>>>>>>>>>>>> crash dump kernel
>>>>>>>>>>>>> will boot failure because there is no low memory available
>>>>>>>>>>>>> for allocation.
>>>>>>>>>>>>>
>>>>>>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump
>>>>>>>>>>>> kernel
>>>>>>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>>>>>>
>>>>>>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this
>>>>>>>>>>>> patch set
>>>>>>>>>>>> and
>>>>>>>>>>>>
>> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>>>>>>> Also **without** this patch-set
>>>>>>>>>>>> "
>>>>>>>>>>>>
>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>>>>>>>>>>> "
>>>>>>>>>>>>
>>>>>>>>>>>> This issue comes whenever crashkernel memory is reserved
>>>>>>>>>>>> after 0xc000_0000.
>>>>>>>>>>>> More details discussed earlier in
>>>>>>>>>>>>
>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>    without
>>>>>>>>>>>> any
>>>>>>>>>>>> solution
>>>>>>>>>>>>
>>>>>>>>>>>> This patch-set is expected to solve similar kind of issue.
>>>>>>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above
>>>>>>>>>>>> mentioned
>>>>>>>>>>>> observation should be considered/fixed. .
>>>>>>>>>>>>
>>>>>>>>>>>> --pk
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>>>>>> [   30.367696] NET: Registered protocol family 16
>>>>>>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>>>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>>>>>>>>>>>> 5.7.0-rc3+ #121
>>>>>>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>>>>>> [   30.369984] Call trace:
>>>>>>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>>>>>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>>>>>>>>> [   30.369997]  dump_stack+0xc0/0x10c
>>>>>>>>>>>> [   30.370001]  warn_alloc+0x10c/0x178
>>>>>>>>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
>>>>>>>>>>>> xb50
>>>>>>>>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>>>>>>>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>>>>>>>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>>>>>>>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>>>>>>>>>>> [   30.370020]  do_one_initcall+0x54/0x228
>>>>>>>>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>>>>>>>>>>> [   30.370031]  kernel_init+0x1c/0x110
>>>>>>>>>>>> [   30.370034]  ret_from_fork+0x10/0x18
>>>>>>>>>>>> [   30.370036] Mem-Info:
>>>>>>>>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>>>>>>>>>>> [   30.370064]  active_file:0 inactive_file:0
>>>>>>>>>>>> isolated_file:0
>>>>>>>>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>>>>>>>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>>>>>>>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>>>>>>>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>>>>>>>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>> isolated(anon):0kB
>>>>>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
>>>>>>>>>>>> shmem:0kB
>>>>>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>>>>>>>>>>>> writeback_tmp:0kB
>>>>>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>> isolated(anon):0kB
>>>>>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
>>>>>>>>>>>> shmem:0kB
>>>>>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>>>>>>>>>>>> writeback_tmp:0kB
>>>>>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>>>>>>>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>> writepending:0kB
>>>>>>>>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
>>>>>>>>>>>> pagetables:0kB
>>>>>>>>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>>>>>>>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB
>>>>>>>>>>>> low:664kB
>>>>>>>>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB
>>>>>>>>>>>> inactive_anon:0kB
>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>> writepending:0kB
>>>>>>>>>>>> present:269700kB managed:256000kB mlocked:0kB
>>>>>>>>>>>> kernel_stack:0kB
>>>>>>>>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
>>>>>>>>>>>> free_cma:0kB
>>>>>>>>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>>>>>>>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB
>>>>>>>>>>>> low:15504kB
>>>>>>>>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB
>>>>>>>>>>>> inactive_anon:0kB
>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>> writepending:0kB
>>>>>>>>>>>> present:8388608kB managed:5953112kB mlocked:0kB
>>>>>>>>>>>> kernel_stack:21672kB
>>>>>>>>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB
>>>>>>>>>>>> free_cma:0kB
>>>>>>>>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>>>>>>>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB
>>>>>>>>>>>> 0*128kB
>>>>>>>>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>>>>>>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB
>>>>>>>>>>>> 0*64kB 0*128kB
>>>>>>>>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) =
>>>>>>>>>>>> 256000kB
>>>>>>>>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB
>>>>>>>>>>>> (UE) 3*32kB
>>>>>>>>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME)
>>>>>>>>>>>> 3*1024kB (ME)
>>>>>>>>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>>>>>>>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>>>>>>>>>>> hugepages_surp=0 hugepages_size=1048576kB
>>>>>>>>>>>> [   30.370130] 0 total pagecache pages
>>>>>>>>>>>> [   30.370132] 0 pages in swap cache
>>>>>>>>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>>>>>>> [   30.370135] Free swap  = 0kB
>>>>>>>>>>>> [   30.370136] Total swap = 0kB
>>>>>>>>>>>> [   30.370137] 2164609 pages RAM
>>>>>>>>>>>> [   30.370139] 0 pages HighMem/MovableOnly
>>>>>>>>>>>> [   30.370140] 612331 pages reserved
>>>>>>>>>>>> [   30.370141] 0 pages hwpoisoned
>>>>>>>>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for
>>>>>>>>>>>> atomic
>>>>>>>>>>>> coherent allocation
>>>>>>>>>>> During my testing I saw the same error and Chen's  solution
>>>>>>>>>>> corrected it .
>>>>>>>>>> Which combination you are using on your side? I am using
>>>>>>>>>> Prabhakar's
>>>>>>>>>> suggested environment and can reproduce the issue
>>>>>>>>>> with or without Chen's crashkernel support above 4G patchset.
>>>>>>>>>>
>>>>>>>>>> I am also using a ThunderX2 platform with latest makedumpfile
>>>>>>>>>> code and
>>>>>>>>>> kexec-tools (with the suggested patch
>>>>>>>>>> <
>>>>>>>>>>
>> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$
>>>>>>>>>>> ).
>>>>>>>>>> Thanks,
>>>>>>>>>> Bhupesh
>>>>>>>>> I did this activity 5 months ago and I have moved on to other
>>>>>>>>> activities. My DMA failures were related to PCI devices that could
>>>>>>>>> not be enumerated because  low-DMA space was not  available when
>>>>>>>>> crashkernel was moved above 4G; I don’t recall the exact platform.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For this failure ,
>>>>>>>>>
>>>>>>>>>>>> DMA: failed to allocate 256 KiB pool for atomic
>>>>>>>>>>>> coherent allocation
>>>>>>>>> Is due to :
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 3618082c
>>>>>>>>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
>>>>>>>>>
>>>>>>>>> With the introduction of ZONE_DMA to support the Raspberry DMA
>>>>>>>>> region below 1G, the crashkernel is placed in the upper 4G
>>>>>>>>> ZONE_DMA_32 region. Since the crashkernel does not have access
>>>>>>>>> to the ZONE_DMA region, it prints out call trace during bootup.
>>>>>>>>>
>>>>>>>>> It is due to having this CONFIG item  ON  :
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIG_ZONE_DMA=y
>>>>>>>>>
>>>>>>>>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
>>>>>>>>> use the device tree to specify memory below 1G.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Disabling ZONE_DMA is temporary solution.  We may need proper
>>>>>>>> solution
>>>>>>> Perhaps the Raspberry platform configuration dependencies need
>>>>>>> separated  from “server class” Arm  equipment ?  Or auto-configured on
>>>>>>> boot ?  Consult an expert ;-)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> I would like to see Chen’s feature added , perhaps as
>>>>>>>>> EXPERIMENTAL,  so we can get some configuration testing done on
>>>>>>>>> it.   It corrects having a DMA zone in low memory while crash-
>>>>>>>>> kernel is above 4GB.  This has been going on for a year now.
>>>>>>>> I will also like this patch to be added in Linux as early as
>>>>>>>> possible.
>>>>>>>>
>>>>>>>> Issue mentioned by me happens with or without this patch.
>>>>>>>>
>>>>>>>> This patch-set can consider fixing because it uses low memory for
>>>>>>>> DMA
>>>>>>>> & swiotlb only.
>>>>>>>> We can consider restricting crashkernel within the required range
>>>>>>>> like below
>>>>>>>>
>>>>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>>>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
>>>>>>>> --- a/kernel/crash_core.c
>>>>>>>> +++ b/kernel/crash_core.c
>>>>>>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
>>>>>>>>                        return 0;
>>>>>>>>        }
>>>>>>>>
>>>>>>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size,
>>>>>>>> CRASH_ALIGN);
>>>>>>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size,
>>>>>>>> CRASH_ALIGN);
>>>>>>>>        if (!low_base) {
>>>>>>>>                pr_err("Cannot reserve %ldMB crashkernel low memory,
>>>>>>>> please try smaller size.\n",
>>>>>>>>                       (unsigned long)(low_size >> 20));
>>>>>>>>
>>>>>>>>
>>>>>>>     I suspect  0xc0000000  would need to be a CONFIG item  and not
>>>>>>> hard-coded.
>>>>>>>
>>>>>> if you consider this as valid change,  can you please incorporate as
>>>>>> part of your patch-set.
>>>>> After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-
>>>>> 4G memory is splited
>>>>> to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem
>>>>> 0x0000000040000000-0x00000000ffffffff] on arm64.
>>>>>
>>>>>  From the above discussion, on your platform, the low crashkernel fall in
>>>>> DMA32 region, but your environment needs to access DMA
>>>>> region, so there is the call trace.
>>>>>
>>>>> I have a question, why do you choose 0xc0000000 here?
>>>>>
>>>>> Besides, this is common code, we also need to consider about x86.
>>>>>
>>>>   + nsaenzjulienne@suse.de
>> Thanks for adding me to the conversation, and sorry for the headaches.
>>
>>>>    Exactly .  This is why it needs to be a CONFIG option for  Raspberry
>>>> ..,  or device tree option.
>>>>
>>>>
>>>>    We could revert 1a8e1cef7 since it broke  Arm kdump too.
>>> Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
>>> both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
>>> list, thus we couldn't get many eyes on it for a thorough review from
>>> kexec/kdump p-o-v.
>>>
>>> Also we historically never had distinction in common arch code on the
>>> basis of the intended end use-case: embedded, server or automotive, so
>>> I am not sure introducing a Raspberry specific CONFIG option would be
>>> a good idea.
>> +1
>>
>>  From the distros perspective it's very important to keep a single kernel image.
>>
>>> So, rather than reverting the patch, we can look at addressing the
>>> same properly this time - especially from a kdump p-o-v.
>>> This issue has been reported by some Red Hat arm64 partners with
>>> upstream kernel also and as we have noticed in the past as well,
>>> hardcoding the placement of the crashkernel base address (unless the
>>> base address is specified by a crashkernel=X@Y like bootargs) is also
>>> not a portable suggestion.
>>>
>>> I am working on a possible fix and will have more updates on the same
>>> in a day-or-two.
>> Please keep me in the loop, we've also had issues pointing to this reported by
>> SUSE partners. I can do some testing both on the RPi4 and on big servers that
>> need huge crashkernel sizes.
>>
>> Regards,
>> Nicolas
>>
>
>   Hi
>
>   Has there been any progress on this ? It appears we are stalled because Nicolas's  and Chen's changes are not compatible . One is needed for RPi4 and the other for server class equipment.
>
>
> Thanks,
>
> John
>
>
Hi all,

Thanks for John's reminder.
commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32") broken the arm64 kdump,
there is a simple solution similar to pk's to fix this, see below:

In crash dump kernel, if the peripherals need to use ZONE_DMA like the the Raspberry Pi 4, based on
my solution, we adjusted the memory range in memblock_find_in_range.

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a7580d291c37..eb16c6d54b73 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -320,6 +320,7 @@ int __init reserve_crashkernel_low(void)
        unsigned long long base, low_base = 0, low_size = 0;
        unsigned long total_low_mem;
        int ret;
+       phys_addr_t crash_max = 1ULL << 32;
 
        total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
 
@@ -352,7 +353,12 @@ int __init reserve_crashkernel_low(void)
                        return 0;
        }
 
-       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
+#ifdef CONFIG_ARM64
+       if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+               crash_max = arm64_dma_phys_limit;
+       }
+#endif
+       low_base = memblock_find_in_range(0, crash_max, low_size, CRASH_ALIGN);
        if (!low_base) {
                pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
                       (unsigned long)(low_size >> 20));


Thanks,
Chen Zhou

>
> .
>



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
  2020-06-19  8:21                         ` chenzhou
@ 2020-06-20  0:01                           ` John Donnelly
  0 siblings, 0 replies; 34+ messages in thread
From: John Donnelly @ 2020-06-20  0:01 UTC (permalink / raw)
  To: chenzhou, Nicolas Saenz Julienne, Bhupesh Sharma
  Cc: Simon Horman, Devicetree List, Arnd Bergmann, Baoquan He,
	Linux Doc Mailing List, Catalin Marinas, guohanjun,
	kexec mailing list, Linux Kernel Mailing List, Will Deacon,
	Rob Herring, James Morse, Prabhakar Kushwaha, Thomas Gleixner,
	Prabhakar Kushwaha, RuiRui Yang, Ingo Molnar, linux-arm-kernel


On 6/19/20 3:21 AM, chenzhou wrote:
> On 2020/6/19 10:32, John Donnelly wrote:
>> On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:
>>> On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:
>>>> Hi All,
>>>>
>>>> On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@oracle.com>
>>>> wrote:
>>>>>> On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
>>>>>>> Hi Chen,
>>>>>>>
>>>>>>> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
>>>>>>>>> prabhakar.pkin@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
>>>>>>>>> john.p.donnelly@oracle.com> wrote:
>>>>>>>>>> Hi .  See below !
>>>>>>>>>>
>>>>>>>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi John,
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
>>>>>>>>>>> John.P.donnelly@oracle.com> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>>>>>>>> Hi Chen,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
>>>>>>>>>>>>> chenzhou10@huawei.com> wrote:
>>>>>>>>>>>>>> This patch series enable reserving crashkernel above 4G in
>>>>>>>>>>>>>> arm64.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G,
>>>>>>>>>>>>>> which will fail
>>>>>>>>>>>>>> when there is no enough low memory.
>>>>>>>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve
>>>>>>>>>>>>>> crashkernel above 4G,
>>>>>>>>>>>>>> in this case, if swiotlb or DMA buffers are required,
>>>>>>>>>>>>>> crash dump kernel
>>>>>>>>>>>>>> will boot failure because there is no low memory available
>>>>>>>>>>>>>> for allocation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump
>>>>>>>>>>>>> kernel
>>>>>>>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this
>>>>>>>>>>>>> patch set
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>>>>>>>> Also **without** this patch-set
>>>>>>>>>>>>> "
>>>>>>>>>>>>>
>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>>>>>>>>>>>> "
>>>>>>>>>>>>>
>>>>>>>>>>>>> This issue comes whenever crashkernel memory is reserved
>>>>>>>>>>>>> after 0xc000_0000.
>>>>>>>>>>>>> More details discussed earlier in
>>>>>>>>>>>>>
>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>>     without
>>>>>>>>>>>>> any
>>>>>>>>>>>>> solution
>>>>>>>>>>>>>
>>>>>>>>>>>>> This patch-set is expected to solve similar kind of issue.
>>>>>>>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above
>>>>>>>>>>>>> mentioned
>>>>>>>>>>>>> observation should be considered/fixed. .
>>>>>>>>>>>>>
>>>>>>>>>>>>> --pk
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>>>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>>>>>>> [   30.367696] NET: Registered protocol family 16
>>>>>>>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>>>>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>>>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>>>>>>>>>>>>> 5.7.0-rc3+ #121
>>>>>>>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>>>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>>>>>>> [   30.369984] Call trace:
>>>>>>>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>>>>>>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>>>>>>>>>> [   30.369997]  dump_stack+0xc0/0x10c
>>>>>>>>>>>>> [   30.370001]  warn_alloc+0x10c/0x178
>>>>>>>>>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
>>>>>>>>>>>>> xb50
>>>>>>>>>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>>>>>>>>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>>>>>>>>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>>>>>>>>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>>>>>>>>>>>> [   30.370020]  do_one_initcall+0x54/0x228
>>>>>>>>>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>>>>>>>>>>>> [   30.370031]  kernel_init+0x1c/0x110
>>>>>>>>>>>>> [   30.370034]  ret_from_fork+0x10/0x18
>>>>>>>>>>>>> [   30.370036] Mem-Info:
>>>>>>>>>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>>>>>>>>>>>> [   30.370064]  active_file:0 inactive_file:0
>>>>>>>>>>>>> isolated_file:0
>>>>>>>>>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>>>>>>>>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>>>>>>>>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>>>>>>>>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>>>>>>>>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>>> isolated(anon):0kB
>>>>>>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
>>>>>>>>>>>>> shmem:0kB
>>>>>>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>>>>>>>>>>>>> writeback_tmp:0kB
>>>>>>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>>>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>>> isolated(anon):0kB
>>>>>>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
>>>>>>>>>>>>> shmem:0kB
>>>>>>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>>>>>>>>>>>>> writeback_tmp:0kB
>>>>>>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>>>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>>>>>>>>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>>> writepending:0kB
>>>>>>>>>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
>>>>>>>>>>>>> pagetables:0kB
>>>>>>>>>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>>>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>>>>>>>>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB
>>>>>>>>>>>>> low:664kB
>>>>>>>>>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB
>>>>>>>>>>>>> inactive_anon:0kB
>>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>>> writepending:0kB
>>>>>>>>>>>>> present:269700kB managed:256000kB mlocked:0kB
>>>>>>>>>>>>> kernel_stack:0kB
>>>>>>>>>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
>>>>>>>>>>>>> free_cma:0kB
>>>>>>>>>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>>>>>>>>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB
>>>>>>>>>>>>> low:15504kB
>>>>>>>>>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB
>>>>>>>>>>>>> inactive_anon:0kB
>>>>>>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB
>>>>>>>>>>>>> writepending:0kB
>>>>>>>>>>>>> present:8388608kB managed:5953112kB mlocked:0kB
>>>>>>>>>>>>> kernel_stack:21672kB
>>>>>>>>>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB
>>>>>>>>>>>>> free_cma:0kB
>>>>>>>>>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>>>>>>>>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB
>>>>>>>>>>>>> 0*128kB
>>>>>>>>>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>>>>>>>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB
>>>>>>>>>>>>> 0*64kB 0*128kB
>>>>>>>>>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) =
>>>>>>>>>>>>> 256000kB
>>>>>>>>>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB
>>>>>>>>>>>>> (UE) 3*32kB
>>>>>>>>>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME)
>>>>>>>>>>>>> 3*1024kB (ME)
>>>>>>>>>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>>>>>>>>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>>>>>>>>>>>> hugepages_surp=0 hugepages_size=1048576kB
>>>>>>>>>>>>> [   30.370130] 0 total pagecache pages
>>>>>>>>>>>>> [   30.370132] 0 pages in swap cache
>>>>>>>>>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>>>>>>>> [   30.370135] Free swap  = 0kB
>>>>>>>>>>>>> [   30.370136] Total swap = 0kB
>>>>>>>>>>>>> [   30.370137] 2164609 pages RAM
>>>>>>>>>>>>> [   30.370139] 0 pages HighMem/MovableOnly
>>>>>>>>>>>>> [   30.370140] 612331 pages reserved
>>>>>>>>>>>>> [   30.370141] 0 pages hwpoisoned
>>>>>>>>>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for
>>>>>>>>>>>>> atomic
>>>>>>>>>>>>> coherent allocation
>>>>>>>>>>>> During my testing I saw the same error and Chen's  solution
>>>>>>>>>>>> corrected it .
>>>>>>>>>>> Which combination you are using on your side? I am using
>>>>>>>>>>> Prabhakar's
>>>>>>>>>>> suggested environment and can reproduce the issue
>>>>>>>>>>> with or without Chen's crashkernel support above 4G patchset.
>>>>>>>>>>>
>>>>>>>>>>> I am also using a ThunderX2 platform with latest makedumpfile
>>>>>>>>>>> code and
>>>>>>>>>>> kexec-tools (with the suggested patch
>>>>>>>>>>> <
>>>>>>>>>>>
>>> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$
>>>>>>>>>>>> ).
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Bhupesh
>>>>>>>>>> I did this activity 5 months ago and I have moved on to other
>>>>>>>>>> activities. My DMA failures were related to PCI devices that could
>>>>>>>>>> not be enumerated because  low-DMA space was not  available when
>>>>>>>>>> crashkernel was moved above 4G; I don’t recall the exact platform.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For this failure ,
>>>>>>>>>>
>>>>>>>>>>>>> DMA: failed to allocate 256 KiB pool for atomic
>>>>>>>>>>>>> coherent allocation
>>>>>>>>>> Is due to :
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 3618082c
>>>>>>>>>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
>>>>>>>>>>
>>>>>>>>>> With the introduction of ZONE_DMA to support the Raspberry DMA
>>>>>>>>>> region below 1G, the crashkernel is placed in the upper 4G
>>>>>>>>>> ZONE_DMA_32 region. Since the crashkernel does not have access
>>>>>>>>>> to the ZONE_DMA region, it prints out call trace during bootup.
>>>>>>>>>>
>>>>>>>>>> It is due to having this CONFIG item  ON  :
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> CONFIG_ZONE_DMA=y
>>>>>>>>>>
>>>>>>>>>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
>>>>>>>>>> use the device tree to specify memory below 1G.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Disabling ZONE_DMA is temporary solution.  We may need proper
>>>>>>>>> solution
>>>>>>>> Perhaps the Raspberry platform configuration dependencies need
>>>>>>>> separated  from “server class” Arm  equipment ?  Or auto-configured on
>>>>>>>> boot ?  Consult an expert ;-)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> I would like to see Chen’s feature added , perhaps as
>>>>>>>>>> EXPERIMENTAL,  so we can get some configuration testing done on
>>>>>>>>>> it.   It corrects having a DMA zone in low memory while crash-
>>>>>>>>>> kernel is above 4GB.  This has been going on for a year now.
>>>>>>>>> I will also like this patch to be added in Linux as early as
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> Issue mentioned by me happens with or without this patch.
>>>>>>>>>
>>>>>>>>> This patch-set can consider fixing because it uses low memory for
>>>>>>>>> DMA
>>>>>>>>> & swiotlb only.
>>>>>>>>> We can consider restricting crashkernel within the required range
>>>>>>>>> like below
>>>>>>>>>
>>>>>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>>>>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
>>>>>>>>> --- a/kernel/crash_core.c
>>>>>>>>> +++ b/kernel/crash_core.c
>>>>>>>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
>>>>>>>>>                         return 0;
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size,
>>>>>>>>> CRASH_ALIGN);
>>>>>>>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size,
>>>>>>>>> CRASH_ALIGN);
>>>>>>>>>         if (!low_base) {
>>>>>>>>>                 pr_err("Cannot reserve %ldMB crashkernel low memory,
>>>>>>>>> please try smaller size.\n",
>>>>>>>>>                        (unsigned long)(low_size >> 20));
>>>>>>>>>
>>>>>>>>>
>>>>>>>>      I suspect  0xc0000000  would need to be a CONFIG item  and not
>>>>>>>> hard-coded.
>>>>>>>>
>>>>>>> if you consider this as valid change,  can you please incorporate as
>>>>>>> part of your patch-set.
>>>>>> After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-
>>>>>> 4G memory is splited
>>>>>> to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem
>>>>>> 0x0000000040000000-0x00000000ffffffff] on arm64.
>>>>>>
>>>>>>   From the above discussion, on your platform, the low crashkernel fall in
>>>>>> DMA32 region, but your environment needs to access DMA
>>>>>> region, so there is the call trace.
>>>>>>
>>>>>> I have a question, why do you choose 0xc0000000 here?
>>>>>>
>>>>>> Besides, this is common code, we also need to consider about x86.
>>>>>>
>>>>>    + nsaenzjulienne@suse.de
>>> Thanks for adding me to the conversation, and sorry for the headaches.
>>>
>>>>>     Exactly .  This is why it needs to be a CONFIG option for  Raspberry
>>>>> ..,  or device tree option.
>>>>>
>>>>>
>>>>>     We could revert 1a8e1cef7 since it broke  Arm kdump too.
>>>> Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
>>>> both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
>>>> list, thus we couldn't get many eyes on it for a thorough review from
>>>> kexec/kdump p-o-v.
>>>>
>>>> Also we historically never had distinction in common arch code on the
>>>> basis of the intended end use-case: embedded, server or automotive, so
>>>> I am not sure introducing a Raspberry specific CONFIG option would be
>>>> a good idea.
>>> +1
>>>
>>>   From the distros perspective it's very important to keep a single kernel image.
>>>
>>>> So, rather than reverting the patch, we can look at addressing the
>>>> same properly this time - especially from a kdump p-o-v.
>>>> This issue has been reported by some Red Hat arm64 partners with
>>>> upstream kernel also and as we have noticed in the past as well,
>>>> hardcoding the placement of the crashkernel base address (unless the
>>>> base address is specified by a crashkernel=X@Y like bootargs) is also
>>>> not a portable suggestion.
>>>>
>>>> I am working on a possible fix and will have more updates on the same
>>>> in a day-or-two.
>>> Please keep me in the loop, we've also had issues pointing to this reported by
>>> SUSE partners. I can do some testing both on the RPi4 and on big servers that
>>> need huge crashkernel sizes.
>>>
>>> Regards,
>>> Nicolas
>>>
>>    Hi
>>
>>    Has there been any progress on this ? It appears we are stalled because Nicolas's  and Chen's changes are not compatible . One is needed for RPi4 and the other for server class equipment.
>>
>>
>> Thanks,
>>
>> John
>>
>>
> Hi all,
>
> Thanks for John's reminder.
> commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32") broken the arm64 kdump,
> there is a simple solution similar to pk's to fix this, see below:
>
> In crash dump kernel, if the peripherals need to use ZONE_DMA like the the Raspberry Pi 4, based on
> my solution, we adjusted the memory range in memblock_find_in_range.
>
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index a7580d291c37..eb16c6d54b73 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -320,6 +320,7 @@ int __init reserve_crashkernel_low(void)
>          unsigned long long base, low_base = 0, low_size = 0;
>          unsigned long total_low_mem;
>          int ret;
> +       phys_addr_t crash_max = 1ULL << 32;
>   
>          total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>   
> @@ -352,7 +353,12 @@ int __init reserve_crashkernel_low(void)
>                          return 0;
>          }
>   
> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> +#ifdef CONFIG_ARM64
> +       if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> +               crash_max = arm64_dma_phys_limit;
> +       }
> +#endif
> +       low_base = memblock_find_in_range(0, crash_max, low_size, CRASH_ALIGN);
>          if (!low_base) {
>                  pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>                         (unsigned long)(low_size >> 20));
>
>
> Thanks,
> Chen Zhou
>
Hi,

I don't have any objections to this proposal.






>> .
>>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
  2020-05-29 16:11         ` James Morse
@ 2020-06-20  3:54           ` chenzhou
  0 siblings, 0 replies; 34+ messages in thread
From: chenzhou @ 2020-06-20  3:54 UTC (permalink / raw)
  To: James Morse, Rob Herring
  Cc: Thomas Gleixner, Ingo Molnar, Catalin Marinas, Will Deacon,
	dyoung, Baoquan He, Arnd Bergmann, John.p.donnelly, pkushwaha,
	Simon Horman, Hanjun Guo,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	devicetree, Linux Doc Mailing List, linux-kernel, kexec,
	Nicolas Saenz Julienne, Bhupesh Sharma

Hi James, Rob,


On 2020/5/30 0:11, James Morse wrote:
> Hi guys,
>
> On 26/05/2020 22:18, Rob Herring wrote:
>> On Fri, May 22, 2020 at 11:24:11AM +0800, chenzhou wrote:
>>> On 2020/5/21 21:29, Rob Herring wrote:
>>>> On Thu, May 21, 2020 at 3:35 AM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>> Add documentation for DT property used by arm64 kdump:
>>>>> linux,low-memory-range.
>>>>> "linux,low-memory-range" is an another memory region used for crash
>>>>> dump kernel devices.
>>>>> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
>>>>> index 45e79172a646..bfe6fb6976e6 100644
>>>>> --- a/Documentation/devicetree/bindings/chosen.txt
>>>>> +++ b/Documentation/devicetree/bindings/chosen.txt
>>>>> +linux,low-memory-range
>>>>> +----------------------
>>>>> +This property (arm64 only) holds a base address and size, describing a
>>>>> +limited region below 4G. Similar to "linux,usable-memory-range", it is
>>>>> +an another memory range which may be considered available for use by the
>>>>> +kernel.
>>>> Why can't you just add a range to "linux,usable-memory-range"? It
>>>> shouldn't be hard to figure out which part is below 4G.
>>> The comments from James:
>>> Won't this break if your kdump kernel doesn't know what the extra parameters are?
>>> Or if it expects two ranges, but only gets one? These DT properties should be treated as
>>> ABI between kernel versions, we can't really change it like this.
>>>
>>> I think the 'low' region is an optional-extra, that is never mapped by the first kernel. I
>>> think the simplest thing to do is to add an 'linux,low-memory-range' that we
>>> memblock_add() after memblock_cap_memory_range() has been called.
>>> If its missing, or the new kernel doesn't know what its for, everything keeps working.
>>
>> I don't think there's a compatibility issue here though. The current 
>> kernel doesn't care if the property is longer than 1 base+size. It only 
>> checks if the size is less than 1 base+size.
> Aha! I missed that.
>
>
>> And yes, we can rely on 
>> that implementation detail. It's only an ABI if an existing user 
>> notices.
>>
>> Now, if the low memory is listed first, then an older kdump kernel 
>> would get a different memory range. If that's a problem, then define 
>> that low memory goes last. 
> This first entry would need to be the 'crashkernel' range where the kdump kernel is
> placed, otherwise an older kernel won't boot. The rest can be optional extras, as long as
> we are tolerant of it being missing...
How about like this:

1. The low memory region remained as "Crash kernel (low)".
2. Userspace will find "Crash kernel" and "Crash kernel (low)" region in /proc/iomem,
and add "Crash kernel (low)" as the last range of property "linux,usable-memory-range".

Thanks,
Chen Zhou
>
> I'll try and look at the rest of this series on Monday,
>
>
> Thanks,
>
> James
>
> .
>



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-06-20  3:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21  9:38 [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
2020-05-21  9:38 ` [PATCH v8 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
2020-05-26  0:56   ` Baoquan He
2020-05-21  9:38 ` [PATCH v8 2/5] arm64: kdump: reserve crashkenel above 4G for crash dump kernel Chen Zhou
2020-05-26  0:59   ` Baoquan He
2020-05-21  9:38 ` [PATCH v8 3/5] arm64: kdump: add memory for devices by DT property, low-memory-range Chen Zhou
2020-05-21  9:38 ` [PATCH v8 4/5] kdump: update Documentation about crashkernel on arm64 Chen Zhou
2020-05-21  9:38 ` [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump Chen Zhou
2020-05-21 13:29   ` Rob Herring
2020-05-22  3:24     ` chenzhou
2020-05-26 21:18       ` Rob Herring
2020-05-29 16:11         ` James Morse
2020-06-20  3:54           ` chenzhou
2020-05-26  1:42 ` [PATCH v8 0/5] support reserving crashkernel above 4G on " Baoquan He
2020-05-26  2:28   ` chenzhou
2020-05-28 22:20   ` John Donnelly
2020-05-29  8:05     ` Will Deacon
2020-06-01 12:02 ` Prabhakar Kushwaha
2020-06-01 19:30   ` John Donnelly
2020-06-01 21:02     ` Bhupesh Sharma
2020-06-01 21:59       ` John Donnelly
2020-06-02  5:38         ` Prabhakar Kushwaha
2020-06-02 14:41           ` John Donnelly
2020-06-03 11:47             ` Prabhakar Kushwaha
2020-06-03 13:20               ` chenzhou
2020-06-03 15:30                 ` John Donnelly
2020-06-03 19:47                   ` Bhupesh Sharma
2020-06-04  7:14                     ` Will Deacon
2020-06-04 17:01                     ` Nicolas Saenz Julienne
2020-06-05  2:26                       ` John Donnelly
2020-06-05  8:21                         ` Nicolas Saenz Julienne
2020-06-19  2:32                       ` John Donnelly
2020-06-19  8:21                         ` chenzhou
2020-06-20  0:01                           ` John Donnelly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).