linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump
@ 2019-12-23 15:23 Chen Zhou
  2019-12-23 15:23 ` [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Chen Zhou @ 2019-12-23 15:23 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, james.morse, dyoung, bhsharma
  Cc: xiexiuqi, chenzhou10, linux-doc, kexec, linux-kernel, horms,
	linux-arm-kernel

This patch series enable reserving crashkernel above 4G in arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required, crash dump kernel
will boot failure because there is no low memory available for allocation.

To solve these issues, introduce crashkernel=X,low to reserve specified
size low memory.
Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
is specified simultaneously, kernel should reserve specified size low memory
for crash dump kernel devices. So there may be two crash kernel regions, one
is below 4G, the other is above 4G.
In order to distinct from the high region and make no effect to the use of
kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
"linux,low-memory-range" to crash dump kernel's dtb to pass the low region.

Besides, we need to modify kexec-tools:
arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])

The previous changes and discussions can be retrieved from:

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2019-August/023569.html
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142

Chen Zhou (4):
  x86: kdump: move reserve_crashkernel_low() into crash_core.c
  arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  arm64: kdump: add memory for devices by DT property, low-memory-range
  kdump: update Documentation about crashkernel on arm64

 Documentation/admin-guide/kdump/kdump.rst       | 13 +++-
 Documentation/admin-guide/kernel-parameters.txt | 12 +++-
 arch/arm64/kernel/setup.c                       |  8 ++-
 arch/arm64/mm/init.c                            | 61 ++++++++++++++++-
 arch/x86/kernel/setup.c                         | 62 ++----------------
 include/linux/crash_core.h                      |  3 +
 include/linux/kexec.h                           |  2 -
 kernel/crash_core.c                             | 87 +++++++++++++++++++++++++
 kernel/kexec_core.c                             | 17 -----
 9 files changed, 183 insertions(+), 82 deletions(-)

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2019-12-23 15:23 [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
@ 2019-12-23 15:23 ` Chen Zhou
  2019-12-27  5:54   ` Dave Young
  2019-12-23 15:23 ` [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel Chen Zhou
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Chen Zhou @ 2019-12-23 15:23 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, james.morse, dyoung, bhsharma
  Cc: kbuild test robot, xiexiuqi, chenzhou10, linux-doc, kexec,
	linux-kernel, horms, linux-arm-kernel

In preparation for supporting reserve_crashkernel_low in arm64 as
x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.

Note, in arm64, we reserve low memory if and only if crashkernel=X,low
is specified. Different with x86_64, don't set low memory automatically.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 arch/x86/kernel/setup.c    | 62 ++++-----------------------------
 include/linux/crash_core.h |  3 ++
 include/linux/kexec.h      |  2 --
 kernel/crash_core.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec_core.c        | 17 ---------
 5 files changed, 96 insertions(+), 75 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index cedfe20..5f38942 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -486,59 +486,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_HIGH_MAX	SZ_64T
 #endif
 
-static int __init reserve_crashkernel_low(void)
-{
-#ifdef CONFIG_X86_64
-	unsigned long long base, low_base = 0, low_size = 0;
-	unsigned long total_low_mem;
-	int ret;
-
-	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
-
-	/* crashkernel=Y,low */
-	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
-	if (ret) {
-		/*
-		 * two parts from kernel/dma/swiotlb.c:
-		 * -swiotlb size: user-specified with swiotlb= or default.
-		 *
-		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
-		 * to 8M for other buffers that may need to stay low too. Also
-		 * make sure we allocate enough extra low memory so that we
-		 * don't run out of DMA buffers for 32-bit devices.
-		 */
-		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
-	} else {
-		/* passed with crashkernel=0,low ? */
-		if (!low_size)
-			return 0;
-	}
-
-	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
-	if (!low_base) {
-		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
-		       (unsigned long)(low_size >> 20));
-		return -ENOMEM;
-	}
-
-	ret = memblock_reserve(low_base, low_size);
-	if (ret) {
-		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
-		return ret;
-	}
-
-	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
-		(unsigned long)(low_size >> 20),
-		(unsigned long)(low_base >> 20),
-		(unsigned long)(total_low_mem >> 20));
-
-	crashk_low_res.start = low_base;
-	crashk_low_res.end   = low_base + low_size - 1;
-	insert_resource(&iomem_resource, &crashk_low_res);
-#endif
-	return 0;
-}
-
 static void __init reserve_crashkernel(void)
 {
 	unsigned long long crash_size, crash_base, total_mem;
@@ -602,9 +549,12 @@ static void __init reserve_crashkernel(void)
 		return;
 	}
 
-	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
-		memblock_free(crash_base, crash_size);
-		return;
+	if (crash_base >= (1ULL << 32)) {
+		if (reserve_crashkernel_low()) {
+			memblock_free(crash_base, crash_size);
+			return;
+		}
+		insert_resource(&iomem_resource, &crashk_low_res);
 	}
 
 	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 525510a..4df8c0b 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 extern unsigned char *vmcoreinfo_data;
 extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
+extern struct resource crashk_res;
+extern struct resource crashk_low_res;
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len);
@@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
+int __init reserve_crashkernel_low(void);
 
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1776eb2..5d5d963 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -330,8 +330,6 @@ extern int kexec_load_disabled;
 
 /* Location of a reserved region to hold the crash kernel.
  */
-extern struct resource crashk_res;
-extern struct resource crashk_low_res;
 extern note_buf_t __percpu *crash_notes;
 
 /* flag to track if kexec reboot is in progress */
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 9f1557b..eb72fd6 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -7,6 +7,8 @@
 #include <linux/crash_core.h>
 #include <linux/utsname.h>
 #include <linux/vmalloc.h>
+#include <linux/memblock.h>
+#include <linux/swiotlb.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
 /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
 static unsigned char *vmcoreinfo_data_safecopy;
 
+/* Location of the reserved area for the crash kernel */
+struct resource crashk_res = {
+	.name  = "Crash kernel",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+	.desc  = IORES_DESC_CRASH_KERNEL
+};
+struct resource crashk_low_res = {
+	.name  = "Crash kernel",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+	.desc  = IORES_DESC_CRASH_KERNEL
+};
+
 /*
  * parsing the "crashkernel" commandline
  *
@@ -292,6 +310,75 @@ int __init parse_crashkernel_low(char *cmdline,
 				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
 }
 
+#if defined(CONFIG_X86_64)
+#define CRASH_ALIGN		SZ_16M
+#elif defined(CONFIG_ARM64)
+#define CRASH_ALIGN		SZ_2M
+#endif
+
+int __init reserve_crashkernel_low(void)
+{
+#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
+	unsigned long long base, low_base = 0, low_size = 0;
+	unsigned long total_low_mem;
+	int ret;
+
+	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
+
+	/* crashkernel=Y,low */
+	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
+			&base);
+	if (ret) {
+#ifdef CONFIG_X86_64
+		/*
+		 * two parts from lib/swiotlb.c:
+		 * -swiotlb size: user-specified with swiotlb= or default.
+		 *
+		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
+		 * to 8M for other buffers that may need to stay low too. Also
+		 * make sure we allocate enough extra low memory so that we
+		 * don't run out of DMA buffers for 32-bit devices.
+		 */
+		low_size = max(swiotlb_size_or_default() + (8UL << 20),
+				256UL << 20);
+#else
+		/*
+		 * in arm64, reserve low memory if and only if crashkernel=X,low
+		 * specified.
+		 */
+		return -EINVAL;
+#endif
+	} else {
+		/* passed with crashkernel=0,low ? */
+		if (!low_size)
+			return 0;
+	}
+
+	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
+	if (!low_base) {
+		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
+		       (unsigned long)(low_size >> 20));
+		return -ENOMEM;
+	}
+
+	ret = memblock_reserve(low_base, low_size);
+	if (ret) {
+		pr_err("%s: Error reserving crashkernel low memblock.\n",
+				__func__);
+		return ret;
+	}
+
+	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
+		(unsigned long)(low_size >> 20),
+		(unsigned long)(low_base >> 20),
+		(unsigned long)(total_low_mem >> 20));
+
+	crashk_low_res.start = low_base;
+	crashk_low_res.end   = low_base + low_size - 1;
+#endif
+	return 0;
+}
+
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len)
 {
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 15d70a9..458d093 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
 /* Flag to indicate we are going to kexec a new kernel */
 bool kexec_in_progress = false;
 
-
-/* Location of the reserved area for the crash kernel */
-struct resource crashk_res = {
-	.name  = "Crash kernel",
-	.start = 0,
-	.end   = 0,
-	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
-	.desc  = IORES_DESC_CRASH_KERNEL
-};
-struct resource crashk_low_res = {
-	.name  = "Crash kernel",
-	.start = 0,
-	.end   = 0,
-	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
-	.desc  = IORES_DESC_CRASH_KERNEL
-};
-
 int kexec_should_crash(struct task_struct *p)
 {
 	/*
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2019-12-23 15:23 [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
  2019-12-23 15:23 ` [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
@ 2019-12-23 15:23 ` Chen Zhou
  2020-03-05 10:13   ` Prabhakar Kushwaha
  2019-12-23 15:23 ` [PATCH v7 3/4] arm64: kdump: add memory for devices by DT property, low-memory-range Chen Zhou
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Chen Zhou @ 2019-12-23 15:23 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, james.morse, dyoung, bhsharma
  Cc: xiexiuqi, chenzhou10, linux-doc, kexec, linux-kernel, horms,
	linux-arm-kernel

Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=X,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 arch/arm64/kernel/setup.c |  8 +++++++-
 arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 56f6645..04d1c87 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
 		    kernel_data.end <= res->end)
 			request_resource(res, &kernel_data);
 #ifdef CONFIG_KEXEC_CORE
-		/* Userspace will find "Crash kernel" region in /proc/iomem. */
+		/*
+		 * Userspace will find "Crash kernel" region in /proc/iomem.
+		 * Note: the low region is renamed as Crash kernel (low).
+		 */
+		if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+				crashk_low_res.end <= res->end)
+			request_resource(res, &crashk_low_res);
 		if (crashk_res.end && crashk_res.start >= res->start &&
 		    crashk_res.end <= res->end)
 			request_resource(res, &crashk_res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index b65dffd..0d7afd5 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
 {
 	unsigned long long crash_base, crash_size;
 	int ret;
+	phys_addr_t crash_max = arm64_dma32_phys_limit;
 
 	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
 				&crash_size, &crash_base);
@@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
 	if (ret || !crash_size)
 		return;
 
+	ret = reserve_crashkernel_low();
+	if (!ret && crashk_low_res.end) {
+		/*
+		 * If crashkernel=X,low specified, there may be two regions,
+		 * we need to make some changes as follows:
+		 *
+		 * 1. rename the low region as "Crash kernel (low)"
+		 * In order to distinct from the high region and make no effect
+		 * to the use of existing kexec-tools, rename the low region as
+		 * "Crash kernel (low)".
+		 *
+		 * 2. change the upper bound for crash memory
+		 * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
+		 *
+		 * 3. mark the low region as "nomap"
+		 * The low region is intended to be used for crash dump kernel
+		 * devices, just mark the low region as "nomap" simply.
+		 */
+		const char *rename = "Crash kernel (low)";
+
+		crashk_low_res.name = rename;
+		crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
+		memblock_mark_nomap(crashk_low_res.start,
+				    resource_size(&crashk_low_res));
+	}
+
 	crash_size = PAGE_ALIGN(crash_size);
 
 	if (crash_base == 0) {
 		/* Current arm64 boot protocol requires 2MB alignment */
-		crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
-				crash_size, SZ_2M);
+		crash_base = memblock_find_in_range(0, crash_max, crash_size,
+				SZ_2M);
 		if (crash_base == 0) {
 			pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
 				crash_size);
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 3/4] arm64: kdump: add memory for devices by DT property, low-memory-range
  2019-12-23 15:23 [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
  2019-12-23 15:23 ` [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
  2019-12-23 15:23 ` [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel Chen Zhou
@ 2019-12-23 15:23 ` Chen Zhou
  2019-12-23 15:23 ` [PATCH v7 4/4] kdump: update Documentation about crashkernel on arm64 Chen Zhou
  2020-03-26  3:09 ` [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
  4 siblings, 0 replies; 30+ messages in thread
From: Chen Zhou @ 2019-12-23 15:23 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, james.morse, dyoung, bhsharma
  Cc: xiexiuqi, chenzhou10, linux-doc, kexec, linux-kernel, horms,
	linux-arm-kernel

If we want to reserve crashkernel above 4G, we could use parameters
"crashkernel=X crashkernel=Y,low", in this case, specified size low
memory is reserved for crash dump kernel devices and never mapped by
the first kernel. This memory range is advertised to crash dump kernel
via DT property under /chosen,
	linux,low-memory-range=<BASE SIZE>

Crash dump kernel reads this property at boot time and call
memblock_add() after memblock_cap_memory_range() has been called.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 arch/arm64/mm/init.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0d7afd5..1c4a6ad 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -322,6 +322,26 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static int __init early_init_dt_scan_lowmem(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	struct memblock_region *lowmem = data;
+	const __be32 *reg;
+	int len;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,low-memory-range", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	lowmem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	lowmem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
 static int __init early_init_dt_scan_usablemem(unsigned long node,
 		const char *uname, int depth, void *data)
 {
@@ -352,13 +372,21 @@ static void __init fdt_enforce_memory_region(void)
 
 	if (reg.size)
 		memblock_cap_memory_range(reg.base, reg.size);
+
+	of_scan_flat_dt(early_init_dt_scan_lowmem, &reg);
+
+	if (reg.size)
+		memblock_add(reg.base, reg.size);
 }
 
 void __init arm64_memblock_init(void)
 {
 	const s64 linear_region_size = BIT(vabits_actual - 1);
 
-	/* Handle linux,usable-memory-range property */
+	/*
+	 * Handle linux,usable-memory-range and linux,low-memory-range
+	 * properties.
+	 */
 	fdt_enforce_memory_region();
 
 	/* Remove memory above our supported physical address size */
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 4/4] kdump: update Documentation about crashkernel on arm64
  2019-12-23 15:23 [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
                   ` (2 preceding siblings ...)
  2019-12-23 15:23 ` [PATCH v7 3/4] arm64: kdump: add memory for devices by DT property, low-memory-range Chen Zhou
@ 2019-12-23 15:23 ` Chen Zhou
  2020-03-26  3:09 ` [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
  4 siblings, 0 replies; 30+ messages in thread
From: Chen Zhou @ 2019-12-23 15:23 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, james.morse, dyoung, bhsharma
  Cc: xiexiuqi, chenzhou10, linux-doc, kexec, linux-kernel, horms,
	linux-arm-kernel

Now we support crashkernel=X,[low] on arm64, update the Documentation.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 Documentation/admin-guide/kdump/kdump.rst       | 13 +++++++++++--
 Documentation/admin-guide/kernel-parameters.txt | 12 +++++++++++-
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index ac7e131..e55173e 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -299,7 +299,13 @@ Boot into System Kernel
    "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
    starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
 
-   On x86 and x86_64, use "crashkernel=64M@16M".
+   On x86 use "crashkernel=64M@16M".
+
+   On x86_64, use "crashkernel=Y[@X]" to select a region under 4G first, and
+   fall back to reserve region above 4G when '@offset' hasn't been specified.
+   We can also use "crashkernel=X,high" to select a region above 4G, which
+   also tries to allocate at least 256M below 4G automatically and
+   "crashkernel=Y,low" can be used to allocate specified size low memory.
 
    On ppc64, use "crashkernel=128M@32M".
 
@@ -316,8 +322,11 @@ Boot into System Kernel
    kernel will automatically locate the crash kernel image within the
    first 512MB of RAM if X is not given.
 
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   On arm64, use "crashkernel=Y[@X]". Note that the start address of
    the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+   If crashkernel=Z,low is specified simultaneously, reserve spcified size
+   low memory for crash kdump kernel devices firstly and then reserve memory
+   above 4G.
 
 Load the Dump-capture Kernel
 ============================
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6e..bde3ab4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -706,6 +706,9 @@
 			[KNL, x86_64] select a region under 4G first, and
 			fall back to reserve region above 4G when '@offset'
 			hasn't been specified.
+			[KNL, arm64] If crashkernel=X,low is specified, reserve
+			spcified size low memory for crash kdump kernel devices
+			firstly, and then reserve memory above 4G.
 			See Documentation/admin-guide/kdump/kdump.rst for further details.
 
 	crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -730,12 +733,19 @@
 			requires at least 64M+32K low memory, also enough extra
 			low memory is needed to make sure DMA buffers for 32-bit
 			devices won't run out. Kernel would try to allocate at
-			at least 256M below 4G automatically.
+			least 256M below 4G automatically.
 			This one let user to specify own low range under 4G
 			for second kernel instead.
 			0: to disable low allocation.
 			It will be ignored when crashkernel=X,high is not used
 			or memory reserved is below 4G.
+			[KNL, arm64] range under 4G.
+			This one let user to specify own low range under 4G
+			for crash dump kernel instead.
+			Different with x86_64, kernel allocates specified size
+			physical memory region only when this parameter is specified
+			instead of trying to allocate at least 256M below 4G
+			automatically.
 
 	cryptomgr.notests
 			[KNL] Disable crypto self-tests
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2019-12-23 15:23 ` [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
@ 2019-12-27  5:54   ` Dave Young
  2019-12-27 11:04     ` Chen Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Young @ 2019-12-27  5:54 UTC (permalink / raw)
  To: Chen Zhou
  Cc: kbuild test robot, horms, linux-doc, catalin.marinas, bhsharma,
	xiexiuqi, kexec, linux-kernel, mingo, james.morse, tglx, will,
	linux-arm-kernel

Hi,
On 12/23/19 at 11:23pm, Chen Zhou wrote:
> In preparation for supporting reserve_crashkernel_low in arm64 as
> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
> 
> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
> is specified. Different with x86_64, don't set low memory automatically.

Do you have any reason for the difference?  I'd expect we have same
logic if possible and remove some of the ifdefs.

> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> ---
>  arch/x86/kernel/setup.c    | 62 ++++-----------------------------
>  include/linux/crash_core.h |  3 ++
>  include/linux/kexec.h      |  2 --
>  kernel/crash_core.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/kexec_core.c        | 17 ---------
>  5 files changed, 96 insertions(+), 75 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index cedfe20..5f38942 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -486,59 +486,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>  # define CRASH_ADDR_HIGH_MAX	SZ_64T
>  #endif
>  
> -static int __init reserve_crashkernel_low(void)
> -{
> -#ifdef CONFIG_X86_64
> -	unsigned long long base, low_base = 0, low_size = 0;
> -	unsigned long total_low_mem;
> -	int ret;
> -
> -	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> -
> -	/* crashkernel=Y,low */
> -	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
> -	if (ret) {
> -		/*
> -		 * two parts from kernel/dma/swiotlb.c:
> -		 * -swiotlb size: user-specified with swiotlb= or default.
> -		 *
> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> -		 * to 8M for other buffers that may need to stay low too. Also
> -		 * make sure we allocate enough extra low memory so that we
> -		 * don't run out of DMA buffers for 32-bit devices.
> -		 */
> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> -	} else {
> -		/* passed with crashkernel=0,low ? */
> -		if (!low_size)
> -			return 0;
> -	}
> -
> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> -	if (!low_base) {
> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> -		       (unsigned long)(low_size >> 20));
> -		return -ENOMEM;
> -	}
> -
> -	ret = memblock_reserve(low_base, low_size);
> -	if (ret) {
> -		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
> -		return ret;
> -	}
> -
> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
> -		(unsigned long)(low_size >> 20),
> -		(unsigned long)(low_base >> 20),
> -		(unsigned long)(total_low_mem >> 20));
> -
> -	crashk_low_res.start = low_base;
> -	crashk_low_res.end   = low_base + low_size - 1;
> -	insert_resource(&iomem_resource, &crashk_low_res);
> -#endif
> -	return 0;
> -}
> -
>  static void __init reserve_crashkernel(void)
>  {
>  	unsigned long long crash_size, crash_base, total_mem;
> @@ -602,9 +549,12 @@ static void __init reserve_crashkernel(void)
>  		return;
>  	}
>  
> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> -		memblock_free(crash_base, crash_size);
> -		return;
> +	if (crash_base >= (1ULL << 32)) {
> +		if (reserve_crashkernel_low()) {
> +			memblock_free(crash_base, crash_size);
> +			return;
> +		}
> +		insert_resource(&iomem_resource, &crashk_low_res);

Some specific reason to move insert_resouce out of the
reserve_crashkernel_low function?

>  	}
>  
>  	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index 525510a..4df8c0b 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
>  extern unsigned char *vmcoreinfo_data;
>  extern size_t vmcoreinfo_size;
>  extern u32 *vmcoreinfo_note;
> +extern struct resource crashk_res;
> +extern struct resource crashk_low_res;
>  
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>  			  void *data, size_t data_len);
> @@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>  		unsigned long long *crash_size, unsigned long long *crash_base);
>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>  		unsigned long long *crash_size, unsigned long long *crash_base);
> +int __init reserve_crashkernel_low(void);
>  
>  #endif /* LINUX_CRASH_CORE_H */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 1776eb2..5d5d963 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -330,8 +330,6 @@ extern int kexec_load_disabled;
>  
>  /* Location of a reserved region to hold the crash kernel.
>   */
> -extern struct resource crashk_res;
> -extern struct resource crashk_low_res;
>  extern note_buf_t __percpu *crash_notes;
>  
>  /* flag to track if kexec reboot is in progress */
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 9f1557b..eb72fd6 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -7,6 +7,8 @@
>  #include <linux/crash_core.h>
>  #include <linux/utsname.h>
>  #include <linux/vmalloc.h>
> +#include <linux/memblock.h>
> +#include <linux/swiotlb.h>
>  
>  #include <asm/page.h>
>  #include <asm/sections.h>
> @@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
>  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>  static unsigned char *vmcoreinfo_data_safecopy;
>  
> +/* Location of the reserved area for the crash kernel */
> +struct resource crashk_res = {
> +	.name  = "Crash kernel",
> +	.start = 0,
> +	.end   = 0,
> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> +	.desc  = IORES_DESC_CRASH_KERNEL
> +};
> +struct resource crashk_low_res = {
> +	.name  = "Crash kernel",
> +	.start = 0,
> +	.end   = 0,
> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> +	.desc  = IORES_DESC_CRASH_KERNEL
> +};
> +
>  /*
>   * parsing the "crashkernel" commandline
>   *
> @@ -292,6 +310,75 @@ int __init parse_crashkernel_low(char *cmdline,
>  				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
>  }
>  
> +#if defined(CONFIG_X86_64)
> +#define CRASH_ALIGN		SZ_16M
> +#elif defined(CONFIG_ARM64)
> +#define CRASH_ALIGN		SZ_2M
> +#endif

I think no need to have the #ifdef, although I can not think out of
reason we have 16M for X86, maybe move it to 2M as well if no other
objections.  Then it will be easier to reserve crashkernel successfully
considering nowadays we have KASLR and other stuff it becomes harder.

> +
> +int __init reserve_crashkernel_low(void)
> +{
> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
> +	unsigned long long base, low_base = 0, low_size = 0;
> +	unsigned long total_low_mem;
> +	int ret;
> +
> +	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> +
> +	/* crashkernel=Y,low */
> +	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
> +			&base);
> +	if (ret) {
> +#ifdef CONFIG_X86_64
> +		/*
> +		 * two parts from lib/swiotlb.c:
> +		 * -swiotlb size: user-specified with swiotlb= or default.
> +		 *
> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> +		 * to 8M for other buffers that may need to stay low too. Also
> +		 * make sure we allocate enough extra low memory so that we
> +		 * don't run out of DMA buffers for 32-bit devices.
> +		 */
> +		low_size = max(swiotlb_size_or_default() + (8UL << 20),
> +				256UL << 20);
> +#else
> +		/*
> +		 * in arm64, reserve low memory if and only if crashkernel=X,low
> +		 * specified.
> +		 */
> +		return -EINVAL;
> +#endif

As said before, can you explore about why it needs different logic, it
would be good to keep two arches same.

> +	} else {
> +		/* passed with crashkernel=0,low ? */
> +		if (!low_size)
> +			return 0;
> +	}
> +
> +	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> +	if (!low_base) {
> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> +		       (unsigned long)(low_size >> 20));
> +		return -ENOMEM;
> +	}
> +
> +	ret = memblock_reserve(low_base, low_size);
> +	if (ret) {
> +		pr_err("%s: Error reserving crashkernel low memblock.\n",
> +				__func__);
> +		return ret;
> +	}
> +
> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
> +		(unsigned long)(low_size >> 20),
> +		(unsigned long)(low_base >> 20),
> +		(unsigned long)(total_low_mem >> 20));
> +
> +	crashk_low_res.start = low_base;
> +	crashk_low_res.end   = low_base + low_size - 1;
> +#endif
> +	return 0;
> +}
> +
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>  			  void *data, size_t data_len)
>  {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 15d70a9..458d093 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>  /* Flag to indicate we are going to kexec a new kernel */
>  bool kexec_in_progress = false;
>  
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> -	.name  = "Crash kernel",
> -	.start = 0,
> -	.end   = 0,
> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> -	.desc  = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> -	.name  = "Crash kernel",
> -	.start = 0,
> -	.end   = 0,
> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> -	.desc  = IORES_DESC_CRASH_KERNEL
> -};
> -
>  int kexec_should_crash(struct task_struct *p)
>  {
>  	/*
> -- 
> 2.7.4
> 

Thanks
Dave


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2019-12-27  5:54   ` Dave Young
@ 2019-12-27 11:04     ` Chen Zhou
  2019-12-28  9:32       ` Dave Young
  0 siblings, 1 reply; 30+ messages in thread
From: Chen Zhou @ 2019-12-27 11:04 UTC (permalink / raw)
  To: Dave Young
  Cc: kbuild test robot, horms, linux-doc, catalin.marinas, bhsharma,
	xiexiuqi, kexec, linux-kernel, mingo, james.morse, tglx, will,
	linux-arm-kernel

Hi Dave

On 2019/12/27 13:54, Dave Young wrote:
> Hi,
> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>> In preparation for supporting reserve_crashkernel_low in arm64 as
>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>
>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>> is specified. Different with x86_64, don't set low memory automatically.
> 
> Do you have any reason for the difference?  I'd expect we have same
> logic if possible and remove some of the ifdefs.

In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
to reserve low memory.

In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
and this needs extra considerations.

previous discusses:
	https://lkml.org/lkml/2019/6/5/670
	https://lkml.org/lkml/2019/6/13/229

> 
>>
>> Reported-by: kbuild test robot <lkp@intel.com>
>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>> ---
>>  arch/x86/kernel/setup.c    | 62 ++++-----------------------------
>>  include/linux/crash_core.h |  3 ++
>>  include/linux/kexec.h      |  2 --
>>  kernel/crash_core.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/kexec_core.c        | 17 ---------
>>  5 files changed, 96 insertions(+), 75 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index cedfe20..5f38942 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -486,59 +486,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>>  # define CRASH_ADDR_HIGH_MAX	SZ_64T
>>  #endif
>>  
>> -static int __init reserve_crashkernel_low(void)
>> -{
>> -#ifdef CONFIG_X86_64
>> -	unsigned long long base, low_base = 0, low_size = 0;
>> -	unsigned long total_low_mem;
>> -	int ret;
>> -
>> -	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>> -
>> -	/* crashkernel=Y,low */
>> -	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
>> -	if (ret) {
>> -		/*
>> -		 * two parts from kernel/dma/swiotlb.c:
>> -		 * -swiotlb size: user-specified with swiotlb= or default.
>> -		 *
>> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>> -		 * to 8M for other buffers that may need to stay low too. Also
>> -		 * make sure we allocate enough extra low memory so that we
>> -		 * don't run out of DMA buffers for 32-bit devices.
>> -		 */
>> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>> -	} else {
>> -		/* passed with crashkernel=0,low ? */
>> -		if (!low_size)
>> -			return 0;
>> -	}
>> -
>> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>> -	if (!low_base) {
>> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>> -		       (unsigned long)(low_size >> 20));
>> -		return -ENOMEM;
>> -	}
>> -
>> -	ret = memblock_reserve(low_base, low_size);
>> -	if (ret) {
>> -		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
>> -		return ret;
>> -	}
>> -
>> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
>> -		(unsigned long)(low_size >> 20),
>> -		(unsigned long)(low_base >> 20),
>> -		(unsigned long)(total_low_mem >> 20));
>> -
>> -	crashk_low_res.start = low_base;
>> -	crashk_low_res.end   = low_base + low_size - 1;
>> -	insert_resource(&iomem_resource, &crashk_low_res);
>> -#endif
>> -	return 0;
>> -}
>> -
>>  static void __init reserve_crashkernel(void)
>>  {
>>  	unsigned long long crash_size, crash_base, total_mem;
>> @@ -602,9 +549,12 @@ static void __init reserve_crashkernel(void)
>>  		return;
>>  	}
>>  
>> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>> -		memblock_free(crash_base, crash_size);
>> -		return;
>> +	if (crash_base >= (1ULL << 32)) {
>> +		if (reserve_crashkernel_low()) {
>> +			memblock_free(crash_base, crash_size);
>> +			return;
>> +		}
>> +		insert_resource(&iomem_resource, &crashk_low_res);
> 
> Some specific reason to move insert_resouce out of the
> reserve_crashkernel_low function?

No specific reason.
I just exposed arm64 "Crash kernel low" in request_standard_resources() as other resources,
so did this change.

> 
>>  	}
>>  
>>  	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
>> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
>> index 525510a..4df8c0b 100644
>> --- a/include/linux/crash_core.h
>> +++ b/include/linux/crash_core.h
>> @@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
>>  extern unsigned char *vmcoreinfo_data;
>>  extern size_t vmcoreinfo_size;
>>  extern u32 *vmcoreinfo_note;
>> +extern struct resource crashk_res;
>> +extern struct resource crashk_low_res;
>>  
>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>  			  void *data, size_t data_len);
>> @@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>>  		unsigned long long *crash_size, unsigned long long *crash_base);
>>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>>  		unsigned long long *crash_size, unsigned long long *crash_base);
>> +int __init reserve_crashkernel_low(void);
>>  
>>  #endif /* LINUX_CRASH_CORE_H */
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 1776eb2..5d5d963 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -330,8 +330,6 @@ extern int kexec_load_disabled;
>>  
>>  /* Location of a reserved region to hold the crash kernel.
>>   */
>> -extern struct resource crashk_res;
>> -extern struct resource crashk_low_res;
>>  extern note_buf_t __percpu *crash_notes;
>>  
>>  /* flag to track if kexec reboot is in progress */
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 9f1557b..eb72fd6 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -7,6 +7,8 @@
>>  #include <linux/crash_core.h>
>>  #include <linux/utsname.h>
>>  #include <linux/vmalloc.h>
>> +#include <linux/memblock.h>
>> +#include <linux/swiotlb.h>
>>  
>>  #include <asm/page.h>
>>  #include <asm/sections.h>
>> @@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
>>  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>>  static unsigned char *vmcoreinfo_data_safecopy;
>>  
>> +/* Location of the reserved area for the crash kernel */
>> +struct resource crashk_res = {
>> +	.name  = "Crash kernel",
>> +	.start = 0,
>> +	.end   = 0,
>> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> +	.desc  = IORES_DESC_CRASH_KERNEL
>> +};
>> +struct resource crashk_low_res = {
>> +	.name  = "Crash kernel",
>> +	.start = 0,
>> +	.end   = 0,
>> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> +	.desc  = IORES_DESC_CRASH_KERNEL
>> +};
>> +
>>  /*
>>   * parsing the "crashkernel" commandline
>>   *
>> @@ -292,6 +310,75 @@ int __init parse_crashkernel_low(char *cmdline,
>>  				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
>>  }
>>  
>> +#if defined(CONFIG_X86_64)
>> +#define CRASH_ALIGN		SZ_16M
>> +#elif defined(CONFIG_ARM64)
>> +#define CRASH_ALIGN		SZ_2M
>> +#endif
> 
> I think no need to have the #ifdef, although I can not think out of
> reason we have 16M for X86, maybe move it to 2M as well if no other
> objections.  Then it will be easier to reserve crashkernel successfully
> considering nowadays we have KASLR and other stuff it becomes harder.

I also don't figure out why it is 16M in x86.

> 
>> +
>> +int __init reserve_crashkernel_low(void)
>> +{
>> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
>> +	unsigned long long base, low_base = 0, low_size = 0;
>> +	unsigned long total_low_mem;
>> +	int ret;
>> +
>> +	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>> +
>> +	/* crashkernel=Y,low */
>> +	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
>> +			&base);
>> +	if (ret) {
>> +#ifdef CONFIG_X86_64
>> +		/*
>> +		 * two parts from lib/swiotlb.c:
>> +		 * -swiotlb size: user-specified with swiotlb= or default.
>> +		 *
>> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>> +		 * to 8M for other buffers that may need to stay low too. Also
>> +		 * make sure we allocate enough extra low memory so that we
>> +		 * don't run out of DMA buffers for 32-bit devices.
>> +		 */
>> +		low_size = max(swiotlb_size_or_default() + (8UL << 20),
>> +				256UL << 20);
>> +#else
>> +		/*
>> +		 * in arm64, reserve low memory if and only if crashkernel=X,low
>> +		 * specified.
>> +		 */
>> +		return -EINVAL;
>> +#endif
> 
> As said before, can you explore about why it needs different logic, it
> would be good to keep two arches same.
> 
>> +	} else {
>> +		/* passed with crashkernel=0,low ? */
>> +		if (!low_size)
>> +			return 0;
>> +	}
>> +
>> +	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>> +	if (!low_base) {
>> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>> +		       (unsigned long)(low_size >> 20));
>> +		return -ENOMEM;
>> +	}
>> +
>> +	ret = memblock_reserve(low_base, low_size);
>> +	if (ret) {
>> +		pr_err("%s: Error reserving crashkernel low memblock.\n",
>> +				__func__);
>> +		return ret;
>> +	}
>> +
>> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
>> +		(unsigned long)(low_size >> 20),
>> +		(unsigned long)(low_base >> 20),
>> +		(unsigned long)(total_low_mem >> 20));
>> +
>> +	crashk_low_res.start = low_base;
>> +	crashk_low_res.end   = low_base + low_size - 1;
>> +#endif
>> +	return 0;
>> +}
>> +
>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>  			  void *data, size_t data_len)
>>  {
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 15d70a9..458d093 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>>  /* Flag to indicate we are going to kexec a new kernel */
>>  bool kexec_in_progress = false;
>>  
>> -
>> -/* Location of the reserved area for the crash kernel */
>> -struct resource crashk_res = {
>> -	.name  = "Crash kernel",
>> -	.start = 0,
>> -	.end   = 0,
>> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> -	.desc  = IORES_DESC_CRASH_KERNEL
>> -};
>> -struct resource crashk_low_res = {
>> -	.name  = "Crash kernel",
>> -	.start = 0,
>> -	.end   = 0,
>> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> -	.desc  = IORES_DESC_CRASH_KERNEL
>> -};
>> -
>>  int kexec_should_crash(struct task_struct *p)
>>  {
>>  	/*
>> -- 
>> 2.7.4
>>
> 
> Thanks
> Dave
> 
> 
> .
> 
Thanks,
Chen Zhou


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2019-12-27 11:04     ` Chen Zhou
@ 2019-12-28  9:32       ` Dave Young
  2019-12-31  1:39         ` Chen Zhou
  2020-01-16 15:17         ` James Morse
  0 siblings, 2 replies; 30+ messages in thread
From: Dave Young @ 2019-12-28  9:32 UTC (permalink / raw)
  To: Chen Zhou
  Cc: kbuild test robot, horms, linux-doc, catalin.marinas, bhsharma,
	xiexiuqi, kexec, linux-kernel, mingo, james.morse, tglx, will,
	linux-arm-kernel

On 12/27/19 at 07:04pm, Chen Zhou wrote:
> Hi Dave
> 
> On 2019/12/27 13:54, Dave Young wrote:
> > Hi,
> > On 12/23/19 at 11:23pm, Chen Zhou wrote:
> >> In preparation for supporting reserve_crashkernel_low in arm64 as
> >> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
> >>
> >> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
> >> is specified. Different with x86_64, don't set low memory automatically.
> > 
> > Do you have any reason for the difference?  I'd expect we have same
> > logic if possible and remove some of the ifdefs.
> 
> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
> to reserve low memory.
> 
> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
> and this needs extra considerations.

Sorry that I did not read the old thread details and thought that is
arch dependent.  But rethink about that, it would be better that we can
have same semantic about crashkernel parameters across arches.  If we
make them different then it causes confusion, especially for
distributions.

OTOH, I thought if we reserve high memory then the low memory should be
needed.  There might be some exceptions, but I do not know the exact
one, can we make the behavior same, and special case those systems which
do not need low memory reservation.

> 
> previous discusses:
> 	https://lkml.org/lkml/2019/6/5/670
> 	https://lkml.org/lkml/2019/6/13/229

Another concern from James:
"
With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
"Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
find "Crash kernel", you are always going to get the kernel placed in the lower portion.
"

The kexec-tools code is iterating all "Crash kernel" ranges and add them
in an array.  In X86 code, it uses the higher range to locate memory.

> 
> > 
> >>
> >> Reported-by: kbuild test robot <lkp@intel.com>
> >> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> >> ---
> >>  arch/x86/kernel/setup.c    | 62 ++++-----------------------------
> >>  include/linux/crash_core.h |  3 ++
> >>  include/linux/kexec.h      |  2 --
> >>  kernel/crash_core.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++++
> >>  kernel/kexec_core.c        | 17 ---------
> >>  5 files changed, 96 insertions(+), 75 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> index cedfe20..5f38942 100644
> >> --- a/arch/x86/kernel/setup.c
> >> +++ b/arch/x86/kernel/setup.c
> >> @@ -486,59 +486,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
> >>  # define CRASH_ADDR_HIGH_MAX	SZ_64T
> >>  #endif
> >>  
> >> -static int __init reserve_crashkernel_low(void)
> >> -{
> >> -#ifdef CONFIG_X86_64
> >> -	unsigned long long base, low_base = 0, low_size = 0;
> >> -	unsigned long total_low_mem;
> >> -	int ret;
> >> -
> >> -	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> >> -
> >> -	/* crashkernel=Y,low */
> >> -	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
> >> -	if (ret) {
> >> -		/*
> >> -		 * two parts from kernel/dma/swiotlb.c:
> >> -		 * -swiotlb size: user-specified with swiotlb= or default.
> >> -		 *
> >> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> >> -		 * to 8M for other buffers that may need to stay low too. Also
> >> -		 * make sure we allocate enough extra low memory so that we
> >> -		 * don't run out of DMA buffers for 32-bit devices.
> >> -		 */
> >> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> >> -	} else {
> >> -		/* passed with crashkernel=0,low ? */
> >> -		if (!low_size)
> >> -			return 0;
> >> -	}
> >> -
> >> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> >> -	if (!low_base) {
> >> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> >> -		       (unsigned long)(low_size >> 20));
> >> -		return -ENOMEM;
> >> -	}
> >> -
> >> -	ret = memblock_reserve(low_base, low_size);
> >> -	if (ret) {
> >> -		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
> >> -		return ret;
> >> -	}
> >> -
> >> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
> >> -		(unsigned long)(low_size >> 20),
> >> -		(unsigned long)(low_base >> 20),
> >> -		(unsigned long)(total_low_mem >> 20));
> >> -
> >> -	crashk_low_res.start = low_base;
> >> -	crashk_low_res.end   = low_base + low_size - 1;
> >> -	insert_resource(&iomem_resource, &crashk_low_res);
> >> -#endif
> >> -	return 0;
> >> -}
> >> -
> >>  static void __init reserve_crashkernel(void)
> >>  {
> >>  	unsigned long long crash_size, crash_base, total_mem;
> >> @@ -602,9 +549,12 @@ static void __init reserve_crashkernel(void)
> >>  		return;
> >>  	}
> >>  
> >> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> >> -		memblock_free(crash_base, crash_size);
> >> -		return;
> >> +	if (crash_base >= (1ULL << 32)) {
> >> +		if (reserve_crashkernel_low()) {
> >> +			memblock_free(crash_base, crash_size);
> >> +			return;
> >> +		}
> >> +		insert_resource(&iomem_resource, &crashk_low_res);
> > 
> > Some specific reason to move insert_resouce out of the
> > reserve_crashkernel_low function?
> 
> No specific reason.
> I just exposed arm64 "Crash kernel low" in request_standard_resources() as other resources,
> so did this change.

Ok.

> 
> > 
> >>  	}
> >>  
> >>  	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> >> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> >> index 525510a..4df8c0b 100644
> >> --- a/include/linux/crash_core.h
> >> +++ b/include/linux/crash_core.h
> >> @@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
> >>  extern unsigned char *vmcoreinfo_data;
> >>  extern size_t vmcoreinfo_size;
> >>  extern u32 *vmcoreinfo_note;
> >> +extern struct resource crashk_res;
> >> +extern struct resource crashk_low_res;
> >>  
> >>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> >>  			  void *data, size_t data_len);
> >> @@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
> >>  		unsigned long long *crash_size, unsigned long long *crash_base);
> >>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
> >>  		unsigned long long *crash_size, unsigned long long *crash_base);
> >> +int __init reserve_crashkernel_low(void);
> >>  
> >>  #endif /* LINUX_CRASH_CORE_H */
> >> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> >> index 1776eb2..5d5d963 100644
> >> --- a/include/linux/kexec.h
> >> +++ b/include/linux/kexec.h
> >> @@ -330,8 +330,6 @@ extern int kexec_load_disabled;
> >>  
> >>  /* Location of a reserved region to hold the crash kernel.
> >>   */
> >> -extern struct resource crashk_res;
> >> -extern struct resource crashk_low_res;
> >>  extern note_buf_t __percpu *crash_notes;
> >>  
> >>  /* flag to track if kexec reboot is in progress */
> >> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> >> index 9f1557b..eb72fd6 100644
> >> --- a/kernel/crash_core.c
> >> +++ b/kernel/crash_core.c
> >> @@ -7,6 +7,8 @@
> >>  #include <linux/crash_core.h>
> >>  #include <linux/utsname.h>
> >>  #include <linux/vmalloc.h>
> >> +#include <linux/memblock.h>
> >> +#include <linux/swiotlb.h>
> >>  
> >>  #include <asm/page.h>
> >>  #include <asm/sections.h>
> >> @@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
> >>  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
> >>  static unsigned char *vmcoreinfo_data_safecopy;
> >>  
> >> +/* Location of the reserved area for the crash kernel */
> >> +struct resource crashk_res = {
> >> +	.name  = "Crash kernel",
> >> +	.start = 0,
> >> +	.end   = 0,
> >> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> >> +	.desc  = IORES_DESC_CRASH_KERNEL
> >> +};
> >> +struct resource crashk_low_res = {
> >> +	.name  = "Crash kernel",
> >> +	.start = 0,
> >> +	.end   = 0,
> >> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> >> +	.desc  = IORES_DESC_CRASH_KERNEL
> >> +};
> >> +
> >>  /*
> >>   * parsing the "crashkernel" commandline
> >>   *
> >> @@ -292,6 +310,75 @@ int __init parse_crashkernel_low(char *cmdline,
> >>  				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
> >>  }
> >>  
> >> +#if defined(CONFIG_X86_64)
> >> +#define CRASH_ALIGN		SZ_16M
> >> +#elif defined(CONFIG_ARM64)
> >> +#define CRASH_ALIGN		SZ_2M
> >> +#endif
> > 
> > I think no need to have the #ifdef, although I can not think out of
> > reason we have 16M for X86, maybe move it to 2M as well if no other
> > objections.  Then it will be easier to reserve crashkernel successfully
> > considering nowadays we have KASLR and other stuff it becomes harder.
> 
> I also don't figure out why it is 16M in x86.

IMHO, if we do not know why and in theory it should work with 2M, can
you do some basic testing and move it to 2M?

We can easily move back to 16M if someone really report something, but
if we do not change it will always stay there but we do not know why.

> 
> > 
> >> +
> >> +int __init reserve_crashkernel_low(void)
> >> +{
> >> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
> >> +	unsigned long long base, low_base = 0, low_size = 0;
> >> +	unsigned long total_low_mem;
> >> +	int ret;
> >> +
> >> +	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> >> +
> >> +	/* crashkernel=Y,low */
> >> +	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
> >> +			&base);
> >> +	if (ret) {
> >> +#ifdef CONFIG_X86_64
> >> +		/*
> >> +		 * two parts from lib/swiotlb.c:
> >> +		 * -swiotlb size: user-specified with swiotlb= or default.
> >> +		 *
> >> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> >> +		 * to 8M for other buffers that may need to stay low too. Also
> >> +		 * make sure we allocate enough extra low memory so that we
> >> +		 * don't run out of DMA buffers for 32-bit devices.
> >> +		 */
> >> +		low_size = max(swiotlb_size_or_default() + (8UL << 20),
> >> +				256UL << 20);
> >> +#else
> >> +		/*
> >> +		 * in arm64, reserve low memory if and only if crashkernel=X,low
> >> +		 * specified.
> >> +		 */
> >> +		return -EINVAL;
> >> +#endif
> > 
> > As said before, can you explore about why it needs different logic, it
> > would be good to keep two arches same.
> > 
> >> +	} else {
> >> +		/* passed with crashkernel=0,low ? */
> >> +		if (!low_size)
> >> +			return 0;
> >> +	}
> >> +
> >> +	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> >> +	if (!low_base) {
> >> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> >> +		       (unsigned long)(low_size >> 20));
> >> +		return -ENOMEM;
> >> +	}
> >> +
> >> +	ret = memblock_reserve(low_base, low_size);
> >> +	if (ret) {
> >> +		pr_err("%s: Error reserving crashkernel low memblock.\n",
> >> +				__func__);
> >> +		return ret;
> >> +	}
> >> +
> >> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
> >> +		(unsigned long)(low_size >> 20),
> >> +		(unsigned long)(low_base >> 20),
> >> +		(unsigned long)(total_low_mem >> 20));
> >> +
> >> +	crashk_low_res.start = low_base;
> >> +	crashk_low_res.end   = low_base + low_size - 1;
> >> +#endif
> >> +	return 0;
> >> +}
> >> +
> >>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> >>  			  void *data, size_t data_len)
> >>  {
> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> >> index 15d70a9..458d093 100644
> >> --- a/kernel/kexec_core.c
> >> +++ b/kernel/kexec_core.c
> >> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
> >>  /* Flag to indicate we are going to kexec a new kernel */
> >>  bool kexec_in_progress = false;
> >>  
> >> -
> >> -/* Location of the reserved area for the crash kernel */
> >> -struct resource crashk_res = {
> >> -	.name  = "Crash kernel",
> >> -	.start = 0,
> >> -	.end   = 0,
> >> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> >> -	.desc  = IORES_DESC_CRASH_KERNEL
> >> -};
> >> -struct resource crashk_low_res = {
> >> -	.name  = "Crash kernel",
> >> -	.start = 0,
> >> -	.end   = 0,
> >> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> >> -	.desc  = IORES_DESC_CRASH_KERNEL
> >> -};
> >> -
> >>  int kexec_should_crash(struct task_struct *p)
> >>  {
> >>  	/*
> >> -- 
> >> 2.7.4
> >>
> > 
> > Thanks
> > Dave
> > 
> > 
> > .
> > 
> Thanks,
> Chen Zhou
> 

Thanks
Dave


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2019-12-28  9:32       ` Dave Young
@ 2019-12-31  1:39         ` Chen Zhou
  2020-04-03  7:13           ` Chen Zhou
  2020-01-16 15:17         ` James Morse
  1 sibling, 1 reply; 30+ messages in thread
From: Chen Zhou @ 2019-12-31  1:39 UTC (permalink / raw)
  To: Dave Young
  Cc: kbuild test robot, horms, linux-doc, catalin.marinas, bhsharma,
	xiexiuqi, kexec, linux-kernel, mingo, james.morse, tglx, will,
	linux-arm-kernel

Hi Dave,

On 2019/12/28 17:32, Dave Young wrote:
> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>> Hi Dave
>>
>> On 2019/12/27 13:54, Dave Young wrote:
>>> Hi,
>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>
>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>
>>> Do you have any reason for the difference?  I'd expect we have same
>>> logic if possible and remove some of the ifdefs.
>>
>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>> to reserve low memory.
>>
>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>> and this needs extra considerations.
> 
> Sorry that I did not read the old thread details and thought that is
> arch dependent.  But rethink about that, it would be better that we can
> have same semantic about crashkernel parameters across arches.  If we
> make them different then it causes confusion, especially for
> distributions.
> 
> OTOH, I thought if we reserve high memory then the low memory should be
> needed.  There might be some exceptions, but I do not know the exact
> one, can we make the behavior same, and special case those systems which
> do not need low memory reservation.
> 
I thought like this and did implement with crashkernel parameters arch independent.
This is my v4: https://lkml.org/lkml/2019/5/6/1361, i implemented according to x86_64's
behavior.

>>
>> previous discusses:
>> 	https://lkml.org/lkml/2019/6/5/670
>> 	https://lkml.org/lkml/2019/6/13/229
> 
> Another concern from James:
> "
> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
> "
> 
> The kexec-tools code is iterating all "Crash kernel" ranges and add them
> in an array.  In X86 code, it uses the higher range to locate memory.

We also discussed about this: https://lkml.org/lkml/2019/6/13/227.
I guess James's opinion is that kexec-tools should take forward compatibility into account.
"But we can't rely on people updating user-space when they update the kernel!" -- James

> 
>>
>>>
>>>>
>>>> Reported-by: kbuild test robot <lkp@intel.com>
>>>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>>>> ---
>>>>  arch/x86/kernel/setup.c    | 62 ++++-----------------------------
>>>>  include/linux/crash_core.h |  3 ++
>>>>  include/linux/kexec.h      |  2 --
>>>>  kernel/crash_core.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>  kernel/kexec_core.c        | 17 ---------
>>>>  5 files changed, 96 insertions(+), 75 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>>> index cedfe20..5f38942 100644
>>>> --- a/arch/x86/kernel/setup.c
>>>> +++ b/arch/x86/kernel/setup.c
>>>> @@ -486,59 +486,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>>>>  # define CRASH_ADDR_HIGH_MAX	SZ_64T
>>>>  #endif
>>>>  
>>>> -static int __init reserve_crashkernel_low(void)
>>>> -{
>>>> -#ifdef CONFIG_X86_64
>>>> -	unsigned long long base, low_base = 0, low_size = 0;
>>>> -	unsigned long total_low_mem;
>>>> -	int ret;
>>>> -
>>>> -	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>>> -
>>>> -	/* crashkernel=Y,low */
>>>> -	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
>>>> -	if (ret) {
>>>> -		/*
>>>> -		 * two parts from kernel/dma/swiotlb.c:
>>>> -		 * -swiotlb size: user-specified with swiotlb= or default.
>>>> -		 *
>>>> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>>>> -		 * to 8M for other buffers that may need to stay low too. Also
>>>> -		 * make sure we allocate enough extra low memory so that we
>>>> -		 * don't run out of DMA buffers for 32-bit devices.
>>>> -		 */
>>>> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>>>> -	} else {
>>>> -		/* passed with crashkernel=0,low ? */
>>>> -		if (!low_size)
>>>> -			return 0;
>>>> -	}
>>>> -
>>>> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>>> -	if (!low_base) {
>>>> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>>> -		       (unsigned long)(low_size >> 20));
>>>> -		return -ENOMEM;
>>>> -	}
>>>> -
>>>> -	ret = memblock_reserve(low_base, low_size);
>>>> -	if (ret) {
>>>> -		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
>>>> -		return ret;
>>>> -	}
>>>> -
>>>> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
>>>> -		(unsigned long)(low_size >> 20),
>>>> -		(unsigned long)(low_base >> 20),
>>>> -		(unsigned long)(total_low_mem >> 20));
>>>> -
>>>> -	crashk_low_res.start = low_base;
>>>> -	crashk_low_res.end   = low_base + low_size - 1;
>>>> -	insert_resource(&iomem_resource, &crashk_low_res);
>>>> -#endif
>>>> -	return 0;
>>>> -}
>>>> -
>>>>  static void __init reserve_crashkernel(void)
>>>>  {
>>>>  	unsigned long long crash_size, crash_base, total_mem;
>>>> @@ -602,9 +549,12 @@ static void __init reserve_crashkernel(void)
>>>>  		return;
>>>>  	}
>>>>  
>>>> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>>>> -		memblock_free(crash_base, crash_size);
>>>> -		return;
>>>> +	if (crash_base >= (1ULL << 32)) {
>>>> +		if (reserve_crashkernel_low()) {
>>>> +			memblock_free(crash_base, crash_size);
>>>> +			return;
>>>> +		}
>>>> +		insert_resource(&iomem_resource, &crashk_low_res);
>>>
>>> Some specific reason to move insert_resouce out of the
>>> reserve_crashkernel_low function?
>>
>> No specific reason.
>> I just exposed arm64 "Crash kernel low" in request_standard_resources() as other resources,
>> so did this change.
> 
> Ok.
> 
>>
>>>
>>>>  	}
>>>>  
>>>>  	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
>>>> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
>>>> index 525510a..4df8c0b 100644
>>>> --- a/include/linux/crash_core.h
>>>> +++ b/include/linux/crash_core.h
>>>> @@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
>>>>  extern unsigned char *vmcoreinfo_data;
>>>>  extern size_t vmcoreinfo_size;
>>>>  extern u32 *vmcoreinfo_note;
>>>> +extern struct resource crashk_res;
>>>> +extern struct resource crashk_low_res;
>>>>  
>>>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>>>  			  void *data, size_t data_len);
>>>> @@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>>>>  		unsigned long long *crash_size, unsigned long long *crash_base);
>>>>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>>>>  		unsigned long long *crash_size, unsigned long long *crash_base);
>>>> +int __init reserve_crashkernel_low(void);
>>>>  
>>>>  #endif /* LINUX_CRASH_CORE_H */
>>>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>>>> index 1776eb2..5d5d963 100644
>>>> --- a/include/linux/kexec.h
>>>> +++ b/include/linux/kexec.h
>>>> @@ -330,8 +330,6 @@ extern int kexec_load_disabled;
>>>>  
>>>>  /* Location of a reserved region to hold the crash kernel.
>>>>   */
>>>> -extern struct resource crashk_res;
>>>> -extern struct resource crashk_low_res;
>>>>  extern note_buf_t __percpu *crash_notes;
>>>>  
>>>>  /* flag to track if kexec reboot is in progress */
>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>> index 9f1557b..eb72fd6 100644
>>>> --- a/kernel/crash_core.c
>>>> +++ b/kernel/crash_core.c
>>>> @@ -7,6 +7,8 @@
>>>>  #include <linux/crash_core.h>
>>>>  #include <linux/utsname.h>
>>>>  #include <linux/vmalloc.h>
>>>> +#include <linux/memblock.h>
>>>> +#include <linux/swiotlb.h>
>>>>  
>>>>  #include <asm/page.h>
>>>>  #include <asm/sections.h>
>>>> @@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
>>>>  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>>>>  static unsigned char *vmcoreinfo_data_safecopy;
>>>>  
>>>> +/* Location of the reserved area for the crash kernel */
>>>> +struct resource crashk_res = {
>>>> +	.name  = "Crash kernel",
>>>> +	.start = 0,
>>>> +	.end   = 0,
>>>> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>> +	.desc  = IORES_DESC_CRASH_KERNEL
>>>> +};
>>>> +struct resource crashk_low_res = {
>>>> +	.name  = "Crash kernel",
>>>> +	.start = 0,
>>>> +	.end   = 0,
>>>> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>> +	.desc  = IORES_DESC_CRASH_KERNEL
>>>> +};
>>>> +
>>>>  /*
>>>>   * parsing the "crashkernel" commandline
>>>>   *
>>>> @@ -292,6 +310,75 @@ int __init parse_crashkernel_low(char *cmdline,
>>>>  				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
>>>>  }
>>>>  
>>>> +#if defined(CONFIG_X86_64)
>>>> +#define CRASH_ALIGN		SZ_16M
>>>> +#elif defined(CONFIG_ARM64)
>>>> +#define CRASH_ALIGN		SZ_2M
>>>> +#endif
>>>
>>> I think no need to have the #ifdef, although I can not think out of
>>> reason we have 16M for X86, maybe move it to 2M as well if no other
>>> objections.  Then it will be easier to reserve crashkernel successfully
>>> considering nowadays we have KASLR and other stuff it becomes harder.
>>
>> I also don't figure out why it is 16M in x86.
> 
> IMHO, if we do not know why and in theory it should work with 2M, can
> you do some basic testing and move it to 2M?
> 
> We can easily move back to 16M if someone really report something, but
> if we do not change it will always stay there but we do not know why.

Ok. I will do some test later.

> 
>>
>>>
>>>> +
>>>> +int __init reserve_crashkernel_low(void)
>>>> +{
>>>> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
>>>> +	unsigned long long base, low_base = 0, low_size = 0;
>>>> +	unsigned long total_low_mem;
>>>> +	int ret;
>>>> +
>>>> +	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>>> +
>>>> +	/* crashkernel=Y,low */
>>>> +	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
>>>> +			&base);
>>>> +	if (ret) {
>>>> +#ifdef CONFIG_X86_64
>>>> +		/*
>>>> +		 * two parts from lib/swiotlb.c:
>>>> +		 * -swiotlb size: user-specified with swiotlb= or default.
>>>> +		 *
>>>> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>>>> +		 * to 8M for other buffers that may need to stay low too. Also
>>>> +		 * make sure we allocate enough extra low memory so that we
>>>> +		 * don't run out of DMA buffers for 32-bit devices.
>>>> +		 */
>>>> +		low_size = max(swiotlb_size_or_default() + (8UL << 20),
>>>> +				256UL << 20);
>>>> +#else
>>>> +		/*
>>>> +		 * in arm64, reserve low memory if and only if crashkernel=X,low
>>>> +		 * specified.
>>>> +		 */
>>>> +		return -EINVAL;
>>>> +#endif
>>>
>>> As said before, can you explore about why it needs different logic, it
>>> would be good to keep two arches same.
>>>
>>>> +	} else {
>>>> +		/* passed with crashkernel=0,low ? */
>>>> +		if (!low_size)
>>>> +			return 0;
>>>> +	}
>>>> +
>>>> +	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>>> +	if (!low_base) {
>>>> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>>> +		       (unsigned long)(low_size >> 20));
>>>> +		return -ENOMEM;
>>>> +	}
>>>> +
>>>> +	ret = memblock_reserve(low_base, low_size);
>>>> +	if (ret) {
>>>> +		pr_err("%s: Error reserving crashkernel low memblock.\n",
>>>> +				__func__);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
>>>> +		(unsigned long)(low_size >> 20),
>>>> +		(unsigned long)(low_base >> 20),
>>>> +		(unsigned long)(total_low_mem >> 20));
>>>> +
>>>> +	crashk_low_res.start = low_base;
>>>> +	crashk_low_res.end   = low_base + low_size - 1;
>>>> +#endif
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>>>  			  void *data, size_t data_len)
>>>>  {
>>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>>> index 15d70a9..458d093 100644
>>>> --- a/kernel/kexec_core.c
>>>> +++ b/kernel/kexec_core.c
>>>> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>>>>  /* Flag to indicate we are going to kexec a new kernel */
>>>>  bool kexec_in_progress = false;
>>>>  
>>>> -
>>>> -/* Location of the reserved area for the crash kernel */
>>>> -struct resource crashk_res = {
>>>> -	.name  = "Crash kernel",
>>>> -	.start = 0,
>>>> -	.end   = 0,
>>>> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>> -	.desc  = IORES_DESC_CRASH_KERNEL
>>>> -};
>>>> -struct resource crashk_low_res = {
>>>> -	.name  = "Crash kernel",
>>>> -	.start = 0,
>>>> -	.end   = 0,
>>>> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>> -	.desc  = IORES_DESC_CRASH_KERNEL
>>>> -};
>>>> -
>>>>  int kexec_should_crash(struct task_struct *p)
>>>>  {
>>>>  	/*
>>>> -- 
>>>> 2.7.4
>>>>
>>>
>>> Thanks
>>> Dave
>>>
>>>
>>> .
>>>
>> Thanks,
>> Chen Zhou
>>
> 
> Thanks
> Dave
> 
> 

Thanks,
Chen Zhou


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2019-12-28  9:32       ` Dave Young
  2019-12-31  1:39         ` Chen Zhou
@ 2020-01-16 15:17         ` James Morse
  2020-01-16 15:47           ` John Donnelly
  2020-01-17  3:58           ` Dave Young
  1 sibling, 2 replies; 30+ messages in thread
From: James Morse @ 2020-01-16 15:17 UTC (permalink / raw)
  To: Dave Young, Chen Zhou
  Cc: kbuild test robot, horms, linux-doc, catalin.marinas, bhsharma,
	xiexiuqi, kexec, linux-kernel, mingo, tglx, will,
	linux-arm-kernel

Hi guys,

On 28/12/2019 09:32, Dave Young wrote:
> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>> On 2019/12/27 13:54, Dave Young wrote:
>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>
>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>
>>> Do you have any reason for the difference?  I'd expect we have same
>>> logic if possible and remove some of the ifdefs.
>>
>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>> to reserve low memory.
>>
>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>> and this needs extra considerations.

> Sorry that I did not read the old thread details and thought that is
> arch dependent.  But rethink about that, it would be better that we can
> have same semantic about crashkernel parameters across arches.  If we
> make them different then it causes confusion, especially for
> distributions.

Surely distros also want one crashkernel* string they can use on all platforms without
having to detect the kernel version, platform or changeable memory layout...


> OTOH, I thought if we reserve high memory then the low memory should be
> needed.  There might be some exceptions, but I do not know the exact
> one,

> can we make the behavior same, and special case those systems which
> do not need low memory reservation.

Its tricky to work out which systems are the 'normal' ones.

We don't have a fixed memory layout for arm64. Some systems have no memory below 4G.
Others have no memory above 4G.

Chen Zhou's machine has some memory below 4G, but its too precious to reserve a large
chunk for kdump. Without any memory below 4G some of the drivers won't work.

I don't see what distros can set as their default for all platforms if high/low are
mutually exclusive with the 'crashkernel=' in use today. How did x86 navigate this, ... or
was it so long ago?

No one else has reported a problem with the existing placement logic, hence treating this
'low' thing as the 'in addition' special case.


>> previous discusses:
>> 	https://lkml.org/lkml/2019/6/5/670
>> 	https://lkml.org/lkml/2019/6/13/229
> 
> Another concern from James:
> "
> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
> "
> 
> The kexec-tools code is iterating all "Crash kernel" ranges and add them
> in an array.  In X86 code, it uses the higher range to locate memory.

Then my hurried reading of what the user-space code does was wrong!

If kexec-tools places the kernel in the low region, there may not be enough memory left
for whatever purpose it was reserved for. This was the motivation for giving it a
different name.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2020-01-16 15:17         ` James Morse
@ 2020-01-16 15:47           ` John Donnelly
  2020-02-24 15:25             ` John Donnelly
  2020-01-17  3:58           ` Dave Young
  1 sibling, 1 reply; 30+ messages in thread
From: John Donnelly @ 2020-01-16 15:47 UTC (permalink / raw)
  To: James Morse
  Cc: kbuild test robot, will, linux-doc, Chen Zhou, catalin.marinas,
	bhsharma, xiexiuqi, kexec, linux-kernel, horms, tglx, Dave Young,
	mingo, linux-arm-kernel



> On Jan 16, 2020, at 9:17 AM, James Morse <james.morse@arm.com> wrote:
> 
> Hi guys,
> 
> On 28/12/2019 09:32, Dave Young wrote:
>> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>>> On 2019/12/27 13:54, Dave Young wrote:
>>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>> 
>>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>> 
>>>> Do you have any reason for the difference?  I'd expect we have same
>>>> logic if possible and remove some of the ifdefs.
>>> 
>>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>>> to reserve low memory.
>>> 
>>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>>> and this needs extra considerations.
> 
>> Sorry that I did not read the old thread details and thought that is
>> arch dependent.  But rethink about that, it would be better that we can
>> have same semantic about crashkernel parameters across arches.  If we
>> make them different then it causes confusion, especially for
>> distributions.
> 
> Surely distros also want one crashkernel* string they can use on all platforms without
> having to detect the kernel version, platform or changeable memory layout...
> 
> 
>> OTOH, I thought if we reserve high memory then the low memory should be
>> needed.  There might be some exceptions, but I do not know the exact
>> one,
> 
>> can we make the behavior same, and special case those systems which
>> do not need low memory reservation.
> 
> Its tricky to work out which systems are the 'normal' ones.
> 
> We don't have a fixed memory layout for arm64. Some systems have no memory below 4G.
> Others have no memory above 4G.
> 
> Chen Zhou's machine has some memory below 4G, but its too precious to reserve a large
> chunk for kdump. Without any memory below 4G some of the drivers won't work.
> 
> I don't see what distros can set as their default for all platforms if high/low are
> mutually exclusive with the 'crashkernel=' in use today. How did x86 navigate this, ... or
> was it so long ago?
> 
> No one else has reported a problem with the existing placement logic, hence treating this
> 'low' thing as the 'in addition' special case.


Hi,

I am seeing similar  Arm crash dump issues  on  5.4 kernels  where we need  rather large amount of crashkernel memory reserved that is not available below 4GB ( The maximum reserved size appears to be around 768M ) . When I pick memory range higher than 4GB , I see  adapters that fail to initialize :


There is no low-memory  <4G  memory for DMA ;     

[   11.506792] kworker/0:14: page allocation failure: order:0, 
mode:0x104(GFP_DMA32|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 
[   11.518793] CPU: 0 PID: 150 Comm: kworker/0:14 Not tainted 
5.4.0-1948.3.el8uek.aarch64 #1 
[   11.526955] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 
0ACKL025 01/18/2019 
[   11.534948] Workqueue: events work_for_cpu_fn 
[   11.539291] Call trace: 
[   11.541727]  dump_backtrace+0x0/0x18c 
[   11.545376]  show_stack+0x24/0x30 
[   11.548679]  dump_stack+0xbc/0xe0 
[   11.551982]  warn_alloc+0xf0/0x15c 
[   11.555370]  __alloc_pages_slowpath+0xb4c/0xb84 
[   11.559887]  __alloc_pages_nodemask+0x2d0/0x330 
[   11.564405]  alloc_pages_current+0x8c/0xf8 
[   11.568496]  ttm_bo_device_init+0x188/0x220 [ttm] 
[   11.573187]  drm_vram_mm_init+0x58/0x80 [drm_vram_helper] 
[   11.578572]  drm_vram_helper_alloc_mm+0x64/0xb0 [drm_vram_helper] 
[   11.584655]  ast_mm_init+0x38/0x80 [ast] 
[   11.588566]  ast_driver_load+0x474/0xa70 [ast] 
[   11.593029]  drm_dev_register+0x144/0x1c8 [drm] 
[   11.597573]  drm_get_pci_dev+0xa4/0x168 [drm] 
[   11.601919]  ast_pci_probe+0x8c/0x9c [ast] 
[   11.606004]  local_pci_probe+0x44/0x98 
[   11.609739]  work_for_cpu_fn+0x20/0x30 
[   11.613474]  process_one_work+0x1c4/0x41c 
[   11.617470]  worker_thread+0x150/0x4b0 
[   11.621206]  kthread+0x110/0x114 
[   11.624422]  ret_from_fork+0x10/0x18 

This failure is related to a graphics adapter. 

The more complex kdump configurations that use networking stack to NFS mount a filesystem to dump to , or use ssh to copy to another machine,  require more crashkernel memory reservations than perhaps the “default*” settings of  a minimal kdump that creates a minimal  vmcore to local storage in  /var/crash. If crashkernel is too small I get Out of Memory issues and the entire vmcore  process fails. 

( *default kdump setting I assume are a minimal vmcore to /var/crash using primary boot device where /root is located  ) 




> 
> 
>>> previous discusses:
>>> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_6_5_670&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=gS9BLOkmj78lP5L7SP6_VLHwvP249uWKaE2R7N7sxgM&e= 
>>> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_6_13_229&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=U1Nis29n3A7XSBzED53fiE4MDAv5NlxYp1UorvvBOOw&e= 
>> 
>> Another concern from James:
>> "
>> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
>> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
>> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
>> "
>> 
>> The kexec-tools code is iterating all "Crash kernel" ranges and add them
>> in an array.  In X86 code, it uses the higher range to locate memory.
> 
> Then my hurried reading of what the user-space code does was wrong!
> 
> If kexec-tools places the kernel in the low region, there may not be enough memory left
> for whatever purpose it was reserved for. This was the motivation for giving it a
> different name.
> 
> 
> Thanks,
> 
> James
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_kexec&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=bqp02iQDP_Ez-XvLIvj-IPHqbbZwMPlDgmEcG8vhXFE&e= 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2020-01-16 15:17         ` James Morse
  2020-01-16 15:47           ` John Donnelly
@ 2020-01-17  3:58           ` Dave Young
  2020-04-03  7:29             ` Chen Zhou
  1 sibling, 1 reply; 30+ messages in thread
From: Dave Young @ 2020-01-17  3:58 UTC (permalink / raw)
  To: James Morse
  Cc: kbuild test robot, horms, linux-doc, Chen Zhou, catalin.marinas,
	bhsharma, xiexiuqi, kexec, linux-kernel, mingo, tglx, will,
	linux-arm-kernel

On 01/16/20 at 03:17pm, James Morse wrote:
> Hi guys,
> 
> On 28/12/2019 09:32, Dave Young wrote:
> > On 12/27/19 at 07:04pm, Chen Zhou wrote:
> >> On 2019/12/27 13:54, Dave Young wrote:
> >>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
> >>>> In preparation for supporting reserve_crashkernel_low in arm64 as
> >>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
> >>>>
> >>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
> >>>> is specified. Different with x86_64, don't set low memory automatically.
> >>>
> >>> Do you have any reason for the difference?  I'd expect we have same
> >>> logic if possible and remove some of the ifdefs.
> >>
> >> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
> >> to reserve low memory.
> >>
> >> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
> >> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
> >> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
> >> and this needs extra considerations.
> 
> > Sorry that I did not read the old thread details and thought that is
> > arch dependent.  But rethink about that, it would be better that we can
> > have same semantic about crashkernel parameters across arches.  If we
> > make them different then it causes confusion, especially for
> > distributions.
> 
> Surely distros also want one crashkernel* string they can use on all platforms without
> having to detect the kernel version, platform or changeable memory layout...
> 
> 
> > OTOH, I thought if we reserve high memory then the low memory should be
> > needed.  There might be some exceptions, but I do not know the exact
> > one,
> 
> > can we make the behavior same, and special case those systems which
> > do not need low memory reservation.
> 
> Its tricky to work out which systems are the 'normal' ones.
> 
> We don't have a fixed memory layout for arm64. Some systems have no memory below 4G.
> Others have no memory above 4G.
> 
> Chen Zhou's machine has some memory below 4G, but its too precious to reserve a large
> chunk for kdump. Without any memory below 4G some of the drivers won't work.
> 
> I don't see what distros can set as their default for all platforms if high/low are
> mutually exclusive with the 'crashkernel=' in use today. How did x86 navigate this, ... or
> was it so long ago?

It is very rare for such machine without any low memory in X86, at least
from what I know,  so the current way just works fine.

Since arm64 is quite different, I would agree with current way
proposed in the patch, but a question is, for those arm64 systems how can
admin know if low crashkernel memory is needed or not?  and just skip the
low reservation for machine with high memory installed only?

> 
> No one else has reported a problem with the existing placement logic, hence treating this
> 'low' thing as the 'in addition' special case.
> 
> 
> >> previous discusses:
> >> 	https://lkml.org/lkml/2019/6/5/670
> >> 	https://lkml.org/lkml/2019/6/13/229
> > 
> > Another concern from James:
> > "
> > With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
> > "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
> > find "Crash kernel", you are always going to get the kernel placed in the lower portion.
> > "
> > 
> > The kexec-tools code is iterating all "Crash kernel" ranges and add them
> > in an array.  In X86 code, it uses the higher range to locate memory.
> 
> Then my hurried reading of what the user-space code does was wrong!
> 
> If kexec-tools places the kernel in the low region, there may not be enough memory left
> for whatever purpose it was reserved for. This was the motivation for giving it a
> different name.

Agreed,  it is still a potential problem though.  Say we have both low
and high reserved.  Kdump kernel boots up, the kernel and drivers,
applications will use memory, I'm not sure if there is a memory
allocation policy to let them all use high mem first..  Anyway that is
beyond the kexec-tools and resource name.

Thanks
Dave


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2020-01-16 15:47           ` John Donnelly
@ 2020-02-24 15:25             ` John Donnelly
  2020-03-02  1:29               ` Chen Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: John Donnelly @ 2020-02-24 15:25 UTC (permalink / raw)
  To: James Morse
  Cc: kbuild test robot, linux-doc, Chen Zhou, catalin.marinas,
	bhsharma, xiexiuqi, kexec, linux-kernel, Dave Young, horms, tglx,
	will, mingo, linux-arm-kernel



> On Jan 16, 2020, at 9:47 AM, John Donnelly <john.p.donnelly@oracle.com> wrote:
> 
> 
> 
>> On Jan 16, 2020, at 9:17 AM, James Morse <james.morse@arm.com> wrote:
>> 
>> Hi guys,
>> 
>> On 28/12/2019 09:32, Dave Young wrote:
>>> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>>>> On 2019/12/27 13:54, Dave Young wrote:
>>>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>>> 
>>>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>>> 
>>>>> Do you have any reason for the difference?  I'd expect we have same
>>>>> logic if possible and remove some of the ifdefs.
>>>> 
>>>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>>>> to reserve low memory.
>>>> 
>>>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>>>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>>>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>>>> and this needs extra considerations.
>> 
>>> Sorry that I did not read the old thread details and thought that is
>>> arch dependent.  But rethink about that, it would be better that we can
>>> have same semantic about crashkernel parameters across arches.  If we
>>> make them different then it causes confusion, especially for
>>> distributions.
>> 
>> Surely distros also want one crashkernel* string they can use on all platforms without
>> having to detect the kernel version, platform or changeable memory layout...
>> 
>> 
>>> OTOH, I thought if we reserve high memory then the low memory should be
>>> needed.  There might be some exceptions, but I do not know the exact
>>> one,
>> 
>>> can we make the behavior same, and special case those systems which
>>> do not need low memory reservation.
>> 
>> Its tricky to work out which systems are the 'normal' ones.
>> 
>> We don't have a fixed memory layout for arm64. Some systems have no memory below 4G.
>> Others have no memory above 4G.
>> 
>> Chen Zhou's machine has some memory below 4G, but its too precious to reserve a large
>> chunk for kdump. Without any memory below 4G some of the drivers won't work.
>> 
>> I don't see what distros can set as their default for all platforms if high/low are
>> mutually exclusive with the 'crashkernel=' in use today. How did x86 navigate this, ... or
>> was it so long ago?
>> 
>> No one else has reported a problem with the existing placement logic, hence treating this
>> 'low' thing as the 'in addition' special case.
> 
> 
> Hi,
> 
> I am seeing similar  Arm crash dump issues  on  5.4 kernels  where we need  rather large amount of crashkernel memory reserved that is not available below 4GB ( The maximum reserved size appears to be around 768M ) . When I pick memory range higher than 4GB , I see  adapters that fail to initialize :
> 
> 
> There is no low-memory  <4G  memory for DMA ;     
> 
> [   11.506792] kworker/0:14: page allocation failure: order:0, 
> mode:0x104(GFP_DMA32|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 
> [   11.518793] CPU: 0 PID: 150 Comm: kworker/0:14 Not tainted 
> 5.4.0-1948.3.el8uek.aarch64 #1 
> [   11.526955] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 
> 0ACKL025 01/18/2019 
> [   11.534948] Workqueue: events work_for_cpu_fn 
> [   11.539291] Call trace: 
> [   11.541727]  dump_backtrace+0x0/0x18c 
> [   11.545376]  show_stack+0x24/0x30 
> [   11.548679]  dump_stack+0xbc/0xe0 
> [   11.551982]  warn_alloc+0xf0/0x15c 
> [   11.555370]  __alloc_pages_slowpath+0xb4c/0xb84 
> [   11.559887]  __alloc_pages_nodemask+0x2d0/0x330 
> [   11.564405]  alloc_pages_current+0x8c/0xf8 
> [   11.568496]  ttm_bo_device_init+0x188/0x220 [ttm] 
> [   11.573187]  drm_vram_mm_init+0x58/0x80 [drm_vram_helper] 
> [   11.578572]  drm_vram_helper_alloc_mm+0x64/0xb0 [drm_vram_helper] 
> [   11.584655]  ast_mm_init+0x38/0x80 [ast] 
> [   11.588566]  ast_driver_load+0x474/0xa70 [ast] 
> [   11.593029]  drm_dev_register+0x144/0x1c8 [drm] 
> [   11.597573]  drm_get_pci_dev+0xa4/0x168 [drm] 
> [   11.601919]  ast_pci_probe+0x8c/0x9c [ast] 
> [   11.606004]  local_pci_probe+0x44/0x98 
> [   11.609739]  work_for_cpu_fn+0x20/0x30 
> [   11.613474]  process_one_work+0x1c4/0x41c 
> [   11.617470]  worker_thread+0x150/0x4b0 
> [   11.621206]  kthread+0x110/0x114 
> [   11.624422]  ret_from_fork+0x10/0x18 
> 
> This failure is related to a graphics adapter. 
> 
> The more complex kdump configurations that use networking stack to NFS mount a filesystem to dump to , or use ssh to copy to another machine,  require more crashkernel memory reservations than perhaps the “default*” settings of  a minimal kdump that creates a minimal  vmcore to local storage in  /var/crash. If crashkernel is too small I get Out of Memory issues and the entire vmcore  process fails. 
> 
> ( *default kdump setting I assume are a minimal vmcore to /var/crash using primary boot device where /root is located  ) 
> 
Hi Chen,


I was able to unit test these series of kernel  patches  applied to a 5.4.17 test kernel  along with the kexec CLI  change :

0001-arm64-kdump-add-another-DT-property-to-crash-dump-ke.patch

Applied to :

kexec-tools-2.0.19-12.0.4.el8.src.rpm

And obtained a vmcore using this cmdline :

BOOT_IMAGE=(hd6,gpt2)/vmlinuz-5.4.17-4-uek6m_ol8-jpdonnel+ root=/dev/mapper/ol01-root ro crashkernel=2048M@35G crashkernel=250M,low rd.lvm.lv=ol01/root rd.lvm.lv=ol01/swap console=ttyS4 loglevel=7

Can you add :

Tested-by: John Donnelly <John.p.donnelly@oracle.com>


How can we  get these changes included into an rc kernel release  ?

Thanks,

John.


> 
> 
> 
>> 
>> 
>>>> previous discusses:
>>>> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_6_5_670&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=gS9BLOkmj78lP5L7SP6_VLHwvP249uWKaE2R7N7sxgM&e= 
>>>> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_6_13_229&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=U1Nis29n3A7XSBzED53fiE4MDAv5NlxYp1UorvvBOOw&e= 
>>> 
>>> Another concern from James:
>>> "
>>> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
>>> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
>>> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
>>> "
>>> 
>>> The kexec-tools code is iterating all "Crash kernel" ranges and add them
>>> in an array.  In X86 code, it uses the higher range to locate memory.
>> 
>> Then my hurried reading of what the user-space code does was wrong!
>> 
>> If kexec-tools places the kernel in the low region, there may not be enough memory left
>> for whatever purpose it was reserved for. This was the motivation for giving it a
>> different name.
>> 
>> 
>> Thanks,
>> 
>> James
>> 
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_kexec&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=bqp02iQDP_Ez-XvLIvj-IPHqbbZwMPlDgmEcG8vhXFE&e= 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_kexec&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=whm9_BOrgAjJvBn0Ey_brHhFg2YMU_P0HF02dhgdgwU&s=vLar_m5JbicYwwuo6N84ZiBDGZUPM8bBLSPLQBtPZNY&e= 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2020-02-24 15:25             ` John Donnelly
@ 2020-03-02  1:29               ` Chen Zhou
  0 siblings, 0 replies; 30+ messages in thread
From: Chen Zhou @ 2020-03-02  1:29 UTC (permalink / raw)
  To: John Donnelly, James Morse
  Cc: kbuild test robot, xiexiuqi, catalin.marinas, bhsharma,
	linux-doc, kexec, linux-kernel, Dave Young, horms, tglx, will,
	mingo, linux-arm-kernel



On 2020/2/24 23:25, John Donnelly wrote:
> 
> 
>> On Jan 16, 2020, at 9:47 AM, John Donnelly <john.p.donnelly@oracle.com> wrote:
>>
>>
>>
>>> On Jan 16, 2020, at 9:17 AM, James Morse <james.morse@arm.com> wrote:
>>>
>>> Hi guys,
>>>
>>> On 28/12/2019 09:32, Dave Young wrote:
>>>> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>>>>> On 2019/12/27 13:54, Dave Young wrote:
>>>>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>>>>
>>>>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>>>>
>>>>>> Do you have any reason for the difference?  I'd expect we have same
>>>>>> logic if possible and remove some of the ifdefs.
>>>>>
>>>>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>>>>> to reserve low memory.
>>>>>
>>>>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>>>>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>>>>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>>>>> and this needs extra considerations.
>>>
>>>> Sorry that I did not read the old thread details and thought that is
>>>> arch dependent.  But rethink about that, it would be better that we can
>>>> have same semantic about crashkernel parameters across arches.  If we
>>>> make them different then it causes confusion, especially for
>>>> distributions.
>>>
>>> Surely distros also want one crashkernel* string they can use on all platforms without
>>> having to detect the kernel version, platform or changeable memory layout...
>>>
>>>
>>>> OTOH, I thought if we reserve high memory then the low memory should be
>>>> needed.  There might be some exceptions, but I do not know the exact
>>>> one,
>>>
>>>> can we make the behavior same, and special case those systems which
>>>> do not need low memory reservation.
>>>
>>> Its tricky to work out which systems are the 'normal' ones.
>>>
>>> We don't have a fixed memory layout for arm64. Some systems have no memory below 4G.
>>> Others have no memory above 4G.
>>>
>>> Chen Zhou's machine has some memory below 4G, but its too precious to reserve a large
>>> chunk for kdump. Without any memory below 4G some of the drivers won't work.
>>>
>>> I don't see what distros can set as their default for all platforms if high/low are
>>> mutually exclusive with the 'crashkernel=' in use today. How did x86 navigate this, ... or
>>> was it so long ago?
>>>
>>> No one else has reported a problem with the existing placement logic, hence treating this
>>> 'low' thing as the 'in addition' special case.
>>
>>
>> Hi,
>>
>> I am seeing similar  Arm crash dump issues  on  5.4 kernels  where we need  rather large amount of crashkernel memory reserved that is not available below 4GB ( The maximum reserved size appears to be around 768M ) . When I pick memory range higher than 4GB , I see  adapters that fail to initialize :
>>
>>
>> There is no low-memory  <4G  memory for DMA ;     
>>
>> [   11.506792] kworker/0:14: page allocation failure: order:0, 
>> mode:0x104(GFP_DMA32|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 
>> [   11.518793] CPU: 0 PID: 150 Comm: kworker/0:14 Not tainted 
>> 5.4.0-1948.3.el8uek.aarch64 #1 
>> [   11.526955] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 
>> 0ACKL025 01/18/2019 
>> [   11.534948] Workqueue: events work_for_cpu_fn 
>> [   11.539291] Call trace: 
>> [   11.541727]  dump_backtrace+0x0/0x18c 
>> [   11.545376]  show_stack+0x24/0x30 
>> [   11.548679]  dump_stack+0xbc/0xe0 
>> [   11.551982]  warn_alloc+0xf0/0x15c 
>> [   11.555370]  __alloc_pages_slowpath+0xb4c/0xb84 
>> [   11.559887]  __alloc_pages_nodemask+0x2d0/0x330 
>> [   11.564405]  alloc_pages_current+0x8c/0xf8 
>> [   11.568496]  ttm_bo_device_init+0x188/0x220 [ttm] 
>> [   11.573187]  drm_vram_mm_init+0x58/0x80 [drm_vram_helper] 
>> [   11.578572]  drm_vram_helper_alloc_mm+0x64/0xb0 [drm_vram_helper] 
>> [   11.584655]  ast_mm_init+0x38/0x80 [ast] 
>> [   11.588566]  ast_driver_load+0x474/0xa70 [ast] 
>> [   11.593029]  drm_dev_register+0x144/0x1c8 [drm] 
>> [   11.597573]  drm_get_pci_dev+0xa4/0x168 [drm] 
>> [   11.601919]  ast_pci_probe+0x8c/0x9c [ast] 
>> [   11.606004]  local_pci_probe+0x44/0x98 
>> [   11.609739]  work_for_cpu_fn+0x20/0x30 
>> [   11.613474]  process_one_work+0x1c4/0x41c 
>> [   11.617470]  worker_thread+0x150/0x4b0 
>> [   11.621206]  kthread+0x110/0x114 
>> [   11.624422]  ret_from_fork+0x10/0x18 
>>
>> This failure is related to a graphics adapter. 
>>
>> The more complex kdump configurations that use networking stack to NFS mount a filesystem to dump to , or use ssh to copy to another machine,  require more crashkernel memory reservations than perhaps the “default*” settings of  a minimal kdump that creates a minimal  vmcore to local storage in  /var/crash. If crashkernel is too small I get Out of Memory issues and the entire vmcore  process fails. 
>>
>> ( *default kdump setting I assume are a minimal vmcore to /var/crash using primary boot device where /root is located  ) 
>>
> Hi Chen,
> 
> 
> I was able to unit test these series of kernel  patches  applied to a 5.4.17 test kernel  along with the kexec CLI  change :
> 
> 0001-arm64-kdump-add-another-DT-property-to-crash-dump-ke.patch
> 
> Applied to :
> 
> kexec-tools-2.0.19-12.0.4.el8.src.rpm
> 
> And obtained a vmcore using this cmdline :
> 
> BOOT_IMAGE=(hd6,gpt2)/vmlinuz-5.4.17-4-uek6m_ol8-jpdonnel+ root=/dev/mapper/ol01-root ro crashkernel=2048M@35G crashkernel=250M,low rd.lvm.lv=ol01/root rd.lvm.lv=ol01/swap console=ttyS4 loglevel=7
> 
> Can you add :
> 
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> 
> 
> How can we  get these changes included into an rc kernel release  ?
> 
> Thanks,
> 
> John.

Hi all,

Friendly ping...

> 
> 
>>
>>
>>
>>>
>>>
>>>>> previous discusses:
>>>>> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_6_5_670&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=gS9BLOkmj78lP5L7SP6_VLHwvP249uWKaE2R7N7sxgM&e= 
>>>>> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_6_13_229&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=U1Nis29n3A7XSBzED53fiE4MDAv5NlxYp1UorvvBOOw&e= 
>>>>
>>>> Another concern from James:
>>>> "
>>>> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
>>>> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
>>>> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
>>>> "
>>>>
>>>> The kexec-tools code is iterating all "Crash kernel" ranges and add them
>>>> in an array.  In X86 code, it uses the higher range to locate memory.
>>>
>>> Then my hurried reading of what the user-space code does was wrong!
>>>
>>> If kexec-tools places the kernel in the low region, there may not be enough memory left
>>> for whatever purpose it was reserved for. This was the motivation for giving it a
>>> different name.
>>>
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> _______________________________________________
>>> kexec mailing list
>>> kexec@lists.infradead.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_kexec&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=bqp02iQDP_Ez-XvLIvj-IPHqbbZwMPlDgmEcG8vhXFE&e= 
>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_kexec&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=whm9_BOrgAjJvBn0Ey_brHhFg2YMU_P0HF02dhgdgwU&s=vLar_m5JbicYwwuo6N84ZiBDGZUPM8bBLSPLQBtPZNY&e= 
> 
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2019-12-23 15:23 ` [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel Chen Zhou
@ 2020-03-05 10:13   ` Prabhakar Kushwaha
  2020-03-07 11:06     ` Chen Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Prabhakar Kushwaha @ 2020-03-05 10:13 UTC (permalink / raw)
  To: Chen Zhou
  Cc: horms, Ganapatrao Prabhakerrao Kulkarni, Will Deacon, xiexiuqi,
	Catalin Marinas, Bhupesh Sharma, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, mingo,
	James Morse, Thomas Gleixner, dyoung, linux-arm-kernel

On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>
> Crashkernel=X tries to reserve memory for the crash dump kernel under
> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
> size low memory for crash kdump kernel devices firstly and then reserve
> memory above 4G.
>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> ---
>  arch/arm64/kernel/setup.c |  8 +++++++-
>  arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
>  2 files changed, 36 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 56f6645..04d1c87 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
>                     kernel_data.end <= res->end)
>                         request_resource(res, &kernel_data);
>  #ifdef CONFIG_KEXEC_CORE
> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
> +               /*
> +                * Userspace will find "Crash kernel" region in /proc/iomem.
> +                * Note: the low region is renamed as Crash kernel (low).
> +                */
> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> +                               crashk_low_res.end <= res->end)
> +                       request_resource(res, &crashk_low_res);
>                 if (crashk_res.end && crashk_res.start >= res->start &&
>                     crashk_res.end <= res->end)
>                         request_resource(res, &crashk_res);
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index b65dffd..0d7afd5 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
>  {
>         unsigned long long crash_base, crash_size;
>         int ret;
> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
>
>         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>                                 &crash_size, &crash_base);
> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
>         if (ret || !crash_size)
>                 return;
>
> +       ret = reserve_crashkernel_low();
> +       if (!ret && crashk_low_res.end) {
> +               /*
> +                * If crashkernel=X,low specified, there may be two regions,
> +                * we need to make some changes as follows:
> +                *
> +                * 1. rename the low region as "Crash kernel (low)"
> +                * In order to distinct from the high region and make no effect
> +                * to the use of existing kexec-tools, rename the low region as
> +                * "Crash kernel (low)".
> +                *
> +                * 2. change the upper bound for crash memory
> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
> +                *
> +                * 3. mark the low region as "nomap"
> +                * The low region is intended to be used for crash dump kernel
> +                * devices, just mark the low region as "nomap" simply.
> +                */
> +               const char *rename = "Crash kernel (low)";
> +
> +               crashk_low_res.name = rename;
> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
> +               memblock_mark_nomap(crashk_low_res.start,
> +                                   resource_size(&crashk_low_res));
> +       }
> +
>         crash_size = PAGE_ALIGN(crash_size);
>
>         if (crash_base == 0) {
>                 /* Current arm64 boot protocol requires 2MB alignment */
> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
> -                               crash_size, SZ_2M);
> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
> +                               SZ_2M);
>                 if (crash_base == 0) {
>                         pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>                                 crash_size);
> --

I tested this patch series on ARM64-ThunderX2 with no issue with
bootargs crashkenel=X@Y crashkernel=250M,low

$ dmesg | grep crash
[    0.000000] crashkernel reserved: 0x0000000b81200000 -
0x0000000c81200000 (4096 MB)
[    0.000000] Kernel command line:
BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
[   29.310209]     crashkernel=250M,low

$  kexec -p -i /boot/vmlinuz-`uname -r`
--initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
$ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger

But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
Kernel is not able to allocate memory.
[    0.000000] cannot allocate crashkernel (size:0x100000000)
[    0.000000] Kernel command line:
BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
crashkernel=250M,low nowatchdog
[   29.332081]     crashkernel=250M,low

does crashkernel=X@Y mandatory to get allocated beyond 4G?
am I missing something?

--pk

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-03-05 10:13   ` Prabhakar Kushwaha
@ 2020-03-07 11:06     ` Chen Zhou
  2020-03-07 18:43       ` John Donnelly
  2020-03-09  4:48       ` Prabhakar Kushwaha
  0 siblings, 2 replies; 30+ messages in thread
From: Chen Zhou @ 2020-03-07 11:06 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: horms, Ganapatrao Prabhakerrao Kulkarni, Will Deacon, xiexiuqi,
	Catalin Marinas, Bhupesh Sharma, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, mingo,
	James Morse, Thomas Gleixner, dyoung, linux-arm-kernel



On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
> On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>
>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
>> size low memory for crash kdump kernel devices firstly and then reserve
>> memory above 4G.
>>
>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>> ---
>>  arch/arm64/kernel/setup.c |  8 +++++++-
>>  arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
>>  2 files changed, 36 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 56f6645..04d1c87 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
>>                     kernel_data.end <= res->end)
>>                         request_resource(res, &kernel_data);
>>  #ifdef CONFIG_KEXEC_CORE
>> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
>> +               /*
>> +                * Userspace will find "Crash kernel" region in /proc/iomem.
>> +                * Note: the low region is renamed as Crash kernel (low).
>> +                */
>> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>> +                               crashk_low_res.end <= res->end)
>> +                       request_resource(res, &crashk_low_res);
>>                 if (crashk_res.end && crashk_res.start >= res->start &&
>>                     crashk_res.end <= res->end)
>>                         request_resource(res, &crashk_res);
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index b65dffd..0d7afd5 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
>>  {
>>         unsigned long long crash_base, crash_size;
>>         int ret;
>> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
>>
>>         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>>                                 &crash_size, &crash_base);
>> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
>>         if (ret || !crash_size)
>>                 return;
>>
>> +       ret = reserve_crashkernel_low();
>> +       if (!ret && crashk_low_res.end) {
>> +               /*
>> +                * If crashkernel=X,low specified, there may be two regions,
>> +                * we need to make some changes as follows:
>> +                *
>> +                * 1. rename the low region as "Crash kernel (low)"
>> +                * In order to distinct from the high region and make no effect
>> +                * to the use of existing kexec-tools, rename the low region as
>> +                * "Crash kernel (low)".
>> +                *
>> +                * 2. change the upper bound for crash memory
>> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
>> +                *
>> +                * 3. mark the low region as "nomap"
>> +                * The low region is intended to be used for crash dump kernel
>> +                * devices, just mark the low region as "nomap" simply.
>> +                */
>> +               const char *rename = "Crash kernel (low)";
>> +
>> +               crashk_low_res.name = rename;
>> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
>> +               memblock_mark_nomap(crashk_low_res.start,
>> +                                   resource_size(&crashk_low_res));
>> +       }
>> +
>>         crash_size = PAGE_ALIGN(crash_size);
>>
>>         if (crash_base == 0) {
>>                 /* Current arm64 boot protocol requires 2MB alignment */
>> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>> -                               crash_size, SZ_2M);
>> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
>> +                               SZ_2M);
>>                 if (crash_base == 0) {
>>                         pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>>                                 crash_size);
>> --
> 
> I tested this patch series on ARM64-ThunderX2 with no issue with
> bootargs crashkenel=X@Y crashkernel=250M,low
> 
> $ dmesg | grep crash
> [    0.000000] crashkernel reserved: 0x0000000b81200000 -
> 0x0000000c81200000 (4096 MB)
> [    0.000000] Kernel command line:
> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
> crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
> [   29.310209]     crashkernel=250M,low
> 
> $  kexec -p -i /boot/vmlinuz-`uname -r`
> --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
> $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
> 
> But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
> Kernel is not able to allocate memory.
> [    0.000000] cannot allocate crashkernel (size:0x100000000)
> [    0.000000] Kernel command line:
> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
> crashkernel=250M,low nowatchdog
> [   29.332081]     crashkernel=250M,low
> 
> does crashkernel=X@Y mandatory to get allocated beyond 4G?
> am I missing something?

I can't reproduce the problem in my environment, can you test with other size,
such as "crashkernel=1G crashkernel=250M,low", see if there is the same issue.

Besides, crashkernel=X@Y isn't mandatory to get allocated beyond 4G,
can you show the whole file /proc/iomem.

Thanks,
Chen Zhou

> 
> --pk
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-03-07 11:06     ` Chen Zhou
@ 2020-03-07 18:43       ` John Donnelly
  2020-03-09  4:59         ` Prabhakar Kushwaha
  2020-03-09  4:48       ` Prabhakar Kushwaha
  1 sibling, 1 reply; 30+ messages in thread
From: John Donnelly @ 2020-03-07 18:43 UTC (permalink / raw)
  To: Chen Zhou
  Cc: Ganapatrao Prabhakerrao Kulkarni, xiexiuqi, Catalin Marinas,
	Bhupesh Sharma, Linux Doc Mailing List, kexec mailing list,
	Linux Kernel Mailing List, dyoung, horms, James Morse,
	Prabhakar Kushwaha, Thomas Gleixner, Will Deacon, mingo,
	linux-arm-kernel



> On Mar 7, 2020, at 5:06 AM, Chen Zhou <chenzhou10@huawei.com> wrote:
> 
> 
> 
> On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
>> On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>> 
>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
>>> size low memory for crash kdump kernel devices firstly and then reserve
>>> memory above 4G.
>>> 
>>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>>> ---
>>> arch/arm64/kernel/setup.c |  8 +++++++-
>>> arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
>>> 2 files changed, 36 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> index 56f6645..04d1c87 100644
>>> --- a/arch/arm64/kernel/setup.c
>>> +++ b/arch/arm64/kernel/setup.c
>>> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
>>>                    kernel_data.end <= res->end)
>>>                        request_resource(res, &kernel_data);
>>> #ifdef CONFIG_KEXEC_CORE
>>> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
>>> +               /*
>>> +                * Userspace will find "Crash kernel" region in /proc/iomem.
>>> +                * Note: the low region is renamed as Crash kernel (low).
>>> +                */
>>> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>>> +                               crashk_low_res.end <= res->end)
>>> +                       request_resource(res, &crashk_low_res);
>>>                if (crashk_res.end && crashk_res.start >= res->start &&
>>>                    crashk_res.end <= res->end)
>>>                        request_resource(res, &crashk_res);
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index b65dffd..0d7afd5 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
>>> {
>>>        unsigned long long crash_base, crash_size;
>>>        int ret;
>>> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
>>> 
>>>        ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>>>                                &crash_size, &crash_base);
>>> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
>>>        if (ret || !crash_size)
>>>                return;
>>> 
>>> +       ret = reserve_crashkernel_low();
>>> +       if (!ret && crashk_low_res.end) {
>>> +               /*
>>> +                * If crashkernel=X,low specified, there may be two regions,
>>> +                * we need to make some changes as follows:
>>> +                *
>>> +                * 1. rename the low region as "Crash kernel (low)"
>>> +                * In order to distinct from the high region and make no effect
>>> +                * to the use of existing kexec-tools, rename the low region as
>>> +                * "Crash kernel (low)".
>>> +                *
>>> +                * 2. change the upper bound for crash memory
>>> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
>>> +                *
>>> +                * 3. mark the low region as "nomap"
>>> +                * The low region is intended to be used for crash dump kernel
>>> +                * devices, just mark the low region as "nomap" simply.
>>> +                */
>>> +               const char *rename = "Crash kernel (low)";
>>> +
>>> +               crashk_low_res.name = rename;
>>> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
>>> +               memblock_mark_nomap(crashk_low_res.start,
>>> +                                   resource_size(&crashk_low_res));
>>> +       }
>>> +
>>>        crash_size = PAGE_ALIGN(crash_size);
>>> 
>>>        if (crash_base == 0) {
>>>                /* Current arm64 boot protocol requires 2MB alignment */
>>> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>>> -                               crash_size, SZ_2M);
>>> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
>>> +                               SZ_2M);
>>>                if (crash_base == 0) {
>>>                        pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>>>                                crash_size);
>>> --
>> 
>> I tested this patch series on ARM64-ThunderX2 with no issue with
>> bootargs crashkenel=X@Y crashkernel=250M,low
>> 
>> $ dmesg | grep crash
>> [    0.000000] crashkernel reserved: 0x0000000b81200000 -
>> 0x0000000c81200000 (4096 MB)
>> [    0.000000] Kernel command line:
>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
>> crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
>> [   29.310209]     crashkernel=250M,low
>> 
>> $  kexec -p -i /boot/vmlinuz-`uname -r`
>> --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
>> $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
>> 
>> But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
>> Kernel is not able to allocate memory.
>> [    0.000000] cannot allocate crashkernel (size:0x100000000)
>> [    0.000000] Kernel command line:
>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
>> crashkernel=250M,low nowatchdog
>> [   29.332081]     crashkernel=250M,low
>> 
>> does crashkernel=X@Y mandatory to get allocated beyond 4G?
>> am I missing something?
> 

   crashkernel=4G

   You need to look at the memory map on node 0  from dmesg     ( or /proc/iomem ) to determine if there is any memory in that range  - 0x100000000 == 1st byte above 4G .

On the Arm server class machines  I’ve seen the 1st usable memory range above 4G is 32G area. It is platform dependent where the 1st range is. 

> I can't reproduce the problem in my environment, can you test with other size,
> such as "crashkernel=1G crashkernel=250M,low", see if there is the same issue.
> 
> Besides, crashkernel=X@Y isn't mandatory to get allocated beyond 4G,
> can you show the whole file /proc/iomem.
> 
> Thanks,
> Chen Zhou
> 
>> 
>> --pk
>> 
>> .
>> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> https://urldefense.com/v3/__http://lists.infradead.org/mailman/listinfo/kexec__;!!GqivPVa7Brio!ODmAWng4F8H39PjvA-8q-Y9yOCQN8plPM95XeJsrXLMwbkFCZ5r3NPBr0duY0Rku_MCe$


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-03-07 11:06     ` Chen Zhou
  2020-03-07 18:43       ` John Donnelly
@ 2020-03-09  4:48       ` Prabhakar Kushwaha
  2020-03-09 15:51         ` Prabhakar Kushwaha
  1 sibling, 1 reply; 30+ messages in thread
From: Prabhakar Kushwaha @ 2020-03-09  4:48 UTC (permalink / raw)
  To: Chen Zhou
  Cc: horms, Ganapatrao Prabhakerrao Kulkarni, Will Deacon, xiexiuqi,
	Catalin Marinas, Bhupesh Sharma, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, mingo,
	James Morse, Thomas Gleixner, dyoung, linux-arm-kernel

Hi Chen,

On Sat, Mar 7, 2020 at 4:36 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>
>
>
> On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
> > On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
> >>
> >> Crashkernel=X tries to reserve memory for the crash dump kernel under
> >> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
> >> size low memory for crash kdump kernel devices firstly and then reserve
> >> memory above 4G.
> >>
> >> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> >> ---
> >>  arch/arm64/kernel/setup.c |  8 +++++++-
> >>  arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
> >>  2 files changed, 36 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >> index 56f6645..04d1c87 100644
> >> --- a/arch/arm64/kernel/setup.c
> >> +++ b/arch/arm64/kernel/setup.c
> >> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
> >>                     kernel_data.end <= res->end)
> >>                         request_resource(res, &kernel_data);
> >>  #ifdef CONFIG_KEXEC_CORE
> >> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
> >> +               /*
> >> +                * Userspace will find "Crash kernel" region in /proc/iomem.
> >> +                * Note: the low region is renamed as Crash kernel (low).
> >> +                */
> >> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> >> +                               crashk_low_res.end <= res->end)
> >> +                       request_resource(res, &crashk_low_res);
> >>                 if (crashk_res.end && crashk_res.start >= res->start &&
> >>                     crashk_res.end <= res->end)
> >>                         request_resource(res, &crashk_res);
> >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> index b65dffd..0d7afd5 100644
> >> --- a/arch/arm64/mm/init.c
> >> +++ b/arch/arm64/mm/init.c
> >> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
> >>  {
> >>         unsigned long long crash_base, crash_size;
> >>         int ret;
> >> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
> >>
> >>         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> >>                                 &crash_size, &crash_base);
> >> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
> >>         if (ret || !crash_size)
> >>                 return;
> >>
> >> +       ret = reserve_crashkernel_low();
> >> +       if (!ret && crashk_low_res.end) {
> >> +               /*
> >> +                * If crashkernel=X,low specified, there may be two regions,
> >> +                * we need to make some changes as follows:
> >> +                *
> >> +                * 1. rename the low region as "Crash kernel (low)"
> >> +                * In order to distinct from the high region and make no effect
> >> +                * to the use of existing kexec-tools, rename the low region as
> >> +                * "Crash kernel (low)".
> >> +                *
> >> +                * 2. change the upper bound for crash memory
> >> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
> >> +                *
> >> +                * 3. mark the low region as "nomap"
> >> +                * The low region is intended to be used for crash dump kernel
> >> +                * devices, just mark the low region as "nomap" simply.
> >> +                */
> >> +               const char *rename = "Crash kernel (low)";
> >> +
> >> +               crashk_low_res.name = rename;
> >> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
> >> +               memblock_mark_nomap(crashk_low_res.start,
> >> +                                   resource_size(&crashk_low_res));
> >> +       }
> >> +
> >>         crash_size = PAGE_ALIGN(crash_size);
> >>
> >>         if (crash_base == 0) {
> >>                 /* Current arm64 boot protocol requires 2MB alignment */
> >> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
> >> -                               crash_size, SZ_2M);
> >> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
> >> +                               SZ_2M);
> >>                 if (crash_base == 0) {
> >>                         pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> >>                                 crash_size);
> >> --
> >
> > I tested this patch series on ARM64-ThunderX2 with no issue with
> > bootargs crashkenel=X@Y crashkernel=250M,low
> >
> > $ dmesg | grep crash
> > [    0.000000] crashkernel reserved: 0x0000000b81200000 -
> > 0x0000000c81200000 (4096 MB)
> > [    0.000000] Kernel command line:
> > BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> > root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
> > crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
> > [   29.310209]     crashkernel=250M,low
> >
> > $  kexec -p -i /boot/vmlinuz-`uname -r`
> > --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
> > $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
> >
> > But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
> > Kernel is not able to allocate memory.
> > [    0.000000] cannot allocate crashkernel (size:0x100000000)
> > [    0.000000] Kernel command line:
> > BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> > root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
> > crashkernel=250M,low nowatchdog
> > [   29.332081]     crashkernel=250M,low
> >
> > does crashkernel=X@Y mandatory to get allocated beyond 4G?
> > am I missing something?
>
> I can't reproduce the problem in my environment, can you test with other size,
> such as "crashkernel=1G crashkernel=250M,low", see if there is the same issue.
>
I tried 1G also. Same error, please find the logs

$ dmesg | grep crash
[    0.000000] cannot allocate crashkernel (size:0x40000000)
[    0.000000] Kernel command line:
BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
crashkernel=1G crashkernel=250M,low
[   29.326916]     crashkernel=250M,low


> Besides, crashkernel=X@Y isn't mandatory to get allocated beyond 4G,

this was my understanding also.

> can you show the whole file /proc/iomem.
>

$ cat /proc/iomem
00000000-00000000 : PCI ECAM
00000000-00000000 : PCI ECAM
00000000-00000000 : PCI Bus 0000:00
  00000000-00000000 : PCI Bus 0000:0f
    00000000-00000000 : PCI Bus 0000:10
      00000000-00000000 : 0000:10:00.0
      00000000-00000000 : 0000:10:00.0
  00000000-00000000 : PCI Bus 0000:01
    00000000-00000000 : 0000:01:00.0
    00000000-00000000 : 0000:01:00.1
  00000000-00000000 : PCI Bus 0000:05
    00000000-00000000 : 0000:05:00.0
    00000000-00000000 : 0000:05:00.1
  00000000-00000000 : PCI Bus 0000:09
    00000000-00000000 : 0000:09:00.0
    00000000-00000000 : 0000:09:00.1
  00000000-00000000 : 0000:00:10.0
    00000000-00000000 : ahci
  00000000-00000000 : 0000:00:10.1
    00000000-00000000 : ahci
00000000-00000000 : PCI Bus 0000:80
  00000000-00000000 : PCI Bus 0000:83
    00000000-00000000 : 0000:83:00.0
    00000000-00000000 : 0000:83:00.0
      00000000-00000000 : nvme
  00000000-00000000 : PCI Bus 0000:89
    00000000-00000000 : 0000:89:00.0
      00000000-00000000 : e1000e
    00000000-00000000 : 0000:89:00.0
    00000000-00000000 : 0000:89:00.0
      00000000-00000000 : e1000e
    00000000-00000000 : 0000:89:00.0
      00000000-00000000 : e1000e
  00000000-00000000 : PCI Bus 0000:8d
    00000000-00000000 : 0000:8d:00.0
    00000000-00000000 : 0000:8d:00.0
      00000000-00000000 : mpt3sas
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : Kernel code
  00000000-00000000 : reserved
  00000000-00000000 : Kernel data
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901D:00
  00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901E:00
  00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901F:00
  00000000-00000000 : CAV901C:00
00000000-00000000 : CAV9006:00
  00000000-00000000 : CAV9006:00
00000000-00000000 : ARMH0011:00
  00000000-00000000 : ARMH0011:00
00000000-00000000 : arm-smmu-v3.0.auto
  00000000-00000000 : arm-smmu-v3.0.auto
00000000-00000000 : arm-smmu-v3.1.auto
  00000000-00000000 : arm-smmu-v3.1.auto
00000000-00000000 : arm-smmu-v3.2.auto
  00000000-00000000 : arm-smmu-v3.2.auto
00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901D:01
  00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901E:01
  00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901F:01
  00000000-00000000 : CAV901C:01
00000000-00000000 : CAV9007:06
  00000000-00000000 : CAV9007:06
00000000-00000000 : arm-smmu-v3.3.auto
  00000000-00000000 : arm-smmu-v3.3.auto
00000000-00000000 : arm-smmu-v3.4.auto
  00000000-00000000 : arm-smmu-v3.4.auto
00000000-00000000 : arm-smmu-v3.5.auto
  00000000-00000000 : arm-smmu-v3.5.auto
00000000-00000000 : System RAM
00000000-00000000 : System RAM
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
00000000-00000000 : PCI Bus 0000:00
  00000000-00000000 : PCI Bus 0000:01
    00000000-00000000 : 0000:01:00.0
    00000000-00000000 : 0000:01:00.1
    00000000-00000000 : 0000:01:00.0
    00000000-00000000 : 0000:01:00.1
    00000000-00000000 : 0000:01:00.0
    00000000-00000000 : 0000:01:00.1
  00000000-00000000 : PCI Bus 0000:05
    00000000-00000000 : 0000:05:00.0
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.1
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.0
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.0
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.1
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.1
      00000000-00000000 : bnx2x
  00000000-00000000 : PCI Bus 0000:09
    00000000-00000000 : 0000:09:00.0
      00000000-00000000 : i40e
    00000000-00000000 : 0000:09:00.1
      00000000-00000000 : i40e
    00000000-00000000 : 0000:09:00.0
    00000000-00000000 : 0000:09:00.1
    00000000-00000000 : 0000:09:00.0
      00000000-00000000 : i40e
    00000000-00000000 : 0000:09:00.1
      00000000-00000000 : i40e
    00000000-00000000 : 0000:09:00.0
    00000000-00000000 : 0000:09:00.1
  00000000-00000000 : 0000:00:0f.0
    00000000-00000000 : xhci-hcd
  00000000-00000000 : 0000:00:0f.0
  00000000-00000000 : 0000:00:0f.1
    00000000-00000000 : xhci-hcd
  00000000-00000000 : 0000:00:0f.1
  00000000-00000000 : 0000:00:10.0
    00000000-00000000 : ahci
  00000000-00000000 : 0000:00:10.1
    00000000-00000000 : ahci
00000000-00000000 : PCI Bus 0000:80

--pk

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-03-07 18:43       ` John Donnelly
@ 2020-03-09  4:59         ` Prabhakar Kushwaha
  0 siblings, 0 replies; 30+ messages in thread
From: Prabhakar Kushwaha @ 2020-03-09  4:59 UTC (permalink / raw)
  To: John Donnelly
  Cc: Ganapatrao Prabhakerrao Kulkarni, xiexiuqi, Chen Zhou,
	Catalin Marinas, Bhupesh Sharma, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, dyoung, horms,
	James Morse, Thomas Gleixner, Will Deacon, mingo,
	linux-arm-kernel

. Hi John,

On Sun, Mar 8, 2020 at 12:13 AM John Donnelly
<john.p.donnelly@oracle.com> wrote:
>
>
>
> > On Mar 7, 2020, at 5:06 AM, Chen Zhou <chenzhou10@huawei.com> wrote:
> >
> >
> >
> > On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
> >> On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
> >>>
> >>> Crashkernel=X tries to reserve memory for the crash dump kernel under
> >>> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
> >>> size low memory for crash kdump kernel devices firstly and then reserve
> >>> memory above 4G.
> >>>
> >>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> >>> ---
> >>> arch/arm64/kernel/setup.c |  8 +++++++-
> >>> arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
> >>> 2 files changed, 36 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >>> index 56f6645..04d1c87 100644
> >>> --- a/arch/arm64/kernel/setup.c
> >>> +++ b/arch/arm64/kernel/setup.c
> >>> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
> >>>                    kernel_data.end <= res->end)
> >>>                        request_resource(res, &kernel_data);
> >>> #ifdef CONFIG_KEXEC_CORE
> >>> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
> >>> +               /*
> >>> +                * Userspace will find "Crash kernel" region in /proc/iomem.
> >>> +                * Note: the low region is renamed as Crash kernel (low).
> >>> +                */
> >>> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> >>> +                               crashk_low_res.end <= res->end)
> >>> +                       request_resource(res, &crashk_low_res);
> >>>                if (crashk_res.end && crashk_res.start >= res->start &&
> >>>                    crashk_res.end <= res->end)
> >>>                        request_resource(res, &crashk_res);
> >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >>> index b65dffd..0d7afd5 100644
> >>> --- a/arch/arm64/mm/init.c
> >>> +++ b/arch/arm64/mm/init.c
> >>> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
> >>> {
> >>>        unsigned long long crash_base, crash_size;
> >>>        int ret;
> >>> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
> >>>
> >>>        ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> >>>                                &crash_size, &crash_base);
> >>> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
> >>>        if (ret || !crash_size)
> >>>                return;
> >>>
> >>> +       ret = reserve_crashkernel_low();
> >>> +       if (!ret && crashk_low_res.end) {
> >>> +               /*
> >>> +                * If crashkernel=X,low specified, there may be two regions,
> >>> +                * we need to make some changes as follows:
> >>> +                *
> >>> +                * 1. rename the low region as "Crash kernel (low)"
> >>> +                * In order to distinct from the high region and make no effect
> >>> +                * to the use of existing kexec-tools, rename the low region as
> >>> +                * "Crash kernel (low)".
> >>> +                *
> >>> +                * 2. change the upper bound for crash memory
> >>> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
> >>> +                *
> >>> +                * 3. mark the low region as "nomap"
> >>> +                * The low region is intended to be used for crash dump kernel
> >>> +                * devices, just mark the low region as "nomap" simply.
> >>> +                */
> >>> +               const char *rename = "Crash kernel (low)";
> >>> +
> >>> +               crashk_low_res.name = rename;
> >>> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
> >>> +               memblock_mark_nomap(crashk_low_res.start,
> >>> +                                   resource_size(&crashk_low_res));
> >>> +       }
> >>> +
> >>>        crash_size = PAGE_ALIGN(crash_size);
> >>>
> >>>        if (crash_base == 0) {
> >>>                /* Current arm64 boot protocol requires 2MB alignment */
> >>> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
> >>> -                               crash_size, SZ_2M);
> >>> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
> >>> +                               SZ_2M);
> >>>                if (crash_base == 0) {
> >>>                        pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> >>>                                crash_size);
> >>> --
> >>
> >> I tested this patch series on ARM64-ThunderX2 with no issue with
> >> bootargs crashkenel=X@Y crashkernel=250M,low
> >>
> >> $ dmesg | grep crash
> >> [    0.000000] crashkernel reserved: 0x0000000b81200000 -
> >> 0x0000000c81200000 (4096 MB)
> >> [    0.000000] Kernel command line:
> >> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> >> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
> >> crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
> >> [   29.310209]     crashkernel=250M,low
> >>
> >> $  kexec -p -i /boot/vmlinuz-`uname -r`
> >> --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
> >> $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
> >>
> >> But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
> >> Kernel is not able to allocate memory.
> >> [    0.000000] cannot allocate crashkernel (size:0x100000000)
> >> [    0.000000] Kernel command line:
> >> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> >> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
> >> crashkernel=250M,low nowatchdog
> >> [   29.332081]     crashkernel=250M,low
> >>
> >> does crashkernel=X@Y mandatory to get allocated beyond 4G?
> >> am I missing something?
> >
>
>    crashkernel=4G
>
>    You need to look at the memory map on node 0  from dmesg     ( or /proc/iomem ) to determine if there is any memory in that range  - 0x100000000 == 1st byte above 4G .
>

i believe i have enough free memory. Please find log below

$ dmesg | grep "node 0"
[    0.000000] Initmem setup node 0 [mem 0x00000000802f0000-0x0000009ffcffffff]
[    0.000000] On node 0 totalpages: 33537296
[   12.335714] pci_bus 0000:00: on NUMA node 0
$

I am passing 4G@0xb81200000 in working scenario, here 0xb81200000 is
well within node 0 range.

Logs of iomem is below:

$ cat /proc/iomem
00000000-00000000 : PCI ECAM
00000000-00000000 : PCI ECAM
00000000-00000000 : PCI Bus 0000:00
  00000000-00000000 : PCI Bus 0000:0f
    00000000-00000000 : PCI Bus 0000:10
      00000000-00000000 : 0000:10:00.0
      00000000-00000000 : 0000:10:00.0
  00000000-00000000 : PCI Bus 0000:01
    00000000-00000000 : 0000:01:00.0
    00000000-00000000 : 0000:01:00.1
  00000000-00000000 : PCI Bus 0000:05
    00000000-00000000 : 0000:05:00.0
    00000000-00000000 : 0000:05:00.1
  00000000-00000000 : PCI Bus 0000:09
    00000000-00000000 : 0000:09:00.0
    00000000-00000000 : 0000:09:00.1
  00000000-00000000 : 0000:00:10.0
    00000000-00000000 : ahci
  00000000-00000000 : 0000:00:10.1
    00000000-00000000 : ahci
00000000-00000000 : PCI Bus 0000:80
  00000000-00000000 : PCI Bus 0000:83
    00000000-00000000 : 0000:83:00.0
    00000000-00000000 : 0000:83:00.0
      00000000-00000000 : nvme
  00000000-00000000 : PCI Bus 0000:89
    00000000-00000000 : 0000:89:00.0
      00000000-00000000 : e1000e
    00000000-00000000 : 0000:89:00.0
    00000000-00000000 : 0000:89:00.0
      00000000-00000000 : e1000e
    00000000-00000000 : 0000:89:00.0
      00000000-00000000 : e1000e
  00000000-00000000 : PCI Bus 0000:8d
    00000000-00000000 : 0000:8d:00.0
    00000000-00000000 : 0000:8d:00.0
      00000000-00000000 : mpt3sas
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : Kernel code
  00000000-00000000 : reserved
  00000000-00000000 : Kernel data
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901D:00
  00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901E:00
  00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901F:00
  00000000-00000000 : CAV901C:00
00000000-00000000 : CAV9006:00
  00000000-00000000 : CAV9006:00
00000000-00000000 : ARMH0011:00
  00000000-00000000 : ARMH0011:00
00000000-00000000 : arm-smmu-v3.0.auto
  00000000-00000000 : arm-smmu-v3.0.auto
00000000-00000000 : arm-smmu-v3.1.auto
  00000000-00000000 : arm-smmu-v3.1.auto
00000000-00000000 : arm-smmu-v3.2.auto
  00000000-00000000 : arm-smmu-v3.2.auto
00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901D:01
  00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901E:01
  00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901F:01
  00000000-00000000 : CAV901C:01
00000000-00000000 : CAV9007:06
  00000000-00000000 : CAV9007:06
00000000-00000000 : arm-smmu-v3.3.auto
  00000000-00000000 : arm-smmu-v3.3.auto
00000000-00000000 : arm-smmu-v3.4.auto
  00000000-00000000 : arm-smmu-v3.4.auto
00000000-00000000 : arm-smmu-v3.5.auto
  00000000-00000000 : arm-smmu-v3.5.auto
00000000-00000000 : System RAM
00000000-00000000 : System RAM
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
00000000-00000000 : System RAM
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
  00000000-00000000 : reserved
00000000-00000000 : PCI Bus 0000:00
  00000000-00000000 : PCI Bus 0000:01
    00000000-00000000 : 0000:01:00.0
    00000000-00000000 : 0000:01:00.1
    00000000-00000000 : 0000:01:00.0
    00000000-00000000 : 0000:01:00.1
    00000000-00000000 : 0000:01:00.0
    00000000-00000000 : 0000:01:00.1
  00000000-00000000 : PCI Bus 0000:05
    00000000-00000000 : 0000:05:00.0
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.1
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.0
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.0
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.1
      00000000-00000000 : bnx2x
    00000000-00000000 : 0000:05:00.1
      00000000-00000000 : bnx2x
  00000000-00000000 : PCI Bus 0000:09
    00000000-00000000 : 0000:09:00.0
      00000000-00000000 : i40e
    00000000-00000000 : 0000:09:00.1
      00000000-00000000 : i40e
    00000000-00000000 : 0000:09:00.0
    00000000-00000000 : 0000:09:00.1
    00000000-00000000 : 0000:09:00.0
      00000000-00000000 : i40e
    00000000-00000000 : 0000:09:00.1
      00000000-00000000 : i40e
    00000000-00000000 : 0000:09:00.0
    00000000-00000000 : 0000:09:00.1
  00000000-00000000 : 0000:00:0f.0
    00000000-00000000 : xhci-hcd
  00000000-00000000 : 0000:00:0f.0
  00000000-00000000 : 0000:00:0f.1
    00000000-00000000 : xhci-hcd
  00000000-00000000 : 0000:00:0f.1
  00000000-00000000 : 0000:00:10.0
    00000000-00000000 : ahci
  00000000-00000000 : 0000:00:10.1
    00000000-00000000 : ahci
00000000-00000000 : PCI Bus 0000:80

--pk

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-03-09  4:48       ` Prabhakar Kushwaha
@ 2020-03-09 15:51         ` Prabhakar Kushwaha
  2020-03-10  1:30           ` chenzhou
  0 siblings, 1 reply; 30+ messages in thread
From: Prabhakar Kushwaha @ 2020-03-09 15:51 UTC (permalink / raw)
  To: Chen Zhou
  Cc: horms, Ganapatrao Prabhakerrao Kulkarni, Will Deacon, xiexiuqi,
	Catalin Marinas, Bhupesh Sharma, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, mingo,
	James Morse, Thomas Gleixner, dyoung, linux-arm-kernel

On 3/9/2020 10:18 AM, Prabhakar Kushwaha wrote:
> Hi Chen,
> 
> On Sat, Mar 7, 2020 at 4:36 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>
>>
>>
>> On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
>>> On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>
>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>>> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
>>>> size low memory for crash kdump kernel devices firstly and then reserve
>>>> memory above 4G.
>>>>
>>>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>>>> ---
>>>>  arch/arm64/kernel/setup.c |  8 +++++++-
>>>>  arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
>>>>  2 files changed, 36 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>>> index 56f6645..04d1c87 100644
>>>> --- a/arch/arm64/kernel/setup.c
>>>> +++ b/arch/arm64/kernel/setup.c
>>>> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
>>>>                     kernel_data.end <= res->end)
>>>>                         request_resource(res, &kernel_data);
>>>>  #ifdef CONFIG_KEXEC_CORE
>>>> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
>>>> +               /*
>>>> +                * Userspace will find "Crash kernel" region in /proc/iomem.
>>>> +                * Note: the low region is renamed as Crash kernel (low).
>>>> +                */
>>>> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>>>> +                               crashk_low_res.end <= res->end)
>>>> +                       request_resource(res, &crashk_low_res);
>>>>                 if (crashk_res.end && crashk_res.start >= res->start &&
>>>>                     crashk_res.end <= res->end)
>>>>                         request_resource(res, &crashk_res);
>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>> index b65dffd..0d7afd5 100644
>>>> --- a/arch/arm64/mm/init.c
>>>> +++ b/arch/arm64/mm/init.c
>>>> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
>>>>  {
>>>>         unsigned long long crash_base, crash_size;
>>>>         int ret;
>>>> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
>>>>
>>>>         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>>>>                                 &crash_size, &crash_base);
>>>> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
>>>>         if (ret || !crash_size)
>>>>                 return;
>>>>
>>>> +       ret = reserve_crashkernel_low();
>>>> +       if (!ret && crashk_low_res.end) {
>>>> +               /*
>>>> +                * If crashkernel=X,low specified, there may be two regions,
>>>> +                * we need to make some changes as follows:
>>>> +                *
>>>> +                * 1. rename the low region as "Crash kernel (low)"
>>>> +                * In order to distinct from the high region and make no effect
>>>> +                * to the use of existing kexec-tools, rename the low region as
>>>> +                * "Crash kernel (low)".
>>>> +                *
>>>> +                * 2. change the upper bound for crash memory
>>>> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
>>>> +                *
>>>> +                * 3. mark the low region as "nomap"
>>>> +                * The low region is intended to be used for crash dump kernel
>>>> +                * devices, just mark the low region as "nomap" simply.
>>>> +                */
>>>> +               const char *rename = "Crash kernel (low)";
>>>> +
>>>> +               crashk_low_res.name = rename;
>>>> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
>>>> +               memblock_mark_nomap(crashk_low_res.start,
>>>> +                                   resource_size(&crashk_low_res));
>>>> +       }
>>>> +
>>>>         crash_size = PAGE_ALIGN(crash_size);
>>>>
>>>>         if (crash_base == 0) {
>>>>                 /* Current arm64 boot protocol requires 2MB alignment */
>>>> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>>>> -                               crash_size, SZ_2M);
>>>> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
>>>> +                               SZ_2M);
>>>>                 if (crash_base == 0) {
>>>>                         pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>>>>                                 crash_size);
>>>> --
>>>
>>> I tested this patch series on ARM64-ThunderX2 with no issue with
>>> bootargs crashkenel=X@Y crashkernel=250M,low
>>>
>>> $ dmesg | grep crash
>>> [    0.000000] crashkernel reserved: 0x0000000b81200000 -
>>> 0x0000000c81200000 (4096 MB)
>>> [    0.000000] Kernel command line:
>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
>>> crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
>>> [   29.310209]     crashkernel=250M,low
>>>
>>> $  kexec -p -i /boot/vmlinuz-`uname -r`
>>> --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
>>> $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
>>>
>>> But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
>>> Kernel is not able to allocate memory.
>>> [    0.000000] cannot allocate crashkernel (size:0x100000000)
>>> [    0.000000] Kernel command line:
>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
>>> crashkernel=250M,low nowatchdog
>>> [   29.332081]     crashkernel=250M,low
>>>
>>> does crashkernel=X@Y mandatory to get allocated beyond 4G?
>>> am I missing something?
>>
>> I can't reproduce the problem in my environment, can you test with other size,
>> such as "crashkernel=1G crashkernel=250M,low", see if there is the same issue.
>>
> I tried 1G also. Same error, please find the logs
> 
> $ dmesg | grep crash
> [    0.000000] cannot allocate crashkernel (size:0x40000000)
> [    0.000000] Kernel command line:
> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
> crashkernel=1G crashkernel=250M,low
> [   29.326916]     crashkernel=250M,low
> 
> 
>> Besides, crashkernel=X@Y isn't mandatory to get allocated beyond 4G,
> 
> this was my understanding also.
> 
>> can you show the whole file /proc/iomem.
>>
> 
> $ cat /proc/iomem
> 00000000-00000000 : PCI ECAM
> 00000000-00000000 : PCI ECAM
> 00000000-00000000 : PCI Bus 0000:00
>   00000000-00000000 : PCI Bus 0000:0f
>     00000000-00000000 : PCI Bus 0000:10
>       00000000-00000000 : 0000:10:00.0
>       00000000-00000000 : 0000:10:00.0
>   00000000-00000000 : PCI Bus 0000:01
>     00000000-00000000 : 0000:01:00.0
>     00000000-00000000 : 0000:01:00.1
>   00000000-00000000 : PCI Bus 0000:05
>     00000000-00000000 : 0000:05:00.0
>     00000000-00000000 : 0000:05:00.1
>   00000000-00000000 : PCI Bus 0000:09
>     00000000-00000000 : 0000:09:00.0
>     00000000-00000000 : 0000:09:00.1
>   00000000-00000000 : 0000:00:10.0
>     00000000-00000000 : ahci
>   00000000-00000000 : 0000:00:10.1
>     00000000-00000000 : ahci
> 00000000-00000000 : PCI Bus 0000:80
>   00000000-00000000 : PCI Bus 0000:83
>     00000000-00000000 : 0000:83:00.0
>     00000000-00000000 : 0000:83:00.0
>       00000000-00000000 : nvme
>   00000000-00000000 : PCI Bus 0000:89
>     00000000-00000000 : 0000:89:00.0
>       00000000-00000000 : e1000e
>     00000000-00000000 : 0000:89:00.0
>     00000000-00000000 : 0000:89:00.0
>       00000000-00000000 : e1000e
>     00000000-00000000 : 0000:89:00.0
>       00000000-00000000 : e1000e
>   00000000-00000000 : PCI Bus 0000:8d
>     00000000-00000000 : 0000:8d:00.0
>     00000000-00000000 : 0000:8d:00.0
>       00000000-00000000 : mpt3sas
> 00000000-00000000 : reserved
> 00000000-00000000 : System RAM
>   00000000-00000000 : Kernel code
>   00000000-00000000 : reserved
>   00000000-00000000 : Kernel data
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
> 00000000-00000000 : reserved
> 00000000-00000000 : System RAM
> 00000000-00000000 : reserved
> 00000000-00000000 : System RAM
>   00000000-00000000 : reserved
> 00000000-00000000 : reserved
> 00000000-00000000 : System RAM
>   00000000-00000000 : reserved
> 00000000-00000000 : reserved
> 00000000-00000000 : System RAM
>   00000000-00000000 : reserved
> 00000000-00000000 : reserved
> 00000000-00000000 : System RAM
>   00000000-00000000 : reserved
> 00000000-00000000 : reserved
> 00000000-00000000 : System RAM
>   00000000-00000000 : reserved
> 00000000-00000000 : CAV901C:00
> 00000000-00000000 : CAV901D:00
>   00000000-00000000 : CAV901C:00
> 00000000-00000000 : CAV901E:00
>   00000000-00000000 : CAV901C:00
> 00000000-00000000 : CAV901F:00
>   00000000-00000000 : CAV901C:00
> 00000000-00000000 : CAV9006:00
>   00000000-00000000 : CAV9006:00
> 00000000-00000000 : ARMH0011:00
>   00000000-00000000 : ARMH0011:00
> 00000000-00000000 : arm-smmu-v3.0.auto
>   00000000-00000000 : arm-smmu-v3.0.auto
> 00000000-00000000 : arm-smmu-v3.1.auto
>   00000000-00000000 : arm-smmu-v3.1.auto
> 00000000-00000000 : arm-smmu-v3.2.auto
>   00000000-00000000 : arm-smmu-v3.2.auto
> 00000000-00000000 : CAV901C:01
> 00000000-00000000 : CAV901D:01
>   00000000-00000000 : CAV901C:01
> 00000000-00000000 : CAV901E:01
>   00000000-00000000 : CAV901C:01
> 00000000-00000000 : CAV901F:01
>   00000000-00000000 : CAV901C:01
> 00000000-00000000 : CAV9007:06
>   00000000-00000000 : CAV9007:06
> 00000000-00000000 : arm-smmu-v3.3.auto
>   00000000-00000000 : arm-smmu-v3.3.auto
> 00000000-00000000 : arm-smmu-v3.4.auto
>   00000000-00000000 : arm-smmu-v3.4.auto
> 00000000-00000000 : arm-smmu-v3.5.auto
>   00000000-00000000 : arm-smmu-v3.5.auto
> 00000000-00000000 : System RAM
> 00000000-00000000 : System RAM
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
> 00000000-00000000 : System RAM
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
>   00000000-00000000 : reserved
> 00000000-00000000 : PCI Bus 0000:00
>   00000000-00000000 : PCI Bus 0000:01
>     00000000-00000000 : 0000:01:00.0
>     00000000-00000000 : 0000:01:00.1
>     00000000-00000000 : 0000:01:00.0
>     00000000-00000000 : 0000:01:00.1
>     00000000-00000000 : 0000:01:00.0
>     00000000-00000000 : 0000:01:00.1
>   00000000-00000000 : PCI Bus 0000:05
>     00000000-00000000 : 0000:05:00.0
>       00000000-00000000 : bnx2x
>     00000000-00000000 : 0000:05:00.1
>       00000000-00000000 : bnx2x
>     00000000-00000000 : 0000:05:00.0
>       00000000-00000000 : bnx2x
>     00000000-00000000 : 0000:05:00.0
>       00000000-00000000 : bnx2x
>     00000000-00000000 : 0000:05:00.1
>       00000000-00000000 : bnx2x
>     00000000-00000000 : 0000:05:00.1
>       00000000-00000000 : bnx2x
>   00000000-00000000 : PCI Bus 0000:09
>     00000000-00000000 : 0000:09:00.0
>       00000000-00000000 : i40e
>     00000000-00000000 : 0000:09:00.1
>       00000000-00000000 : i40e
>     00000000-00000000 : 0000:09:00.0
>     00000000-00000000 : 0000:09:00.1
>     00000000-00000000 : 0000:09:00.0
>       00000000-00000000 : i40e
>     00000000-00000000 : 0000:09:00.1
>       00000000-00000000 : i40e
>     00000000-00000000 : 0000:09:00.0
>     00000000-00000000 : 0000:09:00.1
>   00000000-00000000 : 0000:00:0f.0
>     00000000-00000000 : xhci-hcd
>   00000000-00000000 : 0000:00:0f.0
>   00000000-00000000 : 0000:00:0f.1
>     00000000-00000000 : xhci-hcd
>   00000000-00000000 : 0000:00:0f.1
>   00000000-00000000 : 0000:00:10.0
>     00000000-00000000 : ahci
>   00000000-00000000 : 0000:00:10.1
>     00000000-00000000 : ahci
> 00000000-00000000 : PCI Bus 0000:80
> 

resending with correct logs (after login as root)

$ cat /proc/iomem
30000000-37ffffff : PCI ECAM
38000000-3fffffff : PCI ECAM
40000000-5fffffff : PCI Bus 0000:00
  40000000-417fffff : PCI Bus 0000:0f
    40000000-417fffff : PCI Bus 0000:10
      40000000-40ffffff : 0000:10:00.0
      41000000-4101ffff : 0000:10:00.0
  41800000-418fffff : PCI Bus 0000:01
    41800000-4183ffff : 0000:01:00.0
    41840000-4187ffff : 0000:01:00.1
  41900000-419fffff : PCI Bus 0000:05
    41900000-4197ffff : 0000:05:00.0
    41980000-419fffff : 0000:05:00.1
  41a00000-41afffff : PCI Bus 0000:09
    41a00000-41a7ffff : 0000:09:00.0
    41a80000-41afffff : 0000:09:00.1
  41b00000-41b0ffff : 0000:00:10.0
    41b00000-41b0ffff : ahci
  41b10000-41b1ffff : 0000:00:10.1
    41b10000-41b1ffff : ahci
60000000-7fffffff : PCI Bus 0000:80
  60000000-600fffff : PCI Bus 0000:83
    60000000-6001ffff : 0000:83:00.0
    60020000-60023fff : 0000:83:00.0
      60020000-60023fff : nvme
  60100000-601fffff : PCI Bus 0000:89
    60100000-6017ffff : 0000:89:00.0
      60100000-6017ffff : e1000e
    60180000-601bffff : 0000:89:00.0
    601c0000-601dffff : 0000:89:00.0
      601c0000-601dffff : e1000e
    601e0000-601e3fff : 0000:89:00.0
      601e0000-601e3fff : e1000e
  60200000-603fffff : PCI Bus 0000:8d
    60200000-602fffff : 0000:8d:00.0
    60300000-6030ffff : 0000:8d:00.0
      60300000-6030ffff : mpt3sas
802f0000-8030ffff : reserved
e6247000-e6247fff : reserved
e6720000-e690ffff : reserved
e6a90000-e6a9ffff : reserved
e6ab0000-e721ffff : reserved
e7240000-e7240fff : reserved
fac00000-fafdffff : reserved
400040400-40004041f : CAV901C:00
400040480-400040567 : CAV901D:00
  400040480-400040567 : CAV901C:00
400040600-40004073b : CAV901E:00
  400040600-40004073b : CAV901C:00
400041400-40004177f : CAV901F:00
  400041400-40004177f : CAV901C:00
402000100-402000fff : CAV9006:00
  402000100-402000fff : CAV9006:00
402020000-40202ffff : ARMH0011:00
  402020000-40202ffff : ARMH0011:00
402300000-40230ffff : arm-smmu-v3.0.auto
  402300000-40230ffff : arm-smmu-v3.0.auto
402320000-40232ffff : arm-smmu-v3.1.auto
  402320000-40232ffff : arm-smmu-v3.1.auto
402340000-40234ffff : arm-smmu-v3.2.auto
  402340000-40234ffff : arm-smmu-v3.2.auto
440040400-44004041f : CAV901C:01
440040480-440040567 : CAV901D:01
  440040480-440040567 : CAV901C:01
440040600-44004073b : CAV901E:01
  440040600-44004073b : CAV901C:01
440041400-44004177f : CAV901F:01
  440041400-44004177f : CAV901C:01
4421a0000-4421affff : CAV9007:06
  4421a0000-4421affff : CAV9007:06
442300000-44230ffff : arm-smmu-v3.3.auto
  442300000-44230ffff : arm-smmu-v3.3.auto
442320000-44232ffff : arm-smmu-v3.4.auto
  442320000-44232ffff : arm-smmu-v3.4.auto
442340000-44234ffff : arm-smmu-v3.5.auto
  442340000-44234ffff : arm-smmu-v3.5.auto
b81200000-c811fffff : System RAM
  b81280000-b8270ffff : Kernel code
  b82710000-b82dfffff : reserved
  b82e00000-b83168fff : Kernel data
  b83169000-baccd7fff : reserved
  c78a00000-c7fffffff : reserved
  c80129000-c801a9fff : reserved
  c801aa000-c809e9fff : reserved
  c809ec000-c809eefff : reserved
  c809ef000-c811fffff : reserved
10000000000-13fffffffff : PCI Bus 0000:00
  10000000000-100013fffff : PCI Bus 0000:01
    10000000000-100007fffff : 0000:01:00.0
    10000800000-10000ffffff : 0000:01:00.1
    10001000000-1000101ffff : 0000:01:00.0
    10001020000-1000103ffff : 0000:01:00.1
    10001040000-1000104ffff : 0000:01:00.0
    10001050000-1000105ffff : 0000:01:00.1
  10001400000-100037fffff : PCI Bus 0000:05
    10001400000-1000140ffff : 0000:05:00.0
      10001400000-1000140ffff : bnx2x
    10001410000-1000141ffff : 0000:05:00.1
      10001410000-1000141ffff : bnx2x
    10001800000-10001ffffff : 0000:05:00.0
      10001800000-10001ffffff : bnx2x
    10002000000-100027fffff : 0000:05:00.0
      10002000000-100027fffff : bnx2x
    10002800000-10002ffffff : 0000:05:00.1
      10002800000-10002ffffff : bnx2x
    10003000000-100037fffff : 0000:05:00.1
      10003000000-100037fffff : bnx2x
  10003800000-100053fffff : PCI Bus 0000:09
    10003800000-10003ffffff : 0000:09:00.0
      10003800000-10003ffffff : i40e
    10004000000-100047fffff : 0000:09:00.1
      10004000000-100047fffff : i40e
    10004800000-10004bfffff : 0000:09:00.0
    10004c00000-10004ffffff : 0000:09:00.1
    10005000000-10005007fff : 0000:09:00.0
      10005000000-10005007fff : i40e
    10005008000-1000500ffff : 0000:09:00.1
      10005008000-1000500ffff : i40e
    10005010000-1000510ffff : 0000:09:00.0
    10005110000-1000520ffff : 0000:09:00.1
  10005400000-1000540ffff : 0000:00:0f.0
    10005400000-1000540ffff : xhci-hcd
  10005410000-1000541ffff : 0000:00:0f.0
  10005420000-1000542ffff : 0000:00:0f.1
    10005420000-1000542ffff : xhci-hcd
  10005430000-1000543ffff : 0000:00:0f.1
  10005440000-1000544ffff : 0000:00:10.0
    10005440000-1000544ffff : ahci
  10005450000-1000545ffff : 0000:00:10.1
    10005450000-1000545ffff : ahci
14000000000-17fffffffff : PCI Bus 0000:80


failure with crashkernel=1G

:~$ dmesg | grep crash
[    0.000000] cannot allocate crashkernel (size:0x40000000)
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
crashkernel=1G crashkernel=250M,low
[   29.326916]     crashkernel=250M,low

--pk

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-03-09 15:51         ` Prabhakar Kushwaha
@ 2020-03-10  1:30           ` chenzhou
  2020-03-10 17:08             ` Prabhakar Kushwaha
  0 siblings, 1 reply; 30+ messages in thread
From: chenzhou @ 2020-03-10  1:30 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: horms, Ganapatrao Prabhakerrao Kulkarni, Will Deacon, xiexiuqi,
	Catalin Marinas, Bhupesh Sharma, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, mingo,
	James Morse, Thomas Gleixner, dyoung, linux-arm-kernel

Hi,

On 2020/3/9 23:51, Prabhakar Kushwaha wrote:
> On 3/9/2020 10:18 AM, Prabhakar Kushwaha wrote:
>> Hi Chen,
>>
>> On Sat, Mar 7, 2020 at 4:36 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>
>>>
>>>
>>> On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
>>>> On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>>
>>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>>>> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
>>>>> size low memory for crash kdump kernel devices firstly and then reserve
>>>>> memory above 4G.
>>>>>
>>>>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>>>>> ---
>>>>>  arch/arm64/kernel/setup.c |  8 +++++++-
>>>>>  arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
>>>>>  2 files changed, 36 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>>>> index 56f6645..04d1c87 100644
>>>>> --- a/arch/arm64/kernel/setup.c
>>>>> +++ b/arch/arm64/kernel/setup.c
>>>>> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
>>>>>                     kernel_data.end <= res->end)
>>>>>                         request_resource(res, &kernel_data);
>>>>>  #ifdef CONFIG_KEXEC_CORE
>>>>> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
>>>>> +               /*
>>>>> +                * Userspace will find "Crash kernel" region in /proc/iomem.
>>>>> +                * Note: the low region is renamed as Crash kernel (low).
>>>>> +                */
>>>>> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>>>>> +                               crashk_low_res.end <= res->end)
>>>>> +                       request_resource(res, &crashk_low_res);
>>>>>                 if (crashk_res.end && crashk_res.start >= res->start &&
>>>>>                     crashk_res.end <= res->end)
>>>>>                         request_resource(res, &crashk_res);
>>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>>> index b65dffd..0d7afd5 100644
>>>>> --- a/arch/arm64/mm/init.c
>>>>> +++ b/arch/arm64/mm/init.c
>>>>> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
>>>>>  {
>>>>>         unsigned long long crash_base, crash_size;
>>>>>         int ret;
>>>>> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
>>>>>
>>>>>         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>>>>>                                 &crash_size, &crash_base);
>>>>> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
>>>>>         if (ret || !crash_size)
>>>>>                 return;
>>>>>
>>>>> +       ret = reserve_crashkernel_low();
>>>>> +       if (!ret && crashk_low_res.end) {
>>>>> +               /*
>>>>> +                * If crashkernel=X,low specified, there may be two regions,
>>>>> +                * we need to make some changes as follows:
>>>>> +                *
>>>>> +                * 1. rename the low region as "Crash kernel (low)"
>>>>> +                * In order to distinct from the high region and make no effect
>>>>> +                * to the use of existing kexec-tools, rename the low region as
>>>>> +                * "Crash kernel (low)".
>>>>> +                *
>>>>> +                * 2. change the upper bound for crash memory
>>>>> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
>>>>> +                *
>>>>> +                * 3. mark the low region as "nomap"
>>>>> +                * The low region is intended to be used for crash dump kernel
>>>>> +                * devices, just mark the low region as "nomap" simply.
>>>>> +                */
>>>>> +               const char *rename = "Crash kernel (low)";
>>>>> +
>>>>> +               crashk_low_res.name = rename;
>>>>> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
>>>>> +               memblock_mark_nomap(crashk_low_res.start,
>>>>> +                                   resource_size(&crashk_low_res));
>>>>> +       }
>>>>> +
>>>>>         crash_size = PAGE_ALIGN(crash_size);
>>>>>
>>>>>         if (crash_base == 0) {
>>>>>                 /* Current arm64 boot protocol requires 2MB alignment */
>>>>> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>>>>> -                               crash_size, SZ_2M);
>>>>> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
>>>>> +                               SZ_2M);
>>>>>                 if (crash_base == 0) {
>>>>>                         pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>>>>>                                 crash_size);
>>>>> --
>>>>
>>>> I tested this patch series on ARM64-ThunderX2 with no issue with
>>>> bootargs crashkenel=X@Y crashkernel=250M,low
>>>>
>>>> $ dmesg | grep crash
>>>> [    0.000000] crashkernel reserved: 0x0000000b81200000 -
>>>> 0x0000000c81200000 (4096 MB)
>>>> [    0.000000] Kernel command line:
>>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
>>>> crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
>>>> [   29.310209]     crashkernel=250M,low
>>>>
>>>> $  kexec -p -i /boot/vmlinuz-`uname -r`
>>>> --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
>>>> $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
>>>>
>>>> But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
>>>> Kernel is not able to allocate memory.
>>>> [    0.000000] cannot allocate crashkernel (size:0x100000000)
>>>> [    0.000000] Kernel command line:
>>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
>>>> crashkernel=250M,low nowatchdog
>>>> [   29.332081]     crashkernel=250M,low
>>>>
>>>> does crashkernel=X@Y mandatory to get allocated beyond 4G?
>>>> am I missing something?
>>>
>>> I can't reproduce the problem in my environment, can you test with other size,
>>> such as "crashkernel=1G crashkernel=250M,low", see if there is the same issue.
>>>
>> I tried 1G also. Same error, please find the logs
>>
>> $ dmesg | grep crash
>> [    0.000000] cannot allocate crashkernel (size:0x40000000)
>> [    0.000000] Kernel command line:
>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
>> crashkernel=1G crashkernel=250M,low
>> [   29.326916]     crashkernel=250M,low
>>
>>
>>> Besides, crashkernel=X@Y isn't mandatory to get allocated beyond 4G,
>>
>> this was my understanding also.
>>
>>> can you show the whole file /proc/iomem.
>>>
>>
>> $ cat /proc/iomem
>> 00000000-00000000 : PCI ECAM
>> 00000000-00000000 : PCI ECAM
>> 00000000-00000000 : PCI Bus 0000:00
>>   00000000-00000000 : PCI Bus 0000:0f
>>     00000000-00000000 : PCI Bus 0000:10
>>       00000000-00000000 : 0000:10:00.0
>>       00000000-00000000 : 0000:10:00.0
>>   00000000-00000000 : PCI Bus 0000:01
>>     00000000-00000000 : 0000:01:00.0
>>     00000000-00000000 : 0000:01:00.1
>>   00000000-00000000 : PCI Bus 0000:05
>>     00000000-00000000 : 0000:05:00.0
>>     00000000-00000000 : 0000:05:00.1
>>   00000000-00000000 : PCI Bus 0000:09
>>     00000000-00000000 : 0000:09:00.0
>>     00000000-00000000 : 0000:09:00.1
>>   00000000-00000000 : 0000:00:10.0
>>     00000000-00000000 : ahci
>>   00000000-00000000 : 0000:00:10.1
>>     00000000-00000000 : ahci
>> 00000000-00000000 : PCI Bus 0000:80
>>   00000000-00000000 : PCI Bus 0000:83
>>     00000000-00000000 : 0000:83:00.0
>>     00000000-00000000 : 0000:83:00.0
>>       00000000-00000000 : nvme
>>   00000000-00000000 : PCI Bus 0000:89
>>     00000000-00000000 : 0000:89:00.0
>>       00000000-00000000 : e1000e
>>     00000000-00000000 : 0000:89:00.0
>>     00000000-00000000 : 0000:89:00.0
>>       00000000-00000000 : e1000e
>>     00000000-00000000 : 0000:89:00.0
>>       00000000-00000000 : e1000e
>>   00000000-00000000 : PCI Bus 0000:8d
>>     00000000-00000000 : 0000:8d:00.0
>>     00000000-00000000 : 0000:8d:00.0
>>       00000000-00000000 : mpt3sas
>> 00000000-00000000 : reserved
>> 00000000-00000000 : System RAM
>>   00000000-00000000 : Kernel code
>>   00000000-00000000 : reserved
>>   00000000-00000000 : Kernel data
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>> 00000000-00000000 : reserved
>> 00000000-00000000 : System RAM
>> 00000000-00000000 : reserved
>> 00000000-00000000 : System RAM
>>   00000000-00000000 : reserved
>> 00000000-00000000 : reserved
>> 00000000-00000000 : System RAM
>>   00000000-00000000 : reserved
>> 00000000-00000000 : reserved
>> 00000000-00000000 : System RAM
>>   00000000-00000000 : reserved
>> 00000000-00000000 : reserved
>> 00000000-00000000 : System RAM
>>   00000000-00000000 : reserved
>> 00000000-00000000 : reserved
>> 00000000-00000000 : System RAM
>>   00000000-00000000 : reserved
>> 00000000-00000000 : CAV901C:00
>> 00000000-00000000 : CAV901D:00
>>   00000000-00000000 : CAV901C:00
>> 00000000-00000000 : CAV901E:00
>>   00000000-00000000 : CAV901C:00
>> 00000000-00000000 : CAV901F:00
>>   00000000-00000000 : CAV901C:00
>> 00000000-00000000 : CAV9006:00
>>   00000000-00000000 : CAV9006:00
>> 00000000-00000000 : ARMH0011:00
>>   00000000-00000000 : ARMH0011:00
>> 00000000-00000000 : arm-smmu-v3.0.auto
>>   00000000-00000000 : arm-smmu-v3.0.auto
>> 00000000-00000000 : arm-smmu-v3.1.auto
>>   00000000-00000000 : arm-smmu-v3.1.auto
>> 00000000-00000000 : arm-smmu-v3.2.auto
>>   00000000-00000000 : arm-smmu-v3.2.auto
>> 00000000-00000000 : CAV901C:01
>> 00000000-00000000 : CAV901D:01
>>   00000000-00000000 : CAV901C:01
>> 00000000-00000000 : CAV901E:01
>>   00000000-00000000 : CAV901C:01
>> 00000000-00000000 : CAV901F:01
>>   00000000-00000000 : CAV901C:01
>> 00000000-00000000 : CAV9007:06
>>   00000000-00000000 : CAV9007:06
>> 00000000-00000000 : arm-smmu-v3.3.auto
>>   00000000-00000000 : arm-smmu-v3.3.auto
>> 00000000-00000000 : arm-smmu-v3.4.auto
>>   00000000-00000000 : arm-smmu-v3.4.auto
>> 00000000-00000000 : arm-smmu-v3.5.auto
>>   00000000-00000000 : arm-smmu-v3.5.auto
>> 00000000-00000000 : System RAM
>> 00000000-00000000 : System RAM
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>> 00000000-00000000 : System RAM
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>>   00000000-00000000 : reserved
>> 00000000-00000000 : PCI Bus 0000:00
>>   00000000-00000000 : PCI Bus 0000:01
>>     00000000-00000000 : 0000:01:00.0
>>     00000000-00000000 : 0000:01:00.1
>>     00000000-00000000 : 0000:01:00.0
>>     00000000-00000000 : 0000:01:00.1
>>     00000000-00000000 : 0000:01:00.0
>>     00000000-00000000 : 0000:01:00.1
>>   00000000-00000000 : PCI Bus 0000:05
>>     00000000-00000000 : 0000:05:00.0
>>       00000000-00000000 : bnx2x
>>     00000000-00000000 : 0000:05:00.1
>>       00000000-00000000 : bnx2x
>>     00000000-00000000 : 0000:05:00.0
>>       00000000-00000000 : bnx2x
>>     00000000-00000000 : 0000:05:00.0
>>       00000000-00000000 : bnx2x
>>     00000000-00000000 : 0000:05:00.1
>>       00000000-00000000 : bnx2x
>>     00000000-00000000 : 0000:05:00.1
>>       00000000-00000000 : bnx2x
>>   00000000-00000000 : PCI Bus 0000:09
>>     00000000-00000000 : 0000:09:00.0
>>       00000000-00000000 : i40e
>>     00000000-00000000 : 0000:09:00.1
>>       00000000-00000000 : i40e
>>     00000000-00000000 : 0000:09:00.0
>>     00000000-00000000 : 0000:09:00.1
>>     00000000-00000000 : 0000:09:00.0
>>       00000000-00000000 : i40e
>>     00000000-00000000 : 0000:09:00.1
>>       00000000-00000000 : i40e
>>     00000000-00000000 : 0000:09:00.0
>>     00000000-00000000 : 0000:09:00.1
>>   00000000-00000000 : 0000:00:0f.0
>>     00000000-00000000 : xhci-hcd
>>   00000000-00000000 : 0000:00:0f.0
>>   00000000-00000000 : 0000:00:0f.1
>>     00000000-00000000 : xhci-hcd
>>   00000000-00000000 : 0000:00:0f.1
>>   00000000-00000000 : 0000:00:10.0
>>     00000000-00000000 : ahci
>>   00000000-00000000 : 0000:00:10.1
>>     00000000-00000000 : ahci
>> 00000000-00000000 : PCI Bus 0000:80
>>
> 
> resending with correct logs (after login as root)
> 
> $ cat /proc/iomem
> 30000000-37ffffff : PCI ECAM
> 38000000-3fffffff : PCI ECAM
> 40000000-5fffffff : PCI Bus 0000:00
>   40000000-417fffff : PCI Bus 0000:0f
>     40000000-417fffff : PCI Bus 0000:10
>       40000000-40ffffff : 0000:10:00.0
>       41000000-4101ffff : 0000:10:00.0
>   41800000-418fffff : PCI Bus 0000:01
>     41800000-4183ffff : 0000:01:00.0
>     41840000-4187ffff : 0000:01:00.1
>   41900000-419fffff : PCI Bus 0000:05
>     41900000-4197ffff : 0000:05:00.0
>     41980000-419fffff : 0000:05:00.1
>   41a00000-41afffff : PCI Bus 0000:09
>     41a00000-41a7ffff : 0000:09:00.0
>     41a80000-41afffff : 0000:09:00.1
>   41b00000-41b0ffff : 0000:00:10.0
>     41b00000-41b0ffff : ahci
>   41b10000-41b1ffff : 0000:00:10.1
>     41b10000-41b1ffff : ahci
> 60000000-7fffffff : PCI Bus 0000:80
>   60000000-600fffff : PCI Bus 0000:83
>     60000000-6001ffff : 0000:83:00.0
>     60020000-60023fff : 0000:83:00.0
>       60020000-60023fff : nvme
>   60100000-601fffff : PCI Bus 0000:89
>     60100000-6017ffff : 0000:89:00.0
>       60100000-6017ffff : e1000e
>     60180000-601bffff : 0000:89:00.0
>     601c0000-601dffff : 0000:89:00.0
>       601c0000-601dffff : e1000e
>     601e0000-601e3fff : 0000:89:00.0
>       601e0000-601e3fff : e1000e
>   60200000-603fffff : PCI Bus 0000:8d
>     60200000-602fffff : 0000:8d:00.0
>     60300000-6030ffff : 0000:8d:00.0
>       60300000-6030ffff : mpt3sas
> 802f0000-8030ffff : reserved
> e6247000-e6247fff : reserved
> e6720000-e690ffff : reserved
> e6a90000-e6a9ffff : reserved
> e6ab0000-e721ffff : reserved
> e7240000-e7240fff : reserved
> fac00000-fafdffff : reserved
> 400040400-40004041f : CAV901C:00
> 400040480-400040567 : CAV901D:00
>   400040480-400040567 : CAV901C:00
> 400040600-40004073b : CAV901E:00
>   400040600-40004073b : CAV901C:00
> 400041400-40004177f : CAV901F:00
>   400041400-40004177f : CAV901C:00
> 402000100-402000fff : CAV9006:00
>   402000100-402000fff : CAV9006:00
> 402020000-40202ffff : ARMH0011:00
>   402020000-40202ffff : ARMH0011:00
> 402300000-40230ffff : arm-smmu-v3.0.auto
>   402300000-40230ffff : arm-smmu-v3.0.auto
> 402320000-40232ffff : arm-smmu-v3.1.auto
>   402320000-40232ffff : arm-smmu-v3.1.auto
> 402340000-40234ffff : arm-smmu-v3.2.auto
>   402340000-40234ffff : arm-smmu-v3.2.auto
> 440040400-44004041f : CAV901C:01
> 440040480-440040567 : CAV901D:01
>   440040480-440040567 : CAV901C:01
> 440040600-44004073b : CAV901E:01
>   440040600-44004073b : CAV901C:01
> 440041400-44004177f : CAV901F:01
>   440041400-44004177f : CAV901C:01
> 4421a0000-4421affff : CAV9007:06
>   4421a0000-4421affff : CAV9007:06
> 442300000-44230ffff : arm-smmu-v3.3.auto
>   442300000-44230ffff : arm-smmu-v3.3.auto
> 442320000-44232ffff : arm-smmu-v3.4.auto
>   442320000-44232ffff : arm-smmu-v3.4.auto
> 442340000-44234ffff : arm-smmu-v3.5.auto
>   442340000-44234ffff : arm-smmu-v3.5.auto
> b81200000-c811fffff : System RAM
>   b81280000-b8270ffff : Kernel code
>   b82710000-b82dfffff : reserved
>   b82e00000-b83168fff : Kernel data
>   b83169000-baccd7fff : reserved
>   c78a00000-c7fffffff : reserved
>   c80129000-c801a9fff : reserved
>   c801aa000-c809e9fff : reserved
>   c809ec000-c809eefff : reserved
>   c809ef000-c811fffff : reserved
> 10000000000-13fffffffff : PCI Bus 0000:00
>   10000000000-100013fffff : PCI Bus 0000:01
>     10000000000-100007fffff : 0000:01:00.0
>     10000800000-10000ffffff : 0000:01:00.1
>     10001000000-1000101ffff : 0000:01:00.0
>     10001020000-1000103ffff : 0000:01:00.1
>     10001040000-1000104ffff : 0000:01:00.0
>     10001050000-1000105ffff : 0000:01:00.1
>   10001400000-100037fffff : PCI Bus 0000:05
>     10001400000-1000140ffff : 0000:05:00.0
>       10001400000-1000140ffff : bnx2x
>     10001410000-1000141ffff : 0000:05:00.1
>       10001410000-1000141ffff : bnx2x
>     10001800000-10001ffffff : 0000:05:00.0
>       10001800000-10001ffffff : bnx2x
>     10002000000-100027fffff : 0000:05:00.0
>       10002000000-100027fffff : bnx2x
>     10002800000-10002ffffff : 0000:05:00.1
>       10002800000-10002ffffff : bnx2x
>     10003000000-100037fffff : 0000:05:00.1
>       10003000000-100037fffff : bnx2x
>   10003800000-100053fffff : PCI Bus 0000:09
>     10003800000-10003ffffff : 0000:09:00.0
>       10003800000-10003ffffff : i40e
>     10004000000-100047fffff : 0000:09:00.1
>       10004000000-100047fffff : i40e
>     10004800000-10004bfffff : 0000:09:00.0
>     10004c00000-10004ffffff : 0000:09:00.1
>     10005000000-10005007fff : 0000:09:00.0
>       10005000000-10005007fff : i40e
>     10005008000-1000500ffff : 0000:09:00.1
>       10005008000-1000500ffff : i40e
>     10005010000-1000510ffff : 0000:09:00.0
>     10005110000-1000520ffff : 0000:09:00.1
>   10005400000-1000540ffff : 0000:00:0f.0
>     10005400000-1000540ffff : xhci-hcd
>   10005410000-1000541ffff : 0000:00:0f.0
>   10005420000-1000542ffff : 0000:00:0f.1
>     10005420000-1000542ffff : xhci-hcd
>   10005430000-1000543ffff : 0000:00:0f.1
>   10005440000-1000544ffff : 0000:00:10.0
>     10005440000-1000544ffff : ahci
>   10005450000-1000545ffff : 0000:00:10.1
>     10005450000-1000545ffff : ahci
> 14000000000-17fffffffff : PCI Bus 0000:80
> 
> 
> failure with crashkernel=1G
> 
> :~$ dmesg | grep crash
> [    0.000000] cannot allocate crashkernel (size:0x40000000)
> [    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
> crashkernel=1G crashkernel=250M,low
> [   29.326916]     crashkernel=250M,low
> 

I read these clearly, found that your all testcases failed including
"crashkernel=4G@0xb81200000 crashkernel=250M,low".

There is no "Crash kernel (low)" in all your tests, that is there is no enough
low memory, in these cases, parameters equal to 4G@0xb81200000, crashkernel=4G
and crashkernel=1G.

crashkernel=4G and crashkernel=1G all failed because there is no low memory.

Thanks,
Chen Zhou

> --pk
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-03-10  1:30           ` chenzhou
@ 2020-03-10 17:08             ` Prabhakar Kushwaha
  2020-03-11  1:44               ` chenzhou
  0 siblings, 1 reply; 30+ messages in thread
From: Prabhakar Kushwaha @ 2020-03-10 17:08 UTC (permalink / raw)
  To: chenzhou
  Cc: horms, Ganapatrao Prabhakerrao Kulkarni, Will Deacon, xiexiuqi,
	Catalin Marinas, Bhupesh Sharma, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, mingo,
	James Morse, Thomas Gleixner, dyoung, linux-arm-kernel

On 3/10/2020 7:00 AM, chenzhou wrote:
> Hi,
> 
> On 2020/3/9 23:51, Prabhakar Kushwaha wrote:
>> On 3/9/2020 10:18 AM, Prabhakar Kushwaha wrote:
>>> Hi Chen,
>>>
>>> On Sat, Mar 7, 2020 at 4:36 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
>>>>> On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>>>
>>>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>>>>> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
>>>>>> size low memory for crash kdump kernel devices firstly and then reserve
>>>>>> memory above 4G.
>>>>>>
>>>>>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>>>>>> ---
>>>>>>  arch/arm64/kernel/setup.c |  8 +++++++-
>>>>>>  arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
>>>>>>  2 files changed, 36 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>>>>> index 56f6645..04d1c87 100644
>>>>>> --- a/arch/arm64/kernel/setup.c
>>>>>> +++ b/arch/arm64/kernel/setup.c
>>>>>> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
>>>>>>                     kernel_data.end <= res->end)
>>>>>>                         request_resource(res, &kernel_data);
>>>>>>  #ifdef CONFIG_KEXEC_CORE
>>>>>> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
>>>>>> +               /*
>>>>>> +                * Userspace will find "Crash kernel" region in /proc/iomem.
>>>>>> +                * Note: the low region is renamed as Crash kernel (low).
>>>>>> +                */
>>>>>> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>>>>>> +                               crashk_low_res.end <= res->end)
>>>>>> +                       request_resource(res, &crashk_low_res);
>>>>>>                 if (crashk_res.end && crashk_res.start >= res->start &&
>>>>>>                     crashk_res.end <= res->end)
>>>>>>                         request_resource(res, &crashk_res);
>>>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>>>> index b65dffd..0d7afd5 100644
>>>>>> --- a/arch/arm64/mm/init.c
>>>>>> +++ b/arch/arm64/mm/init.c
>>>>>> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
>>>>>>  {
>>>>>>         unsigned long long crash_base, crash_size;
>>>>>>         int ret;
>>>>>> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
>>>>>>
>>>>>>         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>>>>>>                                 &crash_size, &crash_base);
>>>>>> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
>>>>>>         if (ret || !crash_size)
>>>>>>                 return;
>>>>>>
>>>>>> +       ret = reserve_crashkernel_low();
>>>>>> +       if (!ret && crashk_low_res.end) {
>>>>>> +               /*
>>>>>> +                * If crashkernel=X,low specified, there may be two regions,
>>>>>> +                * we need to make some changes as follows:
>>>>>> +                *
>>>>>> +                * 1. rename the low region as "Crash kernel (low)"
>>>>>> +                * In order to distinct from the high region and make no effect
>>>>>> +                * to the use of existing kexec-tools, rename the low region as
>>>>>> +                * "Crash kernel (low)".
>>>>>> +                *
>>>>>> +                * 2. change the upper bound for crash memory
>>>>>> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
>>>>>> +                *
>>>>>> +                * 3. mark the low region as "nomap"
>>>>>> +                * The low region is intended to be used for crash dump kernel
>>>>>> +                * devices, just mark the low region as "nomap" simply.
>>>>>> +                */
>>>>>> +               const char *rename = "Crash kernel (low)";
>>>>>> +
>>>>>> +               crashk_low_res.name = rename;
>>>>>> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
>>>>>> +               memblock_mark_nomap(crashk_low_res.start,
>>>>>> +                                   resource_size(&crashk_low_res));
>>>>>> +       }
>>>>>> +
>>>>>>         crash_size = PAGE_ALIGN(crash_size);
>>>>>>
>>>>>>         if (crash_base == 0) {
>>>>>>                 /* Current arm64 boot protocol requires 2MB alignment */
>>>>>> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>>>>>> -                               crash_size, SZ_2M);
>>>>>> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
>>>>>> +                               SZ_2M);
>>>>>>                 if (crash_base == 0) {
>>>>>>                         pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>>>>>>                                 crash_size);
>>>>>> --
>>>>>
>>>>> I tested this patch series on ARM64-ThunderX2 with no issue with
>>>>> bootargs crashkenel=X@Y crashkernel=250M,low
>>>>>
>>>>> $ dmesg | grep crash
>>>>> [    0.000000] crashkernel reserved: 0x0000000b81200000 -
>>>>> 0x0000000c81200000 (4096 MB)
>>>>> [    0.000000] Kernel command line:
>>>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
>>>>> crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
>>>>> [   29.310209]     crashkernel=250M,low
>>>>>
>>>>> $  kexec -p -i /boot/vmlinuz-`uname -r`
>>>>> --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
>>>>> $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
>>>>>
>>>>> But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
>>>>> Kernel is not able to allocate memory.
>>>>> [    0.000000] cannot allocate crashkernel (size:0x100000000)
>>>>> [    0.000000] Kernel command line:
>>>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
>>>>> crashkernel=250M,low nowatchdog
>>>>> [   29.332081]     crashkernel=250M,low
>>>>>
>>>>> does crashkernel=X@Y mandatory to get allocated beyond 4G?
>>>>> am I missing something?
>>>>
>>>> I can't reproduce the problem in my environment, can you test with other size,
>>>> such as "crashkernel=1G crashkernel=250M,low", see if there is the same issue.
>>>>
>>> I tried 1G also. Same error, please find the logs
>>>
>>> $ dmesg | grep crash
>>> [    0.000000] cannot allocate crashkernel (size:0x40000000)
>>> [    0.000000] Kernel command line:
>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
>>> crashkernel=1G crashkernel=250M,low
>>> [   29.326916]     crashkernel=250M,low
>>>
>>>
>>>> Besides, crashkernel=X@Y isn't mandatory to get allocated beyond 4G,
>>>
>>> this was my understanding also.
>>>
>>>> can you show the whole file /proc/iomem.
>>>>
>>>
>>> $ cat /proc/iomem
>>> 00000000-00000000 : PCI ECAM
>>> 00000000-00000000 : PCI ECAM
>>> 00000000-00000000 : PCI Bus 0000:00
>>>   00000000-00000000 : PCI Bus 0000:0f
>>>     00000000-00000000 : PCI Bus 0000:10
>>>       00000000-00000000 : 0000:10:00.0
>>>       00000000-00000000 : 0000:10:00.0
>>>   00000000-00000000 : PCI Bus 0000:01
>>>     00000000-00000000 : 0000:01:00.0
>>>     00000000-00000000 : 0000:01:00.1
>>>   00000000-00000000 : PCI Bus 0000:05
>>>     00000000-00000000 : 0000:05:00.0
>>>     00000000-00000000 : 0000:05:00.1
>>>   00000000-00000000 : PCI Bus 0000:09
>>>     00000000-00000000 : 0000:09:00.0
>>>     00000000-00000000 : 0000:09:00.1
>>>   00000000-00000000 : 0000:00:10.0
>>>     00000000-00000000 : ahci
>>>   00000000-00000000 : 0000:00:10.1
>>>     00000000-00000000 : ahci
>>> 00000000-00000000 : PCI Bus 0000:80
>>>   00000000-00000000 : PCI Bus 0000:83
>>>     00000000-00000000 : 0000:83:00.0
>>>     00000000-00000000 : 0000:83:00.0
>>>       00000000-00000000 : nvme
>>>   00000000-00000000 : PCI Bus 0000:89
>>>     00000000-00000000 : 0000:89:00.0
>>>       00000000-00000000 : e1000e
>>>     00000000-00000000 : 0000:89:00.0
>>>     00000000-00000000 : 0000:89:00.0
>>>       00000000-00000000 : e1000e
>>>     00000000-00000000 : 0000:89:00.0
>>>       00000000-00000000 : e1000e
>>>   00000000-00000000 : PCI Bus 0000:8d
>>>     00000000-00000000 : 0000:8d:00.0
>>>     00000000-00000000 : 0000:8d:00.0
>>>       00000000-00000000 : mpt3sas
>>> 00000000-00000000 : reserved
>>> 00000000-00000000 : System RAM
>>>   00000000-00000000 : Kernel code
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : Kernel data
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>> 00000000-00000000 : reserved
>>> 00000000-00000000 : System RAM
>>> 00000000-00000000 : reserved
>>> 00000000-00000000 : System RAM
>>>   00000000-00000000 : reserved
>>> 00000000-00000000 : reserved
>>> 00000000-00000000 : System RAM
>>>   00000000-00000000 : reserved
>>> 00000000-00000000 : reserved
>>> 00000000-00000000 : System RAM
>>>   00000000-00000000 : reserved
>>> 00000000-00000000 : reserved
>>> 00000000-00000000 : System RAM
>>>   00000000-00000000 : reserved
>>> 00000000-00000000 : reserved
>>> 00000000-00000000 : System RAM
>>>   00000000-00000000 : reserved
>>> 00000000-00000000 : CAV901C:00
>>> 00000000-00000000 : CAV901D:00
>>>   00000000-00000000 : CAV901C:00
>>> 00000000-00000000 : CAV901E:00
>>>   00000000-00000000 : CAV901C:00
>>> 00000000-00000000 : CAV901F:00
>>>   00000000-00000000 : CAV901C:00
>>> 00000000-00000000 : CAV9006:00
>>>   00000000-00000000 : CAV9006:00
>>> 00000000-00000000 : ARMH0011:00
>>>   00000000-00000000 : ARMH0011:00
>>> 00000000-00000000 : arm-smmu-v3.0.auto
>>>   00000000-00000000 : arm-smmu-v3.0.auto
>>> 00000000-00000000 : arm-smmu-v3.1.auto
>>>   00000000-00000000 : arm-smmu-v3.1.auto
>>> 00000000-00000000 : arm-smmu-v3.2.auto
>>>   00000000-00000000 : arm-smmu-v3.2.auto
>>> 00000000-00000000 : CAV901C:01
>>> 00000000-00000000 : CAV901D:01
>>>   00000000-00000000 : CAV901C:01
>>> 00000000-00000000 : CAV901E:01
>>>   00000000-00000000 : CAV901C:01
>>> 00000000-00000000 : CAV901F:01
>>>   00000000-00000000 : CAV901C:01
>>> 00000000-00000000 : CAV9007:06
>>>   00000000-00000000 : CAV9007:06
>>> 00000000-00000000 : arm-smmu-v3.3.auto
>>>   00000000-00000000 : arm-smmu-v3.3.auto
>>> 00000000-00000000 : arm-smmu-v3.4.auto
>>>   00000000-00000000 : arm-smmu-v3.4.auto
>>> 00000000-00000000 : arm-smmu-v3.5.auto
>>>   00000000-00000000 : arm-smmu-v3.5.auto
>>> 00000000-00000000 : System RAM
>>> 00000000-00000000 : System RAM
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>> 00000000-00000000 : System RAM
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>>   00000000-00000000 : reserved
>>> 00000000-00000000 : PCI Bus 0000:00
>>>   00000000-00000000 : PCI Bus 0000:01
>>>     00000000-00000000 : 0000:01:00.0
>>>     00000000-00000000 : 0000:01:00.1
>>>     00000000-00000000 : 0000:01:00.0
>>>     00000000-00000000 : 0000:01:00.1
>>>     00000000-00000000 : 0000:01:00.0
>>>     00000000-00000000 : 0000:01:00.1
>>>   00000000-00000000 : PCI Bus 0000:05
>>>     00000000-00000000 : 0000:05:00.0
>>>       00000000-00000000 : bnx2x
>>>     00000000-00000000 : 0000:05:00.1
>>>       00000000-00000000 : bnx2x
>>>     00000000-00000000 : 0000:05:00.0
>>>       00000000-00000000 : bnx2x
>>>     00000000-00000000 : 0000:05:00.0
>>>       00000000-00000000 : bnx2x
>>>     00000000-00000000 : 0000:05:00.1
>>>       00000000-00000000 : bnx2x
>>>     00000000-00000000 : 0000:05:00.1
>>>       00000000-00000000 : bnx2x
>>>   00000000-00000000 : PCI Bus 0000:09
>>>     00000000-00000000 : 0000:09:00.0
>>>       00000000-00000000 : i40e
>>>     00000000-00000000 : 0000:09:00.1
>>>       00000000-00000000 : i40e
>>>     00000000-00000000 : 0000:09:00.0
>>>     00000000-00000000 : 0000:09:00.1
>>>     00000000-00000000 : 0000:09:00.0
>>>       00000000-00000000 : i40e
>>>     00000000-00000000 : 0000:09:00.1
>>>       00000000-00000000 : i40e
>>>     00000000-00000000 : 0000:09:00.0
>>>     00000000-00000000 : 0000:09:00.1
>>>   00000000-00000000 : 0000:00:0f.0
>>>     00000000-00000000 : xhci-hcd
>>>   00000000-00000000 : 0000:00:0f.0
>>>   00000000-00000000 : 0000:00:0f.1
>>>     00000000-00000000 : xhci-hcd
>>>   00000000-00000000 : 0000:00:0f.1
>>>   00000000-00000000 : 0000:00:10.0
>>>     00000000-00000000 : ahci
>>>   00000000-00000000 : 0000:00:10.1
>>>     00000000-00000000 : ahci
>>> 00000000-00000000 : PCI Bus 0000:80
>>>
>>
>> resending with correct logs (after login as root)
>>
>> $ cat /proc/iomem
>> 30000000-37ffffff : PCI ECAM
>> 38000000-3fffffff : PCI ECAM
>> 40000000-5fffffff : PCI Bus 0000:00
>>   40000000-417fffff : PCI Bus 0000:0f
>>     40000000-417fffff : PCI Bus 0000:10
>>       40000000-40ffffff : 0000:10:00.0
>>       41000000-4101ffff : 0000:10:00.0
>>   41800000-418fffff : PCI Bus 0000:01
>>     41800000-4183ffff : 0000:01:00.0
>>     41840000-4187ffff : 0000:01:00.1
>>   41900000-419fffff : PCI Bus 0000:05
>>     41900000-4197ffff : 0000:05:00.0
>>     41980000-419fffff : 0000:05:00.1
>>   41a00000-41afffff : PCI Bus 0000:09
>>     41a00000-41a7ffff : 0000:09:00.0
>>     41a80000-41afffff : 0000:09:00.1
>>   41b00000-41b0ffff : 0000:00:10.0
>>     41b00000-41b0ffff : ahci
>>   41b10000-41b1ffff : 0000:00:10.1
>>     41b10000-41b1ffff : ahci
>> 60000000-7fffffff : PCI Bus 0000:80
>>   60000000-600fffff : PCI Bus 0000:83
>>     60000000-6001ffff : 0000:83:00.0
>>     60020000-60023fff : 0000:83:00.0
>>       60020000-60023fff : nvme
>>   60100000-601fffff : PCI Bus 0000:89
>>     60100000-6017ffff : 0000:89:00.0
>>       60100000-6017ffff : e1000e
>>     60180000-601bffff : 0000:89:00.0
>>     601c0000-601dffff : 0000:89:00.0
>>       601c0000-601dffff : e1000e
>>     601e0000-601e3fff : 0000:89:00.0
>>       601e0000-601e3fff : e1000e
>>   60200000-603fffff : PCI Bus 0000:8d
>>     60200000-602fffff : 0000:8d:00.0
>>     60300000-6030ffff : 0000:8d:00.0
>>       60300000-6030ffff : mpt3sas
>> 802f0000-8030ffff : reserved
>> e6247000-e6247fff : reserved
>> e6720000-e690ffff : reserved
>> e6a90000-e6a9ffff : reserved
>> e6ab0000-e721ffff : reserved
>> e7240000-e7240fff : reserved
>> fac00000-fafdffff : reserved
>> 400040400-40004041f : CAV901C:00
>> 400040480-400040567 : CAV901D:00
>>   400040480-400040567 : CAV901C:00
>> 400040600-40004073b : CAV901E:00
>>   400040600-40004073b : CAV901C:00
>> 400041400-40004177f : CAV901F:00
>>   400041400-40004177f : CAV901C:00
>> 402000100-402000fff : CAV9006:00
>>   402000100-402000fff : CAV9006:00
>> 402020000-40202ffff : ARMH0011:00
>>   402020000-40202ffff : ARMH0011:00
>> 402300000-40230ffff : arm-smmu-v3.0.auto
>>   402300000-40230ffff : arm-smmu-v3.0.auto
>> 402320000-40232ffff : arm-smmu-v3.1.auto
>>   402320000-40232ffff : arm-smmu-v3.1.auto
>> 402340000-40234ffff : arm-smmu-v3.2.auto
>>   402340000-40234ffff : arm-smmu-v3.2.auto
>> 440040400-44004041f : CAV901C:01
>> 440040480-440040567 : CAV901D:01
>>   440040480-440040567 : CAV901C:01
>> 440040600-44004073b : CAV901E:01
>>   440040600-44004073b : CAV901C:01
>> 440041400-44004177f : CAV901F:01
>>   440041400-44004177f : CAV901C:01
>> 4421a0000-4421affff : CAV9007:06
>>   4421a0000-4421affff : CAV9007:06
>> 442300000-44230ffff : arm-smmu-v3.3.auto
>>   442300000-44230ffff : arm-smmu-v3.3.auto
>> 442320000-44232ffff : arm-smmu-v3.4.auto
>>   442320000-44232ffff : arm-smmu-v3.4.auto
>> 442340000-44234ffff : arm-smmu-v3.5.auto
>>   442340000-44234ffff : arm-smmu-v3.5.auto
>> b81200000-c811fffff : System RAM
>>   b81280000-b8270ffff : Kernel code
>>   b82710000-b82dfffff : reserved
>>   b82e00000-b83168fff : Kernel data
>>   b83169000-baccd7fff : reserved
>>   c78a00000-c7fffffff : reserved
>>   c80129000-c801a9fff : reserved
>>   c801aa000-c809e9fff : reserved
>>   c809ec000-c809eefff : reserved
>>   c809ef000-c811fffff : reserved
>> 10000000000-13fffffffff : PCI Bus 0000:00
>>   10000000000-100013fffff : PCI Bus 0000:01
>>     10000000000-100007fffff : 0000:01:00.0
>>     10000800000-10000ffffff : 0000:01:00.1
>>     10001000000-1000101ffff : 0000:01:00.0
>>     10001020000-1000103ffff : 0000:01:00.1
>>     10001040000-1000104ffff : 0000:01:00.0
>>     10001050000-1000105ffff : 0000:01:00.1
>>   10001400000-100037fffff : PCI Bus 0000:05
>>     10001400000-1000140ffff : 0000:05:00.0
>>       10001400000-1000140ffff : bnx2x
>>     10001410000-1000141ffff : 0000:05:00.1
>>       10001410000-1000141ffff : bnx2x
>>     10001800000-10001ffffff : 0000:05:00.0
>>       10001800000-10001ffffff : bnx2x
>>     10002000000-100027fffff : 0000:05:00.0
>>       10002000000-100027fffff : bnx2x
>>     10002800000-10002ffffff : 0000:05:00.1
>>       10002800000-10002ffffff : bnx2x
>>     10003000000-100037fffff : 0000:05:00.1
>>       10003000000-100037fffff : bnx2x
>>   10003800000-100053fffff : PCI Bus 0000:09
>>     10003800000-10003ffffff : 0000:09:00.0
>>       10003800000-10003ffffff : i40e
>>     10004000000-100047fffff : 0000:09:00.1
>>       10004000000-100047fffff : i40e
>>     10004800000-10004bfffff : 0000:09:00.0
>>     10004c00000-10004ffffff : 0000:09:00.1
>>     10005000000-10005007fff : 0000:09:00.0
>>       10005000000-10005007fff : i40e
>>     10005008000-1000500ffff : 0000:09:00.1
>>       10005008000-1000500ffff : i40e
>>     10005010000-1000510ffff : 0000:09:00.0
>>     10005110000-1000520ffff : 0000:09:00.1
>>   10005400000-1000540ffff : 0000:00:0f.0
>>     10005400000-1000540ffff : xhci-hcd
>>   10005410000-1000541ffff : 0000:00:0f.0
>>   10005420000-1000542ffff : 0000:00:0f.1
>>     10005420000-1000542ffff : xhci-hcd
>>   10005430000-1000543ffff : 0000:00:0f.1
>>   10005440000-1000544ffff : 0000:00:10.0
>>     10005440000-1000544ffff : ahci
>>   10005450000-1000545ffff : 0000:00:10.1
>>     10005450000-1000545ffff : ahci
>> 14000000000-17fffffffff : PCI Bus 0000:80
>>
>>
>> failure with crashkernel=1G
>>
>> :~$ dmesg | grep crash
>> [    0.000000] cannot allocate crashkernel (size:0x40000000)
>> [    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
>> crashkernel=1G crashkernel=250M,low
>> [   29.326916]     crashkernel=250M,low
>>
> 
> I read these clearly, found that your all testcases failed including
> "crashkernel=4G@0xb81200000 crashkernel=250M,low".
> 

I tested 3 scenarios
1) crashkernel=4G@0xb81200000 crashkernel=250M,low  : Allocation Passed
2) crashkernel=4G crashkernel=250M,low : Allocation failed
3) crashkernel=2G crashkernel=250M,low : Allocation failed
4) crashkernel=1G crashkernel=250M,low : Allocation failed


> There is no "Crash kernel (low)" in all your tests, that is there is no enough
> low memory, in these cases, parameters equal to 4G@0xb81200000, crashkernel=4G
> and crashkernel=1G.
> 
> crashkernel=4G and crashkernel=1G all failed because there is no low memory.
> 


As per my understanding, this patch series allow to allocate memory
above 4G range. So For crashkernel = 1G: if no memory found in low
memory range, it will automatically be allocated from above 4G range.


--pk


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  2020-03-10 17:08             ` Prabhakar Kushwaha
@ 2020-03-11  1:44               ` chenzhou
  0 siblings, 0 replies; 30+ messages in thread
From: chenzhou @ 2020-03-11  1:44 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: horms, Ganapatrao Prabhakerrao Kulkarni, Will Deacon, xiexiuqi,
	Catalin Marinas, Bhupesh Sharma, Linux Doc Mailing List,
	kexec mailing list, Linux Kernel Mailing List, mingo,
	James Morse, Thomas Gleixner, dyoung, linux-arm-kernel



On 2020/3/11 1:08, Prabhakar Kushwaha wrote:
> On 3/10/2020 7:00 AM, chenzhou wrote:
>> Hi,
>>
>> On 2020/3/9 23:51, Prabhakar Kushwaha wrote:
>>> On 3/9/2020 10:18 AM, Prabhakar Kushwaha wrote:
>>>> Hi Chen,
>>>>
>>>> On Sat, Mar 7, 2020 at 4:36 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
>>>>>> On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>>>>
>>>>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>>>>>> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
>>>>>>> size low memory for crash kdump kernel devices firstly and then reserve
>>>>>>> memory above 4G.
>>>>>>>
>>>>>>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>>>>>>> ---
>>>>>>>  arch/arm64/kernel/setup.c |  8 +++++++-
>>>>>>>  arch/arm64/mm/init.c      | 31 +++++++++++++++++++++++++++++--
>>>>>>>  2 files changed, 36 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>>>>>> index 56f6645..04d1c87 100644
>>>>>>> --- a/arch/arm64/kernel/setup.c
>>>>>>> +++ b/arch/arm64/kernel/setup.c
>>>>>>> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
>>>>>>>                     kernel_data.end <= res->end)
>>>>>>>                         request_resource(res, &kernel_data);
>>>>>>>  #ifdef CONFIG_KEXEC_CORE
>>>>>>> -               /* Userspace will find "Crash kernel" region in /proc/iomem. */
>>>>>>> +               /*
>>>>>>> +                * Userspace will find "Crash kernel" region in /proc/iomem.
>>>>>>> +                * Note: the low region is renamed as Crash kernel (low).
>>>>>>> +                */
>>>>>>> +               if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>>>>>>> +                               crashk_low_res.end <= res->end)
>>>>>>> +                       request_resource(res, &crashk_low_res);
>>>>>>>                 if (crashk_res.end && crashk_res.start >= res->start &&
>>>>>>>                     crashk_res.end <= res->end)
>>>>>>>                         request_resource(res, &crashk_res);
>>>>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>>>>> index b65dffd..0d7afd5 100644
>>>>>>> --- a/arch/arm64/mm/init.c
>>>>>>> +++ b/arch/arm64/mm/init.c
>>>>>>> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
>>>>>>>  {
>>>>>>>         unsigned long long crash_base, crash_size;
>>>>>>>         int ret;
>>>>>>> +       phys_addr_t crash_max = arm64_dma32_phys_limit;
>>>>>>>
>>>>>>>         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>>>>>>>                                 &crash_size, &crash_base);
>>>>>>> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
>>>>>>>         if (ret || !crash_size)
>>>>>>>                 return;
>>>>>>>
>>>>>>> +       ret = reserve_crashkernel_low();
>>>>>>> +       if (!ret && crashk_low_res.end) {
>>>>>>> +               /*
>>>>>>> +                * If crashkernel=X,low specified, there may be two regions,
>>>>>>> +                * we need to make some changes as follows:
>>>>>>> +                *
>>>>>>> +                * 1. rename the low region as "Crash kernel (low)"
>>>>>>> +                * In order to distinct from the high region and make no effect
>>>>>>> +                * to the use of existing kexec-tools, rename the low region as
>>>>>>> +                * "Crash kernel (low)".
>>>>>>> +                *
>>>>>>> +                * 2. change the upper bound for crash memory
>>>>>>> +                * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
>>>>>>> +                *
>>>>>>> +                * 3. mark the low region as "nomap"
>>>>>>> +                * The low region is intended to be used for crash dump kernel
>>>>>>> +                * devices, just mark the low region as "nomap" simply.
>>>>>>> +                */
>>>>>>> +               const char *rename = "Crash kernel (low)";
>>>>>>> +
>>>>>>> +               crashk_low_res.name = rename;
>>>>>>> +               crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
>>>>>>> +               memblock_mark_nomap(crashk_low_res.start,
>>>>>>> +                                   resource_size(&crashk_low_res));
>>>>>>> +       }
>>>>>>> +
>>>>>>>         crash_size = PAGE_ALIGN(crash_size);
>>>>>>>
>>>>>>>         if (crash_base == 0) {
>>>>>>>                 /* Current arm64 boot protocol requires 2MB alignment */
>>>>>>> -               crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>>>>>>> -                               crash_size, SZ_2M);
>>>>>>> +               crash_base = memblock_find_in_range(0, crash_max, crash_size,
>>>>>>> +                               SZ_2M);
>>>>>>>                 if (crash_base == 0) {
>>>>>>>                         pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>>>>>>>                                 crash_size);
>>>>>>> --
>>>>>>
>>>>>> I tested this patch series on ARM64-ThunderX2 with no issue with
>>>>>> bootargs crashkenel=X@Y crashkernel=250M,low
>>>>>>
>>>>>> $ dmesg | grep crash
>>>>>> [    0.000000] crashkernel reserved: 0x0000000b81200000 -
>>>>>> 0x0000000c81200000 (4096 MB)
>>>>>> [    0.000000] Kernel command line:
>>>>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>>>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
>>>>>> crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
>>>>>> [   29.310209]     crashkernel=250M,low
>>>>>>
>>>>>> $  kexec -p -i /boot/vmlinuz-`uname -r`
>>>>>> --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
>>>>>> $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
>>>>>>
>>>>>> But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
>>>>>> Kernel is not able to allocate memory.
>>>>>> [    0.000000] cannot allocate crashkernel (size:0x100000000)
>>>>>> [    0.000000] Kernel command line:
>>>>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>>>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
>>>>>> crashkernel=250M,low nowatchdog
>>>>>> [   29.332081]     crashkernel=250M,low
>>>>>>
>>>>>> does crashkernel=X@Y mandatory to get allocated beyond 4G?
>>>>>> am I missing something?
>>>>>
>>>>> I can't reproduce the problem in my environment, can you test with other size,
>>>>> such as "crashkernel=1G crashkernel=250M,low", see if there is the same issue.
>>>>>
>>>> I tried 1G also. Same error, please find the logs
>>>>
>>>> $ dmesg | grep crash
>>>> [    0.000000] cannot allocate crashkernel (size:0x40000000)
>>>> [    0.000000] Kernel command line:
>>>> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
>>>> crashkernel=1G crashkernel=250M,low
>>>> [   29.326916]     crashkernel=250M,low
>>>>
>>>>
>>>>> Besides, crashkernel=X@Y isn't mandatory to get allocated beyond 4G,
>>>>
>>>> this was my understanding also.
>>>>
>>>>> can you show the whole file /proc/iomem.
>>>>>
>>>>
>>>> $ cat /proc/iomem
>>>> 00000000-00000000 : PCI ECAM
>>>> 00000000-00000000 : PCI ECAM
>>>> 00000000-00000000 : PCI Bus 0000:00
>>>>   00000000-00000000 : PCI Bus 0000:0f
>>>>     00000000-00000000 : PCI Bus 0000:10
>>>>       00000000-00000000 : 0000:10:00.0
>>>>       00000000-00000000 : 0000:10:00.0
>>>>   00000000-00000000 : PCI Bus 0000:01
>>>>     00000000-00000000 : 0000:01:00.0
>>>>     00000000-00000000 : 0000:01:00.1
>>>>   00000000-00000000 : PCI Bus 0000:05
>>>>     00000000-00000000 : 0000:05:00.0
>>>>     00000000-00000000 : 0000:05:00.1
>>>>   00000000-00000000 : PCI Bus 0000:09
>>>>     00000000-00000000 : 0000:09:00.0
>>>>     00000000-00000000 : 0000:09:00.1
>>>>   00000000-00000000 : 0000:00:10.0
>>>>     00000000-00000000 : ahci
>>>>   00000000-00000000 : 0000:00:10.1
>>>>     00000000-00000000 : ahci
>>>> 00000000-00000000 : PCI Bus 0000:80
>>>>   00000000-00000000 : PCI Bus 0000:83
>>>>     00000000-00000000 : 0000:83:00.0
>>>>     00000000-00000000 : 0000:83:00.0
>>>>       00000000-00000000 : nvme
>>>>   00000000-00000000 : PCI Bus 0000:89
>>>>     00000000-00000000 : 0000:89:00.0
>>>>       00000000-00000000 : e1000e
>>>>     00000000-00000000 : 0000:89:00.0
>>>>     00000000-00000000 : 0000:89:00.0
>>>>       00000000-00000000 : e1000e
>>>>     00000000-00000000 : 0000:89:00.0
>>>>       00000000-00000000 : e1000e
>>>>   00000000-00000000 : PCI Bus 0000:8d
>>>>     00000000-00000000 : 0000:8d:00.0
>>>>     00000000-00000000 : 0000:8d:00.0
>>>>       00000000-00000000 : mpt3sas
>>>> 00000000-00000000 : reserved
>>>> 00000000-00000000 : System RAM
>>>>   00000000-00000000 : Kernel code
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : Kernel data
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>> 00000000-00000000 : reserved
>>>> 00000000-00000000 : System RAM
>>>> 00000000-00000000 : reserved
>>>> 00000000-00000000 : System RAM
>>>>   00000000-00000000 : reserved
>>>> 00000000-00000000 : reserved
>>>> 00000000-00000000 : System RAM
>>>>   00000000-00000000 : reserved
>>>> 00000000-00000000 : reserved
>>>> 00000000-00000000 : System RAM
>>>>   00000000-00000000 : reserved
>>>> 00000000-00000000 : reserved
>>>> 00000000-00000000 : System RAM
>>>>   00000000-00000000 : reserved
>>>> 00000000-00000000 : reserved
>>>> 00000000-00000000 : System RAM
>>>>   00000000-00000000 : reserved
>>>> 00000000-00000000 : CAV901C:00
>>>> 00000000-00000000 : CAV901D:00
>>>>   00000000-00000000 : CAV901C:00
>>>> 00000000-00000000 : CAV901E:00
>>>>   00000000-00000000 : CAV901C:00
>>>> 00000000-00000000 : CAV901F:00
>>>>   00000000-00000000 : CAV901C:00
>>>> 00000000-00000000 : CAV9006:00
>>>>   00000000-00000000 : CAV9006:00
>>>> 00000000-00000000 : ARMH0011:00
>>>>   00000000-00000000 : ARMH0011:00
>>>> 00000000-00000000 : arm-smmu-v3.0.auto
>>>>   00000000-00000000 : arm-smmu-v3.0.auto
>>>> 00000000-00000000 : arm-smmu-v3.1.auto
>>>>   00000000-00000000 : arm-smmu-v3.1.auto
>>>> 00000000-00000000 : arm-smmu-v3.2.auto
>>>>   00000000-00000000 : arm-smmu-v3.2.auto
>>>> 00000000-00000000 : CAV901C:01
>>>> 00000000-00000000 : CAV901D:01
>>>>   00000000-00000000 : CAV901C:01
>>>> 00000000-00000000 : CAV901E:01
>>>>   00000000-00000000 : CAV901C:01
>>>> 00000000-00000000 : CAV901F:01
>>>>   00000000-00000000 : CAV901C:01
>>>> 00000000-00000000 : CAV9007:06
>>>>   00000000-00000000 : CAV9007:06
>>>> 00000000-00000000 : arm-smmu-v3.3.auto
>>>>   00000000-00000000 : arm-smmu-v3.3.auto
>>>> 00000000-00000000 : arm-smmu-v3.4.auto
>>>>   00000000-00000000 : arm-smmu-v3.4.auto
>>>> 00000000-00000000 : arm-smmu-v3.5.auto
>>>>   00000000-00000000 : arm-smmu-v3.5.auto
>>>> 00000000-00000000 : System RAM
>>>> 00000000-00000000 : System RAM
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>> 00000000-00000000 : System RAM
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>>   00000000-00000000 : reserved
>>>> 00000000-00000000 : PCI Bus 0000:00
>>>>   00000000-00000000 : PCI Bus 0000:01
>>>>     00000000-00000000 : 0000:01:00.0
>>>>     00000000-00000000 : 0000:01:00.1
>>>>     00000000-00000000 : 0000:01:00.0
>>>>     00000000-00000000 : 0000:01:00.1
>>>>     00000000-00000000 : 0000:01:00.0
>>>>     00000000-00000000 : 0000:01:00.1
>>>>   00000000-00000000 : PCI Bus 0000:05
>>>>     00000000-00000000 : 0000:05:00.0
>>>>       00000000-00000000 : bnx2x
>>>>     00000000-00000000 : 0000:05:00.1
>>>>       00000000-00000000 : bnx2x
>>>>     00000000-00000000 : 0000:05:00.0
>>>>       00000000-00000000 : bnx2x
>>>>     00000000-00000000 : 0000:05:00.0
>>>>       00000000-00000000 : bnx2x
>>>>     00000000-00000000 : 0000:05:00.1
>>>>       00000000-00000000 : bnx2x
>>>>     00000000-00000000 : 0000:05:00.1
>>>>       00000000-00000000 : bnx2x
>>>>   00000000-00000000 : PCI Bus 0000:09
>>>>     00000000-00000000 : 0000:09:00.0
>>>>       00000000-00000000 : i40e
>>>>     00000000-00000000 : 0000:09:00.1
>>>>       00000000-00000000 : i40e
>>>>     00000000-00000000 : 0000:09:00.0
>>>>     00000000-00000000 : 0000:09:00.1
>>>>     00000000-00000000 : 0000:09:00.0
>>>>       00000000-00000000 : i40e
>>>>     00000000-00000000 : 0000:09:00.1
>>>>       00000000-00000000 : i40e
>>>>     00000000-00000000 : 0000:09:00.0
>>>>     00000000-00000000 : 0000:09:00.1
>>>>   00000000-00000000 : 0000:00:0f.0
>>>>     00000000-00000000 : xhci-hcd
>>>>   00000000-00000000 : 0000:00:0f.0
>>>>   00000000-00000000 : 0000:00:0f.1
>>>>     00000000-00000000 : xhci-hcd
>>>>   00000000-00000000 : 0000:00:0f.1
>>>>   00000000-00000000 : 0000:00:10.0
>>>>     00000000-00000000 : ahci
>>>>   00000000-00000000 : 0000:00:10.1
>>>>     00000000-00000000 : ahci
>>>> 00000000-00000000 : PCI Bus 0000:80
>>>>
>>>
>>> resending with correct logs (after login as root)
>>>
>>> $ cat /proc/iomem
>>> 30000000-37ffffff : PCI ECAM
>>> 38000000-3fffffff : PCI ECAM
>>> 40000000-5fffffff : PCI Bus 0000:00
>>>   40000000-417fffff : PCI Bus 0000:0f
>>>     40000000-417fffff : PCI Bus 0000:10
>>>       40000000-40ffffff : 0000:10:00.0
>>>       41000000-4101ffff : 0000:10:00.0
>>>   41800000-418fffff : PCI Bus 0000:01
>>>     41800000-4183ffff : 0000:01:00.0
>>>     41840000-4187ffff : 0000:01:00.1
>>>   41900000-419fffff : PCI Bus 0000:05
>>>     41900000-4197ffff : 0000:05:00.0
>>>     41980000-419fffff : 0000:05:00.1
>>>   41a00000-41afffff : PCI Bus 0000:09
>>>     41a00000-41a7ffff : 0000:09:00.0
>>>     41a80000-41afffff : 0000:09:00.1
>>>   41b00000-41b0ffff : 0000:00:10.0
>>>     41b00000-41b0ffff : ahci
>>>   41b10000-41b1ffff : 0000:00:10.1
>>>     41b10000-41b1ffff : ahci
>>> 60000000-7fffffff : PCI Bus 0000:80
>>>   60000000-600fffff : PCI Bus 0000:83
>>>     60000000-6001ffff : 0000:83:00.0
>>>     60020000-60023fff : 0000:83:00.0
>>>       60020000-60023fff : nvme
>>>   60100000-601fffff : PCI Bus 0000:89
>>>     60100000-6017ffff : 0000:89:00.0
>>>       60100000-6017ffff : e1000e
>>>     60180000-601bffff : 0000:89:00.0
>>>     601c0000-601dffff : 0000:89:00.0
>>>       601c0000-601dffff : e1000e
>>>     601e0000-601e3fff : 0000:89:00.0
>>>       601e0000-601e3fff : e1000e
>>>   60200000-603fffff : PCI Bus 0000:8d
>>>     60200000-602fffff : 0000:8d:00.0
>>>     60300000-6030ffff : 0000:8d:00.0
>>>       60300000-6030ffff : mpt3sas
>>> 802f0000-8030ffff : reserved
>>> e6247000-e6247fff : reserved
>>> e6720000-e690ffff : reserved
>>> e6a90000-e6a9ffff : reserved
>>> e6ab0000-e721ffff : reserved
>>> e7240000-e7240fff : reserved
>>> fac00000-fafdffff : reserved
>>> 400040400-40004041f : CAV901C:00
>>> 400040480-400040567 : CAV901D:00
>>>   400040480-400040567 : CAV901C:00
>>> 400040600-40004073b : CAV901E:00
>>>   400040600-40004073b : CAV901C:00
>>> 400041400-40004177f : CAV901F:00
>>>   400041400-40004177f : CAV901C:00
>>> 402000100-402000fff : CAV9006:00
>>>   402000100-402000fff : CAV9006:00
>>> 402020000-40202ffff : ARMH0011:00
>>>   402020000-40202ffff : ARMH0011:00
>>> 402300000-40230ffff : arm-smmu-v3.0.auto
>>>   402300000-40230ffff : arm-smmu-v3.0.auto
>>> 402320000-40232ffff : arm-smmu-v3.1.auto
>>>   402320000-40232ffff : arm-smmu-v3.1.auto
>>> 402340000-40234ffff : arm-smmu-v3.2.auto
>>>   402340000-40234ffff : arm-smmu-v3.2.auto
>>> 440040400-44004041f : CAV901C:01
>>> 440040480-440040567 : CAV901D:01
>>>   440040480-440040567 : CAV901C:01
>>> 440040600-44004073b : CAV901E:01
>>>   440040600-44004073b : CAV901C:01
>>> 440041400-44004177f : CAV901F:01
>>>   440041400-44004177f : CAV901C:01
>>> 4421a0000-4421affff : CAV9007:06
>>>   4421a0000-4421affff : CAV9007:06
>>> 442300000-44230ffff : arm-smmu-v3.3.auto
>>>   442300000-44230ffff : arm-smmu-v3.3.auto
>>> 442320000-44232ffff : arm-smmu-v3.4.auto
>>>   442320000-44232ffff : arm-smmu-v3.4.auto
>>> 442340000-44234ffff : arm-smmu-v3.5.auto
>>>   442340000-44234ffff : arm-smmu-v3.5.auto
>>> b81200000-c811fffff : System RAM
>>>   b81280000-b8270ffff : Kernel code
>>>   b82710000-b82dfffff : reserved
>>>   b82e00000-b83168fff : Kernel data
>>>   b83169000-baccd7fff : reserved
>>>   c78a00000-c7fffffff : reserved
>>>   c80129000-c801a9fff : reserved
>>>   c801aa000-c809e9fff : reserved
>>>   c809ec000-c809eefff : reserved
>>>   c809ef000-c811fffff : reserved
>>> 10000000000-13fffffffff : PCI Bus 0000:00
>>>   10000000000-100013fffff : PCI Bus 0000:01
>>>     10000000000-100007fffff : 0000:01:00.0
>>>     10000800000-10000ffffff : 0000:01:00.1
>>>     10001000000-1000101ffff : 0000:01:00.0
>>>     10001020000-1000103ffff : 0000:01:00.1
>>>     10001040000-1000104ffff : 0000:01:00.0
>>>     10001050000-1000105ffff : 0000:01:00.1
>>>   10001400000-100037fffff : PCI Bus 0000:05
>>>     10001400000-1000140ffff : 0000:05:00.0
>>>       10001400000-1000140ffff : bnx2x
>>>     10001410000-1000141ffff : 0000:05:00.1
>>>       10001410000-1000141ffff : bnx2x
>>>     10001800000-10001ffffff : 0000:05:00.0
>>>       10001800000-10001ffffff : bnx2x
>>>     10002000000-100027fffff : 0000:05:00.0
>>>       10002000000-100027fffff : bnx2x
>>>     10002800000-10002ffffff : 0000:05:00.1
>>>       10002800000-10002ffffff : bnx2x
>>>     10003000000-100037fffff : 0000:05:00.1
>>>       10003000000-100037fffff : bnx2x
>>>   10003800000-100053fffff : PCI Bus 0000:09
>>>     10003800000-10003ffffff : 0000:09:00.0
>>>       10003800000-10003ffffff : i40e
>>>     10004000000-100047fffff : 0000:09:00.1
>>>       10004000000-100047fffff : i40e
>>>     10004800000-10004bfffff : 0000:09:00.0
>>>     10004c00000-10004ffffff : 0000:09:00.1
>>>     10005000000-10005007fff : 0000:09:00.0
>>>       10005000000-10005007fff : i40e
>>>     10005008000-1000500ffff : 0000:09:00.1
>>>       10005008000-1000500ffff : i40e
>>>     10005010000-1000510ffff : 0000:09:00.0
>>>     10005110000-1000520ffff : 0000:09:00.1
>>>   10005400000-1000540ffff : 0000:00:0f.0
>>>     10005400000-1000540ffff : xhci-hcd
>>>   10005410000-1000541ffff : 0000:00:0f.0
>>>   10005420000-1000542ffff : 0000:00:0f.1
>>>     10005420000-1000542ffff : xhci-hcd
>>>   10005430000-1000543ffff : 0000:00:0f.1
>>>   10005440000-1000544ffff : 0000:00:10.0
>>>     10005440000-1000544ffff : ahci
>>>   10005450000-1000545ffff : 0000:00:10.1
>>>     10005450000-1000545ffff : ahci
>>> 14000000000-17fffffffff : PCI Bus 0000:80
>>>
>>>
>>> failure with crashkernel=1G
>>>
>>> :~$ dmesg | grep crash
>>> [    0.000000] cannot allocate crashkernel (size:0x40000000)
>>> [    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
>>> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro nowatchdog earlycon
>>> crashkernel=1G crashkernel=250M,low
>>> [   29.326916]     crashkernel=250M,low
>>>
>>
>> I read these clearly, found that your all testcases failed including
>> "crashkernel=4G@0xb81200000 crashkernel=250M,low".
>>
> 
> I tested 3 scenarios
> 1) crashkernel=4G@0xb81200000 crashkernel=250M,low  : Allocation Passed
> 2) crashkernel=4G crashkernel=250M,low : Allocation failed
> 3) crashkernel=2G crashkernel=250M,low : Allocation failed
> 4) crashkernel=1G crashkernel=250M,low : Allocation failed
> 
> 
>> There is no "Crash kernel (low)" in all your tests, that is there is no enough
>> low memory, in these cases, parameters equal to 4G@0xb81200000, crashkernel=4G
>> and crashkernel=1G.
>>
>> crashkernel=4G and crashkernel=1G all failed because there is no low memory.
>>
> 
> 
> As per my understanding, this patch series allow to allocate memory
> above 4G range. So For crashkernel = 1G: if no memory found in low
> memory range, it will automatically be allocated from above 4G range.
> 

Yeah, you are right. We also discussed about this, and i did like this in
my v5(https://lkml.org/lkml/2019/5/6/1361). Current version is simplified.

Thanks,
Chen Zhou

> 
> --pk
> 
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump
  2019-12-23 15:23 [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
                   ` (3 preceding siblings ...)
  2019-12-23 15:23 ` [PATCH v7 4/4] kdump: update Documentation about crashkernel on arm64 Chen Zhou
@ 2020-03-26  3:09 ` Chen Zhou
  2020-05-19 10:21   ` Arnd Bergmann
  4 siblings, 1 reply; 30+ messages in thread
From: Chen Zhou @ 2020-03-26  3:09 UTC (permalink / raw)
  To: tglx, mingo, catalin.marinas, will, james.morse, dyoung,
	bhsharma, john.p.donnelly, pkushwaha
  Cc: horms, kexec, linux-kernel, linux-arm-kernel, linux-doc

Hi all,

Friendly ping...

On 2019/12/23 23:23, Chen Zhou wrote:
> This patch series enable reserving crashkernel above 4G in arm64.
> 
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> when there is no enough low memory.
> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> will boot failure because there is no low memory available for allocation.
> 
> To solve these issues, introduce crashkernel=X,low to reserve specified
> size low memory.
> Crashkernel=X tries to reserve memory for the crash dump kernel under
> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> size low memory for crash kdump kernel devices firstly and then reserve
> memory above 4G.
> 
> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
> is specified simultaneously, kernel should reserve specified size low memory
> for crash dump kernel devices. So there may be two crash kernel regions, one
> is below 4G, the other is above 4G.
> In order to distinct from the high region and make no effect to the use of
> kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
> 
> Besides, we need to modify kexec-tools:
> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
> 
> The previous changes and discussions can be retrieved from:
> 
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
> 
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
> 
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> 
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
> 
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
> 
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
> 
> [1]: http://lists.infradead.org/pipermail/kexec/2019-August/023569.html
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> 
> Chen Zhou (4):
>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>   arm64: kdump: add memory for devices by DT property, low-memory-range
>   kdump: update Documentation about crashkernel on arm64
> 
>  Documentation/admin-guide/kdump/kdump.rst       | 13 +++-
>  Documentation/admin-guide/kernel-parameters.txt | 12 +++-
>  arch/arm64/kernel/setup.c                       |  8 ++-
>  arch/arm64/mm/init.c                            | 61 ++++++++++++++++-
>  arch/x86/kernel/setup.c                         | 62 ++----------------
>  include/linux/crash_core.h                      |  3 +
>  include/linux/kexec.h                           |  2 -
>  kernel/crash_core.c                             | 87 +++++++++++++++++++++++++
>  kernel/kexec_core.c                             | 17 -----
>  9 files changed, 183 insertions(+), 82 deletions(-)
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2019-12-31  1:39         ` Chen Zhou
@ 2020-04-03  7:13           ` Chen Zhou
  0 siblings, 0 replies; 30+ messages in thread
From: Chen Zhou @ 2020-04-03  7:13 UTC (permalink / raw)
  To: Dave Young
  Cc: kbuild test robot, horms, linux-doc, catalin.marinas, bhsharma,
	xiexiuqi, kexec, linux-kernel, mingo, james.morse, tglx, will,
	linux-arm-kernel

Hi Dave,

On 2019/12/31 9:39, Chen Zhou wrote:
> Hi Dave,
> 
> On 2019/12/28 17:32, Dave Young wrote:
>> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>>> Hi Dave
>>>
>>> On 2019/12/27 13:54, Dave Young wrote:
>>>> Hi,
>>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>>
>>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>>
>>>> Do you have any reason for the difference?  I'd expect we have same
>>>> logic if possible and remove some of the ifdefs.
>>>
>>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>>> to reserve low memory.
>>>
>>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>>> and this needs extra considerations.
>>
>> Sorry that I did not read the old thread details and thought that is
>> arch dependent.  But rethink about that, it would be better that we can
>> have same semantic about crashkernel parameters across arches.  If we
>> make them different then it causes confusion, especially for
>> distributions.
>>
>> OTOH, I thought if we reserve high memory then the low memory should be
>> needed.  There might be some exceptions, but I do not know the exact
>> one, can we make the behavior same, and special case those systems which
>> do not need low memory reservation.
>>
> I thought like this and did implement with crashkernel parameters arch independent.
> This is my v4: https://lkml.org/lkml/2019/5/6/1361, i implemented according to x86_64's
> behavior.
> 
>>>
>>> previous discusses:
>>> 	https://lkml.org/lkml/2019/6/5/670
>>> 	https://lkml.org/lkml/2019/6/13/229
>>
>> Another concern from James:
>> "
>> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
>> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
>> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
>> "
>>
>> The kexec-tools code is iterating all "Crash kernel" ranges and add them
>> in an array.  In X86 code, it uses the higher range to locate memory.
> 
> We also discussed about this: https://lkml.org/lkml/2019/6/13/227.
> I guess James's opinion is that kexec-tools should take forward compatibility into account.
> "But we can't rely on people updating user-space when they update the kernel!" -- James
> 
>>
>>>
>>>>
>>>>>
>>>>> Reported-by: kbuild test robot <lkp@intel.com>
>>>>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>>>>> ---
>>>>>  arch/x86/kernel/setup.c    | 62 ++++-----------------------------
>>>>>  include/linux/crash_core.h |  3 ++
>>>>>  include/linux/kexec.h      |  2 --
>>>>>  kernel/crash_core.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  kernel/kexec_core.c        | 17 ---------
>>>>>  5 files changed, 96 insertions(+), 75 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>>>> index cedfe20..5f38942 100644
>>>>> --- a/arch/x86/kernel/setup.c
>>>>> +++ b/arch/x86/kernel/setup.c
>>>>> @@ -486,59 +486,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>>>>>  # define CRASH_ADDR_HIGH_MAX	SZ_64T
>>>>>  #endif
>>>>>  
>>>>> -static int __init reserve_crashkernel_low(void)
>>>>> -{
>>>>> -#ifdef CONFIG_X86_64
>>>>> -	unsigned long long base, low_base = 0, low_size = 0;
>>>>> -	unsigned long total_low_mem;
>>>>> -	int ret;
>>>>> -
>>>>> -	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>>>> -
>>>>> -	/* crashkernel=Y,low */
>>>>> -	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
>>>>> -	if (ret) {
>>>>> -		/*
>>>>> -		 * two parts from kernel/dma/swiotlb.c:
>>>>> -		 * -swiotlb size: user-specified with swiotlb= or default.
>>>>> -		 *
>>>>> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>>>>> -		 * to 8M for other buffers that may need to stay low too. Also
>>>>> -		 * make sure we allocate enough extra low memory so that we
>>>>> -		 * don't run out of DMA buffers for 32-bit devices.
>>>>> -		 */
>>>>> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>>>>> -	} else {
>>>>> -		/* passed with crashkernel=0,low ? */
>>>>> -		if (!low_size)
>>>>> -			return 0;
>>>>> -	}
>>>>> -
>>>>> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>>>> -	if (!low_base) {
>>>>> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>>>> -		       (unsigned long)(low_size >> 20));
>>>>> -		return -ENOMEM;
>>>>> -	}
>>>>> -
>>>>> -	ret = memblock_reserve(low_base, low_size);
>>>>> -	if (ret) {
>>>>> -		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
>>>>> -		return ret;
>>>>> -	}
>>>>> -
>>>>> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
>>>>> -		(unsigned long)(low_size >> 20),
>>>>> -		(unsigned long)(low_base >> 20),
>>>>> -		(unsigned long)(total_low_mem >> 20));
>>>>> -
>>>>> -	crashk_low_res.start = low_base;
>>>>> -	crashk_low_res.end   = low_base + low_size - 1;
>>>>> -	insert_resource(&iomem_resource, &crashk_low_res);
>>>>> -#endif
>>>>> -	return 0;
>>>>> -}
>>>>> -
>>>>>  static void __init reserve_crashkernel(void)
>>>>>  {
>>>>>  	unsigned long long crash_size, crash_base, total_mem;
>>>>> @@ -602,9 +549,12 @@ static void __init reserve_crashkernel(void)
>>>>>  		return;
>>>>>  	}
>>>>>  
>>>>> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>>>>> -		memblock_free(crash_base, crash_size);
>>>>> -		return;
>>>>> +	if (crash_base >= (1ULL << 32)) {
>>>>> +		if (reserve_crashkernel_low()) {
>>>>> +			memblock_free(crash_base, crash_size);
>>>>> +			return;
>>>>> +		}
>>>>> +		insert_resource(&iomem_resource, &crashk_low_res);
>>>>
>>>> Some specific reason to move insert_resouce out of the
>>>> reserve_crashkernel_low function?
>>>
>>> No specific reason.
>>> I just exposed arm64 "Crash kernel low" in request_standard_resources() as other resources,
>>> so did this change.
>>
>> Ok.
>>
>>>
>>>>
>>>>>  	}
>>>>>  
>>>>>  	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
>>>>> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
>>>>> index 525510a..4df8c0b 100644
>>>>> --- a/include/linux/crash_core.h
>>>>> +++ b/include/linux/crash_core.h
>>>>> @@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
>>>>>  extern unsigned char *vmcoreinfo_data;
>>>>>  extern size_t vmcoreinfo_size;
>>>>>  extern u32 *vmcoreinfo_note;
>>>>> +extern struct resource crashk_res;
>>>>> +extern struct resource crashk_low_res;
>>>>>  
>>>>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>>>>  			  void *data, size_t data_len);
>>>>> @@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>>>>>  		unsigned long long *crash_size, unsigned long long *crash_base);
>>>>>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>>>>>  		unsigned long long *crash_size, unsigned long long *crash_base);
>>>>> +int __init reserve_crashkernel_low(void);
>>>>>  
>>>>>  #endif /* LINUX_CRASH_CORE_H */
>>>>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>>>>> index 1776eb2..5d5d963 100644
>>>>> --- a/include/linux/kexec.h
>>>>> +++ b/include/linux/kexec.h
>>>>> @@ -330,8 +330,6 @@ extern int kexec_load_disabled;
>>>>>  
>>>>>  /* Location of a reserved region to hold the crash kernel.
>>>>>   */
>>>>> -extern struct resource crashk_res;
>>>>> -extern struct resource crashk_low_res;
>>>>>  extern note_buf_t __percpu *crash_notes;
>>>>>  
>>>>>  /* flag to track if kexec reboot is in progress */
>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>> index 9f1557b..eb72fd6 100644
>>>>> --- a/kernel/crash_core.c
>>>>> +++ b/kernel/crash_core.c
>>>>> @@ -7,6 +7,8 @@
>>>>>  #include <linux/crash_core.h>
>>>>>  #include <linux/utsname.h>
>>>>>  #include <linux/vmalloc.h>
>>>>> +#include <linux/memblock.h>
>>>>> +#include <linux/swiotlb.h>
>>>>>  
>>>>>  #include <asm/page.h>
>>>>>  #include <asm/sections.h>
>>>>> @@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
>>>>>  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>>>>>  static unsigned char *vmcoreinfo_data_safecopy;
>>>>>  
>>>>> +/* Location of the reserved area for the crash kernel */
>>>>> +struct resource crashk_res = {
>>>>> +	.name  = "Crash kernel",
>>>>> +	.start = 0,
>>>>> +	.end   = 0,
>>>>> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>>> +	.desc  = IORES_DESC_CRASH_KERNEL
>>>>> +};
>>>>> +struct resource crashk_low_res = {
>>>>> +	.name  = "Crash kernel",
>>>>> +	.start = 0,
>>>>> +	.end   = 0,
>>>>> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>>> +	.desc  = IORES_DESC_CRASH_KERNEL
>>>>> +};
>>>>> +
>>>>>  /*
>>>>>   * parsing the "crashkernel" commandline
>>>>>   *
>>>>> @@ -292,6 +310,75 @@ int __init parse_crashkernel_low(char *cmdline,
>>>>>  				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
>>>>>  }
>>>>>  
>>>>> +#if defined(CONFIG_X86_64)
>>>>> +#define CRASH_ALIGN		SZ_16M
>>>>> +#elif defined(CONFIG_ARM64)
>>>>> +#define CRASH_ALIGN		SZ_2M
>>>>> +#endif
>>>>
>>>> I think no need to have the #ifdef, although I can not think out of
>>>> reason we have 16M for X86, maybe move it to 2M as well if no other
>>>> objections.  Then it will be easier to reserve crashkernel successfully
>>>> considering nowadays we have KASLR and other stuff it becomes harder.
>>>
>>> I also don't figure out why it is 16M in x86.
>>
>> IMHO, if we do not know why and in theory it should work with 2M, can
>> you do some basic testing and move it to 2M?
>>
>> We can easily move back to 16M if someone really report something, but
>> if we do not change it will always stay there but we do not know why.
> 
> Ok. I will do some test later.

Recently, i tested with 2M alignment in x86 and the system works well.

Besides, i found memblock_find_in_range() in reserve_crashkernel()
restrict the lower bound of the range to "CRASH_ALIGN".
If we can make memblock_find_in_range() search from the start of memory?

The code is as follows:

static void __init reserve_crashkernel(void)
{
	...
	if (!high)
            crash_base = memblock_find_in_range(CRASH_ALIGN,
                        CRASH_ADDR_LOW_MAX,
                        crash_size, CRASH_ALIGN);
        if (!crash_base)
            crash_base = memblock_find_in_range(CRASH_ALIGN,
                        CRASH_ADDR_HIGH_MAX,
                        crash_size, CRASH_ALIGN);

Thanks,
Chen Zhou
> 
>>
>>>
>>>>
>>>>> +
>>>>> +int __init reserve_crashkernel_low(void)
>>>>> +{
>>>>> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
>>>>> +	unsigned long long base, low_base = 0, low_size = 0;
>>>>> +	unsigned long total_low_mem;
>>>>> +	int ret;
>>>>> +
>>>>> +	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>>>> +
>>>>> +	/* crashkernel=Y,low */
>>>>> +	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
>>>>> +			&base);
>>>>> +	if (ret) {
>>>>> +#ifdef CONFIG_X86_64
>>>>> +		/*
>>>>> +		 * two parts from lib/swiotlb.c:
>>>>> +		 * -swiotlb size: user-specified with swiotlb= or default.
>>>>> +		 *
>>>>> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>>>>> +		 * to 8M for other buffers that may need to stay low too. Also
>>>>> +		 * make sure we allocate enough extra low memory so that we
>>>>> +		 * don't run out of DMA buffers for 32-bit devices.
>>>>> +		 */
>>>>> +		low_size = max(swiotlb_size_or_default() + (8UL << 20),
>>>>> +				256UL << 20);
>>>>> +#else
>>>>> +		/*
>>>>> +		 * in arm64, reserve low memory if and only if crashkernel=X,low
>>>>> +		 * specified.
>>>>> +		 */
>>>>> +		return -EINVAL;
>>>>> +#endif
>>>>
>>>> As said before, can you explore about why it needs different logic, it
>>>> would be good to keep two arches same.
>>>>
>>>>> +	} else {
>>>>> +		/* passed with crashkernel=0,low ? */
>>>>> +		if (!low_size)
>>>>> +			return 0;
>>>>> +	}
>>>>> +
>>>>> +	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>>>> +	if (!low_base) {
>>>>> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>>>> +		       (unsigned long)(low_size >> 20));
>>>>> +		return -ENOMEM;
>>>>> +	}
>>>>> +
>>>>> +	ret = memblock_reserve(low_base, low_size);
>>>>> +	if (ret) {
>>>>> +		pr_err("%s: Error reserving crashkernel low memblock.\n",
>>>>> +				__func__);
>>>>> +		return ret;
>>>>> +	}
>>>>> +
>>>>> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
>>>>> +		(unsigned long)(low_size >> 20),
>>>>> +		(unsigned long)(low_base >> 20),
>>>>> +		(unsigned long)(total_low_mem >> 20));
>>>>> +
>>>>> +	crashk_low_res.start = low_base;
>>>>> +	crashk_low_res.end   = low_base + low_size - 1;
>>>>> +#endif
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>>>>  			  void *data, size_t data_len)
>>>>>  {
>>>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>>>> index 15d70a9..458d093 100644
>>>>> --- a/kernel/kexec_core.c
>>>>> +++ b/kernel/kexec_core.c
>>>>> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>>>>>  /* Flag to indicate we are going to kexec a new kernel */
>>>>>  bool kexec_in_progress = false;
>>>>>  
>>>>> -
>>>>> -/* Location of the reserved area for the crash kernel */
>>>>> -struct resource crashk_res = {
>>>>> -	.name  = "Crash kernel",
>>>>> -	.start = 0,
>>>>> -	.end   = 0,
>>>>> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>>> -	.desc  = IORES_DESC_CRASH_KERNEL
>>>>> -};
>>>>> -struct resource crashk_low_res = {
>>>>> -	.name  = "Crash kernel",
>>>>> -	.start = 0,
>>>>> -	.end   = 0,
>>>>> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>>> -	.desc  = IORES_DESC_CRASH_KERNEL
>>>>> -};
>>>>> -
>>>>>  int kexec_should_crash(struct task_struct *p)
>>>>>  {
>>>>>  	/*
>>>>> -- 
>>>>> 2.7.4
>>>>>
>>>>
>>>> Thanks
>>>> Dave
>>>>
>>>>
>>>> .
>>>>
>>> Thanks,
>>> Chen Zhou
>>>
>>
>> Thanks
>> Dave
>>
>>
> 
> Thanks,
> Chen Zhou
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c
  2020-01-17  3:58           ` Dave Young
@ 2020-04-03  7:29             ` Chen Zhou
  0 siblings, 0 replies; 30+ messages in thread
From: Chen Zhou @ 2020-04-03  7:29 UTC (permalink / raw)
  To: Dave Young, James Morse
  Cc: kbuild test robot, horms, linux-doc, catalin.marinas, bhsharma,
	xiexiuqi, kexec, linux-kernel, mingo, tglx, will,
	linux-arm-kernel

Hi Dave/James,

On 2020/1/17 11:58, Dave Young wrote:
> On 01/16/20 at 03:17pm, James Morse wrote:
>> Hi guys,
>>
>> On 28/12/2019 09:32, Dave Young wrote:
>>> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>>>> On 2019/12/27 13:54, Dave Young wrote:
>>>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>>>
>>>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>>>
>>>>> Do you have any reason for the difference?  I'd expect we have same
>>>>> logic if possible and remove some of the ifdefs.
>>>>
>>>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>>>> to reserve low memory.
>>>>
>>>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>>>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>>>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>>>> and this needs extra considerations.
>>
>>> Sorry that I did not read the old thread details and thought that is
>>> arch dependent.  But rethink about that, it would be better that we can
>>> have same semantic about crashkernel parameters across arches.  If we
>>> make them different then it causes confusion, especially for
>>> distributions.
>>
>> Surely distros also want one crashkernel* string they can use on all platforms without
>> having to detect the kernel version, platform or changeable memory layout...
>>
>>
>>> OTOH, I thought if we reserve high memory then the low memory should be
>>> needed.  There might be some exceptions, but I do not know the exact
>>> one,
>>
>>> can we make the behavior same, and special case those systems which
>>> do not need low memory reservation.
>>
>> Its tricky to work out which systems are the 'normal' ones.
>>
>> We don't have a fixed memory layout for arm64. Some systems have no memory below 4G.
>> Others have no memory above 4G.
>>
>> Chen Zhou's machine has some memory below 4G, but its too precious to reserve a large
>> chunk for kdump. Without any memory below 4G some of the drivers won't work.
>>
>> I don't see what distros can set as their default for all platforms if high/low are
>> mutually exclusive with the 'crashkernel=' in use today. How did x86 navigate this, ... or
>> was it so long ago?
> 
> It is very rare for such machine without any low memory in X86, at least
> from what I know,  so the current way just works fine.
> 
> Since arm64 is quite different, I would agree with current way
> proposed in the patch, but a question is, for those arm64 systems how can
> admin know if low crashkernel memory is needed or not?  and just skip the
> low reservation for machine with high memory installed only?

Specified size low memory is for crash dump kernel devices.
I think admin should know if there are devices needing low memory in crash dump kernel.

James, any suggestions?

Thanks,
Chen Zhou

> 
>>
>> No one else has reported a problem with the existing placement logic, hence treating this
>> 'low' thing as the 'in addition' special case.
>>
>>
>>>> previous discusses:
>>>> 	https://lkml.org/lkml/2019/6/5/670
>>>> 	https://lkml.org/lkml/2019/6/13/229
>>>
>>> Another concern from James:
>>> "
>>> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
>>> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
>>> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
>>> "
>>>
>>> The kexec-tools code is iterating all "Crash kernel" ranges and add them
>>> in an array.  In X86 code, it uses the higher range to locate memory.
>>
>> Then my hurried reading of what the user-space code does was wrong!
>>
>> If kexec-tools places the kernel in the low region, there may not be enough memory left
>> for whatever purpose it was reserved for. This was the motivation for giving it a
>> different name.
> 
> Agreed,  it is still a potential problem though.  Say we have both low
> and high reserved.  Kdump kernel boots up, the kernel and drivers,
> applications will use memory, I'm not sure if there is a memory
> allocation policy to let them all use high mem first..  Anyway that is
> beyond the kexec-tools and resource name.
> 
> Thanks
> Dave
> 
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump
  2020-03-26  3:09 ` [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
@ 2020-05-19 10:21   ` Arnd Bergmann
  2020-05-19 20:21     ` John Donnelly
  2020-05-20  3:30     ` chenzhou
  0 siblings, 2 replies; 30+ messages in thread
From: Arnd Bergmann @ 2020-05-19 10:21 UTC (permalink / raw)
  To: Chen Zhou
  Cc: Simon Horman, john.p.donnelly, Will Deacon,
	open list:DOCUMENTATION, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-kernel, Ingo Molnar, James Morse, Thomas Gleixner,
	pkushwaha, Dave Young, Linux ARM

On Thu, Mar 26, 2020 at 4:10 AM Chen Zhou <chenzhou10@huawei.com> wrote:
>
> Hi all,
>
> Friendly ping...

I was asked about this patch series, and see that you last posted it in
December. I think you should rebase it to linux-5.7-rc6 and post the
entire series again to make progress, as it's unlikely that any maintainer
would pick up the patches from last year.

For the contents, everything seems reasonable to me, but I noticed that
you are adding a property to the /chosen node without adding the
corresponding documentation to
Documentation/devicetree/bindings/chosen.txt

Please add that, and Cc the devicetree maintainers on the updated
patch.

         Arnd

> On 2019/12/23 23:23, Chen Zhou wrote:
> > This patch series enable reserving crashkernel above 4G in arm64.
> >
> > There are following issues in arm64 kdump:
> > 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> > when there is no enough low memory.
> > 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> > in this case, if swiotlb or DMA buffers are required, crash dump kernel
> > will boot failure because there is no low memory available for allocation.
> >
> > The previous changes and discussions can be retrieved from:
> >
> > Changes since [v6]
> > - Fix build errors reported by kbuild test robot.
...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump
  2020-05-19 10:21   ` Arnd Bergmann
@ 2020-05-19 20:21     ` John Donnelly
  2020-05-20  8:32       ` Bhupesh Sharma
  2020-05-20  3:30     ` chenzhou
  1 sibling, 1 reply; 30+ messages in thread
From: John Donnelly @ 2020-05-19 20:21 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: John Donnelly, open list:DOCUMENTATION, Chen Zhou,
	Catalin Marinas, Bhupesh Sharma, Dave Young, kexec mailing list,
	linux-kernel, Simon Horman, James Morse, Thomas Gleixner,
	Prabhakar Kushwaha, Will Deacon, Ingo Molnar, Linux ARM



> On May 19, 2020, at 5:21 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> 
> On Thu, Mar 26, 2020 at 4:10 AM Chen Zhou <chenzhou10@huawei.com> wrote:
>> 
>> Hi all,
>> 
>> Friendly ping...
> 
> I was asked about this patch series, and see that you last posted it in
> December. I think you should rebase it to linux-5.7-rc6 and post the
> entire series again to make progress, as it's unlikely that any maintainer
> would pick up the patches from last year.
> 
> For the contents, everything seems reasonable to me, but I noticed that
> you are adding a property to the /chosen node without adding the
> corresponding documentation to
> Documentation/devicetree/bindings/chosen.txt
> 
> Please add that, and Cc the devicetree maintainers on the updated
> patch.
> 
>         Arnd
> 
>> On 2019/12/23 23:23, Chen Zhou wrote:
>>> This patch series enable reserving crashkernel above 4G in arm64.
>>> 
>>> There are following issues in arm64 kdump:
>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>> when there is no enough low memory.
>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>> will boot failure because there is no low memory available for allocation.
>>> 
>>> The previous changes and discussions can be retrieved from:
>>> 
>>> Changes since [v6]
>>> - Fix build errors reported by kbuild test robot.
> ...


 Hi 

We found 

https://lkml.org/lkml/2020/4/30/1583

Has cured our Out-Of-Memory kdump failures. 

From	Henry Willard 
Subject	[PATCH] mm: Limit boost_watermark on small zones.

I am currently not on linux-kernel@vger.kernel.org. dlist for all to see  this message so you may want to rebase and see if this cures your OoM issue and share the results. 









_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump
  2020-05-19 10:21   ` Arnd Bergmann
  2020-05-19 20:21     ` John Donnelly
@ 2020-05-20  3:30     ` chenzhou
  1 sibling, 0 replies; 30+ messages in thread
From: chenzhou @ 2020-05-20  3:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Simon Horman, john.p.donnelly, Will Deacon,
	open list:DOCUMENTATION, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-kernel, Ingo Molnar, James Morse, Thomas Gleixner,
	pkushwaha, Dave Young, Linux ARM

Hi Arnd,

On 2020/5/19 18:21, Arnd Bergmann wrote:
> On Thu, Mar 26, 2020 at 4:10 AM Chen Zhou <chenzhou10@huawei.com> wrote:
>> Hi all,
>>
>> Friendly ping...
> I was asked about this patch series, and see that you last posted it in
> December. I think you should rebase it to linux-5.7-rc6 and post the
> entire series again to make progress, as it's unlikely that any maintainer
> would pick up the patches from last year.
>
> For the contents, everything seems reasonable to me, but I noticed that
> you are adding a property to the /chosen node without adding the
> corresponding documentation to
> Documentation/devicetree/bindings/chosen.txt
>
> Please add that, and Cc the devicetree maintainers on the updated
> patch.
>
>          Arnd

Thanks for your review and comments, i will rebase it to linux-5.7-rc6 and add the
corresponding documentation.

Thanks,
Chen Zhou

>> On 2019/12/23 23:23, Chen Zhou wrote:
>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>
>>> There are following issues in arm64 kdump:
>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>> when there is no enough low memory.
>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>> will boot failure because there is no low memory available for allocation.
>>>
>>> The previous changes and discussions can be retrieved from:
>>>
>>> Changes since [v6]
>>> - Fix build errors reported by kbuild test robot.
> ...
>
> .
>



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump
  2020-05-19 20:21     ` John Donnelly
@ 2020-05-20  8:32       ` Bhupesh Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Bhupesh Sharma @ 2020-05-20  8:32 UTC (permalink / raw)
  To: John Donnelly
  Cc: Arnd Bergmann, open list:DOCUMENTATION, Chen Zhou,
	Catalin Marinas, kexec mailing list, linux-kernel, Will Deacon,
	Simon Horman, James Morse, Thomas Gleixner, Prabhakar Kushwaha,
	Dave Young, Ingo Molnar, Linux ARM

Hi John,

On Wed, May 20, 2020 at 1:53 AM John Donnelly
<john.p.donnelly@oracle.com> wrote:
>
>
>
> > On May 19, 2020, at 5:21 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Thu, Mar 26, 2020 at 4:10 AM Chen Zhou <chenzhou10@huawei.com> wrote:
> >>
> >> Hi all,
> >>
> >> Friendly ping...
> >
> > I was asked about this patch series, and see that you last posted it in
> > December. I think you should rebase it to linux-5.7-rc6 and post the
> > entire series again to make progress, as it's unlikely that any maintainer
> > would pick up the patches from last year.
> >
> > For the contents, everything seems reasonable to me, but I noticed that
> > you are adding a property to the /chosen node without adding the
> > corresponding documentation to
> > Documentation/devicetree/bindings/chosen.txt
> >
> > Please add that, and Cc the devicetree maintainers on the updated
> > patch.
> >
> >         Arnd
> >
> >> On 2019/12/23 23:23, Chen Zhou wrote:
> >>> This patch series enable reserving crashkernel above 4G in arm64.
> >>>
> >>> There are following issues in arm64 kdump:
> >>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> >>> when there is no enough low memory.
> >>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> >>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> >>> will boot failure because there is no low memory available for allocation.
> >>>
> >>> The previous changes and discussions can be retrieved from:
> >>>
> >>> Changes since [v6]
> >>> - Fix build errors reported by kbuild test robot.
> > ...
>
>
>  Hi
>
> We found
>
> https://lkml.org/lkml/2020/4/30/1583
>
> Has cured our Out-Of-Memory kdump failures.
>
> From    Henry Willard
> Subject [PATCH] mm: Limit boost_watermark on small zones.
>
> I am currently not on linux-kernel@vger.kernel.org. dlist for all to see  this message so you may want to rebase and see if this cures your OoM issue and share the results.

This is a very interesting finding. Thanks a lot for sharing the same.
I am working on further avoiding OOM issues with arm64 kdump kernels.
I will experiment more with this patch and get back with more details.

Regards,
Bhupesh


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-05-20  8:32 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-23 15:23 [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
2019-12-23 15:23 ` [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c Chen Zhou
2019-12-27  5:54   ` Dave Young
2019-12-27 11:04     ` Chen Zhou
2019-12-28  9:32       ` Dave Young
2019-12-31  1:39         ` Chen Zhou
2020-04-03  7:13           ` Chen Zhou
2020-01-16 15:17         ` James Morse
2020-01-16 15:47           ` John Donnelly
2020-02-24 15:25             ` John Donnelly
2020-03-02  1:29               ` Chen Zhou
2020-01-17  3:58           ` Dave Young
2020-04-03  7:29             ` Chen Zhou
2019-12-23 15:23 ` [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel Chen Zhou
2020-03-05 10:13   ` Prabhakar Kushwaha
2020-03-07 11:06     ` Chen Zhou
2020-03-07 18:43       ` John Donnelly
2020-03-09  4:59         ` Prabhakar Kushwaha
2020-03-09  4:48       ` Prabhakar Kushwaha
2020-03-09 15:51         ` Prabhakar Kushwaha
2020-03-10  1:30           ` chenzhou
2020-03-10 17:08             ` Prabhakar Kushwaha
2020-03-11  1:44               ` chenzhou
2019-12-23 15:23 ` [PATCH v7 3/4] arm64: kdump: add memory for devices by DT property, low-memory-range Chen Zhou
2019-12-23 15:23 ` [PATCH v7 4/4] kdump: update Documentation about crashkernel on arm64 Chen Zhou
2020-03-26  3:09 ` [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump Chen Zhou
2020-05-19 10:21   ` Arnd Bergmann
2020-05-19 20:21     ` John Donnelly
2020-05-20  8:32       ` Bhupesh Sharma
2020-05-20  3:30     ` chenzhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).