linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v22 0/8] arm64: add kdump support
@ 2016-07-12  5:05 AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series adds kdump support on arm64.

To load a crash-dump kernel to the systems, a series of patches to
kexec-tools [1], which have not yet been merged upstream, are needed.
Please update to the latest if you have been using an older version.

To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
  - crash utility (coming v7.1.6 or later) [2]
    (Necessary patches have already been queued in the master.)

[1]  https://git.kernel.org/cgit/linux/kernel/git/geoff/kexec-tools.git
     (Please rename "linux,usable-memory" to "linux,usable-memory-range"
      in kexec/arch/arm64/kexec-arm64.c)
[2]  https://github.com/crash-utility/crash.git

Changes for v22 (July 12, 2016):

  o Export "crashkernel-base" and "crashkernel-size" via device-tree,
    and add some descriptions about them in chosen.txt
  o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
    with powerpc's "usable-memory"
  o Make cosmetic changes regarding "ifdef" usage
  o Correct some wordings in kdump.txt

Changes for v21 (July 6, 2016):

  o Remove kexec patches.
  o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
  o Clarify the description about kvm in kdump.txt.

See the following link [3] for older changes:
[3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html

AKASHI Takahiro (7):
  arm64: kdump: reserve memory for crash dump kernel
  arm64: limit memory regions based on DT property, usable-memory-range
  arm64: kdump: implement machine_crash_shutdown()
  arm64: kdump: add kdump support
  arm64: kdump: add VMCOREINFO's for user-space coredump tools
  arm64: kdump: enable kdump in the arm64 defconfig
  arm64: kdump: update a kernel doc

James Morse (1):
  Documentation: dt: chosen properties for arm64 kdump

 Documentation/devicetree/bindings/chosen.txt |  45 ++++++
 Documentation/kdump/kdump.txt                |  16 ++-
 arch/arm64/Kconfig                           |  11 ++
 arch/arm64/configs/defconfig                 |   1 +
 arch/arm64/include/asm/hardirq.h             |   2 +-
 arch/arm64/include/asm/kexec.h               |  41 +++++-
 arch/arm64/include/asm/smp.h                 |   2 +
 arch/arm64/kernel/Makefile                   |   1 +
 arch/arm64/kernel/crash_dump.c               |  71 +++++++++
 arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
 arch/arm64/kernel/setup.c                    |   7 +-
 arch/arm64/kernel/smp.c                      |  64 +++++++++
 arch/arm64/mm/init.c                         | 207 +++++++++++++++++++++++++++
 13 files changed, 528 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/kernel/crash_dump.c

-- 
2.9.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
@ 2016-07-12  5:05 ` AKASHI Takahiro
  2016-07-13  9:12   ` Suzuki K Poulose
  2016-07-19  9:39   ` Dennis Chen
  2016-07-12  5:05 ` [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

On the startup of primary kernel, the memory region used by crash dump
kernel must be specified by "crashkernel=" kernel parameter.
reserve_crashkernel() will allocate and reserve the region for later use.

User space tools, like kexec-tools, will be able to find that region as
	- "Crash kernel" in /proc/iomem, or
	- "linux,crashkernel-base" and "linux,crashkernel-size" under
	  /sys/firmware/devicetree/base/chosen

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Reviewed-by: James Morse <james.morse@arm.com>
---
 arch/arm64/kernel/setup.c |   7 ++-
 arch/arm64/mm/init.c      | 115 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index c1509e6..cb5eee0 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
 #include <linux/screen_info.h>
 #include <linux/init.h>
 #include <linux/kexec.h>
-#include <linux/crash_dump.h>
 #include <linux/root_dev.h>
 #include <linux/cpu.h>
 #include <linux/interrupt.h>
@@ -222,6 +221,12 @@ static void __init request_standard_resources(void)
 		    kernel_data.end <= res->end)
 			request_resource(res, &kernel_data);
 	}
+
+#ifdef CONFIG_KEXEC_CORE
+	/* User space tools will find "Crash kernel" region in /proc/iomem. */
+	if (crashk_res.end)
+		insert_resource(&iomem_resource, &crashk_res);
+#endif
 }
 
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 2ade7a6..51b1302 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,11 +29,13 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
+#include <linux/kexec.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -76,6 +78,117 @@ static int __init early_initrd(char *p)
 early_param("initrd", early_initrd);
 #endif
 
+#ifdef CONFIG_KEXEC_CORE
+static unsigned long long crash_size, crash_base;
+static struct property crash_base_prop = {
+	.name = "linux,crashkernel-base",
+	.length = sizeof(u64),
+	.value = &crash_base
+};
+static struct property crash_size_prop = {
+	.name = "linux,crashkernel-size",
+	.length = sizeof(u64),
+	.value = &crash_size,
+};
+
+static int __init export_crashkernel(void)
+{
+	struct device_node *node;
+	int ret;
+
+	if (!crashk_res.end)
+		return 0;
+
+	crash_base = cpu_to_be64(crashk_res.start);
+	crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
+
+	/* Add /chosen/linux,crashkernel-* properties */
+	node = of_find_node_by_path("/chosen");
+	if (!node)
+		return -ENOENT;
+
+	/*
+	 * There might be existing crash kernel properties, but we can't
+	 * be sure what's in them, so remove them.
+	 */
+	of_remove_property(node, of_find_property(node,
+				"linux,crashkernel-base", NULL));
+	of_remove_property(node, of_find_property(node,
+				"linux,crashkernel-size", NULL));
+
+	ret = of_add_property(node, &crash_base_prop);
+	if (ret)
+		goto ret_err;
+
+	ret = of_add_property(node, &crash_size_prop);
+	if (ret)
+		goto ret_err;
+
+	return 0;
+
+ret_err:
+	pr_warn("Exporting crashkernel region to device tree failed\n");
+	return ret;
+}
+late_initcall(export_crashkernel);
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+	int ret;
+
+	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+				&crash_size, &crash_base);
+	/* no crashkernel= or invalid value specified */
+	if (ret || !crash_size)
+		return;
+
+	if (crash_base == 0) {
+		/* Current arm64 boot protocol requires 2MB alignment */
+		crash_base = memblock_find_in_range(0,
+				MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
+		if (crash_base == 0) {
+			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+				crash_size);
+			return;
+		}
+		memblock_reserve(crash_base, crash_size);
+
+	} else {
+		/* User specifies base address explicitly. */
+		if (!memblock_is_region_memory(crash_base, crash_size) ||
+			memblock_is_region_reserved(crash_base, crash_size)) {
+			pr_warn("crashkernel has wrong address or size\n");
+			return;
+		}
+
+		if (!IS_ALIGNED(crash_base, SZ_2M)) {
+			pr_warn("crashkernel base address is not 2MB aligned\n");
+			return;
+		}
+
+		memblock_reserve(crash_base, crash_size);
+	}
+
+	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+		crash_size >> 20, crash_base >> 20);
+
+	crashk_res.start = crash_base;
+	crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+	;
+}
+#endif /* CONFIG_KEXEC_CORE */
+
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -289,6 +402,8 @@ void __init arm64_memblock_init(void)
 	}
 #endif
 
+	reserve_crashkernel();
+
 	early_init_fdt_scan_reserved_mem();
 
 	/* 4GB maximum for 32-bit only capable devices */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
@ 2016-07-12  5:05 ` AKASHI Takahiro
  2016-07-18 18:04   ` James Morse
  2016-07-12  5:05 ` [PATCH v22 3/8] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

Crash dump kernel will be run with a limited range of memory as System
RAM.

On arm64, we will use a device-tree property under /chosen,
   linux,usable-memory-range = <BASE SIZE>
in order for primary kernel either on uefi or non-uefi (device tree only)
system to hand over the information about usable memory region to crash
dump kernel. This property will supercede entries in uefi memory map table
and "memory" nodes in a device tree.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Geoff Levand <geoff@infradead.org>
---
 arch/arm64/mm/init.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 51b1302..d8b296f 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -300,10 +300,48 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static int __init early_init_dt_scan_usablemem(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	struct memblock_region *usablemem = (struct memblock_region *)data;
+	const __be32 *reg;
+	int len;
+
+	usablemem->size = 0;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+static void __init fdt_enforce_memory_region(void)
+{
+	struct memblock_region reg;
+
+	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+
+	if (reg.size) {
+		memblock_remove(0, PAGE_ALIGN(reg.base));
+		memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
+				ULLONG_MAX);
+	}
+}
+
 void __init arm64_memblock_init(void)
 {
 	const s64 linear_region_size = -(s64)PAGE_OFFSET;
 
+	/* Handle linux,usable-memory-range property */
+	fdt_enforce_memory_region();
+
 	/*
 	 * Ensure that the linear region takes up exactly half of the kernel
 	 * virtual address space. This way, we can distinguish a linear address
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 3/8] arm64: kdump: implement machine_crash_shutdown()
  2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
@ 2016-07-12  5:05 ` AKASHI Takahiro
  2016-07-13  9:32   ` Suzuki K Poulose
  2016-07-12  5:05 ` [PATCH v22 4/8] arm64: kdump: add kdump support AKASHI Takahiro
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.

As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/include/asm/hardirq.h  |  2 +-
 arch/arm64/include/asm/kexec.h    | 41 ++++++++++++++++++++++++-
 arch/arm64/include/asm/smp.h      |  2 ++
 arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
 arch/arm64/kernel/smp.c           | 64 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 160 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297..1473fc2 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
 #include <linux/threads.h>
 #include <asm/irq.h>
 
-#define NR_IPI	6
+#define NR_IPI	7
 
 typedef struct {
 	unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc..a908958 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,46 @@
 static inline void crash_setup_regs(struct pt_regs *newregs,
 				    struct pt_regs *oldregs)
 {
-	/* Empty routine needed to avoid build errors. */
+	if (oldregs) {
+		memcpy(newregs, oldregs, sizeof(*newregs));
+	} else {
+		u64 tmp1, tmp2;
+
+		__asm__ __volatile__ (
+			"stp	 x0,   x1, [%2, #16 *  0]\n"
+			"stp	 x2,   x3, [%2, #16 *  1]\n"
+			"stp	 x4,   x5, [%2, #16 *  2]\n"
+			"stp	 x6,   x7, [%2, #16 *  3]\n"
+			"stp	 x8,   x9, [%2, #16 *  4]\n"
+			"stp	x10,  x11, [%2, #16 *  5]\n"
+			"stp	x12,  x13, [%2, #16 *  6]\n"
+			"stp	x14,  x15, [%2, #16 *  7]\n"
+			"stp	x16,  x17, [%2, #16 *  8]\n"
+			"stp	x18,  x19, [%2, #16 *  9]\n"
+			"stp	x20,  x21, [%2, #16 * 10]\n"
+			"stp	x22,  x23, [%2, #16 * 11]\n"
+			"stp	x24,  x25, [%2, #16 * 12]\n"
+			"stp	x26,  x27, [%2, #16 * 13]\n"
+			"stp	x28,  x29, [%2, #16 * 14]\n"
+			"mov	 %0,  sp\n"
+			"stp	x30,  %0,  [%2, #16 * 15]\n"
+
+			"/* faked current PSTATE */\n"
+			"mrs	 %0, CurrentEL\n"
+			"mrs	 %1, DAIF\n"
+			"orr	 %0, %0, %1\n"
+			"mrs	 %1, NZCV\n"
+			"orr	 %0, %0, %1\n"
+
+			/* pc */
+			"adr	 %1, 1f\n"
+		"1:\n"
+			"stp	 %1, %0,   [%2, #16 * 16]\n"
+			: "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
+			:
+			: "memory"
+		);
+	}
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 0226447..6b0f2c7 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -136,6 +136,8 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
+extern void smp_send_crash_stop(void);
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a..8ac9dba8 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
 #include <linux/kexec.h>
 #include <linux/smp.h>
 
@@ -22,6 +25,7 @@
 extern const unsigned char arm64_relocate_new_kernel[];
 extern const unsigned long arm64_relocate_new_kernel_size;
 
+bool in_crash_kexec;
 static unsigned long kimage_start;
 
 /**
@@ -148,7 +152,8 @@ void machine_kexec(struct kimage *kimage)
 	/*
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
 	 */
-	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+	BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
+			!WARN_ON(in_crash_kexec));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
 	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -200,13 +205,58 @@ void machine_kexec(struct kimage *kimage)
 	 * relocation is complete.
 	 */
 
-	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
+	cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
 		kimage_start, 0);
 
 	BUG(); /* Should never get here. */
 }
 
+static void machine_kexec_mask_interrupts(void)
+{
+	unsigned int i;
+	struct irq_desc *desc;
+
+	for_each_irq_desc(i, desc) {
+		struct irq_chip *chip;
+		int ret;
+
+		chip = irq_desc_get_chip(desc);
+		if (!chip)
+			continue;
+
+		/*
+		 * First try to remove the active state. If this
+		 * fails, try to EOI the interrupt.
+		 */
+		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+		    chip->irq_eoi)
+			chip->irq_eoi(&desc->irq_data);
+
+		if (chip->irq_mask)
+			chip->irq_mask(&desc->irq_data);
+
+		if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+			chip->irq_disable(&desc->irq_data);
+	}
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-	/* Empty routine needed to avoid build errors. */
+	local_irq_disable();
+
+	in_crash_kexec = true;
+
+	/* shutdown non-crashing cpus */
+	smp_send_crash_stop();
+
+	/* for crashing cpu */
+	crash_save_cpu(regs, smp_processor_id());
+	machine_kexec_mask_interrupts();
+
+	pr_info("Starting crashdump kernel...\n");
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 62ff3c0..66923db 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
 #include <linux/completion.h>
 #include <linux/of.h>
 #include <linux/irq_work.h>
+#include <linux/kexec.h>
 
 #include <asm/alternative.h>
 #include <asm/atomic.h>
@@ -71,6 +72,7 @@ enum ipi_msg_type {
 	IPI_RESCHEDULE,
 	IPI_CALL_FUNC,
 	IPI_CPU_STOP,
+	IPI_CPU_CRASH_STOP,
 	IPI_TIMER,
 	IPI_IRQ_WORK,
 	IPI_WAKEUP
@@ -726,6 +728,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
 	S(IPI_RESCHEDULE, "Rescheduling interrupts"),
 	S(IPI_CALL_FUNC, "Function call interrupts"),
 	S(IPI_CPU_STOP, "CPU stop interrupts"),
+	S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
 	S(IPI_TIMER, "Timer broadcast interrupts"),
 	S(IPI_IRQ_WORK, "IRQ work interrupts"),
 	S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -800,6 +803,30 @@ static void ipi_cpu_stop(unsigned int cpu)
 		cpu_relax();
 }
 
+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi;
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+	crash_save_cpu(regs, cpu);
+
+	atomic_dec(&waiting_for_crash_ipi);
+
+	local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+	if (cpu_ops[cpu]->cpu_die)
+		cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+	/* just in case */
+	while (1)
+		wfi();
+#endif
+}
+
 /*
  * Main handler for inter-processor interrupts
  */
@@ -830,6 +857,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		irq_exit();
 		break;
 
+	case IPI_CPU_CRASH_STOP:
+		if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+			irq_enter();
+			ipi_cpu_crash_stop(cpu, regs);
+
+			unreachable();
+		}
+		break;
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 	case IPI_TIMER:
 		irq_enter();
@@ -902,6 +938,34 @@ void smp_send_stop(void)
 			   cpumask_pr_args(cpu_online_mask));
 }
 
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+	cpumask_t mask;
+	unsigned long timeout;
+
+	if (num_online_cpus() == 1)
+		return;
+
+	cpumask_copy(&mask, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), &mask);
+
+	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+	pr_crit("SMP: stopping secondary CPUs\n");
+	smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+	/* Wait up to one second for other CPUs to stop */
+	timeout = USEC_PER_SEC;
+	while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+		udelay(1);
+
+	if (atomic_read(&waiting_for_crash_ipi) > 0)
+		pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+			   cpumask_pr_args(cpu_online_mask));
+}
+#endif
+
 /*
  * not supported here
  */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 4/8] arm64: kdump: add kdump support
  2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
                   ` (2 preceding siblings ...)
  2016-07-12  5:05 ` [PATCH v22 3/8] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
@ 2016-07-12  5:05 ` AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 5/8] arm64: kdump: add VMCOREINFO's for user-space coredump tools AKASHI Takahiro
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

On crash dump kernel, all the information about primary kernel's system
memory (core image) is available in elf core header.
The primary kernel will set aside this header with reserve_elfcorehdr()
at boot time and inform crash dump kernel of its location via a new
device-tree property, "linux,elfcorehdr".

Please note that all other architectures use traditional "elfcorehdr="
kernel parameter for this purpose.

Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which reads one page by ioremap'ing it since it does
not reside in linear mapping on crash dump kernel.

We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/Kconfig             | 11 +++++++
 arch/arm64/kernel/Makefile     |  1 +
 arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/mm/init.c           | 54 ++++++++++++++++++++++++++++++++
 4 files changed, 137 insertions(+)
 create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1b196bf..ad2c53e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -675,6 +675,17 @@ config KEXEC
 	  but it is independent of the system firmware.   And like a reboot
 	  you can start any kernel with it, not just Linux.
 
+config CRASH_DUMP
+	bool "Build kdump crash kernel"
+	help
+	  Generate crash dump after being started by kexec. This should
+	  be normally only set in special crash dump kernels which are
+	  loaded in the main kernel with kexec-tools into a specially
+	  reserved region and then later executed after a crash by
+	  kdump/kexec.
+
+	  For more details see Documentation/kdump/kdump.txt
+
 config XEN_DOM0
 	def_bool y
 	depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 7700c0c..c13d3eb 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,6 +48,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
 
 obj-y					+= $(arm64-obj-y) vdso/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 0000000..2dc54d1
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+			 size_t csize, unsigned long offset,
+			 int userbuf)
+{
+	void *vaddr;
+
+	if (!csize)
+		return 0;
+
+	vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
+	if (!vaddr)
+		return -ENOMEM;
+
+	if (userbuf) {
+		if (copy_to_user(buf, vaddr + offset, csize)) {
+			iounmap(vaddr);
+			return -EFAULT;
+		}
+	} else {
+		memcpy(buf, vaddr + offset, csize);
+	}
+
+	iounmap(vaddr);
+
+	return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+	memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+	return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index d8b296f..ad38b93 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 #include <linux/kexec.h>
+#include <linux/crash_dump.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -189,6 +190,57 @@ static void __init reserve_crashkernel(void)
 }
 #endif /* CONFIG_KEXEC_CORE */
 
+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	const __be32 *reg;
+	int len;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves elf core header given in "elfcorehdr=" kernel
+ * command line parameter. This region contains all the information about
+ * primary kernel's core image and is used by a dump capture kernel to
+ * access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
+	if (!elfcorehdr_size)
+		return;
+
+	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+		pr_warn("elfcorehdr is overlapped\n");
+		return;
+	}
+
+	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
+
+	pr_info("Reserving %lldKB of memory@0x%llx for elfcorehdr\n",
+		elfcorehdr_size >> 10, elfcorehdr_addr);
+}
+#else
+static void __init reserve_elfcorehdr(void)
+{
+	;
+}
+#endif /* CONFIG_CRASH_DUMP */
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -442,6 +494,8 @@ void __init arm64_memblock_init(void)
 
 	reserve_crashkernel();
 
+	reserve_elfcorehdr();
+
 	early_init_fdt_scan_reserved_mem();
 
 	/* 4GB maximum for 32-bit only capable devices */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 5/8] arm64: kdump: add VMCOREINFO's for user-space coredump tools
  2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
                   ` (3 preceding siblings ...)
  2016-07-12  5:05 ` [PATCH v22 4/8] arm64: kdump: add kdump support AKASHI Takahiro
@ 2016-07-12  5:05 ` AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 6/8] arm64: kdump: enable kdump in the arm64 defconfig AKASHI Takahiro
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

For the current crash utility, we need to know, at least,
  - kimage_voffset
  - PHYS_OFFSET
to handle the contents of core dump file (/proc/vmcore) correctly due to
the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.
This patch puts them as VMCOREINFO's into the file.

  - VA_BITS
is also added for makedumpfile command.
More VMCOREINFO's may be added later.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/kernel/machine_kexec.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 8ac9dba8..38b4411 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -17,6 +17,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
+#include <asm/memory.h>
 #include <asm/mmu_context.h>
 
 #include "cpu-reset.h"
@@ -260,3 +261,13 @@ void machine_crash_shutdown(struct pt_regs *regs)
 
 	pr_info("Starting crashdump kernel...\n");
 }
+
+void arch_crash_save_vmcoreinfo(void)
+{
+	VMCOREINFO_NUMBER(VA_BITS);
+	/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
+	vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
+						kimage_voffset);
+	vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
+						PHYS_OFFSET);
+}
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 6/8] arm64: kdump: enable kdump in the arm64 defconfig
  2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
                   ` (4 preceding siblings ...)
  2016-07-12  5:05 ` [PATCH v22 5/8] arm64: kdump: add VMCOREINFO's for user-space coredump tools AKASHI Takahiro
@ 2016-07-12  5:05 ` AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 7/8] arm64: kdump: update a kernel doc AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 8/8] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
  7 siblings, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 4ed4756..aa3e1dd 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -71,6 +71,7 @@ CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_CMA=y
 CONFIG_XEN=y
 CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_COMPAT=y
 CONFIG_CPU_IDLE=y
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 7/8] arm64: kdump: update a kernel doc
  2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
                   ` (5 preceding siblings ...)
  2016-07-12  5:05 ` [PATCH v22 6/8] arm64: kdump: enable kdump in the arm64 defconfig AKASHI Takahiro
@ 2016-07-12  5:05 ` AKASHI Takahiro
  2016-07-12  5:05 ` [PATCH v22 8/8] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
  7 siblings, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds arch specific descriptions about kdump usage on arm64
to kdump.txt.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
---
 Documentation/kdump/kdump.txt | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 88ff63d..c090531 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
 a remote system.
 
 Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.
 
 When the system kernel boots, it reserves a small section of memory for
 the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)
 
     AUTO_ZRELADDR=y
 
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+  on non-VHE systems even if it is configured. This is because the CPU
+  cannot be reset to EL2 on panic.
+
 Extended crashkernel syntax
 ===========================
 
@@ -305,6 +312,8 @@ Boot into System Kernel
    kernel will automatically locate the crash kernel image within the
    first 512MB of RAM if X is not given.
 
+   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
 
 Load the Dump-capture Kernel
 ============================
@@ -327,6 +336,8 @@ For s390x:
 	- Use image or bzImage
 For arm:
 	- Use zImage
+For arm64:
+	- Use vmlinux or Image
 
 If you are using a uncompressed vmlinux image then use following command
 to load dump-capture kernel.
@@ -370,6 +381,9 @@ For s390x:
 For arm:
 	"1 maxcpus=1 reset_devices"
 
+For arm64:
+	"1 maxcpus=1 reset_devices"
+
 Notes on loading the dump-capture kernel:
 
 * By default, the ELF headers are stored in ELF64 format to support
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 8/8] Documentation: dt: chosen properties for arm64 kdump
  2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
                   ` (6 preceding siblings ...)
  2016-07-12  5:05 ` [PATCH v22 7/8] arm64: kdump: update a kernel doc AKASHI Takahiro
@ 2016-07-12  5:05 ` AKASHI Takahiro
  2016-07-12 10:07   ` Mark Rutland
  7 siblings, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-12  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

From: James Morse <james.morse@arm.com>

Add documentation for
	linux,crashkernel-base and crashkernel-size,
	linux,usable-memory-range, and
	linux,elfcorehdr
used by arm64 kexec/kdump to decribe the kdump reserved area, and
the elfcorehdr's location within it.

Signed-off-by: James Morse <james.morse@arm.com>
[takahiro.akashi at linaro.org:
    renamed "usable-memory" to "usable-memory-range",
    added "linux,crashkernel-base" and "-size" ]
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 Documentation/devicetree/bindings/chosen.txt | 45 ++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82..d7a3a86 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,48 @@ This property is set (currently only on PowerPC, and only needed on
 book3e) by some versions of kexec-tools to tell the new kernel that it
 is being booted by kexec, as the booting environment may differ (e.g.
 a different secondary CPU release mechanism)
+
+linux,crashkernel-base
+linux,crashkernel-size
+----------------------
+These properties are set (on PowerPC and arm64) during kdump to tell
+use-space tools, like kexec-tools, the base address of the crash-dump
+kernel's reserved area of memory and the size. e.g.
+
+/ {
+	chosen {
+		linux,crashkernel-base = <0x9 0xf0000000>;
+		linux,crashkernel-size = <0x0 0x10000000>;
+	};
+};
+
+linux,usable-memory-range
+-------------------------
+
+This property is set (currently only on arm64) during kdump to tell
+the crash-dump kernel the base address of its reserved area of memory,
+and the size. e.g.
+
+/ {
+	chosen {
+		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+	};
+};
+
+Please note that, if this property is present, any memory regions under
+"memory" nodes will be ignored.
+
+linux,elfcorehdr
+----------------
+
+This property is set (currently only on arm64) during kdump to tell
+the crash-dump kernel the address and size of the elfcorehdr that describes
+the old kernel's memory as an elf file. This memory must reside within
+the area described by 'linux,usable-memory-range'. e.g.
+
+/ {
+	chosen {
+		linux,usable-memory = <0x9 0xf0000000 0x0 0x10000000>;
+		linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+	};
+};
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 8/8] Documentation: dt: chosen properties for arm64 kdump
  2016-07-12  5:05 ` [PATCH v22 8/8] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
@ 2016-07-12 10:07   ` Mark Rutland
  2016-07-13 15:14     ` AKASHI Takahiro
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Rutland @ 2016-07-12 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Apologies for the delay on this.

On Tue, Jul 12, 2016 at 02:05:14PM +0900, AKASHI Takahiro wrote:
> From: James Morse <james.morse@arm.com>
> 
> Add documentation for
> 	linux,crashkernel-base and crashkernel-size,
> 	linux,usable-memory-range, and
> 	linux,elfcorehdr
> used by arm64 kexec/kdump to decribe the kdump reserved area, and
> the elfcorehdr's location within it.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> [takahiro.akashi at linaro.org:
>     renamed "usable-memory" to "usable-memory-range",
>     added "linux,crashkernel-base" and "-size" ]
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> ---
>  Documentation/devicetree/bindings/chosen.txt | 45 ++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> index 6ae9d82..d7a3a86 100644
> --- a/Documentation/devicetree/bindings/chosen.txt
> +++ b/Documentation/devicetree/bindings/chosen.txt
> @@ -52,3 +52,48 @@ This property is set (currently only on PowerPC, and only needed on
>  book3e) by some versions of kexec-tools to tell the new kernel that it
>  is being booted by kexec, as the booting environment may differ (e.g.
>  a different secondary CPU release mechanism)
> +
> +linux,crashkernel-base
> +linux,crashkernel-size
> +----------------------
> +These properties are set (on PowerPC and arm64) during kdump to tell
> +use-space tools, like kexec-tools, the base address of the crash-dump
> +kernel's reserved area of memory and the size. e.g.

No need to mention what consumes this. Just state what it describes,
e.g.

	These properties describe the base and size of the crash-dump
	kernel's reserved area of memory. Valid for PowerPC and arm64.

> +
> +/ {
> +	chosen {
> +		linux,crashkernel-base = <0x9 0xf0000000>;
> +		linux,crashkernel-size = <0x0 0x10000000>;
> +	};
> +};
> +
> +linux,usable-memory-range
> +-------------------------
> +
> +This property is set (currently only on arm64) during kdump to tell
> +the crash-dump kernel the base address of its reserved area of memory,
> +and the size. e.g.

The description sounds like this duplicates linux,crashkernel-*. What is
the difference between the two?

On powerpc it looks like there's a linux,usable-memory property (without
the -range suffix). How do these differ?

> +
> +/ {
> +	chosen {
> +		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
> +	};
> +};
> +
> +Please note that, if this property is present, any memory regions under
> +"memory" nodes will be ignored.

What exactly do you mean by "ignored"?

Do we truly ignore this property, or do we map that memory at some
point, even if not used for general allocation?

> +
> +linux,elfcorehdr
> +----------------
> +
> +This property is set (currently only on arm64) during kdump to tell
> +the crash-dump kernel the address and size of the elfcorehdr that describes
> +the old kernel's memory as an elf file. This memory must reside within
> +the area described by 'linux,usable-memory-range'. e.g.


As with the linux,crashkernel-* properties, just state what this
describes, e.g.

	This property describes the base and size of the ELF core
	header, which describes the old kernel's memory as an ELF file.
	This memory must reside within the range described by
	linux,usable-memory-range.

That said, this falling within usable-memory feels very odd, because
it's not actually usable for general purposes. Why can the kernel not
memremap this, such that it need not be in usable-memory?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-12  5:05 ` [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
@ 2016-07-13  9:12   ` Suzuki K Poulose
  2016-07-13 15:42     ` AKASHI Takahiro
  2016-07-19  9:39   ` Dennis Chen
  1 sibling, 1 reply; 35+ messages in thread
From: Suzuki K Poulose @ 2016-07-13  9:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/07/16 06:05, AKASHI Takahiro wrote:
> On the startup of primary kernel, the memory region used by crash dump
> kernel must be specified by "crashkernel=" kernel parameter.
> reserve_crashkernel() will allocate and reserve the region for later use.
>
> User space tools, like kexec-tools, will be able to find that region as
> 	- "Crash kernel" in /proc/iomem, or
> 	- "linux,crashkernel-base" and "linux,crashkernel-size" under
> 	  /sys/firmware/devicetree/base/chosen
>
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Signed-off-by: Mark Salter <msalter@redhat.com>
> Signed-off-by: Pratyush Anand <panand@redhat.com>
> Reviewed-by: James Morse <james.morse@arm.com>

Minor nits below.

> ---
>  arch/arm64/kernel/setup.c |   7 ++-
>  arch/arm64/mm/init.c      | 115 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 121 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index c1509e6..cb5eee0 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -31,7 +31,6 @@
>  #include <linux/screen_info.h>
>  #include <linux/init.h>
>  #include <linux/kexec.h>
> -#include <linux/crash_dump.h>
>  #include <linux/root_dev.h>
>  #include <linux/cpu.h>
>  #include <linux/interrupt.h>
> @@ -222,6 +221,12 @@ static void __init request_standard_resources(void)
>  		    kernel_data.end <= res->end)
>  			request_resource(res, &kernel_data);
>  	}
> +
> +#ifdef CONFIG_KEXEC_CORE
> +	/* User space tools will find "Crash kernel" region in /proc/iomem. */
> +	if (crashk_res.end)
> +		insert_resource(&iomem_resource, &crashk_res);
> +#endif

nit: Could we not do this from reserve_crashkernel() ?

>
>  #include <asm/boot.h>
>  #include <asm/fixmap.h>
> @@ -76,6 +78,117 @@ static int __init early_initrd(char *p)
>  early_param("initrd", early_initrd);
>  #endif
>
> +#ifdef CONFIG_KEXEC_CORE
> +static unsigned long long crash_size, crash_base;
> +static struct property crash_base_prop = {
> +	.name = "linux,crashkernel-base",
> +	.length = sizeof(u64),
> +	.value = &crash_base
> +};
> +static struct property crash_size_prop = {
> +	.name = "linux,crashkernel-size",
> +	.length = sizeof(u64),
> +	.value = &crash_size,
> +};
> +

> +
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +static void __init reserve_crashkernel(void)
> +{
> +	int ret;
> +
> +	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> +				&crash_size, &crash_base);
> +	/* no crashkernel= or invalid value specified */
> +	if (ret || !crash_size)
> +		return;
> +
> +	if (crash_base == 0) {

...

> +		memblock_reserve(crash_base, crash_size);
> +
> +	} else {

...

> +
> +		memblock_reserve(crash_base, crash_size);

> +	}

Nit: you could move the memblock_reserve() here to a single place.

> +
> +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> +		crash_size >> 20, crash_base >> 20);
> +
> +	crashk_res.start = crash_base;
> +	crashk_res.end = crash_base + crash_size - 1;

As mentioned above, may be we could move the insert_resource() here, which
would keep the generic setup.c code cleaner and is a bit more reader friendly.

> +}
> +#else

Suzuki

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 3/8] arm64: kdump: implement machine_crash_shutdown()
  2016-07-12  5:05 ` [PATCH v22 3/8] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
@ 2016-07-13  9:32   ` Suzuki K Poulose
  2016-07-13 16:00     ` AKASHI Takahiro
  0 siblings, 1 reply; 35+ messages in thread
From: Suzuki K Poulose @ 2016-07-13  9:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/07/16 06:05, AKASHI Takahiro wrote:
> Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
> and save registers' status in per-cpu ELF notes before starting crash
> dump kernel. See kernel_kexec().
> Even if not all secondary cpus have shut down, we do kdump anyway.
>
> As we don't have to make non-boot(crashed) cpus offline (to preserve
> correct status of cpus at crash dump) before shutting down, this patch
> also adds a variant of smp_send_stop().
>
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> ---
>  arch/arm64/include/asm/hardirq.h  |  2 +-
>  arch/arm64/include/asm/kexec.h    | 41 ++++++++++++++++++++++++-
>  arch/arm64/include/asm/smp.h      |  2 ++
>  arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
>  arch/arm64/kernel/smp.c           | 64 +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 160 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
> index 8740297..1473fc2 100644
> --- a/arch/arm64/include/asm/hardirq.h
> +++ b/arch/arm64/include/asm/hardirq.h
> @@ -20,7 +20,7 @@
>  #include <linux/threads.h>
>  #include <asm/irq.h>
>
> -#define NR_IPI	6
> +#define NR_IPI	7
>
>  typedef struct {
>  	unsigned int __softirq_pending;
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 04744dc..a908958 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -40,7 +40,46 @@
>  static inline void crash_setup_regs(struct pt_regs *newregs,
>  				    struct pt_regs *oldregs)
>  {
> -	/* Empty routine needed to avoid build errors. */
> +	if (oldregs) {
> +		memcpy(newregs, oldregs, sizeof(*newregs));
> +	} else {
> +		u64 tmp1, tmp2;
> +
> +		__asm__ __volatile__ (
> +			"stp	 x0,   x1, [%2, #16 *  0]\n"
> +			"stp	 x2,   x3, [%2, #16 *  1]\n"
> +			"stp	 x4,   x5, [%2, #16 *  2]\n"
> +			"stp	 x6,   x7, [%2, #16 *  3]\n"
> +			"stp	 x8,   x9, [%2, #16 *  4]\n"
> +			"stp	x10,  x11, [%2, #16 *  5]\n"
> +			"stp	x12,  x13, [%2, #16 *  6]\n"
> +			"stp	x14,  x15, [%2, #16 *  7]\n"
> +			"stp	x16,  x17, [%2, #16 *  8]\n"
> +			"stp	x18,  x19, [%2, #16 *  9]\n"
> +			"stp	x20,  x21, [%2, #16 * 10]\n"
> +			"stp	x22,  x23, [%2, #16 * 11]\n"
> +			"stp	x24,  x25, [%2, #16 * 12]\n"
> +			"stp	x26,  x27, [%2, #16 * 13]\n"
> +			"stp	x28,  x29, [%2, #16 * 14]\n"
> +			"mov	 %0,  sp\n"
> +			"stp	x30,  %0,  [%2, #16 * 15]\n"
> +
> +			"/* faked current PSTATE */\n"
> +			"mrs	 %0, CurrentEL\n"
> +			"mrs	 %1, DAIF\n"
> +			"orr	 %0, %0, %1\n"
> +			"mrs	 %1, NZCV\n"
> +			"orr	 %0, %0, %1\n"
> +
> +			/* pc */
> +			"adr	 %1, 1f\n"
> +		"1:\n"
> +			"stp	 %1, %0,   [%2, #16 * 16]\n"
> +			: "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
> +			:
> +			: "memory"
> +		);
> +	}
>  }
>
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index 0226447..6b0f2c7 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -136,6 +136,8 @@ static inline void cpu_panic_kernel(void)
>   */
>  bool cpus_are_stuck_in_kernel(void);
>
> +extern void smp_send_crash_stop(void);
> +
>  #endif /* ifndef __ASSEMBLY__ */
>
>  #endif /* ifndef __ASM_SMP_H */
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index bc96c8a..8ac9dba8 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -9,6 +9,9 @@
>   * published by the Free Software Foundation.
>   */
>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
>  #include <linux/kexec.h>
>  #include <linux/smp.h>
>
> @@ -22,6 +25,7 @@
>  extern const unsigned char arm64_relocate_new_kernel[];
>  extern const unsigned long arm64_relocate_new_kernel_size;
>
> +bool in_crash_kexec;
>  static unsigned long kimage_start;
>
>  /**
> @@ -148,7 +152,8 @@ void machine_kexec(struct kimage *kimage)
>  	/*
>  	 * New cpus may have become stuck_in_kernel after we loaded the image.
>  	 */
> -	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
> +	BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
> +			!WARN_ON(in_crash_kexec));
>
>  	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
>  	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
> @@ -200,13 +205,58 @@ void machine_kexec(struct kimage *kimage)
>  	 * relocation is complete.
>  	 */
>
> -	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
> +	cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
>  		kimage_start, 0);
>
>  	BUG(); /* Should never get here. */
>  }
>
> +static void machine_kexec_mask_interrupts(void)
> +{
> +	unsigned int i;
> +	struct irq_desc *desc;
> +
> +	for_each_irq_desc(i, desc) {
> +		struct irq_chip *chip;
> +		int ret;
> +
> +		chip = irq_desc_get_chip(desc);
> +		if (!chip)
> +			continue;
> +
> +		/*
> +		 * First try to remove the active state. If this
> +		 * fails, try to EOI the interrupt.
> +		 */
> +		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
> +
> +		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
> +		    chip->irq_eoi)
> +			chip->irq_eoi(&desc->irq_data);
> +
> +		if (chip->irq_mask)
> +			chip->irq_mask(&desc->irq_data);
> +
> +		if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
> +			chip->irq_disable(&desc->irq_data);
> +	}
> +}
> +
> +/**
> + * machine_crash_shutdown - shutdown non-crashing cpus and save registers
> + */
>  void machine_crash_shutdown(struct pt_regs *regs)
>  {
> -	/* Empty routine needed to avoid build errors. */
> +	local_irq_disable();
> +
> +	in_crash_kexec = true;
> +
> +	/* shutdown non-crashing cpus */
> +	smp_send_crash_stop();
> +
> +	/* for crashing cpu */
> +	crash_save_cpu(regs, smp_processor_id());
> +	machine_kexec_mask_interrupts();
> +
> +	pr_info("Starting crashdump kernel...\n");
>  }
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 62ff3c0..66923db 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -37,6 +37,7 @@
>  #include <linux/completion.h>
>  #include <linux/of.h>
>  #include <linux/irq_work.h>
> +#include <linux/kexec.h>
>
>  #include <asm/alternative.h>
>  #include <asm/atomic.h>
> @@ -71,6 +72,7 @@ enum ipi_msg_type {
>  	IPI_RESCHEDULE,
>  	IPI_CALL_FUNC,
>  	IPI_CPU_STOP,
> +	IPI_CPU_CRASH_STOP,
>  	IPI_TIMER,
>  	IPI_IRQ_WORK,
>  	IPI_WAKEUP
> @@ -726,6 +728,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
>  	S(IPI_RESCHEDULE, "Rescheduling interrupts"),
>  	S(IPI_CALL_FUNC, "Function call interrupts"),
>  	S(IPI_CPU_STOP, "CPU stop interrupts"),
> +	S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
>  	S(IPI_TIMER, "Timer broadcast interrupts"),
>  	S(IPI_IRQ_WORK, "IRQ work interrupts"),
>  	S(IPI_WAKEUP, "CPU wake-up interrupts"),
> @@ -800,6 +803,30 @@ static void ipi_cpu_stop(unsigned int cpu)
>  		cpu_relax();
>  }
>
> +#ifdef CONFIG_KEXEC_CORE
> +static atomic_t waiting_for_crash_ipi;
> +#endif
> +
> +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
> +{
> +#ifdef CONFIG_KEXEC_CORE
> +	crash_save_cpu(regs, cpu);
> +
> +	atomic_dec(&waiting_for_crash_ipi);
> +
> +	local_irq_disable();
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +	if (cpu_ops[cpu]->cpu_die)
> +		cpu_ops[cpu]->cpu_die(cpu);
> +#endif
> +


> +	/* just in case */
> +	while (1)
> +		wfi();

You could use cpu_park_loop() (defined in asm/smp.h) here instead,
which also includes wfe();

Suzuki

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 8/8] Documentation: dt: chosen properties for arm64 kdump
  2016-07-12 10:07   ` Mark Rutland
@ 2016-07-13 15:14     ` AKASHI Takahiro
  0 siblings, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-13 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

Mark,

On Tue, Jul 12, 2016 at 11:07:45AM +0100, Mark Rutland wrote:
> Hi,
> 
> Apologies for the delay on this.
> 
> On Tue, Jul 12, 2016 at 02:05:14PM +0900, AKASHI Takahiro wrote:
> > From: James Morse <james.morse@arm.com>
> > 
> > Add documentation for
> > 	linux,crashkernel-base and crashkernel-size,
> > 	linux,usable-memory-range, and
> > 	linux,elfcorehdr
> > used by arm64 kexec/kdump to decribe the kdump reserved area, and
> > the elfcorehdr's location within it.
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > [takahiro.akashi at linaro.org:
> >     renamed "usable-memory" to "usable-memory-range",
> >     added "linux,crashkernel-base" and "-size" ]
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > ---
> >  Documentation/devicetree/bindings/chosen.txt | 45 ++++++++++++++++++++++++++++
> >  1 file changed, 45 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> > index 6ae9d82..d7a3a86 100644
> > --- a/Documentation/devicetree/bindings/chosen.txt
> > +++ b/Documentation/devicetree/bindings/chosen.txt
> > @@ -52,3 +52,48 @@ This property is set (currently only on PowerPC, and only needed on
> >  book3e) by some versions of kexec-tools to tell the new kernel that it
> >  is being booted by kexec, as the booting environment may differ (e.g.
> >  a different secondary CPU release mechanism)
> > +
> > +linux,crashkernel-base
> > +linux,crashkernel-size
> > +----------------------
> > +These properties are set (on PowerPC and arm64) during kdump to tell
> > +use-space tools, like kexec-tools, the base address of the crash-dump
> > +kernel's reserved area of memory and the size. e.g.
> 
> No need to mention what consumes this. Just state what it describes,
> e.g.
> 
> 	These properties describe the base and size of the crash-dump
> 	kernel's reserved area of memory. Valid for PowerPC and arm64.

Sure.

> > +
> > +/ {
> > +	chosen {
> > +		linux,crashkernel-base = <0x9 0xf0000000>;
> > +		linux,crashkernel-size = <0x0 0x10000000>;
> > +	};
> > +};
> > +
> > +linux,usable-memory-range
> > +-------------------------
> > +
> > +This property is set (currently only on arm64) during kdump to tell
> > +the crash-dump kernel the base address of its reserved area of memory,
> > +and the size. e.g.
> 
> The description sounds like this duplicates linux,crashkernel-*. What is
> the difference between the two?

Simply saying, crashkernel-* are for the first(normal) kernel,
while usable-memory-range is for the second(crash-dump) kernel.

The former appears only under /sys/firmware/devicetree/base/
on kdump-enabled kernel  whenever "crashkernel=" command line parameter
is specified. So user space tools, like kexec-tools, will be able to know
what memory region is reserved for kdump.

The latter is added in device-tree blob which is passed to
crash-dump kernel so the kernel recognize what memory region is usable
for system ram. We will see it in /sys/firmware/devicetree/base
as well as /sys/firmware/fdt on crash dump kernel. 

So both can potentially appear at the same time on crash dump kernel, but
they will have different values in that case.
(So we need different names.)

> On powerpc it looks like there's a linux,usable-memory property (without
> the -range suffix). How do these differ?

Please also see Thiago's comment.
Basically "linux,usable-memory" property imposes further restriction
on memory node's "reg" property.

Currently kdump only supports a single memory region as usable memory
of crash dump kernel, and so adding "linux,usable-memory" to every
memory node in dtb is sort of "doing too much" IMHO.
In addition, there are no memory nodes on UEFI(w/ACPI) systems.

> > +
> > +/ {
> > +	chosen {
> > +		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
> > +	};
> > +};
> > +
> > +Please note that, if this property is present, any memory regions under
> > +"memory" nodes will be ignored.
> 
> What exactly do you mean by "ignored"?
> 
> Do we truly ignore this property, or do we map that memory at some
> point, even if not used for general allocation?

Totally ignored. All the regions are excluded from memblock.
See my patch #1.

> > +
> > +linux,elfcorehdr
> > +----------------
> > +
> > +This property is set (currently only on arm64) during kdump to tell
> > +the crash-dump kernel the address and size of the elfcorehdr that describes
> > +the old kernel's memory as an elf file. This memory must reside within
> > +the area described by 'linux,usable-memory-range'. e.g.
> 
> 
> As with the linux,crashkernel-* properties, just state what this
> describes, e.g.
> 
> 	This property describes the base and size of the ELF core
> 	header, which describes the old kernel's memory as an ELF file.
> 	This memory must reside within the range described by
> 	linux,usable-memory-range.
> 
> That said, this falling within usable-memory feels very odd, because
> it's not actually usable for general purposes. Why can the kernel not
> memremap this, such that it need not be in usable-memory?

Well, James made a similar comment on this before.
By putting elfcorehdr in usable memory of crash dump kernel,
it doesn't even have to call memremap() to access the data.

Initially, I thought that it would also reduce the possibility
of the elfcorehdr region being corrupted by accident on crash, but
there will be no difference even if it is allocated anywhere else.

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-13  9:12   ` Suzuki K Poulose
@ 2016-07-13 15:42     ` AKASHI Takahiro
  0 siblings, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-13 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, Jul 13, 2016 at 10:12:12AM +0100, Suzuki K Poulose wrote:
> On 12/07/16 06:05, AKASHI Takahiro wrote:
> >On the startup of primary kernel, the memory region used by crash dump
> >kernel must be specified by "crashkernel=" kernel parameter.
> >reserve_crashkernel() will allocate and reserve the region for later use.
> >
> >User space tools, like kexec-tools, will be able to find that region as
> >	- "Crash kernel" in /proc/iomem, or
> >	- "linux,crashkernel-base" and "linux,crashkernel-size" under
> >	  /sys/firmware/devicetree/base/chosen
> >
> >Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> >Signed-off-by: Mark Salter <msalter@redhat.com>
> >Signed-off-by: Pratyush Anand <panand@redhat.com>
> >Reviewed-by: James Morse <james.morse@arm.com>
> 
> Minor nits below.
> 
> >---
> > arch/arm64/kernel/setup.c |   7 ++-
> > arch/arm64/mm/init.c      | 115 ++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 121 insertions(+), 1 deletion(-)
> >
> >diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >index c1509e6..cb5eee0 100644
> >--- a/arch/arm64/kernel/setup.c
> >+++ b/arch/arm64/kernel/setup.c
> >@@ -31,7 +31,6 @@
> > #include <linux/screen_info.h>
> > #include <linux/init.h>
> > #include <linux/kexec.h>
> >-#include <linux/crash_dump.h>
> > #include <linux/root_dev.h>
> > #include <linux/cpu.h>
> > #include <linux/interrupt.h>
> >@@ -222,6 +221,12 @@ static void __init request_standard_resources(void)
> > 		    kernel_data.end <= res->end)
> > 			request_resource(res, &kernel_data);
> > 	}
> >+
> >+#ifdef CONFIG_KEXEC_CORE
> >+	/* User space tools will find "Crash kernel" region in /proc/iomem. */
> >+	if (crashk_res.end)
> >+		insert_resource(&iomem_resource, &crashk_res);
> >+#endif
> 
> nit: Could we not do this from reserve_crashkernel() ?

Reserve_crashkernel() should be called before arm64_memblock_init().
If the code above is put in reserve_crashkernel(), the inserted
resource will be gone later by calling request_resource() in
request_standard_resource() because the crashkernel memory is part of
System RAM.

> >
> > #include <asm/boot.h>
> > #include <asm/fixmap.h>
> >@@ -76,6 +78,117 @@ static int __init early_initrd(char *p)
> > early_param("initrd", early_initrd);
> > #endif
> >
> >+#ifdef CONFIG_KEXEC_CORE
> >+static unsigned long long crash_size, crash_base;
> >+static struct property crash_base_prop = {
> >+	.name = "linux,crashkernel-base",
> >+	.length = sizeof(u64),
> >+	.value = &crash_base
> >+};
> >+static struct property crash_size_prop = {
> >+	.name = "linux,crashkernel-size",
> >+	.length = sizeof(u64),
> >+	.value = &crash_size,
> >+};
> >+
> 
> >+
> >+/*
> >+ * reserve_crashkernel() - reserves memory for crash kernel
> >+ *
> >+ * This function reserves memory area given in "crashkernel=" kernel command
> >+ * line parameter. The memory reserved is used by dump capture kernel when
> >+ * primary kernel is crashing.
> >+ */
> >+static void __init reserve_crashkernel(void)
> >+{
> >+	int ret;
> >+
> >+	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> >+				&crash_size, &crash_base);
> >+	/* no crashkernel= or invalid value specified */
> >+	if (ret || !crash_size)
> >+		return;
> >+
> >+	if (crash_base == 0) {
> 
> ...
> 
> >+		memblock_reserve(crash_base, crash_size);
> >+
> >+	} else {
> 
> ...
> 
> >+
> >+		memblock_reserve(crash_base, crash_size);
> 
> >+	}
> 
> Nit: you could move the memblock_reserve() here to a single place.

Sure.

Thanks,
-Takahiro AKASHI

> >+
> >+	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> >+		crash_size >> 20, crash_base >> 20);
> >+
> >+	crashk_res.start = crash_base;
> >+	crashk_res.end = crash_base + crash_size - 1;
> 
> As mentioned above, may be we could move the insert_resource() here, which
> would keep the generic setup.c code cleaner and is a bit more reader friendly.
> 
> >+}
> >+#else
> 
> Suzuki
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 3/8] arm64: kdump: implement machine_crash_shutdown()
  2016-07-13  9:32   ` Suzuki K Poulose
@ 2016-07-13 16:00     ` AKASHI Takahiro
  0 siblings, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-13 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 13, 2016 at 10:32:39AM +0100, Suzuki K Poulose wrote:
> On 12/07/16 06:05, AKASHI Takahiro wrote:
> >Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
> >and save registers' status in per-cpu ELF notes before starting crash
> >dump kernel. See kernel_kexec().
> >Even if not all secondary cpus have shut down, we do kdump anyway.
> >
> >As we don't have to make non-boot(crashed) cpus offline (to preserve
> >correct status of cpus at crash dump) before shutting down, this patch
> >also adds a variant of smp_send_stop().
> >
> >Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> >---
> > arch/arm64/include/asm/hardirq.h  |  2 +-
> > arch/arm64/include/asm/kexec.h    | 41 ++++++++++++++++++++++++-
> > arch/arm64/include/asm/smp.h      |  2 ++
> > arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
> > arch/arm64/kernel/smp.c           | 64 +++++++++++++++++++++++++++++++++++++++
> > 5 files changed, 160 insertions(+), 5 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
> >index 8740297..1473fc2 100644
> >--- a/arch/arm64/include/asm/hardirq.h
> >+++ b/arch/arm64/include/asm/hardirq.h
> >@@ -20,7 +20,7 @@
> > #include <linux/threads.h>
> > #include <asm/irq.h>
> >
> >-#define NR_IPI	6
> >+#define NR_IPI	7
> >
> > typedef struct {
> > 	unsigned int __softirq_pending;
> >diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> >index 04744dc..a908958 100644
> >--- a/arch/arm64/include/asm/kexec.h
> >+++ b/arch/arm64/include/asm/kexec.h
> >@@ -40,7 +40,46 @@
> > static inline void crash_setup_regs(struct pt_regs *newregs,
> > 				    struct pt_regs *oldregs)
> > {
> >-	/* Empty routine needed to avoid build errors. */
> >+	if (oldregs) {
> >+		memcpy(newregs, oldregs, sizeof(*newregs));
> >+	} else {
> >+		u64 tmp1, tmp2;
> >+
> >+		__asm__ __volatile__ (
> >+			"stp	 x0,   x1, [%2, #16 *  0]\n"
> >+			"stp	 x2,   x3, [%2, #16 *  1]\n"
> >+			"stp	 x4,   x5, [%2, #16 *  2]\n"
> >+			"stp	 x6,   x7, [%2, #16 *  3]\n"
> >+			"stp	 x8,   x9, [%2, #16 *  4]\n"
> >+			"stp	x10,  x11, [%2, #16 *  5]\n"
> >+			"stp	x12,  x13, [%2, #16 *  6]\n"
> >+			"stp	x14,  x15, [%2, #16 *  7]\n"
> >+			"stp	x16,  x17, [%2, #16 *  8]\n"
> >+			"stp	x18,  x19, [%2, #16 *  9]\n"
> >+			"stp	x20,  x21, [%2, #16 * 10]\n"
> >+			"stp	x22,  x23, [%2, #16 * 11]\n"
> >+			"stp	x24,  x25, [%2, #16 * 12]\n"
> >+			"stp	x26,  x27, [%2, #16 * 13]\n"
> >+			"stp	x28,  x29, [%2, #16 * 14]\n"
> >+			"mov	 %0,  sp\n"
> >+			"stp	x30,  %0,  [%2, #16 * 15]\n"
> >+
> >+			"/* faked current PSTATE */\n"
> >+			"mrs	 %0, CurrentEL\n"
> >+			"mrs	 %1, DAIF\n"
> >+			"orr	 %0, %0, %1\n"
> >+			"mrs	 %1, NZCV\n"
> >+			"orr	 %0, %0, %1\n"
> >+
> >+			/* pc */
> >+			"adr	 %1, 1f\n"
> >+		"1:\n"
> >+			"stp	 %1, %0,   [%2, #16 * 16]\n"
> >+			: "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
> >+			:
> >+			: "memory"
> >+		);
> >+	}
> > }
> >
> > #endif /* __ASSEMBLY__ */
> >diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> >index 0226447..6b0f2c7 100644
> >--- a/arch/arm64/include/asm/smp.h
> >+++ b/arch/arm64/include/asm/smp.h
> >@@ -136,6 +136,8 @@ static inline void cpu_panic_kernel(void)
> >  */
> > bool cpus_are_stuck_in_kernel(void);
> >
> >+extern void smp_send_crash_stop(void);
> >+
> > #endif /* ifndef __ASSEMBLY__ */
> >
> > #endif /* ifndef __ASM_SMP_H */
> >diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> >index bc96c8a..8ac9dba8 100644
> >--- a/arch/arm64/kernel/machine_kexec.c
> >+++ b/arch/arm64/kernel/machine_kexec.c
> >@@ -9,6 +9,9 @@
> >  * published by the Free Software Foundation.
> >  */
> >
> >+#include <linux/interrupt.h>
> >+#include <linux/irq.h>
> >+#include <linux/kernel.h>
> > #include <linux/kexec.h>
> > #include <linux/smp.h>
> >
> >@@ -22,6 +25,7 @@
> > extern const unsigned char arm64_relocate_new_kernel[];
> > extern const unsigned long arm64_relocate_new_kernel_size;
> >
> >+bool in_crash_kexec;
> > static unsigned long kimage_start;
> >
> > /**
> >@@ -148,7 +152,8 @@ void machine_kexec(struct kimage *kimage)
> > 	/*
> > 	 * New cpus may have become stuck_in_kernel after we loaded the image.
> > 	 */
> >-	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
> >+	BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
> >+			!WARN_ON(in_crash_kexec));
> >
> > 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
> > 	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
> >@@ -200,13 +205,58 @@ void machine_kexec(struct kimage *kimage)
> > 	 * relocation is complete.
> > 	 */
> >
> >-	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
> >+	cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
> > 		kimage_start, 0);
> >
> > 	BUG(); /* Should never get here. */
> > }
> >
> >+static void machine_kexec_mask_interrupts(void)
> >+{
> >+	unsigned int i;
> >+	struct irq_desc *desc;
> >+
> >+	for_each_irq_desc(i, desc) {
> >+		struct irq_chip *chip;
> >+		int ret;
> >+
> >+		chip = irq_desc_get_chip(desc);
> >+		if (!chip)
> >+			continue;
> >+
> >+		/*
> >+		 * First try to remove the active state. If this
> >+		 * fails, try to EOI the interrupt.
> >+		 */
> >+		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
> >+
> >+		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
> >+		    chip->irq_eoi)
> >+			chip->irq_eoi(&desc->irq_data);
> >+
> >+		if (chip->irq_mask)
> >+			chip->irq_mask(&desc->irq_data);
> >+
> >+		if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
> >+			chip->irq_disable(&desc->irq_data);
> >+	}
> >+}
> >+
> >+/**
> >+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
> >+ */
> > void machine_crash_shutdown(struct pt_regs *regs)
> > {
> >-	/* Empty routine needed to avoid build errors. */
> >+	local_irq_disable();
> >+
> >+	in_crash_kexec = true;
> >+
> >+	/* shutdown non-crashing cpus */
> >+	smp_send_crash_stop();
> >+
> >+	/* for crashing cpu */
> >+	crash_save_cpu(regs, smp_processor_id());
> >+	machine_kexec_mask_interrupts();
> >+
> >+	pr_info("Starting crashdump kernel...\n");
> > }
> >diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> >index 62ff3c0..66923db 100644
> >--- a/arch/arm64/kernel/smp.c
> >+++ b/arch/arm64/kernel/smp.c
> >@@ -37,6 +37,7 @@
> > #include <linux/completion.h>
> > #include <linux/of.h>
> > #include <linux/irq_work.h>
> >+#include <linux/kexec.h>
> >
> > #include <asm/alternative.h>
> > #include <asm/atomic.h>
> >@@ -71,6 +72,7 @@ enum ipi_msg_type {
> > 	IPI_RESCHEDULE,
> > 	IPI_CALL_FUNC,
> > 	IPI_CPU_STOP,
> >+	IPI_CPU_CRASH_STOP,
> > 	IPI_TIMER,
> > 	IPI_IRQ_WORK,
> > 	IPI_WAKEUP
> >@@ -726,6 +728,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
> > 	S(IPI_RESCHEDULE, "Rescheduling interrupts"),
> > 	S(IPI_CALL_FUNC, "Function call interrupts"),
> > 	S(IPI_CPU_STOP, "CPU stop interrupts"),
> >+	S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
> > 	S(IPI_TIMER, "Timer broadcast interrupts"),
> > 	S(IPI_IRQ_WORK, "IRQ work interrupts"),
> > 	S(IPI_WAKEUP, "CPU wake-up interrupts"),
> >@@ -800,6 +803,30 @@ static void ipi_cpu_stop(unsigned int cpu)
> > 		cpu_relax();
> > }
> >
> >+#ifdef CONFIG_KEXEC_CORE
> >+static atomic_t waiting_for_crash_ipi;
> >+#endif
> >+
> >+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
> >+{
> >+#ifdef CONFIG_KEXEC_CORE
> >+	crash_save_cpu(regs, cpu);
> >+
> >+	atomic_dec(&waiting_for_crash_ipi);
> >+
> >+	local_irq_disable();
> >+
> >+#ifdef CONFIG_HOTPLUG_CPU
> >+	if (cpu_ops[cpu]->cpu_die)
> >+		cpu_ops[cpu]->cpu_die(cpu);
> >+#endif
> >+
> 
> 
> >+	/* just in case */
> >+	while (1)
> >+		wfi();
> 
> You could use cpu_park_loop() (defined in asm/smp.h) here instead,
> which also includes wfe();

Yes, sure.

-Takahiro AKASHI

> Suzuki
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-12  5:05 ` [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
@ 2016-07-18 18:04   ` James Morse
  2016-07-19  8:35     ` AKASHI Takahiro
  2016-07-21  0:57     ` AKASHI Takahiro
  0 siblings, 2 replies; 35+ messages in thread
From: James Morse @ 2016-07-18 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi!

(CC: Dennis Chen)

On 12/07/16 06:05, AKASHI Takahiro wrote:
> Crash dump kernel will be run with a limited range of memory as System
> RAM.
> 
> On arm64, we will use a device-tree property under /chosen,
>    linux,usable-memory-range = <BASE SIZE>
> in order for primary kernel either on uefi or non-uefi (device tree only)
> system to hand over the information about usable memory region to crash
> dump kernel. This property will supercede entries in uefi memory map table
> and "memory" nodes in a device tree.

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 51b1302..d8b296f 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
>  }
>  early_param("mem", early_mem);
>  
> +static int __init early_init_dt_scan_usablemem(unsigned long node,
> +		const char *uname, int depth, void *data)
> +{
> +	struct memblock_region *usablemem = (struct memblock_region *)data;
> +	const __be32 *reg;
> +	int len;
> +
> +	usablemem->size = 0;
> +
> +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> +		return 0;
> +
> +	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> +		return 1;
> +
> +	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> +	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> +
> +	return 1;
> +}
> +
> +static void __init fdt_enforce_memory_region(void)
> +{
> +	struct memblock_region reg;
> +
> +	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> +
> +	if (reg.size) {
> +		memblock_remove(0, PAGE_ALIGN(reg.base));
> +		memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> +				ULLONG_MAX);

I think this is a new way to trip the problem Dennis Chen has been working on
[0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
the panic below [1]...

It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can be
extended to support a range instead of just a limit?

(It looks like x86 explicitly adds the acpi regions to the crash-kernels memory
map in crash_setup_memmap_entries()).



Is it possible for the kernel text to be outside this range? (a bug in
kexec-tools, or another user of the DT property) If we haven't already failed in
this case, it may be worth printing a warning, or refusing to
restrict-memory/expose-vmcore.



Thanks,

James


[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/443356.html
[1]
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
[    0.000000] efi:  ACPI=0xf95b0000  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
[    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
[    0.000000] cma: Reserved 16 MiB at 0x00000009fec00000
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000F95B0014 000024 (v02 ARMLTD)
[    0.000000] ACPI: XSDT 0x00000000F95A00E8 00004C (v01 ARMLTD ARM-JUNO 2014072
7      01000013)
[    0.000000] ACPI: FACP 0x00000000F9500000 00010C (v05 ARMLTD ARM-JUNO 2014072
7 ARM  00000099)
[    0.000000] ACPI: DSDT 0x00000000F94C0000 000396 (v01 ARMLTD ARM-JUNO 2014072
7 INTL 20150619)
[    0.000000] ACPI: GTDT 0x00000000F94F0000 000060 (v02 ARMLTD ARM-JUNO 2014072
7 ARM  00000099)
[    0.000000] ACPI: APIC 0x00000000F94E0000 000224 (v03 ARMLTD ARM-JUNO 2014072
7 ARM  00000099)
[    0.000000] ACPI: SSDT 0x00000000F94D0000 0001E3 (v01 ARMLTD ARM-JUNO 2014072
7 INTL 20150619)
[    0.000000] ACPI: MCFG 0x00000000F94B0000 00003C (v01 ARMLTD ARM-JUNO 2014072
7 ARM  00000099)
...
[    0.737577] Serial: AMBA PL011 UART driver
[    0.786086] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.794203] ACPI: Added _OSI(Module Device)
[    0.798659] ACPI: Added _OSI(Processor Device)
[    0.803190] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.807973] ACPI: Added _OSI(Processor Aggregator Device)
[    0.813653] Unable to handle kernel paging request at virtual address ffff000
00804e027
[    0.821704] pgd = ffff000008cce000
[    0.825155] [ffff00000804e027] *pgd=00000009ffffd003, *pud=00000009ffffc003,
*pmd=00000009ffffb003, *pte=00e80000f94c0707
[    0.836319] Internal error: Oops: 96000021 [#1] PREEMPT SMP
[    0.841972] Modules linked in:
[    0.845073] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S              4.7.0-rc4
+ #4569
[    0.852927] Hardware name: ARM Juno development board (r1) (DT)
[    0.858936] task: ffff80003d898000 ti: ffff80003d894000 task.ti: ffff80003d89
4000
[    0.866537] PC is at acpi_ns_lookup+0x23c/0x378
[    0.871131] LR is at acpi_ds_load1_begin_op+0x88/0x260
[    0.876340] pc : [<ffff0000084061a4>] lr : [<ffff0000083fc08c>] pstate: 60000
045
[    0.883846] sp : ffff80003d8979b0
[    0.887206] x29: ffff80003d8979b0 x28: 0000000000000000
[    0.892596] x27: 000000000000001b x26: ffff000008a80a07
[    0.897986] x25: ffff80003d897a48 x24: 0000000000000001
[    0.903377] x23: 0000000000000001 x22: ffff00000804e027
[    0.908769] x21: 000000000000001b x20: 0000000000000001
[    0.914158] x19: 0000000000000000 x18: ffff00000804efff
[    0.919547] x17: 00000000000038ff x16: 0000000000000002
[    0.924937] x15: ffff00000804efff x14: 0000008000000000
[    0.930326] x13: ffff000008c942b2 x12: ffff00000804efff
[    0.935717] x11: ffff000008bf0000 x10: 00000000ffffff76
[    0.941107] x9 : 0000000000000000 x8 : ffff000008cb6000
[    0.946498] x7 : 0000000000000000 x6 : ffff80003d897aa8
[    0.951891] x5 : ffff80003d028400 x4 : 0000000000000001
[    0.957281] x3 : 0000000000000003 x2 : ffff000008cb6090
[    0.962673] x1 : 000000000000005f x0 : 0000000000000000
[    0.968063]
[    0.969569] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
[    1.387661] Call trace:
...
[    1.473172] [<ffff0000084061a4>] acpi_ns_lookup+0x23c/0x378
[    1.478832] [<ffff0000083fc08c>] acpi_ds_load1_begin_op+0x88/0x260
[    1.485105] [<ffff00000840c0e8>] acpi_ps_build_named_op+0xa8/0x170
[    1.491378] [<ffff00000840c2e0>] acpi_ps_create_op+0x130/0x230
[    1.497299] [<ffff00000840bc28>] acpi_ps_parse_loop+0x168/0x580
[    1.503302] [<ffff00000840cb44>] acpi_ps_parse_aml+0xa0/0x278
[    1.509135] [<ffff0000084081d0>] acpi_ns_one_complete_parse+0x128/0x150
[    1.515852] [<ffff00000840821c>] acpi_ns_parse_table+0x24/0x44
[    1.521775] [<ffff0000084079e8>] acpi_ns_load_table+0x54/0xdc
[    1.527612] [<ffff000008411038>] acpi_tb_load_namespace+0xd0/0x230
[    1.533887] [<ffff000008b2695c>] acpi_load_tables+0x3c/0xa8
[    1.539542] [<ffff000008b25974>] acpi_init+0x88/0x2b0
[    1.544670] [<ffff000008081a08>] do_one_initcall+0x38/0x128
[    1.550325] [<ffff000008b00cc0>] kernel_init_freeable+0x14c/0x1f0
[    1.556517] [<ffff0000087d2088>] kernel_init+0x10/0x100
[    1.561823] [<ffff000008084e10>] ret_from_fork+0x10/0x40
[    1.567216] Code: b9008fbb 2a000318 36380054 32190318 (b94002c0)
[    1.573451] ---[ end trace dec6cecdcba673b7 ]---
[    1.578158] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00
00000b
[    1.578158]
[    1.587428] SMP: stopping secondary CPUs
[    1.591411] ---[ end Kernel panic - not syncing: Attempted to kill init! exit
code=0x0000000b
[    0.969225] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-18 18:04   ` James Morse
@ 2016-07-19  8:35     ` AKASHI Takahiro
  2016-07-19 10:06       ` Dennis Chen
  2016-07-21  0:57     ` AKASHI Takahiro
  1 sibling, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-19  8:35 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Mon, Jul 18, 2016 at 07:04:33PM +0100, James Morse wrote:
> Hi!
> 
> (CC: Dennis Chen)
> 
> On 12/07/16 06:05, AKASHI Takahiro wrote:
> > Crash dump kernel will be run with a limited range of memory as System
> > RAM.
> > 
> > On arm64, we will use a device-tree property under /chosen,
> >    linux,usable-memory-range = <BASE SIZE>
> > in order for primary kernel either on uefi or non-uefi (device tree only)
> > system to hand over the information about usable memory region to crash
> > dump kernel. This property will supercede entries in uefi memory map table
> > and "memory" nodes in a device tree.
> 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 51b1302..d8b296f 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
> >  }
> >  early_param("mem", early_mem);
> >  
> > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > +		const char *uname, int depth, void *data)
> > +{
> > +	struct memblock_region *usablemem = (struct memblock_region *)data;
> > +	const __be32 *reg;
> > +	int len;
> > +
> > +	usablemem->size = 0;
> > +
> > +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> > +		return 0;
> > +
> > +	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > +		return 1;
> > +
> > +	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > +	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > +
> > +	return 1;
> > +}
> > +
> > +static void __init fdt_enforce_memory_region(void)
> > +{
> > +	struct memblock_region reg;
> > +
> > +	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > +
> > +	if (reg.size) {
> > +		memblock_remove(0, PAGE_ALIGN(reg.base));
> > +		memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> > +				ULLONG_MAX);
> 
> I think this is a new way to trip the problem Dennis Chen has been working on
> [0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
> the panic below [1]...

Yeah, it can be.

> It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can be
> extended to support a range instead of just a limit?
>
> (It looks like x86 explicitly adds the acpi regions to the crash-kernels memory
> map in crash_setup_memmap_entries()).
> 
> 
> 
> Is it possible for the kernel text to be outside this range? (a bug in
> kexec-tools, or another user of the DT property) If we haven't already failed in
> this case, it may be worth printing a warning, or refusing to
> restrict-memory/expose-vmcore.

In my implementation of kdump, the usable memory for crash dump
kernel will be allocated within memblock.memory after ACPI-related
regions have been mapped. "linux,usable-memory-range" indicates
this exact memory range.
On crash dump kernel, my fdt_enforce_memory_region() in arm64_memblock_init()
will exclude all the other regions from memblock.memory.
So the kernel (with acpi=on) won't recognize ACPI-regions as
normal memory, and map them by ioremap().

I thought that it was safe, but actually not due to unaligned accesses.
As you suggested, we will probably be able to do the same thing of
Chen's solution in fdt_enforce_memory_region().

Thanks,
-Takahiro AKASHI

> 
> 
> Thanks, > > James
> 
> 
> [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/443356.html
> [1]
> [    0.000000] efi: Getting EFI parameters from FDT:
> [    0.000000] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
> [    0.000000] efi:  ACPI=0xf95b0000  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
> [    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
> [    0.000000] cma: Reserved 16 MiB at 0x00000009fec00000
> [    0.000000] ACPI: Early table checksum verification disabled
> [    0.000000] ACPI: RSDP 0x00000000F95B0014 000024 (v02 ARMLTD)
> [    0.000000] ACPI: XSDT 0x00000000F95A00E8 00004C (v01 ARMLTD ARM-JUNO 2014072
> 7      01000013)
> [    0.000000] ACPI: FACP 0x00000000F9500000 00010C (v05 ARMLTD ARM-JUNO 2014072
> 7 ARM  00000099)
> [    0.000000] ACPI: DSDT 0x00000000F94C0000 000396 (v01 ARMLTD ARM-JUNO 2014072
> 7 INTL 20150619)
> [    0.000000] ACPI: GTDT 0x00000000F94F0000 000060 (v02 ARMLTD ARM-JUNO 2014072
> 7 ARM  00000099)
> [    0.000000] ACPI: APIC 0x00000000F94E0000 000224 (v03 ARMLTD ARM-JUNO 2014072
> 7 ARM  00000099)
> [    0.000000] ACPI: SSDT 0x00000000F94D0000 0001E3 (v01 ARMLTD ARM-JUNO 2014072
> 7 INTL 20150619)
> [    0.000000] ACPI: MCFG 0x00000000F94B0000 00003C (v01 ARMLTD ARM-JUNO 2014072
> 7 ARM  00000099)
> ...
> [    0.737577] Serial: AMBA PL011 UART driver
> [    0.786086] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> [    0.794203] ACPI: Added _OSI(Module Device)
> [    0.798659] ACPI: Added _OSI(Processor Device)
> [    0.803190] ACPI: Added _OSI(3.0 _SCP Extensions)
> [    0.807973] ACPI: Added _OSI(Processor Aggregator Device)
> [    0.813653] Unable to handle kernel paging request at virtual address ffff000
> 00804e027
> [    0.821704] pgd = ffff000008cce000
> [    0.825155] [ffff00000804e027] *pgd=00000009ffffd003, *pud=00000009ffffc003,
> *pmd=00000009ffffb003, *pte=00e80000f94c0707
> [    0.836319] Internal error: Oops: 96000021 [#1] PREEMPT SMP
> [    0.841972] Modules linked in:
> [    0.845073] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S              4.7.0-rc4
> + #4569
> [    0.852927] Hardware name: ARM Juno development board (r1) (DT)
> [    0.858936] task: ffff80003d898000 ti: ffff80003d894000 task.ti: ffff80003d89
> 4000
> [    0.866537] PC is at acpi_ns_lookup+0x23c/0x378
> [    0.871131] LR is at acpi_ds_load1_begin_op+0x88/0x260
> [    0.876340] pc : [<ffff0000084061a4>] lr : [<ffff0000083fc08c>] pstate: 60000
> 045
> [    0.883846] sp : ffff80003d8979b0
> [    0.887206] x29: ffff80003d8979b0 x28: 0000000000000000
> [    0.892596] x27: 000000000000001b x26: ffff000008a80a07
> [    0.897986] x25: ffff80003d897a48 x24: 0000000000000001
> [    0.903377] x23: 0000000000000001 x22: ffff00000804e027
> [    0.908769] x21: 000000000000001b x20: 0000000000000001
> [    0.914158] x19: 0000000000000000 x18: ffff00000804efff
> [    0.919547] x17: 00000000000038ff x16: 0000000000000002
> [    0.924937] x15: ffff00000804efff x14: 0000008000000000
> [    0.930326] x13: ffff000008c942b2 x12: ffff00000804efff
> [    0.935717] x11: ffff000008bf0000 x10: 00000000ffffff76
> [    0.941107] x9 : 0000000000000000 x8 : ffff000008cb6000
> [    0.946498] x7 : 0000000000000000 x6 : ffff80003d897aa8
> [    0.951891] x5 : ffff80003d028400 x4 : 0000000000000001
> [    0.957281] x3 : 0000000000000003 x2 : ffff000008cb6090
> [    0.962673] x1 : 000000000000005f x0 : 0000000000000000
> [    0.968063]
> [    0.969569] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
> [    1.387661] Call trace:
> ...
> [    1.473172] [<ffff0000084061a4>] acpi_ns_lookup+0x23c/0x378
> [    1.478832] [<ffff0000083fc08c>] acpi_ds_load1_begin_op+0x88/0x260
> [    1.485105] [<ffff00000840c0e8>] acpi_ps_build_named_op+0xa8/0x170
> [    1.491378] [<ffff00000840c2e0>] acpi_ps_create_op+0x130/0x230
> [    1.497299] [<ffff00000840bc28>] acpi_ps_parse_loop+0x168/0x580
> [    1.503302] [<ffff00000840cb44>] acpi_ps_parse_aml+0xa0/0x278
> [    1.509135] [<ffff0000084081d0>] acpi_ns_one_complete_parse+0x128/0x150
> [    1.515852] [<ffff00000840821c>] acpi_ns_parse_table+0x24/0x44
> [    1.521775] [<ffff0000084079e8>] acpi_ns_load_table+0x54/0xdc
> [    1.527612] [<ffff000008411038>] acpi_tb_load_namespace+0xd0/0x230
> [    1.533887] [<ffff000008b2695c>] acpi_load_tables+0x3c/0xa8
> [    1.539542] [<ffff000008b25974>] acpi_init+0x88/0x2b0
> [    1.544670] [<ffff000008081a08>] do_one_initcall+0x38/0x128
> [    1.550325] [<ffff000008b00cc0>] kernel_init_freeable+0x14c/0x1f0
> [    1.556517] [<ffff0000087d2088>] kernel_init+0x10/0x100
> [    1.561823] [<ffff000008084e10>] ret_from_fork+0x10/0x40
> [    1.567216] Code: b9008fbb 2a000318 36380054 32190318 (b94002c0)
> [    1.573451] ---[ end trace dec6cecdcba673b7 ]---
> [    1.578158] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00
> 00000b
> [    1.578158]
> [    1.587428] SMP: stopping secondary CPUs
> [    1.591411] ---[ end Kernel panic - not syncing: Attempted to kill init! exit
> code=0x0000000b
> [    0.969225] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-12  5:05 ` [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
  2016-07-13  9:12   ` Suzuki K Poulose
@ 2016-07-19  9:39   ` Dennis Chen
  2016-07-19 10:28     ` AKASHI Takahiro
  1 sibling, 1 reply; 35+ messages in thread
From: Dennis Chen @ 2016-07-19  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hello AKASHI,

On Tue, Jul 12, 2016 at 02:05:07PM +0900, AKASHI Takahiro wrote:
> On the startup of primary kernel, the memory region used by crash dump
> kernel must be specified by "crashkernel=" kernel parameter.
> reserve_crashkernel() will allocate and reserve the region for later use.
>
> User space tools, like kexec-tools, will be able to find that region as
>       - "Crash kernel" in /proc/iomem, or
>       - "linux,crashkernel-base" and "linux,crashkernel-size" under
>         /sys/firmware/devicetree/base/chosen
>
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Signed-off-by: Mark Salter <msalter@redhat.com>
> Signed-off-by: Pratyush Anand <panand@redhat.com>
> Reviewed-by: James Morse <james.morse@arm.com>
> ---
>  arch/arm64/kernel/setup.c |   7 ++-
>  arch/arm64/mm/init.c      | 115 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 121 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index c1509e6..cb5eee0 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -31,7 +31,6 @@
>  #include <linux/screen_info.h>
>  #include <linux/init.h>
>  #include <linux/kexec.h>
> -#include <linux/crash_dump.h>
>  #include <linux/root_dev.h>
>  #include <linux/cpu.h>
>  #include <linux/interrupt.h>
> @@ -222,6 +221,12 @@ static void __init request_standard_resources(void)
>                   kernel_data.end <= res->end)
>                       request_resource(res, &kernel_data);
>       }
> +
> +#ifdef CONFIG_KEXEC_CORE
> +     /* User space tools will find "Crash kernel" region in /proc/iomem. */
> +     if (crashk_res.end)
> +             insert_resource(&iomem_resource, &crashk_res);
> +#endif
>  }
>
>  u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 2ade7a6..51b1302 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -29,11 +29,13 @@
>  #include <linux/gfp.h>
>  #include <linux/memblock.h>
>  #include <linux/sort.h>
> +#include <linux/of.h>
>  #include <linux/of_fdt.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/dma-contiguous.h>
>  #include <linux/efi.h>
>  #include <linux/swiotlb.h>
> +#include <linux/kexec.h>
>
>  #include <asm/boot.h>
>  #include <asm/fixmap.h>
> @@ -76,6 +78,117 @@ static int __init early_initrd(char *p)
>  early_param("initrd", early_initrd);
>  #endif
>
> +#ifdef CONFIG_KEXEC_CORE
> +static unsigned long long crash_size, crash_base;
> +static struct property crash_base_prop = {
> +     .name = "linux,crashkernel-base",
> +     .length = sizeof(u64),
> +     .value = &crash_base
> +};
> +static struct property crash_size_prop = {
> +     .name = "linux,crashkernel-size",
> +     .length = sizeof(u64),
> +     .value = &crash_size,
> +};
> +
> +static int __init export_crashkernel(void)
> +{
> +     struct device_node *node;
> +     int ret;
> +
> +     if (!crashk_res.end)
> +             return 0;
> +
> +     crash_base = cpu_to_be64(crashk_res.start);
> +     crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
> +
> +     /* Add /chosen/linux,crashkernel-* properties */
> +     node = of_find_node_by_path("/chosen");
> +     if (!node)
> +             return -ENOENT;
> +
> +     /*
> +      * There might be existing crash kernel properties, but we can't
> +      * be sure what's in them, so remove them.
> +      */
> +     of_remove_property(node, of_find_property(node,
> +                             "linux,crashkernel-base", NULL));
> +     of_remove_property(node, of_find_property(node,
> +                             "linux,crashkernel-size", NULL));
> +
> +     ret = of_add_property(node, &crash_base_prop);
> +     if (ret)
> +             goto ret_err;
> +
> +     ret = of_add_property(node, &crash_size_prop);
> +     if (ret)
> +             goto ret_err;
> +
> +     return 0;
> +
> +ret_err:
> +     pr_warn("Exporting crashkernel region to device tree failed\n");
> +     return ret;
> +}
> +late_initcall(export_crashkernel);
> +
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +static void __init reserve_crashkernel(void)
> +{
> +     int ret;
> +
> +     ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> +                             &crash_size, &crash_base);
> +     /* no crashkernel= or invalid value specified */
> +     if (ret || !crash_size)
> +             return;
> +
> +     if (crash_base == 0) {
> +             /* Current arm64 boot protocol requires 2MB alignment */
> +             crash_base = memblock_find_in_range(0,
> +                             MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
> +             if (crash_base == 0) {
> +                     pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> +                             crash_size);
> +                     return;
> +             }
> +             memblock_reserve(crash_base, crash_size);
>
I am not pretty sure the context here, but
can we use below code piece instead of the above lines?
        if (crash_base == 0)
                memblock_alloc(crash_size, SZ_2M);

Thanks,
Dennis

> +
> +     } else {
> +             /* User specifies base address explicitly. */
> +             if (!memblock_is_region_memory(crash_base, crash_size) ||
> +                     memblock_is_region_reserved(crash_base, crash_size)) {
> +                     pr_warn("crashkernel has wrong address or size\n");
> +                     return;
> +             }
> +
> +             if (!IS_ALIGNED(crash_base, SZ_2M)) {
> +                     pr_warn("crashkernel base address is not 2MB aligned\n");
> +                     return;
> +             }
> +
> +             memblock_reserve(crash_base, crash_size);
> +     }
> +
> +     pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> +             crash_size >> 20, crash_base >> 20);
> +
> +     crashk_res.start = crash_base;
> +     crashk_res.end = crash_base + crash_size - 1;
> +}
> +#else
> +static void __init reserve_crashkernel(void)
> +{
> +     ;
> +}
> +#endif /* CONFIG_KEXEC_CORE */
> +
>  /*
>   * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
>   * currently assumes that for memory starting above 4G, 32-bit devices will
> @@ -289,6 +402,8 @@ void __init arm64_memblock_init(void)
>       }
>  #endif
>
> +     reserve_crashkernel();
> +
>       early_init_fdt_scan_reserved_mem();
>
>       /* 4GB maximum for 32-bit only capable devices */
> --
> 2.9.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-19  8:35     ` AKASHI Takahiro
@ 2016-07-19 10:06       ` Dennis Chen
  2016-07-19 11:01         ` AKASHI Takahiro
  0 siblings, 1 reply; 35+ messages in thread
From: Dennis Chen @ 2016-07-19 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hello AKASHI,

On Tue, Jul 19, 2016 at 05:35:55PM +0900, AKASHI Takahiro wrote:
> James,
>
> On Mon, Jul 18, 2016 at 07:04:33PM +0100, James Morse wrote:
> > Hi!
> >
> > (CC: Dennis Chen)
> >
> > On 12/07/16 06:05, AKASHI Takahiro wrote:
> > > Crash dump kernel will be run with a limited range of memory as System
> > > RAM.
> > >
> > > On arm64, we will use a device-tree property under /chosen,
> > >    linux,usable-memory-range = <BASE SIZE>
> > > in order for primary kernel either on uefi or non-uefi (device tree only)
> > > system to hand over the information about usable memory region to crash
> > > dump kernel. This property will supercede entries in uefi memory map table
> > > and "memory" nodes in a device tree.
> >
> > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > index 51b1302..d8b296f 100644
> > > --- a/arch/arm64/mm/init.c
> > > +++ b/arch/arm64/mm/init.c
> > > @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
> > >  }
> > >  early_param("mem", early_mem);
> > >
> > > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > > +         const char *uname, int depth, void *data)
> > > +{
> > > + struct memblock_region *usablemem = (struct memblock_region *)data;
> > > + const __be32 *reg;
> > > + int len;
> > > +
> > > + usablemem->size = 0;
> > > +
> > > + if (depth != 1 || strcmp(uname, "chosen") != 0)
> > > +         return 0;
> > > +
> > > + reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > > + if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > > +         return 1;
> > > +
> > > + usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > > + usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > > +
> > > + return 1;
> > > +}
> > > +
> > > +static void __init fdt_enforce_memory_region(void)
> > > +{
> > > + struct memblock_region reg;
> > > +
> > > + of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > +
> > > + if (reg.size) {
> > > +         memblock_remove(0, PAGE_ALIGN(reg.base));
> > > +         memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> > > +                         ULLONG_MAX);
> >

According to the panic message from James, I guess the ACPI regions are out of the range
[reg.base, reg.base + reg.size] and removed by your above codes. On ARM64, those ACPI
regions have been added into memblock and marked as NOMAP, so I think it should be
easy to adapt my fix to retain the NOMAP regions here

Thanks,
Dennis

> > I think this is a new way to trip the problem Dennis Chen has been working on
> > [0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
> > the panic below [1]...
>
> Yeah, it can be.
>
> > It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can be
> > extended to support a range instead of just a limit?
> >
> > (It looks like x86 explicitly adds the acpi regions to the crash-kernels memory
> > map in crash_setup_memmap_entries()).
> >
> >
> >
> > Is it possible for the kernel text to be outside this range? (a bug in
> > kexec-tools, or another user of the DT property) If we haven't already failed in
> > this case, it may be worth printing a warning, or refusing to
> > restrict-memory/expose-vmcore.
>
> In my implementation of kdump, the usable memory for crash dump
> kernel will be allocated within memblock.memory after ACPI-related
> regions have been mapped. "linux,usable-memory-range" indicates
> this exact memory range.
> On crash dump kernel, my fdt_enforce_memory_region() in arm64_memblock_init()
> will exclude all the other regions from memblock.memory.
> So the kernel (with acpi=on) won't recognize ACPI-regions as
> normal memory, and map them by ioremap().
>
> I thought that it was safe, but actually not due to unaligned accesses.
> As you suggested, we will probably be able to do the same thing of
> Chen's solution in fdt_enforce_memory_region().
>
>
> Thanks,
> -Takahiro AKASHI
>
> >
> >
> > Thanks, > > James
> >
> >
> > [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/443356.html
> > [1]
> > [    0.000000] efi: Getting EFI parameters from FDT:
> > [    0.000000] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
> > [    0.000000] efi:  ACPI=0xf95b0000  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
> > [    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
> > [    0.000000] cma: Reserved 16 MiB at 0x00000009fec00000
> > [    0.000000] ACPI: Early table checksum verification disabled
> > [    0.000000] ACPI: RSDP 0x00000000F95B0014 000024 (v02 ARMLTD)
> > [    0.000000] ACPI: XSDT 0x00000000F95A00E8 00004C (v01 ARMLTD ARM-JUNO 2014072
> > 7      01000013)
> > [    0.000000] ACPI: FACP 0x00000000F9500000 00010C (v05 ARMLTD ARM-JUNO 2014072
> > 7 ARM  00000099)
> > [    0.000000] ACPI: DSDT 0x00000000F94C0000 000396 (v01 ARMLTD ARM-JUNO 2014072
> > 7 INTL 20150619)
> > [    0.000000] ACPI: GTDT 0x00000000F94F0000 000060 (v02 ARMLTD ARM-JUNO 2014072
> > 7 ARM  00000099)
> > [    0.000000] ACPI: APIC 0x00000000F94E0000 000224 (v03 ARMLTD ARM-JUNO 2014072
> > 7 ARM  00000099)
> > [    0.000000] ACPI: SSDT 0x00000000F94D0000 0001E3 (v01 ARMLTD ARM-JUNO 2014072
> > 7 INTL 20150619)
> > [    0.000000] ACPI: MCFG 0x00000000F94B0000 00003C (v01 ARMLTD ARM-JUNO 2014072
> > 7 ARM  00000099)
> > ...
> > [    0.737577] Serial: AMBA PL011 UART driver
> > [    0.786086] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> > [    0.794203] ACPI: Added _OSI(Module Device)
> > [    0.798659] ACPI: Added _OSI(Processor Device)
> > [    0.803190] ACPI: Added _OSI(3.0 _SCP Extensions)
> > [    0.807973] ACPI: Added _OSI(Processor Aggregator Device)
> > [    0.813653] Unable to handle kernel paging request at virtual address ffff000
> > 00804e027
> > [    0.821704] pgd = ffff000008cce000
> > [    0.825155] [ffff00000804e027] *pgd=00000009ffffd003, *pud=00000009ffffc003,
> > *pmd=00000009ffffb003, *pte=00e80000f94c0707
> > [    0.836319] Internal error: Oops: 96000021 [#1] PREEMPT SMP
> > [    0.841972] Modules linked in:
> > [    0.845073] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S              4.7.0-rc4
> > + #4569
> > [    0.852927] Hardware name: ARM Juno development board (r1) (DT)
> > [    0.858936] task: ffff80003d898000 ti: ffff80003d894000 task.ti: ffff80003d89
> > 4000
> > [    0.866537] PC is at acpi_ns_lookup+0x23c/0x378
> > [    0.871131] LR is at acpi_ds_load1_begin_op+0x88/0x260
> > [    0.876340] pc : [<ffff0000084061a4>] lr : [<ffff0000083fc08c>] pstate: 60000
> > 045
> > [    0.883846] sp : ffff80003d8979b0
> > [    0.887206] x29: ffff80003d8979b0 x28: 0000000000000000
> > [    0.892596] x27: 000000000000001b x26: ffff000008a80a07
> > [    0.897986] x25: ffff80003d897a48 x24: 0000000000000001
> > [    0.903377] x23: 0000000000000001 x22: ffff00000804e027
> > [    0.908769] x21: 000000000000001b x20: 0000000000000001
> > [    0.914158] x19: 0000000000000000 x18: ffff00000804efff
> > [    0.919547] x17: 00000000000038ff x16: 0000000000000002
> > [    0.924937] x15: ffff00000804efff x14: 0000008000000000
> > [    0.930326] x13: ffff000008c942b2 x12: ffff00000804efff
> > [    0.935717] x11: ffff000008bf0000 x10: 00000000ffffff76
> > [    0.941107] x9 : 0000000000000000 x8 : ffff000008cb6000
> > [    0.946498] x7 : 0000000000000000 x6 : ffff80003d897aa8
> > [    0.951891] x5 : ffff80003d028400 x4 : 0000000000000001
> > [    0.957281] x3 : 0000000000000003 x2 : ffff000008cb6090
> > [    0.962673] x1 : 000000000000005f x0 : 0000000000000000
> > [    0.968063]
> > [    0.969569] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
> > [    1.387661] Call trace:
> > ...
> > [    1.473172] [<ffff0000084061a4>] acpi_ns_lookup+0x23c/0x378
> > [    1.478832] [<ffff0000083fc08c>] acpi_ds_load1_begin_op+0x88/0x260
> > [    1.485105] [<ffff00000840c0e8>] acpi_ps_build_named_op+0xa8/0x170
> > [    1.491378] [<ffff00000840c2e0>] acpi_ps_create_op+0x130/0x230
> > [    1.497299] [<ffff00000840bc28>] acpi_ps_parse_loop+0x168/0x580
> > [    1.503302] [<ffff00000840cb44>] acpi_ps_parse_aml+0xa0/0x278
> > [    1.509135] [<ffff0000084081d0>] acpi_ns_one_complete_parse+0x128/0x150
> > [    1.515852] [<ffff00000840821c>] acpi_ns_parse_table+0x24/0x44
> > [    1.521775] [<ffff0000084079e8>] acpi_ns_load_table+0x54/0xdc
> > [    1.527612] [<ffff000008411038>] acpi_tb_load_namespace+0xd0/0x230
> > [    1.533887] [<ffff000008b2695c>] acpi_load_tables+0x3c/0xa8
> > [    1.539542] [<ffff000008b25974>] acpi_init+0x88/0x2b0
> > [    1.544670] [<ffff000008081a08>] do_one_initcall+0x38/0x128
> > [    1.550325] [<ffff000008b00cc0>] kernel_init_freeable+0x14c/0x1f0
> > [    1.556517] [<ffff0000087d2088>] kernel_init+0x10/0x100
> > [    1.561823] [<ffff000008084e10>] ret_from_fork+0x10/0x40
> > [    1.567216] Code: b9008fbb 2a000318 36380054 32190318 (b94002c0)
> > [    1.573451] ---[ end trace dec6cecdcba673b7 ]---
> > [    1.578158] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00
> > 00000b
> > [    1.578158]
> > [    1.587428] SMP: stopping secondary CPUs
> > [    1.591411] ---[ end Kernel panic - not syncing: Attempted to kill init! exit
> > code=0x0000000b
> > [    0.969225] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
> >
> >
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-19  9:39   ` Dennis Chen
@ 2016-07-19 10:28     ` AKASHI Takahiro
  2016-07-19 10:41       ` Dennis Chen
  0 siblings, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-19 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 05:39:07PM +0800, Dennis Chen wrote:
> Hello AKASHI,
> 
> On Tue, Jul 12, 2016 at 02:05:07PM +0900, AKASHI Takahiro wrote:
> > On the startup of primary kernel, the memory region used by crash dump
> > kernel must be specified by "crashkernel=" kernel parameter.
> > reserve_crashkernel() will allocate and reserve the region for later use.
> >
> > User space tools, like kexec-tools, will be able to find that region as
> >       - "Crash kernel" in /proc/iomem, or
> >       - "linux,crashkernel-base" and "linux,crashkernel-size" under
> >         /sys/firmware/devicetree/base/chosen
> >
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Signed-off-by: Mark Salter <msalter@redhat.com>
> > Signed-off-by: Pratyush Anand <panand@redhat.com>
> > Reviewed-by: James Morse <james.morse@arm.com>
> > ---
> >  arch/arm64/kernel/setup.c |   7 ++-
> >  arch/arm64/mm/init.c      | 115 ++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 121 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index c1509e6..cb5eee0 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -31,7 +31,6 @@
> >  #include <linux/screen_info.h>
> >  #include <linux/init.h>
> >  #include <linux/kexec.h>
> > -#include <linux/crash_dump.h>
> >  #include <linux/root_dev.h>
> >  #include <linux/cpu.h>
> >  #include <linux/interrupt.h>
> > @@ -222,6 +221,12 @@ static void __init request_standard_resources(void)
> >                   kernel_data.end <= res->end)
> >                       request_resource(res, &kernel_data);
> >       }
> > +
> > +#ifdef CONFIG_KEXEC_CORE
> > +     /* User space tools will find "Crash kernel" region in /proc/iomem. */
> > +     if (crashk_res.end)
> > +             insert_resource(&iomem_resource, &crashk_res);
> > +#endif
> >  }
> >
> >  u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 2ade7a6..51b1302 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -29,11 +29,13 @@
> >  #include <linux/gfp.h>
> >  #include <linux/memblock.h>
> >  #include <linux/sort.h>
> > +#include <linux/of.h>
> >  #include <linux/of_fdt.h>
> >  #include <linux/dma-mapping.h>
> >  #include <linux/dma-contiguous.h>
> >  #include <linux/efi.h>
> >  #include <linux/swiotlb.h>
> > +#include <linux/kexec.h>
> >
> >  #include <asm/boot.h>
> >  #include <asm/fixmap.h>
> > @@ -76,6 +78,117 @@ static int __init early_initrd(char *p)
> >  early_param("initrd", early_initrd);
> >  #endif
> >
> > +#ifdef CONFIG_KEXEC_CORE
> > +static unsigned long long crash_size, crash_base;
> > +static struct property crash_base_prop = {
> > +     .name = "linux,crashkernel-base",
> > +     .length = sizeof(u64),
> > +     .value = &crash_base
> > +};
> > +static struct property crash_size_prop = {
> > +     .name = "linux,crashkernel-size",
> > +     .length = sizeof(u64),
> > +     .value = &crash_size,
> > +};
> > +
> > +static int __init export_crashkernel(void)
> > +{
> > +     struct device_node *node;
> > +     int ret;
> > +
> > +     if (!crashk_res.end)
> > +             return 0;
> > +
> > +     crash_base = cpu_to_be64(crashk_res.start);
> > +     crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
> > +
> > +     /* Add /chosen/linux,crashkernel-* properties */
> > +     node = of_find_node_by_path("/chosen");
> > +     if (!node)
> > +             return -ENOENT;
> > +
> > +     /*
> > +      * There might be existing crash kernel properties, but we can't
> > +      * be sure what's in them, so remove them.
> > +      */
> > +     of_remove_property(node, of_find_property(node,
> > +                             "linux,crashkernel-base", NULL));
> > +     of_remove_property(node, of_find_property(node,
> > +                             "linux,crashkernel-size", NULL));
> > +
> > +     ret = of_add_property(node, &crash_base_prop);
> > +     if (ret)
> > +             goto ret_err;
> > +
> > +     ret = of_add_property(node, &crash_size_prop);
> > +     if (ret)
> > +             goto ret_err;
> > +
> > +     return 0;
> > +
> > +ret_err:
> > +     pr_warn("Exporting crashkernel region to device tree failed\n");
> > +     return ret;
> > +}
> > +late_initcall(export_crashkernel);
> > +
> > +/*
> > + * reserve_crashkernel() - reserves memory for crash kernel
> > + *
> > + * This function reserves memory area given in "crashkernel=" kernel command
> > + * line parameter. The memory reserved is used by dump capture kernel when
> > + * primary kernel is crashing.
> > + */
> > +static void __init reserve_crashkernel(void)
> > +{
> > +     int ret;
> > +
> > +     ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > +                             &crash_size, &crash_base);
> > +     /* no crashkernel= or invalid value specified */
> > +     if (ret || !crash_size)
> > +             return;
> > +
> > +     if (crash_base == 0) {
> > +             /* Current arm64 boot protocol requires 2MB alignment */
> > +             crash_base = memblock_find_in_range(0,
> > +                             MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
> > +             if (crash_base == 0) {
> > +                     pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > +                             crash_size);
> > +                     return;
> > +             }
> > +             memblock_reserve(crash_base, crash_size);
> >
> I am not pretty sure the context here, but
> can we use below code piece instead of the above lines?
>         if (crash_base == 0)
>                 memblock_alloc(crash_size, SZ_2M);

Either would be fine here.

Thanks,
-Takahiro AKASHI

> Thanks,
> Dennis
> 
> > +
> > +     } else {
> > +             /* User specifies base address explicitly. */
> > +             if (!memblock_is_region_memory(crash_base, crash_size) ||
> > +                     memblock_is_region_reserved(crash_base, crash_size)) {
> > +                     pr_warn("crashkernel has wrong address or size\n");
> > +                     return;
> > +             }
> > +
> > +             if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > +                     pr_warn("crashkernel base address is not 2MB aligned\n");
> > +                     return;
> > +             }
> > +
> > +             memblock_reserve(crash_base, crash_size);
> > +     }
> > +
> > +     pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > +             crash_size >> 20, crash_base >> 20);
> > +
> > +     crashk_res.start = crash_base;
> > +     crashk_res.end = crash_base + crash_size - 1;
> > +}
> > +#else
> > +static void __init reserve_crashkernel(void)
> > +{
> > +     ;
> > +}
> > +#endif /* CONFIG_KEXEC_CORE */
> > +
> >  /*
> >   * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
> >   * currently assumes that for memory starting above 4G, 32-bit devices will
> > @@ -289,6 +402,8 @@ void __init arm64_memblock_init(void)
> >       }
> >  #endif
> >
> > +     reserve_crashkernel();
> > +
> >       early_init_fdt_scan_reserved_mem();
> >
> >       /* 4GB maximum for 32-bit only capable devices */
> > --
> > 2.9.0
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-19 10:28     ` AKASHI Takahiro
@ 2016-07-19 10:41       ` Dennis Chen
  2016-07-19 12:48         ` Mark Salter
  0 siblings, 1 reply; 35+ messages in thread
From: Dennis Chen @ 2016-07-19 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 07:28:16PM +0900, AKASHI Takahiro wrote:
> On Tue, Jul 19, 2016 at 05:39:07PM +0800, Dennis Chen wrote:
> > Hello AKASHI,
> > 
> > On Tue, Jul 12, 2016 at 02:05:07PM +0900, AKASHI Takahiro wrote:
> > > On the startup of primary kernel, the memory region used by crash dump
> > > kernel must be specified by "crashkernel=" kernel parameter.
> > > reserve_crashkernel() will allocate and reserve the region for later use.
> > >
> > > User space tools, like kexec-tools, will be able to find that region as
> > >       - "Crash kernel" in /proc/iomem, or
> > >       - "linux,crashkernel-base" and "linux,crashkernel-size" under
> > >         /sys/firmware/devicetree/base/chosen
> > >
> > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > Signed-off-by: Mark Salter <msalter@redhat.com>
> > > Signed-off-by: Pratyush Anand <panand@redhat.com>
> > > Reviewed-by: James Morse <james.morse@arm.com>
> > > ---
> > >  arch/arm64/kernel/setup.c |   7 ++-
> > >  arch/arm64/mm/init.c      | 115 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 121 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > > index c1509e6..cb5eee0 100644
> > > --- a/arch/arm64/kernel/setup.c
> > > +++ b/arch/arm64/kernel/setup.c
> > > @@ -31,7 +31,6 @@
> > >  #include <linux/screen_info.h>
> > >  #include <linux/init.h>
> > >  #include <linux/kexec.h>
> > > -#include <linux/crash_dump.h>
> > >  #include <linux/root_dev.h>
> > >  #include <linux/cpu.h>
> > >  #include <linux/interrupt.h>
> > > @@ -222,6 +221,12 @@ static void __init request_standard_resources(void)
> > >                   kernel_data.end <= res->end)
> > >                       request_resource(res, &kernel_data);
> > >       }
> > > +
> > > +#ifdef CONFIG_KEXEC_CORE
> > > +     /* User space tools will find "Crash kernel" region in /proc/iomem. */
> > > +     if (crashk_res.end)
> > > +             insert_resource(&iomem_resource, &crashk_res);
> > > +#endif
> > >  }
> > >
> > >  u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
> > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > index 2ade7a6..51b1302 100644
> > > --- a/arch/arm64/mm/init.c
> > > +++ b/arch/arm64/mm/init.c
> > > @@ -29,11 +29,13 @@
> > >  #include <linux/gfp.h>
> > >  #include <linux/memblock.h>
> > >  #include <linux/sort.h>
> > > +#include <linux/of.h>
> > >  #include <linux/of_fdt.h>
> > >  #include <linux/dma-mapping.h>
> > >  #include <linux/dma-contiguous.h>
> > >  #include <linux/efi.h>
> > >  #include <linux/swiotlb.h>
> > > +#include <linux/kexec.h>
> > >
> > >  #include <asm/boot.h>
> > >  #include <asm/fixmap.h>
> > > @@ -76,6 +78,117 @@ static int __init early_initrd(char *p)
> > >  early_param("initrd", early_initrd);
> > >  #endif
> > >
> > > +#ifdef CONFIG_KEXEC_CORE
> > > +static unsigned long long crash_size, crash_base;
> > > +static struct property crash_base_prop = {
> > > +     .name = "linux,crashkernel-base",
> > > +     .length = sizeof(u64),
> > > +     .value = &crash_base
> > > +};
> > > +static struct property crash_size_prop = {
> > > +     .name = "linux,crashkernel-size",
> > > +     .length = sizeof(u64),
> > > +     .value = &crash_size,
> > > +};
> > > +
> > > +static int __init export_crashkernel(void)
> > > +{
> > > +     struct device_node *node;
> > > +     int ret;
> > > +
> > > +     if (!crashk_res.end)
> > > +             return 0;
> > > +
> > > +     crash_base = cpu_to_be64(crashk_res.start);
> > > +     crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
> > > +
> > > +     /* Add /chosen/linux,crashkernel-* properties */
> > > +     node = of_find_node_by_path("/chosen");
> > > +     if (!node)
> > > +             return -ENOENT;
> > > +
> > > +     /*
> > > +      * There might be existing crash kernel properties, but we can't
> > > +      * be sure what's in them, so remove them.
> > > +      */
> > > +     of_remove_property(node, of_find_property(node,
> > > +                             "linux,crashkernel-base", NULL));
> > > +     of_remove_property(node, of_find_property(node,
> > > +                             "linux,crashkernel-size", NULL));
> > > +
> > > +     ret = of_add_property(node, &crash_base_prop);
> > > +     if (ret)
> > > +             goto ret_err;
> > > +
> > > +     ret = of_add_property(node, &crash_size_prop);
> > > +     if (ret)
> > > +             goto ret_err;
> > > +
> > > +     return 0;
> > > +
> > > +ret_err:
> > > +     pr_warn("Exporting crashkernel region to device tree failed\n");
> > > +     return ret;
> > > +}
> > > +late_initcall(export_crashkernel);
> > > +
> > > +/*
> > > + * reserve_crashkernel() - reserves memory for crash kernel
> > > + *
> > > + * This function reserves memory area given in "crashkernel=" kernel command
> > > + * line parameter. The memory reserved is used by dump capture kernel when
> > > + * primary kernel is crashing.
> > > + */
> > > +static void __init reserve_crashkernel(void)
> > > +{
> > > +     int ret;
> > > +
> > > +     ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > > +                             &crash_size, &crash_base);
> > > +     /* no crashkernel= or invalid value specified */
> > > +     if (ret || !crash_size)
> > > +             return;
> > > +
> > > +     if (crash_base == 0) {
> > > +             /* Current arm64 boot protocol requires 2MB alignment */
> > > +             crash_base = memblock_find_in_range(0,
> > > +                             MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
> > > +             if (crash_base == 0) {
> > > +                     pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > > +                             crash_size);
> > > +                     return;
> > > +             }
> > > +             memblock_reserve(crash_base, crash_size);
> > >
> > I am not pretty sure the context here, but
> > can we use below code piece instead of the above lines?
> >         if (crash_base == 0)
> >                 memblock_alloc(crash_size, SZ_2M);
> 
> Either would be fine here.
>
Hello AKASHI, maybe you can succeed to find the base with memblock_find_in_range(), 
but that doesn't mean you will also succeed to reserve them with memblock_reserve followed.

Thanks,
Dennis
> 
> Thanks,
> -Takahiro AKASHI
> 
> > Thanks,
> > Dennis
> > 
> > > +
> > > +     } else {
> > > +             /* User specifies base address explicitly. */
> > > +             if (!memblock_is_region_memory(crash_base, crash_size) ||
> > > +                     memblock_is_region_reserved(crash_base, crash_size)) {
> > > +                     pr_warn("crashkernel has wrong address or size\n");
> > > +                     return;
> > > +             }
> > > +
> > > +             if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > > +                     pr_warn("crashkernel base address is not 2MB aligned\n");
> > > +                     return;
> > > +             }
> > > +
> > > +             memblock_reserve(crash_base, crash_size);
> > > +     }
> > > +
> > > +     pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > > +             crash_size >> 20, crash_base >> 20);
> > > +
> > > +     crashk_res.start = crash_base;
> > > +     crashk_res.end = crash_base + crash_size - 1;
> > > +}
> > > +#else
> > > +static void __init reserve_crashkernel(void)
> > > +{
> > > +     ;
> > > +}
> > > +#endif /* CONFIG_KEXEC_CORE */
> > > +
> > >  /*
> > >   * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
> > >   * currently assumes that for memory starting above 4G, 32-bit devices will
> > > @@ -289,6 +402,8 @@ void __init arm64_memblock_init(void)
> > >       }
> > >  #endif
> > >
> > > +     reserve_crashkernel();
> > > +
> > >       early_init_fdt_scan_reserved_mem();
> > >
> > >       /* 4GB maximum for 32-bit only capable devices */
> > > --
> > > 2.9.0
> > >
> > >
> > > _______________________________________________
> > > linux-arm-kernel mailing list
> > > linux-arm-kernel at lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > >
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-19 10:06       ` Dennis Chen
@ 2016-07-19 11:01         ` AKASHI Takahiro
  2016-07-20  3:39           ` Dennis Chen
  0 siblings, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-19 11:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 06:06:18PM +0800, Dennis Chen wrote:
> Hello AKASHI,
> 
> On Tue, Jul 19, 2016 at 05:35:55PM +0900, AKASHI Takahiro wrote:
> > James,
> >
> > On Mon, Jul 18, 2016 at 07:04:33PM +0100, James Morse wrote:
> > > Hi!
> > >
> > > (CC: Dennis Chen)
> > >
> > > On 12/07/16 06:05, AKASHI Takahiro wrote:
> > > > Crash dump kernel will be run with a limited range of memory as System
> > > > RAM.
> > > >
> > > > On arm64, we will use a device-tree property under /chosen,
> > > >    linux,usable-memory-range = <BASE SIZE>
> > > > in order for primary kernel either on uefi or non-uefi (device tree only)
> > > > system to hand over the information about usable memory region to crash
> > > > dump kernel. This property will supercede entries in uefi memory map table
> > > > and "memory" nodes in a device tree.
> > >
> > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > index 51b1302..d8b296f 100644
> > > > --- a/arch/arm64/mm/init.c
> > > > +++ b/arch/arm64/mm/init.c
> > > > @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
> > > >  }
> > > >  early_param("mem", early_mem);
> > > >
> > > > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > > > +         const char *uname, int depth, void *data)
> > > > +{
> > > > + struct memblock_region *usablemem = (struct memblock_region *)data;
> > > > + const __be32 *reg;
> > > > + int len;
> > > > +
> > > > + usablemem->size = 0;
> > > > +
> > > > + if (depth != 1 || strcmp(uname, "chosen") != 0)
> > > > +         return 0;
> > > > +
> > > > + reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > > > + if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > > > +         return 1;
> > > > +
> > > > + usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > > > + usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > > > +
> > > > + return 1;
> > > > +}
> > > > +
> > > > +static void __init fdt_enforce_memory_region(void)
> > > > +{
> > > > + struct memblock_region reg;
> > > > +
> > > > + of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > > +
> > > > + if (reg.size) {
> > > > +         memblock_remove(0, PAGE_ALIGN(reg.base));
> > > > +         memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> > > > +                         ULLONG_MAX);
> > >
> 
> According to the panic message from James, I guess the ACPI regions are out of the range
> [reg.base, reg.base + reg.size] and removed by your above codes. On ARM64, those ACPI
> regions have been added into memblock and marked as NOMAP, so I think it should be
> easy to adapt my fix to retain the NOMAP regions here

See below.

> Thanks,
> Dennis
> 
> > > I think this is a new way to trip the problem Dennis Chen has been working on
> > > [0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
> > > the panic below [1]...
> >
> > Yeah, it can be.
> >
> > > It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can be
> > > extended to support a range instead of just a limit?
> > >
> > > (It looks like x86 explicitly adds the acpi regions to the crash-kernels memory
> > > map in crash_setup_memmap_entries()).
> > >
> > >
> > >
> > > Is it possible for the kernel text to be outside this range? (a bug in
> > > kexec-tools, or another user of the DT property) If we haven't already failed in
> > > this case, it may be worth printing a warning, or refusing to
> > > restrict-memory/expose-vmcore.
> >
> > In my implementation of kdump, the usable memory for crash dump
> > kernel will be allocated within memblock.memory after ACPI-related
> > regions have been mapped. "linux,usable-memory-range" indicates
> > this exact memory range.
> > On crash dump kernel, my fdt_enforce_memory_region() in arm64_memblock_init()
> > will exclude all the other regions from memblock.memory.
> > So the kernel (with acpi=on) won't recognize ACPI-regions as
> > normal memory, and map them by ioremap().
> >
> > I thought that it was safe, but actually not due to unaligned accesses.
> > As you suggested, we will probably be able to do the same thing of
> > Chen's solution in fdt_enforce_memory_region().

memblock_isolate_range() and memblock_remove_range() are not exported.
So we'd better implement an unified interface like:
    memblock_cap_memory_range(phys_addr_t base, size_t size);

If base == NULL, it would behave in the exact same way as your
memblock_mem_limit_remove_map().

Thanks,
-Takahiro AKASHI

> >
> >
> > Thanks,
> > -Takahiro AKASHI
> >
> > >
> > >
> > > Thanks, > > James
> > >
> > >
> > > [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/443356.html
> > > [1]
> > > [    0.000000] efi: Getting EFI parameters from FDT:
> > > [    0.000000] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
> > > [    0.000000] efi:  ACPI=0xf95b0000  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
> > > [    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
> > > [    0.000000] cma: Reserved 16 MiB at 0x00000009fec00000
> > > [    0.000000] ACPI: Early table checksum verification disabled
> > > [    0.000000] ACPI: RSDP 0x00000000F95B0014 000024 (v02 ARMLTD)
> > > [    0.000000] ACPI: XSDT 0x00000000F95A00E8 00004C (v01 ARMLTD ARM-JUNO 2014072
> > > 7      01000013)
> > > [    0.000000] ACPI: FACP 0x00000000F9500000 00010C (v05 ARMLTD ARM-JUNO 2014072
> > > 7 ARM  00000099)
> > > [    0.000000] ACPI: DSDT 0x00000000F94C0000 000396 (v01 ARMLTD ARM-JUNO 2014072
> > > 7 INTL 20150619)
> > > [    0.000000] ACPI: GTDT 0x00000000F94F0000 000060 (v02 ARMLTD ARM-JUNO 2014072
> > > 7 ARM  00000099)
> > > [    0.000000] ACPI: APIC 0x00000000F94E0000 000224 (v03 ARMLTD ARM-JUNO 2014072
> > > 7 ARM  00000099)
> > > [    0.000000] ACPI: SSDT 0x00000000F94D0000 0001E3 (v01 ARMLTD ARM-JUNO 2014072
> > > 7 INTL 20150619)
> > > [    0.000000] ACPI: MCFG 0x00000000F94B0000 00003C (v01 ARMLTD ARM-JUNO 2014072
> > > 7 ARM  00000099)
> > > ...
> > > [    0.737577] Serial: AMBA PL011 UART driver
> > > [    0.786086] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> > > [    0.794203] ACPI: Added _OSI(Module Device)
> > > [    0.798659] ACPI: Added _OSI(Processor Device)
> > > [    0.803190] ACPI: Added _OSI(3.0 _SCP Extensions)
> > > [    0.807973] ACPI: Added _OSI(Processor Aggregator Device)
> > > [    0.813653] Unable to handle kernel paging request at virtual address ffff000
> > > 00804e027
> > > [    0.821704] pgd = ffff000008cce000
> > > [    0.825155] [ffff00000804e027] *pgd=00000009ffffd003, *pud=00000009ffffc003,
> > > *pmd=00000009ffffb003, *pte=00e80000f94c0707
> > > [    0.836319] Internal error: Oops: 96000021 [#1] PREEMPT SMP
> > > [    0.841972] Modules linked in:
> > > [    0.845073] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S              4.7.0-rc4
> > > + #4569
> > > [    0.852927] Hardware name: ARM Juno development board (r1) (DT)
> > > [    0.858936] task: ffff80003d898000 ti: ffff80003d894000 task.ti: ffff80003d89
> > > 4000
> > > [    0.866537] PC is at acpi_ns_lookup+0x23c/0x378
> > > [    0.871131] LR is at acpi_ds_load1_begin_op+0x88/0x260
> > > [    0.876340] pc : [<ffff0000084061a4>] lr : [<ffff0000083fc08c>] pstate: 60000
> > > 045
> > > [    0.883846] sp : ffff80003d8979b0
> > > [    0.887206] x29: ffff80003d8979b0 x28: 0000000000000000
> > > [    0.892596] x27: 000000000000001b x26: ffff000008a80a07
> > > [    0.897986] x25: ffff80003d897a48 x24: 0000000000000001
> > > [    0.903377] x23: 0000000000000001 x22: ffff00000804e027
> > > [    0.908769] x21: 000000000000001b x20: 0000000000000001
> > > [    0.914158] x19: 0000000000000000 x18: ffff00000804efff
> > > [    0.919547] x17: 00000000000038ff x16: 0000000000000002
> > > [    0.924937] x15: ffff00000804efff x14: 0000008000000000
> > > [    0.930326] x13: ffff000008c942b2 x12: ffff00000804efff
> > > [    0.935717] x11: ffff000008bf0000 x10: 00000000ffffff76
> > > [    0.941107] x9 : 0000000000000000 x8 : ffff000008cb6000
> > > [    0.946498] x7 : 0000000000000000 x6 : ffff80003d897aa8
> > > [    0.951891] x5 : ffff80003d028400 x4 : 0000000000000001
> > > [    0.957281] x3 : 0000000000000003 x2 : ffff000008cb6090
> > > [    0.962673] x1 : 000000000000005f x0 : 0000000000000000
> > > [    0.968063]
> > > [    0.969569] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
> > > [    1.387661] Call trace:
> > > ...
> > > [    1.473172] [<ffff0000084061a4>] acpi_ns_lookup+0x23c/0x378
> > > [    1.478832] [<ffff0000083fc08c>] acpi_ds_load1_begin_op+0x88/0x260
> > > [    1.485105] [<ffff00000840c0e8>] acpi_ps_build_named_op+0xa8/0x170
> > > [    1.491378] [<ffff00000840c2e0>] acpi_ps_create_op+0x130/0x230
> > > [    1.497299] [<ffff00000840bc28>] acpi_ps_parse_loop+0x168/0x580
> > > [    1.503302] [<ffff00000840cb44>] acpi_ps_parse_aml+0xa0/0x278
> > > [    1.509135] [<ffff0000084081d0>] acpi_ns_one_complete_parse+0x128/0x150
> > > [    1.515852] [<ffff00000840821c>] acpi_ns_parse_table+0x24/0x44
> > > [    1.521775] [<ffff0000084079e8>] acpi_ns_load_table+0x54/0xdc
> > > [    1.527612] [<ffff000008411038>] acpi_tb_load_namespace+0xd0/0x230
> > > [    1.533887] [<ffff000008b2695c>] acpi_load_tables+0x3c/0xa8
> > > [    1.539542] [<ffff000008b25974>] acpi_init+0x88/0x2b0
> > > [    1.544670] [<ffff000008081a08>] do_one_initcall+0x38/0x128
> > > [    1.550325] [<ffff000008b00cc0>] kernel_init_freeable+0x14c/0x1f0
> > > [    1.556517] [<ffff0000087d2088>] kernel_init+0x10/0x100
> > > [    1.561823] [<ffff000008084e10>] ret_from_fork+0x10/0x40
> > > [    1.567216] Code: b9008fbb 2a000318 36380054 32190318 (b94002c0)
> > > [    1.573451] ---[ end trace dec6cecdcba673b7 ]---
> > > [    1.578158] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00
> > > 00000b
> > > [    1.578158]
> > > [    1.587428] SMP: stopping secondary CPUs
> > > [    1.591411] ---[ end Kernel panic - not syncing: Attempted to kill init! exit
> > > code=0x0000000b
> > > [    0.969225] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
> > >
> > >
> >
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-19 10:41       ` Dennis Chen
@ 2016-07-19 12:48         ` Mark Salter
  2016-07-19 13:27           ` Mark Rutland
  2016-07-19 23:34           ` AKASHI Takahiro
  0 siblings, 2 replies; 35+ messages in thread
From: Mark Salter @ 2016-07-19 12:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2016-07-19 at 18:41 +0800, Dennis Chen wrote:
> On Tue, Jul 19, 2016 at 07:28:16PM +0900, AKASHI Takahiro wrote:
> > 
> > On Tue, Jul 19, 2016 at 05:39:07PM +0800, Dennis Chen wrote:
> > > 
> > > Hello AKASHI,
> > > 
> > > On Tue, Jul 12, 2016 at 02:05:07PM +0900, AKASHI Takahiro wrote:
> > > > 
> > > > On the startup of primary kernel, the memory region used by crash dump
> > > > kernel must be specified by "crashkernel=" kernel parameter.
> > > > reserve_crashkernel() will allocate and reserve the region for later use.
> > > > 
> > > > User space tools, like kexec-tools, will be able to find that region as
> > > > ??????- "Crash kernel" in /proc/iomem, or
> > > > ??????- "linux,crashkernel-base" and "linux,crashkernel-size" under
> > > > ????????/sys/firmware/devicetree/base/chosen
> > > > 
> > > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > > Signed-off-by: Mark Salter <msalter@redhat.com>
> > > > Signed-off-by: Pratyush Anand <panand@redhat.com>
> > > > Reviewed-by: James Morse <james.morse@arm.com>
> > > > ---
> > > > ?arch/arm64/kernel/setup.c |???7 ++-
> > > > ?arch/arm64/mm/init.c??????| 115 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > ?2 files changed, 121 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > > > index c1509e6..cb5eee0 100644
> > > > --- a/arch/arm64/kernel/setup.c
> > > > +++ b/arch/arm64/kernel/setup.c
> > > > @@ -31,7 +31,6 @@
> > > > ?#include <linux/screen_info.h>
> > > > ?#include <linux/init.h>
> > > > ?#include <linux/kexec.h>
> > > > -#include <linux/crash_dump.h>
> > > > ?#include <linux/root_dev.h>
> > > > ?#include <linux/cpu.h>
> > > > ?#include <linux/interrupt.h>
> > > > @@ -222,6 +221,12 @@ static void __init request_standard_resources(void)
> > > > ??????????????????kernel_data.end <= res->end)
> > > > ??????????????????????request_resource(res, &kernel_data);
> > > > ??????}
> > > > +
> > > > +#ifdef CONFIG_KEXEC_CORE
> > > > +?????/* User space tools will find "Crash kernel" region in /proc/iomem. */
> > > > +?????if (crashk_res.end)
> > > > +?????????????insert_resource(&iomem_resource, &crashk_res);
> > > > +#endif
> > > > ?}
> > > > 
> > > > ?u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
> > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > index 2ade7a6..51b1302 100644
> > > > --- a/arch/arm64/mm/init.c
> > > > +++ b/arch/arm64/mm/init.c
> > > > @@ -29,11 +29,13 @@
> > > > ?#include <linux/gfp.h>
> > > > ?#include <linux/memblock.h>
> > > > ?#include <linux/sort.h>
> > > > +#include <linux/of.h>
> > > > ?#include <linux/of_fdt.h>
> > > > ?#include <linux/dma-mapping.h>
> > > > ?#include <linux/dma-contiguous.h>
> > > > ?#include <linux/efi.h>
> > > > ?#include <linux/swiotlb.h>
> > > > +#include <linux/kexec.h>
> > > > 
> > > > ?#include <asm/boot.h>
> > > > ?#include <asm/fixmap.h>
> > > > @@ -76,6 +78,117 @@ static int __init early_initrd(char *p)
> > > > ?early_param("initrd", early_initrd);
> > > > ?#endif
> > > > 
> > > > +#ifdef CONFIG_KEXEC_CORE
> > > > +static unsigned long long crash_size, crash_base;
> > > > +static struct property crash_base_prop = {
> > > > +?????.name = "linux,crashkernel-base",
> > > > +?????.length = sizeof(u64),
> > > > +?????.value = &crash_base
> > > > +};
> > > > +static struct property crash_size_prop = {
> > > > +?????.name = "linux,crashkernel-size",
> > > > +?????.length = sizeof(u64),
> > > > +?????.value = &crash_size,
> > > > +};
> > > > +
> > > > +static int __init export_crashkernel(void)
> > > > +{
> > > > +?????struct device_node *node;
> > > > +?????int ret;
> > > > +
> > > > +?????if (!crashk_res.end)
> > > > +?????????????return 0;
> > > > +
> > > > +?????crash_base = cpu_to_be64(crashk_res.start);
> > > > +?????crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
> > > > +
> > > > +?????/* Add /chosen/linux,crashkernel-* properties */
> > > > +?????node = of_find_node_by_path("/chosen");
> > > > +?????if (!node)
> > > > +?????????????return -ENOENT;
> > > > +
> > > > +?????/*
> > > > +??????* There might be existing crash kernel properties, but we can't
> > > > +??????* be sure what's in them, so remove them.
> > > > +??????*/
> > > > +?????of_remove_property(node, of_find_property(node,
> > > > +?????????????????????????????"linux,crashkernel-base", NULL));
> > > > +?????of_remove_property(node, of_find_property(node,
> > > > +?????????????????????????????"linux,crashkernel-size", NULL));
> > > > +
> > > > +?????ret = of_add_property(node, &crash_base_prop);
> > > > +?????if (ret)
> > > > +?????????????goto ret_err;
> > > > +
> > > > +?????ret = of_add_property(node, &crash_size_prop);
> > > > +?????if (ret)
> > > > +?????????????goto ret_err;
> > > > +
> > > > +?????return 0;
> > > > +
> > > > +ret_err:
> > > > +?????pr_warn("Exporting crashkernel region to device tree failed\n");
> > > > +?????return ret;
> > > > +}
> > > > +late_initcall(export_crashkernel);
> > > > +
> > > > +/*
> > > > + * reserve_crashkernel() - reserves memory for crash kernel
> > > > + *
> > > > + * This function reserves memory area given in "crashkernel=" kernel command
> > > > + * line parameter. The memory reserved is used by dump capture kernel when
> > > > + * primary kernel is crashing.
> > > > + */
> > > > +static void __init reserve_crashkernel(void)
> > > > +{
> > > > +?????int ret;
> > > > +
> > > > +?????ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > > > +?????????????????????????????&crash_size, &crash_base);
> > > > +?????/* no crashkernel= or invalid value specified */
> > > > +?????if (ret || !crash_size)
> > > > +?????????????return;
> > > > +
> > > > +?????if (crash_base == 0) {
> > > > +?????????????/* Current arm64 boot protocol requires 2MB alignment */
> > > > +?????????????crash_base = memblock_find_in_range(0,
> > > > +?????????????????????????????MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
> > > > +?????????????if (crash_base == 0) {
> > > > +?????????????????????pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > > > +?????????????????????????????crash_size);
> > > > +?????????????????????return;
> > > > +?????????????}
> > > > +?????????????memblock_reserve(crash_base, crash_size);
> > > > 
> > > I am not pretty sure the context here, but
> > > can we use below code piece instead of the above lines?
> > > ????????if (crash_base == 0)
> > > ????????????????memblock_alloc(crash_size, SZ_2M);
> > Either would be fine here.
> > 
> Hello AKASHI, maybe you can succeed to find the base with memblock_find_in_range(),?
> but that doesn't mean you will also succeed to reserve them with memblock_reserve followed.

We avoid memblock_alloc() here because it panics on failure. This could happen
if user asks for an unusually large crashkernel size. Better to print a message
and keep booting. Checking the return value of memblock_reserve() seems like a
good thing to do though.


> 
> Thanks,
> Dennis
> > 
> > 
> > Thanks,
> > -Takahiro AKASHI
> > 
> > > 
> > > Thanks,
> > > Dennis
> > > 
> > > > 
> > > > +
> > > > +?????} else {
> > > > +?????????????/* User specifies base address explicitly. */
> > > > +?????????????if (!memblock_is_region_memory(crash_base, crash_size) ||
> > > > +?????????????????????memblock_is_region_reserved(crash_base, crash_size)) {
> > > > +?????????????????????pr_warn("crashkernel has wrong address or size\n");
> > > > +?????????????????????return;
> > > > +?????????????}
> > > > +
> > > > +?????????????if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > > > +?????????????????????pr_warn("crashkernel base address is not 2MB aligned\n");
> > > > +?????????????????????return;
> > > > +?????????????}
> > > > +
> > > > +?????????????memblock_reserve(crash_base, crash_size);
> > > > +?????}
> > > > +
> > > > +?????pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > > > +?????????????crash_size >> 20, crash_base >> 20);
> > > > +
> > > > +?????crashk_res.start = crash_base;
> > > > +?????crashk_res.end = crash_base + crash_size - 1;
> > > > +}
> > > > +#else
> > > > +static void __init reserve_crashkernel(void)
> > > > +{
> > > > +?????;
> > > > +}
> > > > +#endif /* CONFIG_KEXEC_CORE */
> > > > +
> > > > ?/*
> > > > ? * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
> > > > ? * currently assumes that for memory starting above 4G, 32-bit devices will
> > > > @@ -289,6 +402,8 @@ void __init arm64_memblock_init(void)
> > > > ??????}
> > > > ?#endif
> > > > 
> > > > +?????reserve_crashkernel();
> > > > +
> > > > ??????early_init_fdt_scan_reserved_mem();
> > > > 
> > > > ??????/* 4GB maximum for 32-bit only capable devices */
> > > > --
> > > > 2.9.0
> > > > 
> > > > 
> > > > _______________________________________________
> > > > linux-arm-kernel mailing list
> > > > linux-arm-kernel at lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > > > 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-19 12:48         ` Mark Salter
@ 2016-07-19 13:27           ` Mark Rutland
  2016-07-20  2:17             ` AKASHI Takahiro
  2016-07-20  3:48             ` Dennis Chen
  2016-07-19 23:34           ` AKASHI Takahiro
  1 sibling, 2 replies; 35+ messages in thread
From: Mark Rutland @ 2016-07-19 13:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 08:48:57AM -0400, Mark Salter wrote:
> On Tue, 2016-07-19 at 18:41 +0800, Dennis Chen wrote:
> > On Tue, Jul 19, 2016 at 07:28:16PM +0900, AKASHI Takahiro wrote:
> > > On Tue, Jul 19, 2016 at 05:39:07PM +0800, Dennis Chen wrote:
> > > > On Tue, Jul 12, 2016 at 02:05:07PM +0900, AKASHI Takahiro wrote:
> > > > > +/*
> > > > > + * reserve_crashkernel() - reserves memory for crash kernel
> > > > > + *
> > > > > + * This function reserves memory area given in "crashkernel=" kernel command
> > > > > + * line parameter. The memory reserved is used by dump capture kernel when
> > > > > + * primary kernel is crashing.
> > > > > + */
> > > > > +static void __init reserve_crashkernel(void)
> > > > > +{
> > > > > +?????int ret;
> > > > > +
> > > > > +?????ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > > > > +?????????????????????????????&crash_size, &crash_base);
> > > > > +?????/* no crashkernel= or invalid value specified */
> > > > > +?????if (ret || !crash_size)
> > > > > +?????????????return;
> > > > > +
> > > > > +?????if (crash_base == 0) {
> > > > > +?????????????/* Current arm64 boot protocol requires 2MB alignment */
> > > > > +?????????????crash_base = memblock_find_in_range(0,
> > > > > +?????????????????????????????MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
> > > > > +?????????????if (crash_base == 0) {
> > > > > +?????????????????????pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > > > > +?????????????????????????????crash_size);
> > > > > +?????????????????????return;
> > > > > +?????????????}
> > > > > +?????????????memblock_reserve(crash_base, crash_size);
> > > > > 
> > > > I am not pretty sure the context here, but
> > > > can we use below code piece instead of the above lines?
> > > > ????????if (crash_base == 0)
> > > > ????????????????memblock_alloc(crash_size, SZ_2M);
> > > Either would be fine here.
> > > 
> > Hello AKASHI, maybe you can succeed to find the base with memblock_find_in_range(),?
> > but that doesn't mean you will also succeed to reserve them with memblock_reserve followed.
> 
> We avoid memblock_alloc() here because it panics on failure. This could happen
> if user asks for an unusually large crashkernel size. Better to print a message
> and keep booting. Checking the return value of memblock_reserve() seems like a
> good thing to do though.

Another option would be to add a memblock_try_alloc() function to the
memblock API, which in case of failure returns 0 rather than triggering
a panic(). We'd still have to check the return value, but all the
memblock manipulation would be in one place.

It looks like adding that is fairly simple:

phys_addr_t __init memblock_try_alloc(phys_addr_t size, phys_addr_t align)
{
	return __memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
}

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-19 12:48         ` Mark Salter
  2016-07-19 13:27           ` Mark Rutland
@ 2016-07-19 23:34           ` AKASHI Takahiro
  1 sibling, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-19 23:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 08:48:57AM -0400, Mark Salter wrote:
> On Tue, 2016-07-19 at 18:41 +0800, Dennis Chen wrote:
> > On Tue, Jul 19, 2016 at 07:28:16PM +0900, AKASHI Takahiro wrote:
> > > 
> > > On Tue, Jul 19, 2016 at 05:39:07PM +0800, Dennis Chen wrote:
> > > > 
> > > > Hello AKASHI,
> > > > 
> > > > On Tue, Jul 12, 2016 at 02:05:07PM +0900, AKASHI Takahiro wrote:
> > > > > 
> > > > > On the startup of primary kernel, the memory region used by crash dump
> > > > > kernel must be specified by "crashkernel=" kernel parameter.
> > > > > reserve_crashkernel() will allocate and reserve the region for later use.
> > > > > 
> > > > > User space tools, like kexec-tools, will be able to find that region as
> > > > > ????????????- "Crash kernel" in /proc/iomem, or
> > > > > ????????????- "linux,crashkernel-base" and "linux,crashkernel-size" under
> > > > > ????????????????/sys/firmware/devicetree/base/chosen
> > > > > 
> > > > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > > > Signed-off-by: Mark Salter <msalter@redhat.com>
> > > > > Signed-off-by: Pratyush Anand <panand@redhat.com>
> > > > > Reviewed-by: James Morse <james.morse@arm.com>
> > > > > ---
> > > > > ??arch/arm64/kernel/setup.c |??????7 ++-
> > > > > ??arch/arm64/mm/init.c????????????| 115 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > > ??2 files changed, 121 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > > > > index c1509e6..cb5eee0 100644
> > > > > --- a/arch/arm64/kernel/setup.c
> > > > > +++ b/arch/arm64/kernel/setup.c
> > > > > @@ -31,7 +31,6 @@
> > > > > ??#include <linux/screen_info.h>
> > > > > ??#include <linux/init.h>
> > > > > ??#include <linux/kexec.h>
> > > > > -#include <linux/crash_dump.h>
> > > > > ??#include <linux/root_dev.h>
> > > > > ??#include <linux/cpu.h>
> > > > > ??#include <linux/interrupt.h>
> > > > > @@ -222,6 +221,12 @@ static void __init request_standard_resources(void)
> > > > > ????????????????????????????????????kernel_data.end <= res->end)
> > > > > ????????????????????????????????????????????request_resource(res, &kernel_data);
> > > > > ????????????}
> > > > > +
> > > > > +#ifdef CONFIG_KEXEC_CORE
> > > > > +??????????/* User space tools will find "Crash kernel" region in /proc/iomem. */
> > > > > +??????????if (crashk_res.end)
> > > > > +??????????????????????????insert_resource(&iomem_resource, &crashk_res);
> > > > > +#endif
> > > > > ??}
> > > > > 
> > > > > ??u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
> > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > > index 2ade7a6..51b1302 100644
> > > > > --- a/arch/arm64/mm/init.c
> > > > > +++ b/arch/arm64/mm/init.c
> > > > > @@ -29,11 +29,13 @@
> > > > > ??#include <linux/gfp.h>
> > > > > ??#include <linux/memblock.h>
> > > > > ??#include <linux/sort.h>
> > > > > +#include <linux/of.h>
> > > > > ??#include <linux/of_fdt.h>
> > > > > ??#include <linux/dma-mapping.h>
> > > > > ??#include <linux/dma-contiguous.h>
> > > > > ??#include <linux/efi.h>
> > > > > ??#include <linux/swiotlb.h>
> > > > > +#include <linux/kexec.h>
> > > > > 
> > > > > ??#include <asm/boot.h>
> > > > > ??#include <asm/fixmap.h>
> > > > > @@ -76,6 +78,117 @@ static int __init early_initrd(char *p)
> > > > > ??early_param("initrd", early_initrd);
> > > > > ??#endif
> > > > > 
> > > > > +#ifdef CONFIG_KEXEC_CORE
> > > > > +static unsigned long long crash_size, crash_base;
> > > > > +static struct property crash_base_prop = {
> > > > > +??????????.name = "linux,crashkernel-base",
> > > > > +??????????.length = sizeof(u64),
> > > > > +??????????.value = &crash_base
> > > > > +};
> > > > > +static struct property crash_size_prop = {
> > > > > +??????????.name = "linux,crashkernel-size",
> > > > > +??????????.length = sizeof(u64),
> > > > > +??????????.value = &crash_size,
> > > > > +};
> > > > > +
> > > > > +static int __init export_crashkernel(void)
> > > > > +{
> > > > > +??????????struct device_node *node;
> > > > > +??????????int ret;
> > > > > +
> > > > > +??????????if (!crashk_res.end)
> > > > > +??????????????????????????return 0;
> > > > > +
> > > > > +??????????crash_base = cpu_to_be64(crashk_res.start);
> > > > > +??????????crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
> > > > > +
> > > > > +??????????/* Add /chosen/linux,crashkernel-* properties */
> > > > > +??????????node = of_find_node_by_path("/chosen");
> > > > > +??????????if (!node)
> > > > > +??????????????????????????return -ENOENT;
> > > > > +
> > > > > +??????????/*
> > > > > +????????????* There might be existing crash kernel properties, but we can't
> > > > > +????????????* be sure what's in them, so remove them.
> > > > > +????????????*/
> > > > > +??????????of_remove_property(node, of_find_property(node,
> > > > > +??????????????????????????????????????????????????????????"linux,crashkernel-base", NULL));
> > > > > +??????????of_remove_property(node, of_find_property(node,
> > > > > +??????????????????????????????????????????????????????????"linux,crashkernel-size", NULL));
> > > > > +
> > > > > +??????????ret = of_add_property(node, &crash_base_prop);
> > > > > +??????????if (ret)
> > > > > +??????????????????????????goto ret_err;
> > > > > +
> > > > > +??????????ret = of_add_property(node, &crash_size_prop);
> > > > > +??????????if (ret)
> > > > > +??????????????????????????goto ret_err;
> > > > > +
> > > > > +??????????return 0;
> > > > > +
> > > > > +ret_err:
> > > > > +??????????pr_warn("Exporting crashkernel region to device tree failed\n");
> > > > > +??????????return ret;
> > > > > +}
> > > > > +late_initcall(export_crashkernel);
> > > > > +
> > > > > +/*
> > > > > + * reserve_crashkernel() - reserves memory for crash kernel
> > > > > + *
> > > > > + * This function reserves memory area given in "crashkernel=" kernel command
> > > > > + * line parameter. The memory reserved is used by dump capture kernel when
> > > > > + * primary kernel is crashing.
> > > > > + */
> > > > > +static void __init reserve_crashkernel(void)
> > > > > +{
> > > > > +??????????int ret;
> > > > > +
> > > > > +??????????ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > > > > +??????????????????????????????????????????????????????????&crash_size, &crash_base);
> > > > > +??????????/* no crashkernel= or invalid value specified */
> > > > > +??????????if (ret || !crash_size)
> > > > > +??????????????????????????return;
> > > > > +
> > > > > +??????????if (crash_base == 0) {
> > > > > +??????????????????????????/* Current arm64 boot protocol requires 2MB alignment */
> > > > > +??????????????????????????crash_base = memblock_find_in_range(0,
> > > > > +??????????????????????????????????????????????????????????MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
> > > > > +??????????????????????????if (crash_base == 0) {
> > > > > +??????????????????????????????????????????pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > > > > +??????????????????????????????????????????????????????????crash_size);
> > > > > +??????????????????????????????????????????return;
> > > > > +??????????????????????????}
> > > > > +??????????????????????????memblock_reserve(crash_base, crash_size);
> > > > > 
> > > > I am not pretty sure the context here, but
> > > > can we use below code piece instead of the above lines?
> > > > ????????????????if (crash_base == 0)
> > > > ????????????????????????????????memblock_alloc(crash_size, SZ_2M);
> > > Either would be fine here.
> > > 
> > Hello AKASHI, maybe you can succeed to find the base with memblock_find_in_range(),??
> > but that doesn't mean you will also succeed to reserve them with memblock_reserve followed.
> 
> We avoid memblock_alloc() here because it panics on failure. This could happen

Thank you for pointing this out. I forgot it.

> if user asks for an unusually large crashkernel size. Better to print a message
> and keep booting. Checking the return value of memblock_reserve() seems like a
> good thing to do though.

Yeah, but it is unlikely that the function fails in this context (just
after find_in_range at early boot time) and nobody in arm64/mm/*.c checks
a return value of memblock_reserve() as well.
So I want to keep the code unchanged.

Thanks,
-Takahiro AKASHI

> 
> > 
> > Thanks,
> > Dennis
> > > 
> > > 
> > > Thanks,
> > > -Takahiro AKASHI
> > > 
> > > > 
> > > > Thanks,
> > > > Dennis
> > > > 
> > > > > 
> > > > > +
> > > > > +??????????} else {
> > > > > +??????????????????????????/* User specifies base address explicitly. */
> > > > > +??????????????????????????if (!memblock_is_region_memory(crash_base, crash_size) ||
> > > > > +??????????????????????????????????????????memblock_is_region_reserved(crash_base, crash_size)) {
> > > > > +??????????????????????????????????????????pr_warn("crashkernel has wrong address or size\n");
> > > > > +??????????????????????????????????????????return;
> > > > > +??????????????????????????}
> > > > > +
> > > > > +??????????????????????????if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > > > > +??????????????????????????????????????????pr_warn("crashkernel base address is not 2MB aligned\n");
> > > > > +??????????????????????????????????????????return;
> > > > > +??????????????????????????}
> > > > > +
> > > > > +??????????????????????????memblock_reserve(crash_base, crash_size);
> > > > > +??????????}
> > > > > +
> > > > > +??????????pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > > > > +??????????????????????????crash_size >> 20, crash_base >> 20);
> > > > > +
> > > > > +??????????crashk_res.start = crash_base;
> > > > > +??????????crashk_res.end = crash_base + crash_size - 1;
> > > > > +}
> > > > > +#else
> > > > > +static void __init reserve_crashkernel(void)
> > > > > +{
> > > > > +??????????;
> > > > > +}
> > > > > +#endif /* CONFIG_KEXEC_CORE */
> > > > > +
> > > > > ??/*
> > > > > ?? * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
> > > > > ?? * currently assumes that for memory starting above 4G, 32-bit devices will
> > > > > @@ -289,6 +402,8 @@ void __init arm64_memblock_init(void)
> > > > > ????????????}
> > > > > ??#endif
> > > > > 
> > > > > +??????????reserve_crashkernel();
> > > > > +
> > > > > ????????????early_init_fdt_scan_reserved_mem();
> > > > > 
> > > > > ????????????/* 4GB maximum for 32-bit only capable devices */
> > > > > --
> > > > > 2.9.0
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > linux-arm-kernel mailing list
> > > > > linux-arm-kernel at lists.infradead.org
> > > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > > > > 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-19 13:27           ` Mark Rutland
@ 2016-07-20  2:17             ` AKASHI Takahiro
  2016-07-20  3:48             ` Dennis Chen
  1 sibling, 0 replies; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-20  2:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 02:27:54PM +0100, Mark Rutland wrote:
> On Tue, Jul 19, 2016 at 08:48:57AM -0400, Mark Salter wrote:
> > On Tue, 2016-07-19 at 18:41 +0800, Dennis Chen wrote:
> > > On Tue, Jul 19, 2016 at 07:28:16PM +0900, AKASHI Takahiro wrote:
> > > > On Tue, Jul 19, 2016 at 05:39:07PM +0800, Dennis Chen wrote:
> > > > > On Tue, Jul 12, 2016 at 02:05:07PM +0900, AKASHI Takahiro wrote:
> > > > > > +/*
> > > > > > + * reserve_crashkernel() - reserves memory for crash kernel
> > > > > > + *
> > > > > > + * This function reserves memory area given in "crashkernel=" kernel command
> > > > > > + * line parameter. The memory reserved is used by dump capture kernel when
> > > > > > + * primary kernel is crashing.
> > > > > > + */
> > > > > > +static void __init reserve_crashkernel(void)
> > > > > > +{
> > > > > > +??????????int ret;
> > > > > > +
> > > > > > +??????????ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > > > > > +??????????????????????????????????????????????????????????&crash_size, &crash_base);
> > > > > > +??????????/* no crashkernel= or invalid value specified */
> > > > > > +??????????if (ret || !crash_size)
> > > > > > +??????????????????????????return;
> > > > > > +
> > > > > > +??????????if (crash_base == 0) {
> > > > > > +??????????????????????????/* Current arm64 boot protocol requires 2MB alignment */
> > > > > > +??????????????????????????crash_base = memblock_find_in_range(0,
> > > > > > +??????????????????????????????????????????????????????????MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
> > > > > > +??????????????????????????if (crash_base == 0) {
> > > > > > +??????????????????????????????????????????pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > > > > > +??????????????????????????????????????????????????????????crash_size);
> > > > > > +??????????????????????????????????????????return;
> > > > > > +??????????????????????????}
> > > > > > +??????????????????????????memblock_reserve(crash_base, crash_size);
> > > > > > 
> > > > > I am not pretty sure the context here, but
> > > > > can we use below code piece instead of the above lines?
> > > > > ????????????????if (crash_base == 0)
> > > > > ????????????????????????????????memblock_alloc(crash_size, SZ_2M);
> > > > Either would be fine here.
> > > > 
> > > Hello AKASHI, maybe you can succeed to find the base with memblock_find_in_range(),??
> > > but that doesn't mean you will also succeed to reserve them with memblock_reserve followed.
> > 
> > We avoid memblock_alloc() here because it panics on failure. This could happen
> > if user asks for an unusually large crashkernel size. Better to print a message
> > and keep booting. Checking the return value of memblock_reserve() seems like a
> > good thing to do though.
> 
> Another option would be to add a memblock_try_alloc() function to the
> memblock API, which in case of failure returns 0 rather than triggering
> a panic(). We'd still have to check the return value, but all the
> memblock manipulation would be in one place.
> 
> It looks like adding that is fairly simple:
> 
> phys_addr_t __init memblock_try_alloc(phys_addr_t size, phys_addr_t align)
> {
> 	return __memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
> }

Apart form this issue, I found that the kernel can panic, if the memory
for crash dump kernel is allocated above 32-bits, due to a failure
of alloc_bootmem_low() in request_standard_resources().
(In addition, dma_contiguous_reserve() will also fail.)

So I'd like to change the code to:
        if (crash_base == 0) {
                crash_base = memblock_find_in_range(0,
                                IS_ENABLED(CONFIG_ZONE_DMA) ?
                                        ARCH_LOW_ADDRESS_LIMIT :
                                        MEMBLOCK_ALLOC_ACCESSIBLE,
                                crash_size, SZ_2M);
        ...

Does this make sense?

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-19 11:01         ` AKASHI Takahiro
@ 2016-07-20  3:39           ` Dennis Chen
  2016-07-20  4:22             ` AKASHI Takahiro
  0 siblings, 1 reply; 35+ messages in thread
From: Dennis Chen @ 2016-07-20  3:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 08:01:21PM +0900, AKASHI Takahiro wrote:
> On Tue, Jul 19, 2016 at 06:06:18PM +0800, Dennis Chen wrote:
> > Hello AKASHI,
> > 
> > On Tue, Jul 19, 2016 at 05:35:55PM +0900, AKASHI Takahiro wrote:
> > > James,
> > >
> > > On Mon, Jul 18, 2016 at 07:04:33PM +0100, James Morse wrote:
> > > > Hi!
> > > >
> > > > (CC: Dennis Chen)
> > > >
> > > > On 12/07/16 06:05, AKASHI Takahiro wrote:
> > > > > Crash dump kernel will be run with a limited range of memory as System
> > > > > RAM.
> > > > >
> > > > > On arm64, we will use a device-tree property under /chosen,
> > > > >    linux,usable-memory-range = <BASE SIZE>
> > > > > in order for primary kernel either on uefi or non-uefi (device tree only)
> > > > > system to hand over the information about usable memory region to crash
> > > > > dump kernel. This property will supercede entries in uefi memory map table
> > > > > and "memory" nodes in a device tree.
> > > >
> > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > > index 51b1302..d8b296f 100644
> > > > > --- a/arch/arm64/mm/init.c
> > > > > +++ b/arch/arm64/mm/init.c
> > > > > @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
> > > > >  }
> > > > >  early_param("mem", early_mem);
> > > > >
> > > > > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > > > > +         const char *uname, int depth, void *data)
> > > > > +{
> > > > > + struct memblock_region *usablemem = (struct memblock_region *)data;
> > > > > + const __be32 *reg;
> > > > > + int len;
> > > > > +
> > > > > + usablemem->size = 0;
> > > > > +
> > > > > + if (depth != 1 || strcmp(uname, "chosen") != 0)
> > > > > +         return 0;
> > > > > +
> > > > > + reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > > > > + if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > > > > +         return 1;
> > > > > +
> > > > > + usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > > > > + usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > > > > +
> > > > > + return 1;
> > > > > +}
> > > > > +
> > > > > +static void __init fdt_enforce_memory_region(void)
> > > > > +{
> > > > > + struct memblock_region reg;
> > > > > +
> > > > > + of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > > > +
> > > > > + if (reg.size) {
> > > > > +         memblock_remove(0, PAGE_ALIGN(reg.base));
> > > > > +         memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> > > > > +                         ULLONG_MAX);
> > > >
> > 
> > According to the panic message from James, I guess the ACPI regions are out of the range
> > [reg.base, reg.base + reg.size] and removed by your above codes. On ARM64, those ACPI
> > regions have been added into memblock and marked as NOMAP, so I think it should be
> > easy to adapt my fix to retain the NOMAP regions here
> 
> See below.
> 
> > Thanks,
> > Dennis
> > 
> > > > I think this is a new way to trip the problem Dennis Chen has been working on
> > > > [0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
> > > > the panic below [1]...
> > >
> > > Yeah, it can be.
> > >
> > > > It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can be
> > > > extended to support a range instead of just a limit?
> > > >
> > > > (It looks like x86 explicitly adds the acpi regions to the crash-kernels memory
> > > > map in crash_setup_memmap_entries()).
> > > >
> > > >
> > > >
> > > > Is it possible for the kernel text to be outside this range? (a bug in
> > > > kexec-tools, or another user of the DT property) If we haven't already failed in
> > > > this case, it may be worth printing a warning, or refusing to
> > > > restrict-memory/expose-vmcore.
> > >
> > > In my implementation of kdump, the usable memory for crash dump
> > > kernel will be allocated within memblock.memory after ACPI-related
> > > regions have been mapped. "linux,usable-memory-range" indicates
> > > this exact memory range.
> > > On crash dump kernel, my fdt_enforce_memory_region() in arm64_memblock_init()
> > > will exclude all the other regions from memblock.memory.
> > > So the kernel (with acpi=on) won't recognize ACPI-regions as
> > > normal memory, and map them by ioremap().
> > >
> > > I thought that it was safe, but actually not due to unaligned accesses.
> > > As you suggested, we will probably be able to do the same thing of
> > > Chen's solution in fdt_enforce_memory_region().
> 
> memblock_isolate_range() and memblock_remove_range() are not exported.
> So we'd better implement an unified interface like:
>     memblock_cap_memory_range(phys_addr_t base, size_t size);
> 
> If base == NULL, it would behave in the exact same way as your
> memblock_mem_limit_remove_map().
>
Hello AKASHI, it's not reasonable to change the prototype of an existing memblock API
which will be used by other components as we did with memblock_enforce_memory_limit. 
Moreover the @size in you case is to specify a memory range of the memblock, which is
different from the @limit as an indicator of the total size of memblocks being limited. 
But I can be help to post an new memblock API patch to cater for your case.

Thanks,
Dennis
  
>
> Thanks,
> -Takahiro AKASHI
> 
> > >
> > >
> > > Thanks,
> > > -Takahiro AKASHI
> > >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel
  2016-07-19 13:27           ` Mark Rutland
  2016-07-20  2:17             ` AKASHI Takahiro
@ 2016-07-20  3:48             ` Dennis Chen
  1 sibling, 0 replies; 35+ messages in thread
From: Dennis Chen @ 2016-07-20  3:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 02:27:54PM +0100, Mark Rutland wrote:
> On Tue, Jul 19, 2016 at 08:48:57AM -0400, Mark Salter wrote:
> > On Tue, 2016-07-19 at 18:41 +0800, Dennis Chen wrote:
> > > On Tue, Jul 19, 2016 at 07:28:16PM +0900, AKASHI Takahiro wrote:
> > > > On Tue, Jul 19, 2016 at 05:39:07PM +0800, Dennis Chen wrote:
> > > > > On Tue, Jul 12, 2016 at 02:05:07PM +0900, AKASHI Takahiro wrote:
> > > > > > +/*
> > > > > > + * reserve_crashkernel() - reserves memory for crash kernel
> > > > > > + *
> > > > > > + * This function reserves memory area given in "crashkernel=" kernel command
> > > > > > + * line parameter. The memory reserved is used by dump capture kernel when
> > > > > > + * primary kernel is crashing.
> > > > > > + */
> > > > > > +static void __init reserve_crashkernel(void)
> > > > > > +{
> > > > > > +?????int ret;
> > > > > > +
> > > > > > +?????ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > > > > > +?????????????????????????????&crash_size, &crash_base);
> > > > > > +?????/* no crashkernel= or invalid value specified */
> > > > > > +?????if (ret || !crash_size)
> > > > > > +?????????????return;
> > > > > > +
> > > > > > +?????if (crash_base == 0) {
> > > > > > +?????????????/* Current arm64 boot protocol requires 2MB alignment */
> > > > > > +?????????????crash_base = memblock_find_in_range(0,
> > > > > > +?????????????????????????????MEMBLOCK_ALLOC_ACCESSIBLE, crash_size, SZ_2M);
> > > > > > +?????????????if (crash_base == 0) {
> > > > > > +?????????????????????pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > > > > > +?????????????????????????????crash_size);
> > > > > > +?????????????????????return;
> > > > > > +?????????????}
> > > > > > +?????????????memblock_reserve(crash_base, crash_size);
> > > > > > 
> > > > > I am not pretty sure the context here, but
> > > > > can we use below code piece instead of the above lines?
> > > > > ????????if (crash_base == 0)
> > > > > ????????????????memblock_alloc(crash_size, SZ_2M);
> > > > Either would be fine here.
> > > > 
> > > Hello AKASHI, maybe you can succeed to find the base with memblock_find_in_range(),?
> > > but that doesn't mean you will also succeed to reserve them with memblock_reserve followed.
> > 
> > We avoid memblock_alloc() here because it panics on failure. This could happen
> > if user asks for an unusually large crashkernel size. Better to print a message
> > and keep booting. Checking the return value of memblock_reserve() seems like a
> > good thing to do though.
> 
> Another option would be to add a memblock_try_alloc() function to the
> memblock API, which in case of failure returns 0 rather than triggering
> a panic(). We'd still have to check the return value, but all the
> memblock manipulation would be in one place.
> 
> It looks like adding that is fairly simple:
> 
> phys_addr_t __init memblock_try_alloc(phys_addr_t size, phys_addr_t align)
> {
> 	return __memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
> }
>
Hi Mark, memblock_alloc() should remove the panic just as we did in kmalloc IMO, as a memory allocation
API, it should handover the failure case to the user and the latter should make the final decision.
Just take a look at the usage case of the memblock_alloc() in the kernel tree, codes have been messed
up by -- some codes check the return value of memblock_alloc, while others not.

Thanks,
Dennis 
> 
> Thanks,
> Mark.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-20  3:39           ` Dennis Chen
@ 2016-07-20  4:22             ` AKASHI Takahiro
  2016-07-20  4:36               ` Dennis Chen
  0 siblings, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-20  4:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 20, 2016 at 11:39:11AM +0800, Dennis Chen wrote:
> On Tue, Jul 19, 2016 at 08:01:21PM +0900, AKASHI Takahiro wrote:
> > On Tue, Jul 19, 2016 at 06:06:18PM +0800, Dennis Chen wrote:
> > > Hello AKASHI,
> > > 
> > > On Tue, Jul 19, 2016 at 05:35:55PM +0900, AKASHI Takahiro wrote:
> > > > James,
> > > >
> > > > On Mon, Jul 18, 2016 at 07:04:33PM +0100, James Morse wrote:
> > > > > Hi!
> > > > >
> > > > > (CC: Dennis Chen)
> > > > >
> > > > > On 12/07/16 06:05, AKASHI Takahiro wrote:
> > > > > > Crash dump kernel will be run with a limited range of memory as System
> > > > > > RAM.
> > > > > >
> > > > > > On arm64, we will use a device-tree property under /chosen,
> > > > > >    linux,usable-memory-range = <BASE SIZE>
> > > > > > in order for primary kernel either on uefi or non-uefi (device tree only)
> > > > > > system to hand over the information about usable memory region to crash
> > > > > > dump kernel. This property will supercede entries in uefi memory map table
> > > > > > and "memory" nodes in a device tree.
> > > > >
> > > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > > > index 51b1302..d8b296f 100644
> > > > > > --- a/arch/arm64/mm/init.c
> > > > > > +++ b/arch/arm64/mm/init.c
> > > > > > @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
> > > > > >  }
> > > > > >  early_param("mem", early_mem);
> > > > > >
> > > > > > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > > > > > +         const char *uname, int depth, void *data)
> > > > > > +{
> > > > > > + struct memblock_region *usablemem = (struct memblock_region *)data;
> > > > > > + const __be32 *reg;
> > > > > > + int len;
> > > > > > +
> > > > > > + usablemem->size = 0;
> > > > > > +
> > > > > > + if (depth != 1 || strcmp(uname, "chosen") != 0)
> > > > > > +         return 0;
> > > > > > +
> > > > > > + reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > > > > > + if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > > > > > +         return 1;
> > > > > > +
> > > > > > + usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > > > > > + usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > > > > > +
> > > > > > + return 1;
> > > > > > +}
> > > > > > +
> > > > > > +static void __init fdt_enforce_memory_region(void)
> > > > > > +{
> > > > > > + struct memblock_region reg;
> > > > > > +
> > > > > > + of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > > > > +
> > > > > > + if (reg.size) {
> > > > > > +         memblock_remove(0, PAGE_ALIGN(reg.base));
> > > > > > +         memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> > > > > > +                         ULLONG_MAX);
> > > > >
> > > 
> > > According to the panic message from James, I guess the ACPI regions are out of the range
> > > [reg.base, reg.base + reg.size] and removed by your above codes. On ARM64, those ACPI
> > > regions have been added into memblock and marked as NOMAP, so I think it should be
> > > easy to adapt my fix to retain the NOMAP regions here
> > 
> > See below.
> > 
> > > Thanks,
> > > Dennis
> > > 
> > > > > I think this is a new way to trip the problem Dennis Chen has been working on
> > > > > [0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
> > > > > the panic below [1]...
> > > >
> > > > Yeah, it can be.
> > > >
> > > > > It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can be
> > > > > extended to support a range instead of just a limit?
> > > > >
> > > > > (It looks like x86 explicitly adds the acpi regions to the crash-kernels memory
> > > > > map in crash_setup_memmap_entries()).
> > > > >
> > > > >
> > > > >
> > > > > Is it possible for the kernel text to be outside this range? (a bug in
> > > > > kexec-tools, or another user of the DT property) If we haven't already failed in
> > > > > this case, it may be worth printing a warning, or refusing to
> > > > > restrict-memory/expose-vmcore.
> > > >
> > > > In my implementation of kdump, the usable memory for crash dump
> > > > kernel will be allocated within memblock.memory after ACPI-related
> > > > regions have been mapped. "linux,usable-memory-range" indicates
> > > > this exact memory range.
> > > > On crash dump kernel, my fdt_enforce_memory_region() in arm64_memblock_init()
> > > > will exclude all the other regions from memblock.memory.
> > > > So the kernel (with acpi=on) won't recognize ACPI-regions as
> > > > normal memory, and map them by ioremap().
> > > >
> > > > I thought that it was safe, but actually not due to unaligned accesses.
> > > > As you suggested, we will probably be able to do the same thing of
> > > > Chen's solution in fdt_enforce_memory_region().
> > 
> > memblock_isolate_range() and memblock_remove_range() are not exported.
> > So we'd better implement an unified interface like:
> >     memblock_cap_memory_range(phys_addr_t base, size_t size);
> > 
> > If base == NULL, it would behave in the exact same way as your
> > memblock_mem_limit_remove_map().
> >
> Hello AKASHI, it's not reasonable to change the prototype of an existing memblock API

I didn't say memblock_enforce_memory_limit(), but
*your* memblock_memblock_remove_limit().

> which will be used by other components as we did with memblock_enforce_memory_limit. 
> Moreover the @size in you case is to specify a memory range of the memblock, which is
> different from the @limit as an indicator of the total size of memblocks being limited. 
> But I can be help to post an new memblock API patch to cater for your case.

Thanks, but I have already prototyped the function.
If you don't agree to my opinion, I will have to submit
a patch for a quite similar but different function.
I think that nobody will accept this.

-Takahiro AKASHI

> Thanks,
> Dennis
>   
> >
> > Thanks,
> > -Takahiro AKASHI
> > 
> > > >
> > > >
> > > > Thanks,
> > > > -Takahiro AKASHI
> > > >
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-20  4:22             ` AKASHI Takahiro
@ 2016-07-20  4:36               ` Dennis Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Dennis Chen @ 2016-07-20  4:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 20, 2016 at 01:22:05PM +0900, AKASHI Takahiro wrote:
> On Wed, Jul 20, 2016 at 11:39:11AM +0800, Dennis Chen wrote:
> > On Tue, Jul 19, 2016 at 08:01:21PM +0900, AKASHI Takahiro wrote:
> > > On Tue, Jul 19, 2016 at 06:06:18PM +0800, Dennis Chen wrote:
> > > > Hello AKASHI,
> > > >
> > > > On Tue, Jul 19, 2016 at 05:35:55PM +0900, AKASHI Takahiro wrote:
> > > > > James,
> > > > >
> > > > > On Mon, Jul 18, 2016 at 07:04:33PM +0100, James Morse wrote:
> > > > > > Hi!
> > > > > >
> > > > > > (CC: Dennis Chen)
> > > > > >
> > > > > > On 12/07/16 06:05, AKASHI Takahiro wrote:
> > > > > > > Crash dump kernel will be run with a limited range of memory as System
> > > > > > > RAM.
> > > > > > >
> > > > > > > On arm64, we will use a device-tree property under /chosen,
> > > > > > >    linux,usable-memory-range = <BASE SIZE>
> > > > > > > in order for primary kernel either on uefi or non-uefi (device tree only)
> > > > > > > system to hand over the information about usable memory region to crash
> > > > > > > dump kernel. This property will supercede entries in uefi memory map table
> > > > > > > and "memory" nodes in a device tree.
> > > > > >
> > > > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > > > > index 51b1302..d8b296f 100644
> > > > > > > --- a/arch/arm64/mm/init.c
> > > > > > > +++ b/arch/arm64/mm/init.c
> > > > > > > @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
> > > > > > >  }
> > > > > > >  early_param("mem", early_mem);
> > > > > > >
> > > > > > > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > > > > > > +         const char *uname, int depth, void *data)
> > > > > > > +{
> > > > > > > + struct memblock_region *usablemem = (struct memblock_region *)data;
> > > > > > > + const __be32 *reg;
> > > > > > > + int len;
> > > > > > > +
> > > > > > > + usablemem->size = 0;
> > > > > > > +
> > > > > > > + if (depth != 1 || strcmp(uname, "chosen") != 0)
> > > > > > > +         return 0;
> > > > > > > +
> > > > > > > + reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > > > > > > + if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > > > > > > +         return 1;
> > > > > > > +
> > > > > > > + usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > > > > > > + usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > > > > > > +
> > > > > > > + return 1;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void __init fdt_enforce_memory_region(void)
> > > > > > > +{
> > > > > > > + struct memblock_region reg;
> > > > > > > +
> > > > > > > + of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > > > > > +
> > > > > > > + if (reg.size) {
> > > > > > > +         memblock_remove(0, PAGE_ALIGN(reg.base));
> > > > > > > +         memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> > > > > > > +                         ULLONG_MAX);
> > > > > >
> > > >
> > > > According to the panic message from James, I guess the ACPI regions are out of the range
> > > > [reg.base, reg.base + reg.size] and removed by your above codes. On ARM64, those ACPI
> > > > regions have been added into memblock and marked as NOMAP, so I think it should be
> > > > easy to adapt my fix to retain the NOMAP regions here
> > >
> > > See below.
> > >
> > > > Thanks,
> > > > Dennis
> > > >
> > > > > > I think this is a new way to trip the problem Dennis Chen has been working on
> > > > > > [0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
> > > > > > the panic below [1]...
> > > > >
> > > > > Yeah, it can be.
> > > > >
> > > > > > It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can be
> > > > > > extended to support a range instead of just a limit?
> > > > > >
> > > > > > (It looks like x86 explicitly adds the acpi regions to the crash-kernels memory
> > > > > > map in crash_setup_memmap_entries()).
> > > > > >
> > > > > >
> > > > > >
> > > > > > Is it possible for the kernel text to be outside this range? (a bug in
> > > > > > kexec-tools, or another user of the DT property) If we haven't already failed in
> > > > > > this case, it may be worth printing a warning, or refusing to
> > > > > > restrict-memory/expose-vmcore.
> > > > >
> > > > > In my implementation of kdump, the usable memory for crash dump
> > > > > kernel will be allocated within memblock.memory after ACPI-related
> > > > > regions have been mapped. "linux,usable-memory-range" indicates
> > > > > this exact memory range.
> > > > > On crash dump kernel, my fdt_enforce_memory_region() in arm64_memblock_init()
> > > > > will exclude all the other regions from memblock.memory.
> > > > > So the kernel (with acpi=on) won't recognize ACPI-regions as
> > > > > normal memory, and map them by ioremap().
> > > > >
> > > > > I thought that it was safe, but actually not due to unaligned accesses.
> > > > > As you suggested, we will probably be able to do the same thing of
> > > > > Chen's solution in fdt_enforce_memory_region().
> > >
> > > memblock_isolate_range() and memblock_remove_range() are not exported.
> > > So we'd better implement an unified interface like:
> > >     memblock_cap_memory_range(phys_addr_t base, size_t size);
> > >
> > > If base == NULL, it would behave in the exact same way as your
> > > memblock_mem_limit_remove_map().
> > >
> > Hello AKASHI, it's not reasonable to change the prototype of an existing memblock API
>
> I didn't say memblock_enforce_memory_limit(), but
> *your* memblock_memblock_remove_limit().
>
> > which will be used by other components as we did with memblock_enforce_memory_limit.
> > Moreover the @size in you case is to specify a memory range of the memblock, which is
> > different from the @limit as an indicator of the total size of memblocks being limited.
> > But I can be help to post an new memblock API patch to cater for your case.
>
> Thanks, but I have already prototyped the function.
> If you don't agree to my opinion, I will have to submit
> a patch for a quite similar but different function.
> I think that nobody will accept this.
>
No problem.

Thanks,
Dennis
>
> -Takahiro AKASHI
>
> > Thanks,
> > Dennis
> >
> > >
> > > Thanks,
> > > -Takahiro AKASHI
> > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > > -Takahiro AKASHI
> > > > >
> >
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-18 18:04   ` James Morse
  2016-07-19  8:35     ` AKASHI Takahiro
@ 2016-07-21  0:57     ` AKASHI Takahiro
  2016-07-22 13:55       ` James Morse
  1 sibling, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-21  0:57 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Mon, Jul 18, 2016 at 07:04:33PM +0100, James Morse wrote:
> Hi!
> 
> (CC: Dennis Chen)
> 
> On 12/07/16 06:05, AKASHI Takahiro wrote:
> > Crash dump kernel will be run with a limited range of memory as System
> > RAM.
> > 
> > On arm64, we will use a device-tree property under /chosen,
> >    linux,usable-memory-range = <BASE SIZE>
> > in order for primary kernel either on uefi or non-uefi (device tree only)
> > system to hand over the information about usable memory region to crash
> > dump kernel. This property will supercede entries in uefi memory map table
> > and "memory" nodes in a device tree.
> 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 51b1302..d8b296f 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -300,10 +300,48 @@ static int __init early_mem(char *p)
> >  }
> >  early_param("mem", early_mem);
> >  
> > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > +		const char *uname, int depth, void *data)
> > +{
> > +	struct memblock_region *usablemem = (struct memblock_region *)data;
> > +	const __be32 *reg;
> > +	int len;
> > +
> > +	usablemem->size = 0;
> > +
> > +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> > +		return 0;
> > +
> > +	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > +		return 1;
> > +
> > +	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > +	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > +
> > +	return 1;
> > +}
> > +
> > +static void __init fdt_enforce_memory_region(void)
> > +{
> > +	struct memblock_region reg;
> > +
> > +	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > +
> > +	if (reg.size) {
> > +		memblock_remove(0, PAGE_ALIGN(reg.base));
> > +		memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
> > +				ULLONG_MAX);
> 
> I think this is a new way to trip the problem Dennis Chen has been working on
> [0]. If I kdump with --reuse-cmdline on a kernel booted with 'acpi=on', I get
> the panic below [1]...
> 
> It looks like Dennis's fix involves changes in mm/memblock.c, maybe they can be
> extended to support a range instead of just a limit?

Could you please apply the diff attached below and confirm that
kdump works in your environment?
I can't test it by myself since my hikey board seems to be broken now.

Thanks,
-Takahiro AKASHI

> (It looks like x86 explicitly adds the acpi regions to the crash-kernels memory
> map in crash_setup_memmap_entries()).
> 
> 
> 
> Is it possible for the kernel text to be outside this range? (a bug in
> kexec-tools, or another user of the DT property) If we haven't already failed in
> this case, it may be worth printing a warning, or refusing to
> restrict-memory/expose-vmcore.
> 
> 
> 
> Thanks,
> 
> James
> 
> 
> [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/443356.html
> [1]
> [    0.000000] efi: Getting EFI parameters from FDT:
> [    0.000000] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
> [    0.000000] efi:  ACPI=0xf95b0000  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
> [    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
> [    0.000000] cma: Reserved 16 MiB at 0x00000009fec00000
> [    0.000000] ACPI: Early table checksum verification disabled
> [    0.000000] ACPI: RSDP 0x00000000F95B0014 000024 (v02 ARMLTD)
> [    0.000000] ACPI: XSDT 0x00000000F95A00E8 00004C (v01 ARMLTD ARM-JUNO 2014072
> 7      01000013)
> [    0.000000] ACPI: FACP 0x00000000F9500000 00010C (v05 ARMLTD ARM-JUNO 2014072
> 7 ARM  00000099)
> [    0.000000] ACPI: DSDT 0x00000000F94C0000 000396 (v01 ARMLTD ARM-JUNO 2014072
> 7 INTL 20150619)
> [    0.000000] ACPI: GTDT 0x00000000F94F0000 000060 (v02 ARMLTD ARM-JUNO 2014072
> 7 ARM  00000099)
> [    0.000000] ACPI: APIC 0x00000000F94E0000 000224 (v03 ARMLTD ARM-JUNO 2014072
> 7 ARM  00000099)
> [    0.000000] ACPI: SSDT 0x00000000F94D0000 0001E3 (v01 ARMLTD ARM-JUNO 2014072
> 7 INTL 20150619)
> [    0.000000] ACPI: MCFG 0x00000000F94B0000 00003C (v01 ARMLTD ARM-JUNO 2014072
> 7 ARM  00000099)
> ...
> [    0.737577] Serial: AMBA PL011 UART driver
> [    0.786086] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> [    0.794203] ACPI: Added _OSI(Module Device)
> [    0.798659] ACPI: Added _OSI(Processor Device)
> [    0.803190] ACPI: Added _OSI(3.0 _SCP Extensions)
> [    0.807973] ACPI: Added _OSI(Processor Aggregator Device)
> [    0.813653] Unable to handle kernel paging request at virtual address ffff000
> 00804e027
> [    0.821704] pgd = ffff000008cce000
> [    0.825155] [ffff00000804e027] *pgd=00000009ffffd003, *pud=00000009ffffc003,
> *pmd=00000009ffffb003, *pte=00e80000f94c0707
> [    0.836319] Internal error: Oops: 96000021 [#1] PREEMPT SMP
> [    0.841972] Modules linked in:
> [    0.845073] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S              4.7.0-rc4
> + #4569
> [    0.852927] Hardware name: ARM Juno development board (r1) (DT)
> [    0.858936] task: ffff80003d898000 ti: ffff80003d894000 task.ti: ffff80003d89
> 4000
> [    0.866537] PC is at acpi_ns_lookup+0x23c/0x378
> [    0.871131] LR is at acpi_ds_load1_begin_op+0x88/0x260
> [    0.876340] pc : [<ffff0000084061a4>] lr : [<ffff0000083fc08c>] pstate: 60000
> 045
> [    0.883846] sp : ffff80003d8979b0
> [    0.887206] x29: ffff80003d8979b0 x28: 0000000000000000
> [    0.892596] x27: 000000000000001b x26: ffff000008a80a07
> [    0.897986] x25: ffff80003d897a48 x24: 0000000000000001
> [    0.903377] x23: 0000000000000001 x22: ffff00000804e027
> [    0.908769] x21: 000000000000001b x20: 0000000000000001
> [    0.914158] x19: 0000000000000000 x18: ffff00000804efff
> [    0.919547] x17: 00000000000038ff x16: 0000000000000002
> [    0.924937] x15: ffff00000804efff x14: 0000008000000000
> [    0.930326] x13: ffff000008c942b2 x12: ffff00000804efff
> [    0.935717] x11: ffff000008bf0000 x10: 00000000ffffff76
> [    0.941107] x9 : 0000000000000000 x8 : ffff000008cb6000
> [    0.946498] x7 : 0000000000000000 x6 : ffff80003d897aa8
> [    0.951891] x5 : ffff80003d028400 x4 : 0000000000000001
> [    0.957281] x3 : 0000000000000003 x2 : ffff000008cb6090
> [    0.962673] x1 : 000000000000005f x0 : 0000000000000000
> [    0.968063]
> [    0.969569] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
> [    1.387661] Call trace:
> ...
> [    1.473172] [<ffff0000084061a4>] acpi_ns_lookup+0x23c/0x378
> [    1.478832] [<ffff0000083fc08c>] acpi_ds_load1_begin_op+0x88/0x260
> [    1.485105] [<ffff00000840c0e8>] acpi_ps_build_named_op+0xa8/0x170
> [    1.491378] [<ffff00000840c2e0>] acpi_ps_create_op+0x130/0x230
> [    1.497299] [<ffff00000840bc28>] acpi_ps_parse_loop+0x168/0x580
> [    1.503302] [<ffff00000840cb44>] acpi_ps_parse_aml+0xa0/0x278
> [    1.509135] [<ffff0000084081d0>] acpi_ns_one_complete_parse+0x128/0x150
> [    1.515852] [<ffff00000840821c>] acpi_ns_parse_table+0x24/0x44
> [    1.521775] [<ffff0000084079e8>] acpi_ns_load_table+0x54/0xdc
> [    1.527612] [<ffff000008411038>] acpi_tb_load_namespace+0xd0/0x230
> [    1.533887] [<ffff000008b2695c>] acpi_load_tables+0x3c/0xa8
> [    1.539542] [<ffff000008b25974>] acpi_init+0x88/0x2b0
> [    1.544670] [<ffff000008081a08>] do_one_initcall+0x38/0x128
> [    1.550325] [<ffff000008b00cc0>] kernel_init_freeable+0x14c/0x1f0
> [    1.556517] [<ffff0000087d2088>] kernel_init+0x10/0x100
> [    1.561823] [<ffff000008084e10>] ret_from_fork+0x10/0x40
> [    1.567216] Code: b9008fbb 2a000318 36380054 32190318 (b94002c0)
> [    1.573451] ---[ end trace dec6cecdcba673b7 ]---
> [    1.578158] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00
> 00000b
> [    1.578158]
> [    1.587428] SMP: stopping secondary CPUs
> [    1.591411] ---[ end Kernel panic - not syncing: Attempted to kill init! exit
> code=0x0000000b
> [    0.969225] Process swapper/0 (pid: 1, stack limit = 0xffff80003d894020)
> 
> 
===8<===
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 4bd55cd..c027275 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -380,11 +380,8 @@ static void __init fdt_enforce_memory_region(void)
 
 	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
 
-	if (reg.size) {
-		memblock_remove(0, PAGE_ALIGN(reg.base));
-		memblock_remove(round_down(reg.base + reg.size, PAGE_SIZE),
-				ULLONG_MAX);
-	}
+	if (reg.size)
+		memblock_cap_memory_range(reg.base, reg.size);
 }
 
 void __init arm64_memblock_init(void)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 3106ac1..9ab17a9 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -333,6 +333,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
 bool memblock_is_memory(phys_addr_t addr);
 int memblock_is_map_memory(phys_addr_t addr);
 int memblock_is_region_memory(phys_addr_t base, phys_addr_t size);
diff --git a/mm/memblock.c b/mm/memblock.c
index ac12489..30badf1 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1486,6 +1486,34 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
 			      (phys_addr_t)ULLONG_MAX);
 }
 
+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+	int start_rgn, end_rgn;
+	int i, ret;
+
+	if (!size)
+		return;
+
+	ret = memblock_isolate_range(&memblock.memory, base, size,
+						&start_rgn, &end_rgn);
+	if (ret)
+		return;
+
+	/* remove all the MAP regions */
+	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	for (i = start_rgn - 1; i >= 0; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	/* truncate the reserved regions */
+	memblock_remove_range(&memblock.reserved, 0, base);
+	memblock_remove_range(&memblock.reserved,
+			base + size, (phys_addr_t)ULLONG_MAX);
+}
+
 static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
 {
 	unsigned int left = 0, right = type->cnt;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-21  0:57     ` AKASHI Takahiro
@ 2016-07-22 13:55       ` James Morse
  2016-07-25  5:27         ` AKASHI Takahiro
  0 siblings, 1 reply; 35+ messages in thread
From: James Morse @ 2016-07-22 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 21/07/16 01:57, AKASHI Takahiro wrote:
> Could you please apply the diff attached below and confirm that
> kdump works in your environment?
> I can't test it by myself since my hikey board seems to be broken now.

With this I get a failure even earlier, even with 'acpi=off'.
With this patch, on boot of the kdump kernel I get:
[    0.000000] Linux version 4.7.0-rc4+ (morse at melchizedek) (gcc version 5.2.1 6
[    0.000000] Boot CPU: AArch64 Processor [411fd071]
[    0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '')
[    0.000000] bootconsole [pl11] enabled
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
[    0.000000] efi:  ACPI=0xf95b0000  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
[    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
[    0.000000] cma: Failed to reserve 16 MiB
[    0.000000] On node 0 totalpages: 132160
[    0.000000]   DMA zone: 17 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 1088 pages, LIFO batch:0
[    0.000000]   Normal zone: 2048 pages used for memmap
[    0.000000]   Normal zone: 131072 pages, LIFO batch:31
[    0.000000] bootmem alloc of 64 bytes failed!
[    0.000000] Kernel panic - not syncing: Out of memory
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc4+ #4600
[    0.000000] Hardware name: ARM Juno development board (r1) (DT)
[    0.000000] Call trace:
[    0.000000] [<ffff0000080889a8>] dump_backtrace+0x0/0x1e8
[    0.000000] [<ffff000008088ba4>] show_stack+0x14/0x20
[    0.000000] [<ffff000008363274>] dump_stack+0x94/0xb8
[    0.000000] [<ffff00000816192c>] panic+0x10c/0x278
[    0.000000] [<ffff000008b14980>] free_bootmem_late+0x0/0xa0
[    0.000000] [<ffff000008b14e68>] __alloc_bootmem_low+0x2c/0x38
[    0.000000] [<ffff000008b02954>] setup_arch+0x2e4/0x5ac
[    0.000000] [<ffff000008b00854>] start_kernel+0x70/0x390
[    0.000000] [<ffff000008b001bc>] __primary_switched+0x30/0x6c
[    0.000000] ---[ end Kernel panic - not syncing: Out of memory


Trying again, booted with:
> console=ttyAMA0,115200 earlycon=pl011,0x7ff80000 root=/dev/nfs
> nfsroot=10.X.X.X:/ubuntu-15.04-server-arm64,v3 resume=/dev/sda2 no_console_suspend
> ip=dhcp rw crashkernel=512M stacktrace ignore_loglevel=1 acpi=off efi=debug
memblock=debug

kexec-tools invoked as:
> kexec -x --reuse-cmdline --dtb=/sys/firmware/fdt -p /aarch64-Image

cat /proc/iomem | grep Crash:
> 9e0000000-9ffffffff : Crash kernel

crash kernel memblock before call to memblock_cap_memory_range():
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x1fef10000 reserved size = 0x4f3b
[    0.000000]  memory.cnt  = 0x9
[    0.000000]  memory[0x0]     [0x00000080000000-0x000000dfffffff], 0x60000000
bytes flags: 0x0
[    0.000000]  memory[0x1]     [0x000000e00f0000-0x000000f949ffff], 0x193b0000
bytes flags: 0x0
(uefi runtime code|data|acpi:)
[    0.000000]  memory[0x2]     [0x000000f94a0000-0x000000f984ffff], 0x3b0000
bytes flags: 0x4
[    0.000000]  memory[0x3]     [0x000000f9850000-0x000000fe81ffff], 0x4fd0000
bytes flags: 0x0
(uefi runtime data:)
[    0.000000]  memory[0x4]     [0x000000fe820000-0x000000fe85ffff], 0x40000
bytes flags: 0x4
[    0.000000]  memory[0x5]     [0x000000fe860000-0x000000fe86ffff], 0x10000
bytes flags: 0x0
(uefi runtime data:)
[    0.000000]  memory[0x6]     [0x000000fe870000-0x000000fe8bffff], 0x50000
bytes flags: 0x4
[    0.000000]  memory[0x7]     [0x000000fe8c0000-0x000000feffffff], 0x740000
bytes flags: 0x0
[    0.000000]  memory[0x8]     [0x00000880000000-0x000009ffffffff], 0x180000000
bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x2
(uefi memory map:)
[    0.000000]  reserved[0x0]   [0x000000f985a000-0x000000f985afff], 0x1000
bytes flags: 0x0
(device tree:)
[    0.000000]  reserved[0x1]   [0x000009e0ce0000-0x000009e0ce3f3a], 0x3f3b
bytes flags: 0x0

crash kernel memblock after call to memblock_cap_memory_range():
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20440000 reserved size = 0x3f3b
[    0.000000]  memory.cnt  = 0x4
[    0.000000]  memory[0x0]     [0x000000f94a0000-0x000000f984ffff], 0x3b0000
bytes flags: 0x4
[    0.000000]  memory[0x1]     [0x000000fe820000-0x000000fe85ffff], 0x40000
bytes flags: 0x4
[    0.000000]  memory[0x2]     [0x000000fe870000-0x000000fe8bffff], 0x50000
bytes flags: 0x4
[    0.000000]  memory[0x3]     [0x000009e0000000-0x000009ffffffff], 0x20000000
bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x1
[    0.000000]  reserved[0x0]   [0x000009e0ce0000-0x000009e0ce3f3a], 0x3f3b
bytes flags: 0x0
(reserve kernel text:)
[    0.000000] memblock_reserve: [0x000009e0080000-0x000009e0cd0fff] flags 0x0
arm64_memblock_init+0x1e0/0x40c
(reserve elfcorehdr:)
[    0.000000] memblock_reserve: [0x000009fffff000-0x000009fffff3ff] flags 0x0
arm64_memblock_init+0x37c/0x40c
[    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
[    0.000000] cma: Failed to reserve 16 MiB
...

It looks like memblock_cap_memory_range() did exactly what you intended

This first fault seems to be because memblock_start_of_DRAM() is being dragged
down by the preserved nomap areas, this means there isn't any usable memory
within ~4G of memblock_start_of_DRAM()...

I will dig into the bootmem alloc failure next week...

(let me know if you want the full boot logs... they are more noise than signal)


Thanks,

James

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-22 13:55       ` James Morse
@ 2016-07-25  5:27         ` AKASHI Takahiro
  2016-08-04  6:21           ` AKASHI Takahiro
  0 siblings, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-07-25  5:27 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Fri, Jul 22, 2016 at 02:55:02PM +0100, James Morse wrote:
> Hi,
> 
> On 21/07/16 01:57, AKASHI Takahiro wrote:
> > Could you please apply the diff attached below and confirm that
> > kdump works in your environment?
> > I can't test it by myself since my hikey board seems to be broken now.
> 
> With this I get a failure even earlier, even with 'acpi=off'.
> With this patch, on boot of the kdump kernel I get:

I think you ran into the problem that I mentioned in:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/444500.html

You may want to
- apply the change stated above, or
  (Given that we don't know about the configuration of crash dump kernel,
  checking for CONFIG_ZONE_DMA doesn't make sense. Instead, we'd better
  always enforce ARCH_LOW_ADDRESS_LIMIT.)
- explicitly specify the start address (Y) below ARCH_LOW_ADDRESS_LIMIT
  at "crashkernel=X at Y"
Either would work.

Thanks,
-Takahiro AKASHI

> [    0.000000] Linux version 4.7.0-rc4+ (morse at melchizedek) (gcc version 5.2.1 6
> [    0.000000] Boot CPU: AArch64 Processor [411fd071]
> [    0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '')
> [    0.000000] bootconsole [pl11] enabled
> [    0.000000] debug: ignoring loglevel setting.
> [    0.000000] efi: Getting EFI parameters from FDT:
> [    0.000000] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
> [    0.000000] efi:  ACPI=0xf95b0000  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
> [    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 16 MiB
> [    0.000000] On node 0 totalpages: 132160
> [    0.000000]   DMA zone: 17 pages used for memmap
> [    0.000000]   DMA zone: 0 pages reserved
> [    0.000000]   DMA zone: 1088 pages, LIFO batch:0
> [    0.000000]   Normal zone: 2048 pages used for memmap
> [    0.000000]   Normal zone: 131072 pages, LIFO batch:31
> [    0.000000] bootmem alloc of 64 bytes failed!
> [    0.000000] Kernel panic - not syncing: Out of memory
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc4+ #4600
> [    0.000000] Hardware name: ARM Juno development board (r1) (DT)
> [    0.000000] Call trace:
> [    0.000000] [<ffff0000080889a8>] dump_backtrace+0x0/0x1e8
> [    0.000000] [<ffff000008088ba4>] show_stack+0x14/0x20
> [    0.000000] [<ffff000008363274>] dump_stack+0x94/0xb8
> [    0.000000] [<ffff00000816192c>] panic+0x10c/0x278
> [    0.000000] [<ffff000008b14980>] free_bootmem_late+0x0/0xa0
> [    0.000000] [<ffff000008b14e68>] __alloc_bootmem_low+0x2c/0x38
> [    0.000000] [<ffff000008b02954>] setup_arch+0x2e4/0x5ac
> [    0.000000] [<ffff000008b00854>] start_kernel+0x70/0x390
> [    0.000000] [<ffff000008b001bc>] __primary_switched+0x30/0x6c
> [    0.000000] ---[ end Kernel panic - not syncing: Out of memory
> 
> 
> Trying again, booted with:
> > console=ttyAMA0,115200 earlycon=pl011,0x7ff80000 root=/dev/nfs
> > nfsroot=10.X.X.X:/ubuntu-15.04-server-arm64,v3 resume=/dev/sda2 no_console_suspend
> > ip=dhcp rw crashkernel=512M stacktrace ignore_loglevel=1 acpi=off efi=debug
> memblock=debug
> 
> kexec-tools invoked as:
> > kexec -x --reuse-cmdline --dtb=/sys/firmware/fdt -p /aarch64-Image
> 
> cat /proc/iomem | grep Crash:
> > 9e0000000-9ffffffff : Crash kernel
> 
> crash kernel memblock before call to memblock_cap_memory_range():
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x1fef10000 reserved size = 0x4f3b
> [    0.000000]  memory.cnt  = 0x9
> [    0.000000]  memory[0x0]     [0x00000080000000-0x000000dfffffff], 0x60000000
> bytes flags: 0x0
> [    0.000000]  memory[0x1]     [0x000000e00f0000-0x000000f949ffff], 0x193b0000
> bytes flags: 0x0
> (uefi runtime code|data|acpi:)
> [    0.000000]  memory[0x2]     [0x000000f94a0000-0x000000f984ffff], 0x3b0000
> bytes flags: 0x4
> [    0.000000]  memory[0x3]     [0x000000f9850000-0x000000fe81ffff], 0x4fd0000
> bytes flags: 0x0
> (uefi runtime data:)
> [    0.000000]  memory[0x4]     [0x000000fe820000-0x000000fe85ffff], 0x40000
> bytes flags: 0x4
> [    0.000000]  memory[0x5]     [0x000000fe860000-0x000000fe86ffff], 0x10000
> bytes flags: 0x0
> (uefi runtime data:)
> [    0.000000]  memory[0x6]     [0x000000fe870000-0x000000fe8bffff], 0x50000
> bytes flags: 0x4
> [    0.000000]  memory[0x7]     [0x000000fe8c0000-0x000000feffffff], 0x740000
> bytes flags: 0x0
> [    0.000000]  memory[0x8]     [0x00000880000000-0x000009ffffffff], 0x180000000
> bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x2
> (uefi memory map:)
> [    0.000000]  reserved[0x0]   [0x000000f985a000-0x000000f985afff], 0x1000
> bytes flags: 0x0
> (device tree:)
> [    0.000000]  reserved[0x1]   [0x000009e0ce0000-0x000009e0ce3f3a], 0x3f3b
> bytes flags: 0x0
> 
> crash kernel memblock after call to memblock_cap_memory_range():
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x20440000 reserved size = 0x3f3b
> [    0.000000]  memory.cnt  = 0x4
> [    0.000000]  memory[0x0]     [0x000000f94a0000-0x000000f984ffff], 0x3b0000
> bytes flags: 0x4
> [    0.000000]  memory[0x1]     [0x000000fe820000-0x000000fe85ffff], 0x40000
> bytes flags: 0x4
> [    0.000000]  memory[0x2]     [0x000000fe870000-0x000000fe8bffff], 0x50000
> bytes flags: 0x4
> [    0.000000]  memory[0x3]     [0x000009e0000000-0x000009ffffffff], 0x20000000
> bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x1
> [    0.000000]  reserved[0x0]   [0x000009e0ce0000-0x000009e0ce3f3a], 0x3f3b
> bytes flags: 0x0
> (reserve kernel text:)
> [    0.000000] memblock_reserve: [0x000009e0080000-0x000009e0cd0fff] flags 0x0
> arm64_memblock_init+0x1e0/0x40c
> (reserve elfcorehdr:)
> [    0.000000] memblock_reserve: [0x000009fffff000-0x000009fffff3ff] flags 0x0
> arm64_memblock_init+0x37c/0x40c
> [    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 16 MiB
> ...
> 
> It looks like memblock_cap_memory_range() did exactly what you intended
> 
> This first fault seems to be because memblock_start_of_DRAM() is being dragged
> down by the preserved nomap areas, this means there isn't any usable memory
> within ~4G of memblock_start_of_DRAM()...
> 
> I will dig into the bootmem alloc failure next week...
> 
> (let me know if you want the full boot logs... they are more noise than signal)
> 
> 
> Thanks,
> 
> James
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-07-25  5:27         ` AKASHI Takahiro
@ 2016-08-04  6:21           ` AKASHI Takahiro
  2016-08-09 16:22             ` James Morse
  0 siblings, 1 reply; 35+ messages in thread
From: AKASHI Takahiro @ 2016-08-04  6:21 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Mon, Jul 25, 2016 at 02:27:55PM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Fri, Jul 22, 2016 at 02:55:02PM +0100, James Morse wrote:
> > Hi,
> > 
> > On 21/07/16 01:57, AKASHI Takahiro wrote:
> > > Could you please apply the diff attached below and confirm that
> > > kdump works in your environment?
> > > I can't test it by myself since my hikey board seems to be broken now.
> > 
> > With this I get a failure even earlier, even with 'acpi=off'.
> > With this patch, on boot of the kdump kernel I get:
> 
> I think you ran into the problem that I mentioned in:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/444500.html
> 
> You may want to
> - apply the change stated above, or
>   (Given that we don't know about the configuration of crash dump kernel,
>   checking for CONFIG_ZONE_DMA doesn't make sense. Instead, we'd better
>   always enforce ARCH_LOW_ADDRESS_LIMIT.)
> - explicitly specify the start address (Y) below ARCH_LOW_ADDRESS_LIMIT
>   at "crashkernel=X at Y"
> Either would work.

Have these methods fixed your problem in the end?

-Takahiro AKASHI

> Thanks,
> -Takahiro AKASHI
> 
> > [    0.000000] Linux version 4.7.0-rc4+ (morse at melchizedek) (gcc version 5.2.1 6
> > [    0.000000] Boot CPU: AArch64 Processor [411fd071]
> > [    0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '')
> > [    0.000000] bootconsole [pl11] enabled
> > [    0.000000] debug: ignoring loglevel setting.
> > [    0.000000] efi: Getting EFI parameters from FDT:
> > [    0.000000] efi: EFI v2.50 by ARM Juno EFI Nov 24 2015 12:36:35
> > [    0.000000] efi:  ACPI=0xf95b0000  ACPI 2.0=0xf95b0014  PROP=0xfe8db4d8
> > [    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
> > [    0.000000] cma: Failed to reserve 16 MiB
> > [    0.000000] On node 0 totalpages: 132160
> > [    0.000000]   DMA zone: 17 pages used for memmap
> > [    0.000000]   DMA zone: 0 pages reserved
> > [    0.000000]   DMA zone: 1088 pages, LIFO batch:0
> > [    0.000000]   Normal zone: 2048 pages used for memmap
> > [    0.000000]   Normal zone: 131072 pages, LIFO batch:31
> > [    0.000000] bootmem alloc of 64 bytes failed!
> > [    0.000000] Kernel panic - not syncing: Out of memory
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc4+ #4600
> > [    0.000000] Hardware name: ARM Juno development board (r1) (DT)
> > [    0.000000] Call trace:
> > [    0.000000] [<ffff0000080889a8>] dump_backtrace+0x0/0x1e8
> > [    0.000000] [<ffff000008088ba4>] show_stack+0x14/0x20
> > [    0.000000] [<ffff000008363274>] dump_stack+0x94/0xb8
> > [    0.000000] [<ffff00000816192c>] panic+0x10c/0x278
> > [    0.000000] [<ffff000008b14980>] free_bootmem_late+0x0/0xa0
> > [    0.000000] [<ffff000008b14e68>] __alloc_bootmem_low+0x2c/0x38
> > [    0.000000] [<ffff000008b02954>] setup_arch+0x2e4/0x5ac
> > [    0.000000] [<ffff000008b00854>] start_kernel+0x70/0x390
> > [    0.000000] [<ffff000008b001bc>] __primary_switched+0x30/0x6c
> > [    0.000000] ---[ end Kernel panic - not syncing: Out of memory
> > 
> > 
> > Trying again, booted with:
> > > console=ttyAMA0,115200 earlycon=pl011,0x7ff80000 root=/dev/nfs
> > > nfsroot=10.X.X.X:/ubuntu-15.04-server-arm64,v3 resume=/dev/sda2 no_console_suspend
> > > ip=dhcp rw crashkernel=512M stacktrace ignore_loglevel=1 acpi=off efi=debug
> > memblock=debug
> > 
> > kexec-tools invoked as:
> > > kexec -x --reuse-cmdline --dtb=/sys/firmware/fdt -p /aarch64-Image
> > 
> > cat /proc/iomem | grep Crash:
> > > 9e0000000-9ffffffff : Crash kernel
> > 
> > crash kernel memblock before call to memblock_cap_memory_range():
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x1fef10000 reserved size = 0x4f3b
> > [    0.000000]  memory.cnt  = 0x9
> > [    0.000000]  memory[0x0]     [0x00000080000000-0x000000dfffffff], 0x60000000
> > bytes flags: 0x0
> > [    0.000000]  memory[0x1]     [0x000000e00f0000-0x000000f949ffff], 0x193b0000
> > bytes flags: 0x0
> > (uefi runtime code|data|acpi:)
> > [    0.000000]  memory[0x2]     [0x000000f94a0000-0x000000f984ffff], 0x3b0000
> > bytes flags: 0x4
> > [    0.000000]  memory[0x3]     [0x000000f9850000-0x000000fe81ffff], 0x4fd0000
> > bytes flags: 0x0
> > (uefi runtime data:)
> > [    0.000000]  memory[0x4]     [0x000000fe820000-0x000000fe85ffff], 0x40000
> > bytes flags: 0x4
> > [    0.000000]  memory[0x5]     [0x000000fe860000-0x000000fe86ffff], 0x10000
> > bytes flags: 0x0
> > (uefi runtime data:)
> > [    0.000000]  memory[0x6]     [0x000000fe870000-0x000000fe8bffff], 0x50000
> > bytes flags: 0x4
> > [    0.000000]  memory[0x7]     [0x000000fe8c0000-0x000000feffffff], 0x740000
> > bytes flags: 0x0
> > [    0.000000]  memory[0x8]     [0x00000880000000-0x000009ffffffff], 0x180000000
> > bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x2
> > (uefi memory map:)
> > [    0.000000]  reserved[0x0]   [0x000000f985a000-0x000000f985afff], 0x1000
> > bytes flags: 0x0
> > (device tree:)
> > [    0.000000]  reserved[0x1]   [0x000009e0ce0000-0x000009e0ce3f3a], 0x3f3b
> > bytes flags: 0x0
> > 
> > crash kernel memblock after call to memblock_cap_memory_range():
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20440000 reserved size = 0x3f3b
> > [    0.000000]  memory.cnt  = 0x4
> > [    0.000000]  memory[0x0]     [0x000000f94a0000-0x000000f984ffff], 0x3b0000
> > bytes flags: 0x4
> > [    0.000000]  memory[0x1]     [0x000000fe820000-0x000000fe85ffff], 0x40000
> > bytes flags: 0x4
> > [    0.000000]  memory[0x2]     [0x000000fe870000-0x000000fe8bffff], 0x50000
> > bytes flags: 0x4
> > [    0.000000]  memory[0x3]     [0x000009e0000000-0x000009ffffffff], 0x20000000
> > bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x1
> > [    0.000000]  reserved[0x0]   [0x000009e0ce0000-0x000009e0ce3f3a], 0x3f3b
> > bytes flags: 0x0
> > (reserve kernel text:)
> > [    0.000000] memblock_reserve: [0x000009e0080000-0x000009e0cd0fff] flags 0x0
> > arm64_memblock_init+0x1e0/0x40c
> > (reserve elfcorehdr:)
> > [    0.000000] memblock_reserve: [0x000009fffff000-0x000009fffff3ff] flags 0x0
> > arm64_memblock_init+0x37c/0x40c
> > [    0.000000] Reserving 1KB of memory at 0x9fffff000 for elfcorehdr
> > [    0.000000] cma: Failed to reserve 16 MiB
> > ...
> > 
> > It looks like memblock_cap_memory_range() did exactly what you intended
> > 
> > This first fault seems to be because memblock_start_of_DRAM() is being dragged
> > down by the preserved nomap areas, this means there isn't any usable memory
> > within ~4G of memblock_start_of_DRAM()...
> > 
> > I will dig into the bootmem alloc failure next week...
> > 
> > (let me know if you want the full boot logs... they are more noise than signal)
> > 
> > 
> > Thanks,
> > 
> > James
> > 
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range
  2016-08-04  6:21           ` AKASHI Takahiro
@ 2016-08-09 16:22             ` James Morse
  0 siblings, 0 replies; 35+ messages in thread
From: James Morse @ 2016-08-09 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

Sorry for the late response, (I was on holiday last week...)

On 04/08/16 07:21, AKASHI Takahiro wrote:
> On Mon, Jul 25, 2016 at 02:27:55PM +0900, AKASHI Takahiro wrote:
>> On Fri, Jul 22, 2016 at 02:55:02PM +0100, James Morse wrote:
>>> On 21/07/16 01:57, AKASHI Takahiro wrote:
>>>> Could you please apply the diff attached below and confirm that
>>>> kdump works in your environment?
>>>> I can't test it by myself since my hikey board seems to be broken now.
>>>
>>> With this I get a failure even earlier, even with 'acpi=off'.
>>> With this patch, on boot of the kdump kernel I get:
>>
>> I think you ran into the problem that I mentioned in:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/444500.html
>>
>> You may want to
>> - apply the change stated above, or
>>   (Given that we don't know about the configuration of crash dump kernel,
>>   checking for CONFIG_ZONE_DMA doesn't make sense. Instead, we'd better
>>   always enforce ARCH_LOW_ADDRESS_LIMIT.)
>> - explicitly specify the start address (Y) below ARCH_LOW_ADDRESS_LIMIT
>>   at "crashkernel=X at Y"
>> Either would work.
> 
> Have these methods fixed your problem in the end?

The change you made in v23 to use ARCH_LOW_ADDRESS_LIMIT fixed the problem with
juno and acpi. I will retest it with v24 and using Seattle too...

Thanks,

James

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-08-09 16:22 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12  5:05 [PATCH v22 0/8] arm64: add kdump support AKASHI Takahiro
2016-07-12  5:05 ` [PATCH v22 1/8] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
2016-07-13  9:12   ` Suzuki K Poulose
2016-07-13 15:42     ` AKASHI Takahiro
2016-07-19  9:39   ` Dennis Chen
2016-07-19 10:28     ` AKASHI Takahiro
2016-07-19 10:41       ` Dennis Chen
2016-07-19 12:48         ` Mark Salter
2016-07-19 13:27           ` Mark Rutland
2016-07-20  2:17             ` AKASHI Takahiro
2016-07-20  3:48             ` Dennis Chen
2016-07-19 23:34           ` AKASHI Takahiro
2016-07-12  5:05 ` [PATCH v22 2/8] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
2016-07-18 18:04   ` James Morse
2016-07-19  8:35     ` AKASHI Takahiro
2016-07-19 10:06       ` Dennis Chen
2016-07-19 11:01         ` AKASHI Takahiro
2016-07-20  3:39           ` Dennis Chen
2016-07-20  4:22             ` AKASHI Takahiro
2016-07-20  4:36               ` Dennis Chen
2016-07-21  0:57     ` AKASHI Takahiro
2016-07-22 13:55       ` James Morse
2016-07-25  5:27         ` AKASHI Takahiro
2016-08-04  6:21           ` AKASHI Takahiro
2016-08-09 16:22             ` James Morse
2016-07-12  5:05 ` [PATCH v22 3/8] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
2016-07-13  9:32   ` Suzuki K Poulose
2016-07-13 16:00     ` AKASHI Takahiro
2016-07-12  5:05 ` [PATCH v22 4/8] arm64: kdump: add kdump support AKASHI Takahiro
2016-07-12  5:05 ` [PATCH v22 5/8] arm64: kdump: add VMCOREINFO's for user-space coredump tools AKASHI Takahiro
2016-07-12  5:05 ` [PATCH v22 6/8] arm64: kdump: enable kdump in the arm64 defconfig AKASHI Takahiro
2016-07-12  5:05 ` [PATCH v22 7/8] arm64: kdump: update a kernel doc AKASHI Takahiro
2016-07-12  5:05 ` [PATCH v22 8/8] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
2016-07-12 10:07   ` Mark Rutland
2016-07-13 15:14     ` AKASHI Takahiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).