linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v24 0/9] arm64: add kdump support
@ 2016-08-09  1:52 AKASHI Takahiro
  2016-08-09  1:52 ` [PATCH v24 1/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
                   ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:52 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series adds kdump support on arm64.

To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please use my kdump patches [1].

To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
  - crash utility (coming v7.1.6 or later) [2]
    (Necessary patches have already been queued in the master.)

[1] T.B.D.
[2] https://github.com/crash-utility/crash.git

Changes for v24 (Aug 9, 2016):
  o Rebase to Linux-4.8-rc1
  o Update descriptions about newly added DT proerties

Changes for v23 (July 26, 2016):

  o Move memblock_reserve() to a single place in reserve_crashkernel()
  o Use  cpu_park_loop() in ipi_cpu_crash_stop()
  o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
  o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
    (for ACPI) from usable memory at crash kernel

Changes for v22 (July 12, 2016):

  o Export "crashkernel-base" and "crashkernel-size" via device-tree,
    and add some descriptions about them in chosen.txt
  o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
    with powerpc's "usable-memory"
  o Make cosmetic changes regarding "ifdef" usage
  o Correct some wordings in kdump.txt

Changes for v21 (July 6, 2016):

  o Remove kexec patches.
  o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
  o Clarify the description about kvm in kdump.txt.

See the following link [3] for older changes:
[3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html

AKASHI Takahiro (8):
  arm64: kdump: reserve memory for crash dump kernel
  memblock: add memblock_cap_memory_range()
  arm64: limit memory regions based on DT property, usable-memory-range
  arm64: kdump: implement machine_crash_shutdown()
  arm64: kdump: add kdump support
  arm64: kdump: add VMCOREINFO's for user-space coredump tools
  arm64: kdump: enable kdump in the arm64 defconfig
  arm64: kdump: update a kernel doc

James Morse (1):
  Documentation: dt: chosen properties for arm64 kdump

 Documentation/devicetree/bindings/chosen.txt |  45 ++++++
 Documentation/kdump/kdump.txt                |  16 ++-
 arch/arm64/Kconfig                           |  11 ++
 arch/arm64/configs/defconfig                 |   1 +
 arch/arm64/include/asm/hardirq.h             |   2 +-
 arch/arm64/include/asm/kexec.h               |  41 +++++-
 arch/arm64/include/asm/smp.h                 |   2 +
 arch/arm64/kernel/Makefile                   |   1 +
 arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
 arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
 arch/arm64/kernel/setup.c                    |   7 +-
 arch/arm64/kernel/smp.c                      |  63 +++++++++
 arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
 include/linux/memblock.h                     |   1 +
 mm/memblock.c                                |  28 ++++
 15 files changed, 551 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/kernel/crash_dump.c

-- 
2.9.0


AKASHI Takahiro (8):
  arm64: kdump: reserve memory for crash dump kernel
  memblock: add memblock_cap_memory_range()
  arm64: limit memory regions based on DT property, usable-memory-range
  arm64: kdump: implement machine_crash_shutdown()
  arm64: kdump: add kdump support
  arm64: kdump: add VMCOREINFO's for user-space coredump tools
  arm64: kdump: enable kdump in the arm64 defconfig
  arm64: kdump: update a kernel doc

James Morse (1):
  Documentation: dt: chosen properties for arm64 kdump

 Documentation/devicetree/bindings/chosen.txt |  50 +++++++
 Documentation/kdump/kdump.txt                |  16 ++-
 arch/arm64/Kconfig                           |  11 ++
 arch/arm64/configs/defconfig                 |   1 +
 arch/arm64/include/asm/hardirq.h             |   2 +-
 arch/arm64/include/asm/kexec.h               |  41 +++++-
 arch/arm64/include/asm/smp.h                 |   2 +
 arch/arm64/kernel/Makefile                   |   1 +
 arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
 arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
 arch/arm64/kernel/setup.c                    |   7 +-
 arch/arm64/kernel/smp.c                      |  63 +++++++++
 arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
 include/linux/memblock.h                     |   1 +
 mm/memblock.c                                |  28 ++++
 15 files changed, 556 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/kernel/crash_dump.c

-- 
2.9.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 1/9] arm64: kdump: reserve memory for crash dump kernel
  2016-08-09  1:52 [PATCH v24 0/9] arm64: add kdump support AKASHI Takahiro
@ 2016-08-09  1:52 ` AKASHI Takahiro
  2016-08-09  1:55   ` [PATCH v24 2/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
                     ` (2 more replies)
  2016-08-09  2:04 ` [PATCH v24 0/9] arm64: add kdump support AKASHI Takahiro
  2016-08-31  3:41 ` Manish Jaggi
  2 siblings, 3 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:52 UTC (permalink / raw)
  To: linux-arm-kernel

On the startup of primary kernel, the memory region used by crash dump
kernel must be specified by "crashkernel=" kernel parameter.
reserve_crashkernel() will allocate and reserve the region for later use.

User space tools, like kexec-tools, will be able to find that region as
	- "Crash kernel" in /proc/iomem, or
	- "linux,crashkernel-base" and "linux,crashkernel-size" under
	  /sys/firmware/devicetree/base/chosen

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Reviewed-by: James Morse <james.morse@arm.com>
---
 arch/arm64/kernel/setup.c |   7 ++-
 arch/arm64/mm/init.c      | 113 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 536dce2..38eda13 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
 #include <linux/screen_info.h>
 #include <linux/init.h>
 #include <linux/kexec.h>
-#include <linux/crash_dump.h>
 #include <linux/root_dev.h>
 #include <linux/cpu.h>
 #include <linux/interrupt.h>
@@ -220,6 +219,12 @@ static void __init request_standard_resources(void)
 		    kernel_data.end <= res->end)
 			request_resource(res, &kernel_data);
 	}
+
+#ifdef CONFIG_KEXEC_CORE
+	/* User space tools will find "Crash kernel" region in /proc/iomem. */
+	if (crashk_res.end)
+		insert_resource(&iomem_resource, &crashk_res);
+#endif
 }
 
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index bbb7ee7..dd273ec 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,11 +29,13 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
+#include <linux/kexec.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -76,6 +78,114 @@ static int __init early_initrd(char *p)
 early_param("initrd", early_initrd);
 #endif
 
+#ifdef CONFIG_KEXEC_CORE
+static unsigned long long crash_size, crash_base;
+static struct property crash_base_prop = {
+	.name = "linux,crashkernel-base",
+	.length = sizeof(u64),
+	.value = &crash_base
+};
+static struct property crash_size_prop = {
+	.name = "linux,crashkernel-size",
+	.length = sizeof(u64),
+	.value = &crash_size,
+};
+
+static int __init export_crashkernel(void)
+{
+	struct device_node *node;
+	int ret;
+
+	if (!crashk_res.end)
+		return 0;
+
+	crash_base = cpu_to_be64(crashk_res.start);
+	crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
+
+	/* Add /chosen/linux,crashkernel-* properties */
+	node = of_find_node_by_path("/chosen");
+	if (!node)
+		return -ENOENT;
+
+	/*
+	 * There might be existing crash kernel properties, but we can't
+	 * be sure what's in them, so remove them.
+	 */
+	of_remove_property(node, of_find_property(node,
+				"linux,crashkernel-base", NULL));
+	of_remove_property(node, of_find_property(node,
+				"linux,crashkernel-size", NULL));
+
+	ret = of_add_property(node, &crash_base_prop);
+	if (ret)
+		goto ret_err;
+
+	ret = of_add_property(node, &crash_size_prop);
+	if (ret)
+		goto ret_err;
+
+	return 0;
+
+ret_err:
+	pr_warn("Exporting crashkernel region to device tree failed\n");
+	return ret;
+}
+late_initcall(export_crashkernel);
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+	int ret;
+
+	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+				&crash_size, &crash_base);
+	/* no crashkernel= or invalid value specified */
+	if (ret || !crash_size)
+		return;
+
+	if (crash_base == 0) {
+		/* Current arm64 boot protocol requires 2MB alignment */
+		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+				crash_size, SZ_2M);
+		if (crash_base == 0) {
+			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+				crash_size);
+			return;
+		}
+	} else {
+		/* User specifies base address explicitly. */
+		if (!memblock_is_region_memory(crash_base, crash_size) ||
+			memblock_is_region_reserved(crash_base, crash_size)) {
+			pr_warn("crashkernel has wrong address or size\n");
+			return;
+		}
+
+		if (!IS_ALIGNED(crash_base, SZ_2M)) {
+			pr_warn("crashkernel base address is not 2MB aligned\n");
+			return;
+		}
+	}
+	memblock_reserve(crash_base, crash_size);
+
+	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+		crash_size >> 20, crash_base >> 20);
+
+	crashk_res.start = crash_base;
+	crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+	;
+}
+#endif /* CONFIG_KEXEC_CORE */
+
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -296,6 +406,9 @@ void __init arm64_memblock_init(void)
 		arm64_dma_phys_limit = max_zone_dma_phys();
 	else
 		arm64_dma_phys_limit = PHYS_MASK + 1;
+
+	reserve_crashkernel();
+
 	dma_contiguous_reserve(arm64_dma_phys_limit);
 
 	memblock_allow_resize();
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 2/9] memblock: add memblock_cap_memory_range()
  2016-08-09  1:52 ` [PATCH v24 1/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
@ 2016-08-09  1:55   ` AKASHI Takahiro
  2016-08-10 16:26     ` James Morse
  2016-08-09  1:56   ` [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
  2016-08-09  1:57   ` [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
  2 siblings, 1 reply; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:55 UTC (permalink / raw)
  To: linux-arm-kernel

Crash dump kernel uses only a limited range of memory as System RAM.
On arm64 implementation, a new device tree property,
"linux,usable-memory-range," is used to notify crash dump kernel of
this range.[1]
But simply excluding all the other regions, whatever their memory types
are, doesn't work, especially, on the systems with ACPI. Since some of
such regions will be later mapped as "device memory" by ioremap()/
acpi_os_ioremap(), it can cause errors like unalignment accesses.[2]
This issue is akin to the one reported in [3].

So this patch follows Chen's approach, and implements a new function,
memblock_cap_memory_range(), which will exclude only the memory regions
that are not marked "NOMAP" from memblock.memory.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/442817.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/444165.html
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/443356.html

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 include/linux/memblock.h |  1 +
 mm/memblock.c            | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 2925da2..8002f98 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -333,6 +333,7 @@ phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
 void memblock_mem_limit_remove_map(phys_addr_t limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
 bool memblock_is_memory(phys_addr_t addr);
 int memblock_is_map_memory(phys_addr_t addr);
 int memblock_is_region_memory(phys_addr_t base, phys_addr_t size);
diff --git a/mm/memblock.c b/mm/memblock.c
index 483197e..3eae109 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1539,6 +1539,34 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 			      (phys_addr_t)ULLONG_MAX);
 }
 
+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+	int start_rgn, end_rgn;
+	int i, ret;
+
+	if (!size)
+		return;
+
+	ret = memblock_isolate_range(&memblock.memory, base, size,
+						&start_rgn, &end_rgn);
+	if (ret)
+		return;
+
+	/* remove all the MAP regions */
+	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	for (i = start_rgn - 1; i >= 0; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	/* truncate the reserved regions */
+	memblock_remove_range(&memblock.reserved, 0, base);
+	memblock_remove_range(&memblock.reserved,
+			base + size, (phys_addr_t)ULLONG_MAX);
+}
+
 static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
 {
 	unsigned int left = 0, right = type->cnt;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range
  2016-08-09  1:52 ` [PATCH v24 1/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
  2016-08-09  1:55   ` [PATCH v24 2/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
@ 2016-08-09  1:56   ` AKASHI Takahiro
  2016-08-09  1:56     ` [PATCH v24 4/9] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
                       ` (4 more replies)
  2016-08-09  1:57   ` [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
  2 siblings, 5 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:56 UTC (permalink / raw)
  To: linux-arm-kernel

Crash dump kernel will be run with a limited range of memory as System
RAM.

On arm64, we will use a device-tree property under /chosen,
   linux,usable-memory-range = <BASE SIZE>
in order for primary kernel either on uefi or non-uefi (device tree only)
system to hand over the information about usable memory region to crash
dump kernel. This property will supercede entries in uefi memory map table
and "memory" nodes in a device tree.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Geoff Levand <geoff@infradead.org>
---
 arch/arm64/mm/init.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index dd273ec..e3771c4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -297,10 +297,45 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static int __init early_init_dt_scan_usablemem(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	struct memblock_region *usablemem = (struct memblock_region *)data;
+	const __be32 *reg;
+	int len;
+
+	usablemem->size = 0;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+static void __init fdt_enforce_memory_region(void)
+{
+	struct memblock_region reg;
+
+	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+
+	if (reg.size)
+		memblock_cap_memory_range(reg.base, reg.size);
+}
+
 void __init arm64_memblock_init(void)
 {
 	const s64 linear_region_size = -(s64)PAGE_OFFSET;
 
+	/* Handle linux,usable-memory-range property */
+	fdt_enforce_memory_region();
+
 	/*
 	 * Ensure that the linear region takes up exactly half of the kernel
 	 * virtual address space. This way, we can distinguish a linear address
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 4/9] arm64: kdump: implement machine_crash_shutdown()
  2016-08-09  1:56   ` [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
@ 2016-08-09  1:56     ` AKASHI Takahiro
  2016-08-09  1:56     ` [PATCH v24 5/9] arm64: kdump: add kdump support AKASHI Takahiro
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:56 UTC (permalink / raw)
  To: linux-arm-kernel

Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.

As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/include/asm/hardirq.h  |  2 +-
 arch/arm64/include/asm/kexec.h    | 41 ++++++++++++++++++++++++-
 arch/arm64/include/asm/smp.h      |  2 ++
 arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
 arch/arm64/kernel/smp.c           | 63 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 159 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297..1473fc2 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
 #include <linux/threads.h>
 #include <asm/irq.h>
 
-#define NR_IPI	6
+#define NR_IPI	7
 
 typedef struct {
 	unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc..a908958 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,46 @@
 static inline void crash_setup_regs(struct pt_regs *newregs,
 				    struct pt_regs *oldregs)
 {
-	/* Empty routine needed to avoid build errors. */
+	if (oldregs) {
+		memcpy(newregs, oldregs, sizeof(*newregs));
+	} else {
+		u64 tmp1, tmp2;
+
+		__asm__ __volatile__ (
+			"stp	 x0,   x1, [%2, #16 *  0]\n"
+			"stp	 x2,   x3, [%2, #16 *  1]\n"
+			"stp	 x4,   x5, [%2, #16 *  2]\n"
+			"stp	 x6,   x7, [%2, #16 *  3]\n"
+			"stp	 x8,   x9, [%2, #16 *  4]\n"
+			"stp	x10,  x11, [%2, #16 *  5]\n"
+			"stp	x12,  x13, [%2, #16 *  6]\n"
+			"stp	x14,  x15, [%2, #16 *  7]\n"
+			"stp	x16,  x17, [%2, #16 *  8]\n"
+			"stp	x18,  x19, [%2, #16 *  9]\n"
+			"stp	x20,  x21, [%2, #16 * 10]\n"
+			"stp	x22,  x23, [%2, #16 * 11]\n"
+			"stp	x24,  x25, [%2, #16 * 12]\n"
+			"stp	x26,  x27, [%2, #16 * 13]\n"
+			"stp	x28,  x29, [%2, #16 * 14]\n"
+			"mov	 %0,  sp\n"
+			"stp	x30,  %0,  [%2, #16 * 15]\n"
+
+			"/* faked current PSTATE */\n"
+			"mrs	 %0, CurrentEL\n"
+			"mrs	 %1, DAIF\n"
+			"orr	 %0, %0, %1\n"
+			"mrs	 %1, NZCV\n"
+			"orr	 %0, %0, %1\n"
+
+			/* pc */
+			"adr	 %1, 1f\n"
+		"1:\n"
+			"stp	 %1, %0,   [%2, #16 * 16]\n"
+			: "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
+			:
+			: "memory"
+		);
+	}
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 0226447..6b0f2c7 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -136,6 +136,8 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
+extern void smp_send_crash_stop(void);
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a..8ac9dba8 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
 #include <linux/kexec.h>
 #include <linux/smp.h>
 
@@ -22,6 +25,7 @@
 extern const unsigned char arm64_relocate_new_kernel[];
 extern const unsigned long arm64_relocate_new_kernel_size;
 
+bool in_crash_kexec;
 static unsigned long kimage_start;
 
 /**
@@ -148,7 +152,8 @@ void machine_kexec(struct kimage *kimage)
 	/*
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
 	 */
-	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+	BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
+			!WARN_ON(in_crash_kexec));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
 	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -200,13 +205,58 @@ void machine_kexec(struct kimage *kimage)
 	 * relocation is complete.
 	 */
 
-	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
+	cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
 		kimage_start, 0);
 
 	BUG(); /* Should never get here. */
 }
 
+static void machine_kexec_mask_interrupts(void)
+{
+	unsigned int i;
+	struct irq_desc *desc;
+
+	for_each_irq_desc(i, desc) {
+		struct irq_chip *chip;
+		int ret;
+
+		chip = irq_desc_get_chip(desc);
+		if (!chip)
+			continue;
+
+		/*
+		 * First try to remove the active state. If this
+		 * fails, try to EOI the interrupt.
+		 */
+		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+		    chip->irq_eoi)
+			chip->irq_eoi(&desc->irq_data);
+
+		if (chip->irq_mask)
+			chip->irq_mask(&desc->irq_data);
+
+		if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+			chip->irq_disable(&desc->irq_data);
+	}
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-	/* Empty routine needed to avoid build errors. */
+	local_irq_disable();
+
+	in_crash_kexec = true;
+
+	/* shutdown non-crashing cpus */
+	smp_send_crash_stop();
+
+	/* for crashing cpu */
+	crash_save_cpu(regs, smp_processor_id());
+	machine_kexec_mask_interrupts();
+
+	pr_info("Starting crashdump kernel...\n");
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 76a6d92..2d29030 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
 #include <linux/completion.h>
 #include <linux/of.h>
 #include <linux/irq_work.h>
+#include <linux/kexec.h>
 
 #include <asm/alternative.h>
 #include <asm/atomic.h>
@@ -71,6 +72,7 @@ enum ipi_msg_type {
 	IPI_RESCHEDULE,
 	IPI_CALL_FUNC,
 	IPI_CPU_STOP,
+	IPI_CPU_CRASH_STOP,
 	IPI_TIMER,
 	IPI_IRQ_WORK,
 	IPI_WAKEUP
@@ -734,6 +736,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
 	S(IPI_RESCHEDULE, "Rescheduling interrupts"),
 	S(IPI_CALL_FUNC, "Function call interrupts"),
 	S(IPI_CPU_STOP, "CPU stop interrupts"),
+	S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
 	S(IPI_TIMER, "Timer broadcast interrupts"),
 	S(IPI_IRQ_WORK, "IRQ work interrupts"),
 	S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -808,6 +811,29 @@ static void ipi_cpu_stop(unsigned int cpu)
 		cpu_relax();
 }
 
+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi;
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+	crash_save_cpu(regs, cpu);
+
+	atomic_dec(&waiting_for_crash_ipi);
+
+	local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+	if (cpu_ops[cpu]->cpu_die)
+		cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+	/* just in case */
+	cpu_park_loop();
+#endif
+}
+
 /*
  * Main handler for inter-processor interrupts
  */
@@ -838,6 +864,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		irq_exit();
 		break;
 
+	case IPI_CPU_CRASH_STOP:
+		if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+			irq_enter();
+			ipi_cpu_crash_stop(cpu, regs);
+
+			unreachable();
+		}
+		break;
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 	case IPI_TIMER:
 		irq_enter();
@@ -910,6 +945,34 @@ void smp_send_stop(void)
 			   cpumask_pr_args(cpu_online_mask));
 }
 
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+	cpumask_t mask;
+	unsigned long timeout;
+
+	if (num_online_cpus() == 1)
+		return;
+
+	cpumask_copy(&mask, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), &mask);
+
+	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+	pr_crit("SMP: stopping secondary CPUs\n");
+	smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+	/* Wait up to one second for other CPUs to stop */
+	timeout = USEC_PER_SEC;
+	while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+		udelay(1);
+
+	if (atomic_read(&waiting_for_crash_ipi) > 0)
+		pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+			   cpumask_pr_args(cpu_online_mask));
+}
+#endif
+
 /*
  * not supported here
  */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-09  1:56   ` [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
  2016-08-09  1:56     ` [PATCH v24 4/9] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
@ 2016-08-09  1:56     ` AKASHI Takahiro
  2016-08-10 16:38       ` James Morse
  2016-08-24 14:44       ` Ard Biesheuvel
  2016-08-09  1:56     ` [PATCH v24 6/9] arm64: kdump: add VMCOREINFO's for user-space coredump tools AKASHI Takahiro
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:56 UTC (permalink / raw)
  To: linux-arm-kernel

On crash dump kernel, all the information about primary kernel's system
memory (core image) is available in elf core header.
The primary kernel will set aside this header with reserve_elfcorehdr()
at boot time and inform crash dump kernel of its location via a new
device-tree property, "linux,elfcorehdr".

Please note that all other architectures use traditional "elfcorehdr="
kernel parameter for this purpose.

Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which reads one page by ioremap'ing it since it does
not reside in linear mapping on crash dump kernel.

We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/Kconfig             | 11 +++++++
 arch/arm64/kernel/Makefile     |  1 +
 arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/mm/init.c           | 54 ++++++++++++++++++++++++++++++++
 4 files changed, 137 insertions(+)
 create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 69c8787..9ede54b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -682,6 +682,17 @@ config KEXEC
 	  but it is independent of the system firmware.   And like a reboot
 	  you can start any kernel with it, not just Linux.
 
+config CRASH_DUMP
+	bool "Build kdump crash kernel"
+	help
+	  Generate crash dump after being started by kexec. This should
+	  be normally only set in special crash dump kernels which are
+	  loaded in the main kernel with kexec-tools into a specially
+	  reserved region and then later executed after a crash by
+	  kdump/kexec.
+
+	  For more details see Documentation/kdump/kdump.txt
+
 config XEN_DOM0
 	def_bool y
 	depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 14f7b65..f1cbfc8 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,6 +48,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
 
 obj-y					+= $(arm64-obj-y) vdso/ probes/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 0000000..2dc54d1
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+			 size_t csize, unsigned long offset,
+			 int userbuf)
+{
+	void *vaddr;
+
+	if (!csize)
+		return 0;
+
+	vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
+	if (!vaddr)
+		return -ENOMEM;
+
+	if (userbuf) {
+		if (copy_to_user(buf, vaddr + offset, csize)) {
+			iounmap(vaddr);
+			return -EFAULT;
+		}
+	} else {
+		memcpy(buf, vaddr + offset, csize);
+	}
+
+	iounmap(vaddr);
+
+	return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+	memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+	return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e3771c4..bba1e39 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 #include <linux/kexec.h>
+#include <linux/crash_dump.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -186,6 +187,57 @@ static void __init reserve_crashkernel(void)
 }
 #endif /* CONFIG_KEXEC_CORE */
 
+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	const __be32 *reg;
+	int len;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves elf core header given in "elfcorehdr=" kernel
+ * command line parameter. This region contains all the information about
+ * primary kernel's core image and is used by a dump capture kernel to
+ * access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
+	if (!elfcorehdr_size)
+		return;
+
+	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+		pr_warn("elfcorehdr is overlapped\n");
+		return;
+	}
+
+	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
+
+	pr_info("Reserving %lldKB of memory@0x%llx for elfcorehdr\n",
+		elfcorehdr_size >> 10, elfcorehdr_addr);
+}
+#else
+static void __init reserve_elfcorehdr(void)
+{
+	;
+}
+#endif /* CONFIG_CRASH_DUMP */
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -444,6 +496,8 @@ void __init arm64_memblock_init(void)
 
 	reserve_crashkernel();
 
+	reserve_elfcorehdr();
+
 	dma_contiguous_reserve(arm64_dma_phys_limit);
 
 	memblock_allow_resize();
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 6/9] arm64: kdump: add VMCOREINFO's for user-space coredump tools
  2016-08-09  1:56   ` [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
  2016-08-09  1:56     ` [PATCH v24 4/9] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
  2016-08-09  1:56     ` [PATCH v24 5/9] arm64: kdump: add kdump support AKASHI Takahiro
@ 2016-08-09  1:56     ` AKASHI Takahiro
  2016-08-09  1:56     ` [PATCH v24 7/9] arm64: kdump: enable kdump in the arm64 defconfig AKASHI Takahiro
  2016-08-09  1:56     ` [PATCH v24 8/9] arm64: kdump: update a kernel doc AKASHI Takahiro
  4 siblings, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:56 UTC (permalink / raw)
  To: linux-arm-kernel

For the current crash utility, we need to know, at least,
  - kimage_voffset
  - PHYS_OFFSET
to handle the contents of core dump file (/proc/vmcore) correctly due to
the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.
This patch puts them as VMCOREINFO's into the file.

  - VA_BITS
is also added for makedumpfile command.
More VMCOREINFO's may be added later.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/kernel/machine_kexec.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 8ac9dba8..38b4411 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -17,6 +17,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
+#include <asm/memory.h>
 #include <asm/mmu_context.h>
 
 #include "cpu-reset.h"
@@ -260,3 +261,13 @@ void machine_crash_shutdown(struct pt_regs *regs)
 
 	pr_info("Starting crashdump kernel...\n");
 }
+
+void arch_crash_save_vmcoreinfo(void)
+{
+	VMCOREINFO_NUMBER(VA_BITS);
+	/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
+	vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
+						kimage_voffset);
+	vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
+						PHYS_OFFSET);
+}
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 7/9] arm64: kdump: enable kdump in the arm64 defconfig
  2016-08-09  1:56   ` [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
                       ` (2 preceding siblings ...)
  2016-08-09  1:56     ` [PATCH v24 6/9] arm64: kdump: add VMCOREINFO's for user-space coredump tools AKASHI Takahiro
@ 2016-08-09  1:56     ` AKASHI Takahiro
  2016-08-09  1:56     ` [PATCH v24 8/9] arm64: kdump: update a kernel doc AKASHI Takahiro
  4 siblings, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:56 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 0555b7c..4c4d7b5 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -73,6 +73,7 @@ CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_CMA=y
 CONFIG_XEN=y
 CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_COMPAT=y
 CONFIG_CPU_IDLE=y
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 8/9] arm64: kdump: update a kernel doc
  2016-08-09  1:56   ` [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
                       ` (3 preceding siblings ...)
  2016-08-09  1:56     ` [PATCH v24 7/9] arm64: kdump: enable kdump in the arm64 defconfig AKASHI Takahiro
@ 2016-08-09  1:56     ` AKASHI Takahiro
  4 siblings, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:56 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds arch specific descriptions about kdump usage on arm64
to kdump.txt.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
---
 Documentation/kdump/kdump.txt | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 88ff63d..c090531 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
 a remote system.
 
 Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.
 
 When the system kernel boots, it reserves a small section of memory for
 the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)
 
     AUTO_ZRELADDR=y
 
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+  on non-VHE systems even if it is configured. This is because the CPU
+  cannot be reset to EL2 on panic.
+
 Extended crashkernel syntax
 ===========================
 
@@ -305,6 +312,8 @@ Boot into System Kernel
    kernel will automatically locate the crash kernel image within the
    first 512MB of RAM if X is not given.
 
+   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
 
 Load the Dump-capture Kernel
 ============================
@@ -327,6 +336,8 @@ For s390x:
 	- Use image or bzImage
 For arm:
 	- Use zImage
+For arm64:
+	- Use vmlinux or Image
 
 If you are using a uncompressed vmlinux image then use following command
 to load dump-capture kernel.
@@ -370,6 +381,9 @@ For s390x:
 For arm:
 	"1 maxcpus=1 reset_devices"
 
+For arm64:
+	"1 maxcpus=1 reset_devices"
+
 Notes on loading the dump-capture kernel:
 
 * By default, the ELF headers are stored in ELF64 format to support
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-08-09  1:52 ` [PATCH v24 1/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
  2016-08-09  1:55   ` [PATCH v24 2/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
  2016-08-09  1:56   ` [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
@ 2016-08-09  1:57   ` AKASHI Takahiro
  2016-08-19 13:26     ` Rob Herring
  2 siblings, 1 reply; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  1:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: James Morse <james.morse@arm.com>

Add documentation for
	linux,crashkernel-base and crashkernel-size,
	linux,usable-memory-range, and
	linux,elfcorehdr
used by arm64 kexec/kdump to decribe the kdump reserved area, and
the elfcorehdr's location within it.

Signed-off-by: James Morse <james.morse@arm.com>
[takahiro.akashi at linaro.org:
    renamed "usable-memory" to "usable-memory-range",
    added "linux,crashkernel-base" and "-size" ]
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82..236188a 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
 book3e) by some versions of kexec-tools to tell the new kernel that it
 is being booted by kexec, as the booting environment may differ (e.g.
 a different secondary CPU release mechanism)
+
+linux,crashkernel-base
+linux,crashkernel-size
+----------------------
+These properties (currently used on PowerPC and arm64) indicates
+the base address and the size, respectively, of the reserved memory
+range for crash dump kernel.
+e.g.
+
+/ {
+	chosen {
+		linux,crashkernel-base = <0x9 0xf0000000>;
+		linux,crashkernel-size = <0x0 0x10000000>;
+	};
+};
+
+linux,usable-memory-range
+-------------------------
+
+This property (currently used only on arm64) holds the memory range,
+the base address and the size, which can be used as system ram on
+the *current* kernel. Note that, if this property is present, any memory
+regions under "memory" nodes in DT blob or ones marked as "conventional
+memory" in EFI memory map should be ignored.
+e.g.
+
+/ {
+	chosen {
+		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+	};
+};
+
+The main usage is for crash dump kernel to identify its own usable
+memory and exclude, at its boot time, any other memory areas that are
+part of the panicked kernel's memory.
+
+linux,elfcorehdr
+----------------
+
+This property (currently used only on arm64) holds the memory range,
+the address and the size, of the elf core header which mainly describes
+the panicked kernel's memory layout in elf format.
+e.g.
+
+/ {
+	chosen {
+		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+		linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+	};
+};
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-08-09  1:52 [PATCH v24 0/9] arm64: add kdump support AKASHI Takahiro
  2016-08-09  1:52 ` [PATCH v24 1/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
@ 2016-08-09  2:04 ` AKASHI Takahiro
  2016-08-31  3:41 ` Manish Jaggi
  2 siblings, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-09  2:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 09, 2016 at 10:52:47AM +0900, AKASHI Takahiro wrote:
> This patch series adds kdump support on arm64.
> 
> To load a crash-dump kernel to the systems, a series of patches to
> kexec-tools, which have not yet been merged upstream, are needed.
> Please use my kdump patches [1].
> 
> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
>   - crash utility (coming v7.1.6 or later) [2]
>     (Necessary patches have already been queued in the master.)
> 
> [1] T.B.D.

http://lists.infradead.org/pipermail/kexec/2016-August/016779.html

-Takahiro AKASHI

> [2] https://github.com/crash-utility/crash.git
> 
> Changes for v24 (Aug 9, 2016):
>   o Rebase to Linux-4.8-rc1
>   o Update descriptions about newly added DT proerties
> 
> Changes for v23 (July 26, 2016):
> 
>   o Move memblock_reserve() to a single place in reserve_crashkernel()
>   o Use  cpu_park_loop() in ipi_cpu_crash_stop()
>   o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
>   o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
>     (for ACPI) from usable memory at crash kernel
> 
> Changes for v22 (July 12, 2016):
> 
>   o Export "crashkernel-base" and "crashkernel-size" via device-tree,
>     and add some descriptions about them in chosen.txt
>   o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
>     with powerpc's "usable-memory"
>   o Make cosmetic changes regarding "ifdef" usage
>   o Correct some wordings in kdump.txt
> 
> Changes for v21 (July 6, 2016):
> 
>   o Remove kexec patches.
>   o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
>   o Clarify the description about kvm in kdump.txt.
> 
> See the following link [3] for older changes:
> [3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
> 
> AKASHI Takahiro (8):
>   arm64: kdump: reserve memory for crash dump kernel
>   memblock: add memblock_cap_memory_range()
>   arm64: limit memory regions based on DT property, usable-memory-range
>   arm64: kdump: implement machine_crash_shutdown()
>   arm64: kdump: add kdump support
>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>   arm64: kdump: enable kdump in the arm64 defconfig
>   arm64: kdump: update a kernel doc
> 
> James Morse (1):
>   Documentation: dt: chosen properties for arm64 kdump
> 
>  Documentation/devicetree/bindings/chosen.txt |  45 ++++++
>  Documentation/kdump/kdump.txt                |  16 ++-
>  arch/arm64/Kconfig                           |  11 ++
>  arch/arm64/configs/defconfig                 |   1 +
>  arch/arm64/include/asm/hardirq.h             |   2 +-
>  arch/arm64/include/asm/kexec.h               |  41 +++++-
>  arch/arm64/include/asm/smp.h                 |   2 +
>  arch/arm64/kernel/Makefile                   |   1 +
>  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
>  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
>  arch/arm64/kernel/setup.c                    |   7 +-
>  arch/arm64/kernel/smp.c                      |  63 +++++++++
>  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
>  include/linux/memblock.h                     |   1 +
>  mm/memblock.c                                |  28 ++++
>  15 files changed, 551 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm64/kernel/crash_dump.c
> 
> -- 
> 2.9.0
> 
> 
> AKASHI Takahiro (8):
>   arm64: kdump: reserve memory for crash dump kernel
>   memblock: add memblock_cap_memory_range()
>   arm64: limit memory regions based on DT property, usable-memory-range
>   arm64: kdump: implement machine_crash_shutdown()
>   arm64: kdump: add kdump support
>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>   arm64: kdump: enable kdump in the arm64 defconfig
>   arm64: kdump: update a kernel doc
> 
> James Morse (1):
>   Documentation: dt: chosen properties for arm64 kdump
> 
>  Documentation/devicetree/bindings/chosen.txt |  50 +++++++
>  Documentation/kdump/kdump.txt                |  16 ++-
>  arch/arm64/Kconfig                           |  11 ++
>  arch/arm64/configs/defconfig                 |   1 +
>  arch/arm64/include/asm/hardirq.h             |   2 +-
>  arch/arm64/include/asm/kexec.h               |  41 +++++-
>  arch/arm64/include/asm/smp.h                 |   2 +
>  arch/arm64/kernel/Makefile                   |   1 +
>  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
>  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
>  arch/arm64/kernel/setup.c                    |   7 +-
>  arch/arm64/kernel/smp.c                      |  63 +++++++++
>  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
>  include/linux/memblock.h                     |   1 +
>  mm/memblock.c                                |  28 ++++
>  15 files changed, 556 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm64/kernel/crash_dump.c
> 
> -- 
> 2.9.0
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 2/9] memblock: add memblock_cap_memory_range()
  2016-08-09  1:55   ` [PATCH v24 2/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
@ 2016-08-10 16:26     ` James Morse
  0 siblings, 0 replies; 52+ messages in thread
From: James Morse @ 2016-08-10 16:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 09/08/16 02:55, AKASHI Takahiro wrote:
> Crash dump kernel uses only a limited range of memory as System RAM.
> On arm64 implementation, a new device tree property,
> "linux,usable-memory-range," is used to notify crash dump kernel of
> this range.[1]
> But simply excluding all the other regions, whatever their memory types
> are, doesn't work, especially, on the systems with ACPI. Since some of
> such regions will be later mapped as "device memory" by ioremap()/
> acpi_os_ioremap(), it can cause errors like unalignment accesses.[2]
> This issue is akin to the one reported in [3].
> 
> So this patch follows Chen's approach, and implements a new function,
> memblock_cap_memory_range(), which will exclude only the memory regions
> that are not marked "NOMAP" from memblock.memory.

This (and the next patch) fixes the acpi related unaligned access problem I had.
I've tested it on a Juno r1 and Seattle B0.

Tested-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-09  1:56     ` [PATCH v24 5/9] arm64: kdump: add kdump support AKASHI Takahiro
@ 2016-08-10 16:38       ` James Morse
  2016-08-10 18:18         ` Pratyush Anand
  2016-08-18  7:15         ` [PATCH v24 5/9] " AKASHI Takahiro
  2016-08-24 14:44       ` Ard Biesheuvel
  1 sibling, 2 replies; 52+ messages in thread
From: James Morse @ 2016-08-10 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 09/08/16 02:56, AKASHI Takahiro wrote:
> On crash dump kernel, all the information about primary kernel's system
> memory (core image) is available in elf core header.
> The primary kernel will set aside this header with reserve_elfcorehdr()
> at boot time and inform crash dump kernel of its location via a new
> device-tree property, "linux,elfcorehdr".
> 
> Please note that all other architectures use traditional "elfcorehdr="
> kernel parameter for this purpose.
> 
> Then crash dump kernel will access the primary kernel's memory with
> copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> not reside in linear mapping on crash dump kernel.
> 
> We also need our own elfcorehdr_read() here since the header is placed
> within crash dump kernel's usable memory.

On Seattle when I panic and boot the kdump kernel, I am unable to read the
/proc/vmcore file. Instead I get:
nanook at frikadeller:~$ sudo cp /proc/vmcore /
[  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
0xffffff80096b6000
[  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
[  174.407370] Modules linked in:
[  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
[  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
ROD1002C 04/08/2016
[  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
[  174.432313] PC is at __arch_copy_to_user+0x180/0x280
[  174.437274] LR is at copy_oldmem_page+0xac/0xf0
[  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
[  174.449173] sp : ffffffc0f34bfc90
[  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
[  174.457776] x27: 0000000008000000 x26: 000000000000d000
[  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
[  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
[  174.473679] x21: 0000000000000001 x20: 0000000030127000
[  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
[  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
[  174.489579] x15: 0000007fae377590 x14: 0000000000000000
[  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
[  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
[  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
[  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
[  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
[  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
[  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
[  174.531982]
[  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)

[  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
[  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
[  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
[  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
[  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
[  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
[  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
[  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
[  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
[  174.897562] ---[ end trace 00801b2e35b0cd1f ]---


The offending call is:
> copy_oldmem_page(0x8000000, 0x00000000385f8000, 0x1000, 0, 1)

This is trying to access the bottom page of memory. From the efi memory map:
> efi:   0x008000000000-0x008001e7ffff [Runtime Data       |RUN|  |WB|WT|WC|UC]*
> efi:   0x008001e80000-0x008001ffffff [Conventional Memory|   |  |WB|WT|WC|UC]

This page is 'Runtime Data', and marked as nomap by both the original and kdump
kernels, but copy_oldmem_page() doesn't know this.

In this case because we have already parsed the efi memory map again in the
kdump kernel and re-marked these regions as nomap, the below hunk fixes the
problem for me:
=========================%<=========================
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
index 2dc54d129be1..784d4c30b534 100644
--- a/arch/arm64/kernel/crash_dump.c
+++ b/arch/arm64/kernel/crash_dump.c
@@ -37,6 +37,11 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
        if (!csize)
                return 0;

+       if (memblock_is_memory(pfn << PAGE_SHIFT) &&
+           !memblock_is_map_memory(pfn << PAGE_SHIFT))
+               /* skip this nomap memory region, reserved by firmware */
+               return 0;
+
        vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
        if (!vaddr)
                return -ENOMEM;
=========================%<=========================

With this I can copy the vmcore file, and feed it to crash to read dmesg, task
list etc...

This could be a deeper/wider issue, but I can't see any other users of
memblock_mark_nomap().
Do you think depending on this this 're-learning' is robust enough, or should
the nomap ranges be described in the vmcoreinfo elf notes?


Thanks,

James


> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> new file mode 100644
> index 0000000..2dc54d1
> --- /dev/null
> +++ b/arch/arm64/kernel/crash_dump.c
> @@ -0,0 +1,71 @@
> +/*
> + * Routines for doing kexec-based kdump
> + *
> + * Copyright (C) 2014 Linaro Limited
> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/crash_dump.h>
> +#include <linux/errno.h>
> +#include <linux/io.h>
> +#include <linux/memblock.h>
> +#include <linux/uaccess.h>
> +#include <asm/memory.h>
> +
> +/**
> + * copy_oldmem_page() - copy one page from old kernel memory
> + * @pfn: page frame number to be copied
> + * @buf: buffer where the copied page is placed
> + * @csize: number of bytes to copy
> + * @offset: offset in bytes into the page
> + * @userbuf: if set, @buf is in a user address space
> + *
> + * This function copies one page from old kernel memory into buffer pointed by
> + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> + * copied or negative error in case of failure.
> + */
> +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> +			 size_t csize, unsigned long offset,
> +			 int userbuf)
> +{
> +	void *vaddr;
> +
> +	if (!csize)
> +		return 0;
> +
> +	vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> +	if (!vaddr)
> +		return -ENOMEM;
> +
> +	if (userbuf) {
> +		if (copy_to_user(buf, vaddr + offset, csize)) {
> +			iounmap(vaddr);
> +			return -EFAULT;
> +		}
> +	} else {
> +		memcpy(buf, vaddr + offset, csize);
> +	}
> +
> +	iounmap(vaddr);
> +
> +	return csize;
> +}

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-10 16:38       ` James Morse
@ 2016-08-10 18:18         ` Pratyush Anand
  2016-08-11 10:03           ` Pratyush Anand
  2016-08-18  7:15         ` [PATCH v24 5/9] " AKASHI Takahiro
  1 sibling, 1 reply; 52+ messages in thread
From: Pratyush Anand @ 2016-08-10 18:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/08/2016:05:38:05 PM, James Morse wrote:
> Hi Akashi,
> 
> On 09/08/16 02:56, AKASHI Takahiro wrote:
> > On crash dump kernel, all the information about primary kernel's system
> > memory (core image) is available in elf core header.
> > The primary kernel will set aside this header with reserve_elfcorehdr()
> > at boot time and inform crash dump kernel of its location via a new
> > device-tree property, "linux,elfcorehdr".
> > 
> > Please note that all other architectures use traditional "elfcorehdr="
> > kernel parameter for this purpose.
> > 
> > Then crash dump kernel will access the primary kernel's memory with
> > copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> > not reside in linear mapping on crash dump kernel.
> > 
> > We also need our own elfcorehdr_read() here since the header is placed
> > within crash dump kernel's usable memory.
> 
> On Seattle when I panic and boot the kdump kernel, I am unable to read the
> /proc/vmcore file. Instead I get:
> nanook at frikadeller:~$ sudo cp /proc/vmcore /
> [  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
> 0xffffff80096b6000

Yes, I see the same while executing vmcore-dmesg or copying vmcore.

> [  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
> [  174.407370] Modules linked in:
> [  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
> [  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
> ROD1002C 04/08/2016
> [  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
> [  174.432313] PC is at __arch_copy_to_user+0x180/0x280
> [  174.437274] LR is at copy_oldmem_page+0xac/0xf0
> [  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
> [  174.449173] sp : ffffffc0f34bfc90
> [  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
> [  174.457776] x27: 0000000008000000 x26: 000000000000d000
> [  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
> [  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
> [  174.473679] x21: 0000000000000001 x20: 0000000030127000
> [  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
> [  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
> [  174.489579] x15: 0000007fae377590 x14: 0000000000000000
> [  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
> [  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
> [  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
> [  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
> [  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
> [  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
> [  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
> [  174.531982]
> [  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)
> 
> [  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
> [  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
> [  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
> [  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
> [  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
> [  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
> [  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
> [  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
> [  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
> [  174.897562] ---[ end trace 00801b2e35b0cd1f ]---
> 
> 
> The offending call is:
> > copy_oldmem_page(0x8000000, 0x00000000385f8000, 0x1000, 0, 1)
> 
> This is trying to access the bottom page of memory. From the efi memory map:
> > efi:   0x008000000000-0x008001e7ffff [Runtime Data       |RUN|  |WB|WT|WC|UC]*
> > efi:   0x008001e80000-0x008001ffffff [Conventional Memory|   |  |WB|WT|WC|UC]
> 
> This page is 'Runtime Data', and marked as nomap by both the original and kdump
> kernels, but copy_oldmem_page() doesn't know this.
> 
> In this case because we have already parsed the efi memory map again in the
> kdump kernel and re-marked these regions as nomap, the below hunk fixes the
> problem for me:
> =========================%<=========================
> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> index 2dc54d129be1..784d4c30b534 100644
> --- a/arch/arm64/kernel/crash_dump.c
> +++ b/arch/arm64/kernel/crash_dump.c
> @@ -37,6 +37,11 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>         if (!csize)
>                 return 0;
> 
> +       if (memblock_is_memory(pfn << PAGE_SHIFT) &&
> +           !memblock_is_map_memory(pfn << PAGE_SHIFT))
> +               /* skip this nomap memory region, reserved by firmware */
> +               return 0;
> +
>         vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
>         if (!vaddr)
>                 return -ENOMEM;
> =========================%<=========================

In any case kernel must not panic, so I think we must have above hunk. However,
we also need to look into kexec-tools that why it is asking kernel to copy those
unneeded chunks.

I will test tomorrow with above hunk.

~Pratyush

> 
> With this I can copy the vmcore file, and feed it to crash to read dmesg, task
> list etc...
> 
> This could be a deeper/wider issue, but I can't see any other users of
> memblock_mark_nomap().
> Do you think depending on this this 're-learning' is robust enough, or should
> the nomap ranges be described in the vmcoreinfo elf notes?

> 
> 
> Thanks,
> 
> James
> 
> 
> > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > new file mode 100644
> > index 0000000..2dc54d1
> > --- /dev/null
> > +++ b/arch/arm64/kernel/crash_dump.c
> > @@ -0,0 +1,71 @@
> > +/*
> > + * Routines for doing kexec-based kdump
> > + *
> > + * Copyright (C) 2014 Linaro Limited
> > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#include <linux/crash_dump.h>
> > +#include <linux/errno.h>
> > +#include <linux/io.h>
> > +#include <linux/memblock.h>
> > +#include <linux/uaccess.h>
> > +#include <asm/memory.h>
> > +
> > +/**
> > + * copy_oldmem_page() - copy one page from old kernel memory
> > + * @pfn: page frame number to be copied
> > + * @buf: buffer where the copied page is placed
> > + * @csize: number of bytes to copy
> > + * @offset: offset in bytes into the page
> > + * @userbuf: if set, @buf is in a user address space
> > + *
> > + * This function copies one page from old kernel memory into buffer pointed by
> > + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> > + * copied or negative error in case of failure.
> > + */
> > +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> > +			 size_t csize, unsigned long offset,
> > +			 int userbuf)
> > +{
> > +	void *vaddr;
> > +
> > +	if (!csize)
> > +		return 0;
> > +
> > +	vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> > +	if (!vaddr)
> > +		return -ENOMEM;
> > +
> > +	if (userbuf) {
> > +		if (copy_to_user(buf, vaddr + offset, csize)) {
> > +			iounmap(vaddr);
> > +			return -EFAULT;
> > +		}
> > +	} else {
> > +		memcpy(buf, vaddr + offset, csize);
> > +	}
> > +
> > +	iounmap(vaddr);
> > +
> > +	return csize;
> > +}
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-10 18:18         ` Pratyush Anand
@ 2016-08-11 10:03           ` Pratyush Anand
  2016-08-16 10:13             ` James Morse
  0 siblings, 1 reply; 52+ messages in thread
From: Pratyush Anand @ 2016-08-11 10:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/08/2016:11:48:27 PM, Pratyush Anand wrote:
> On 10/08/2016:05:38:05 PM, James Morse wrote:
> > =========================%<=========================
> > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > index 2dc54d129be1..784d4c30b534 100644
> > --- a/arch/arm64/kernel/crash_dump.c
> > +++ b/arch/arm64/kernel/crash_dump.c
> > @@ -37,6 +37,11 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> >         if (!csize)
> >                 return 0;
> > 
> > +       if (memblock_is_memory(pfn << PAGE_SHIFT) &&
> > +           !memblock_is_map_memory(pfn << PAGE_SHIFT))
> > +               /* skip this nomap memory region, reserved by firmware */
> > +               return 0;

This should return 0 or -EINVAL? because, its caller does not care properly
about 0 return value (when csize is non-zero). So either we need to return
-EINVAL or we need to fix it's caller so that pread() would know that required
number of data were not read.

> > +
> >         vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> >         if (!vaddr)
> >                 return -ENOMEM;
> > =========================%<=========================
> 
> In any case kernel must not panic, so I think we must have above hunk. However,
> we also need to look into kexec-tools that why it is asking kernel to copy those
> unneeded chunks.
> 
> I will test tomorrow with above hunk.

After that hunk it did not crash but vmcore-dmesg fails with following message:
"No program header covering vaddr 0x401ff0found kexec bug?"

It happened because vmcore-dmesg is sending wrong offset to the pread(), and so
it did not crash after the above kernel hunk but it still read garbage wrong
log_buf virtual address pointer.

vmcore-dmesg is sending wrong offset because page_offset(vp_offset) calculation
is not perfect for my case, explained here [1].

So, if I correct page_offset(vp_offset) (as arm64_mem.page_offset = ehdr.e_entry
- "kernel Code Start PA" + phys_offset), then vmcore-dmesg and vmcore copy
worked fine, however if I use makedumpfile to copy(compressed) data from
/proc/vmcore then it still generates "synchronous external abort". I think, it
generated because it would have found garbage data in EFI memory region. My
/proc/iomem shows following:

8000000000-8001e7ffff : System RAM
8001e80000-83ff17ffff : System RAM
  8002080000-8002b3ffff : Kernel code
  8002c40000-800348ffff : Kernel data
  807fe00000-80ffdfffff : Crash kernel
83ff180000-83ff1cffff : System RAM
83ff1d0000-83ff21ffff : System RAM
83ff220000-83ffe4ffff : System RAM
83ffe50000-83ffffffff : System RAM

If I clip all the region before "kernel code" and provide that clipped
input to kexec-tools then everything works fine.

~Pratyush

[1] http://lists.infradead.org/pipermail/kexec/2016-August/016834.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-11 10:03           ` Pratyush Anand
@ 2016-08-16 10:13             ` James Morse
  2016-08-17 15:33               ` [PATCH] fixup! " James Morse
  0 siblings, 1 reply; 52+ messages in thread
From: James Morse @ 2016-08-16 10:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Pratyush,

On 11/08/16 11:03, Pratyush Anand wrote:
> On 10/08/2016:11:48:27 PM, Pratyush Anand wrote:
>> On 10/08/2016:05:38:05 PM, James Morse wrote:
>>> =========================%<=========================
>>> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
>>> index 2dc54d129be1..784d4c30b534 100644
>>> --- a/arch/arm64/kernel/crash_dump.c
>>> +++ b/arch/arm64/kernel/crash_dump.c
>>> @@ -37,6 +37,11 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>>>         if (!csize)
>>>                 return 0;
>>>
>>> +       if (memblock_is_memory(pfn << PAGE_SHIFT) &&
>>> +           !memblock_is_map_memory(pfn << PAGE_SHIFT))
>>> +               /* skip this nomap memory region, reserved by firmware */
>>> +               return 0;
> 
> This should return 0 or -EINVAL? because, its caller does not care properly
> about 0 return value (when csize is non-zero). So either we need to return
> -EINVAL or we need to fix it's caller so that pread() would know that required
> number of data were not read.

I blindly followed 'number of bytes copied' -> 0. It worked for me, but may not
be correct.

remap_oldmem_pfn_checked() looks like it substitutes the zero page in this (or
at least a similar) case, maybe we should do the same for nomap pages.


> 
>>> +
>>>         vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
>>>         if (!vaddr)
>>>                 return -ENOMEM;
>>> =========================%<=========================
>>
>> In any case kernel must not panic, so I think we must have above hunk. However,
>> we also need to look into kexec-tools that why it is asking kernel to copy those
>> unneeded chunks.
>>
>> I will test tomorrow with above hunk.
> 
> After that hunk it did not crash but vmcore-dmesg fails with following message:
> "No program header covering vaddr 0x401ff0found kexec bug?"
> 
> It happened because vmcore-dmesg is sending wrong offset to the pread(), and so
> it did not crash after the above kernel hunk but it still read garbage wrong
> log_buf virtual address pointer.
> 
> vmcore-dmesg is sending wrong offset because page_offset(vp_offset) calculation
> is not perfect for my case, explained here [1].
> 
> So, if I correct page_offset(vp_offset) (as arm64_mem.page_offset = ehdr.e_entry
> - "kernel Code Start PA" + phys_offset), then vmcore-dmesg and vmcore copy
> worked fine, however if I use makedumpfile to copy(compressed) data from
> /proc/vmcore then it still generates "synchronous external abort". I think, it

At a guess makedumpfile is mmap()ing /proc/vmcore so it can use multiple
threads to read (then compress) the data. This bypasses the check added to
copy_oldmem_page(). We probably need to provide a remap_oldmem_pfn_range() that
checks whether the range contains nomap pages.

I will try and send a fixup patch to do this later this week, (unless someone
beats me to it!)


> generated because it would have found garbage data in EFI memory region.

If it was marked as belonging to efi in the efi memory map, the kernel shouldn't
be touching it. If you add 'efi=debug' to your kernel cmdline you get a table of
the addresses and properties.


Thanks,

James

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] fixup! arm64: kdump: add kdump support
  2016-08-16 10:13             ` James Morse
@ 2016-08-17 15:33               ` James Morse
  2016-08-18  7:32                 ` AKASHI Takahiro
  2016-08-19  8:00                 ` Pratyush Anand
  0 siblings, 2 replies; 52+ messages in thread
From: James Morse @ 2016-08-17 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

copy_oldmem_page() and mmap_vmcore() provide two ways for userspace to read
from /proc/vmcore. Neither of these check with memblock to see if the page
they are accessing is nomap. On Seattle this causes:

[  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
0xffffff80096b6000
[  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
[  174.407370] Modules linked in:
[  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
[  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
ROD1002C 04/08/2016
[  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
[  174.432313] PC is at __arch_copy_to_user+0x180/0x280
[  174.437274] LR is at copy_oldmem_page+0xac/0xf0
[  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
[  174.449173] sp : ffffffc0f34bfc90
[  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
[  174.457776] x27: 0000000008000000 x26: 000000000000d000
[  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
[  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
[  174.473679] x21: 0000000000000001 x20: 0000000030127000
[  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
[  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
[  174.489579] x15: 0000007fae377590 x14: 0000000000000000

[  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
[  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
[  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
[  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
[  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
[  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
[  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
[  174.531982]
[  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)

[  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
[  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
[  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
[  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
[  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
[  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
[  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
[  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
[  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
[  174.897562] ---[ end trace 00801b2e35b0cd1f ]---

When reading /proc/vmcore with cat/cp or or mmap()ing it with makedumpfile.

The fs/proc/vmcore.c code provides a hook to indicate whether oldmem pages
are ram or not. Use this to look for our earlier handiwork in memblock.

Signed-off-by: James Morse <james.morse@arm.com>
---

Hi Pratyush,

I couldn't get makedumpfile to build, or rather it depends on elfutils which
wouldn't build for autotools reasons. Does implementing this hook solve your
makedumpfile issue?

With this patch I can extract a usable vmcore file using read or mmap,
avoiding the earlier splat.

Akashi, if you agree this is the right thing to do, please consider folding
this into patch 5. (no need to keep the commit mesage or anything).

Thanks,

James

 arch/arm64/kernel/crash_dump.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
index 2dc54d129be1..76c71ab42994 100644
--- a/arch/arm64/kernel/crash_dump.c
+++ b/arch/arm64/kernel/crash_dump.c
@@ -12,10 +12,38 @@
 #include <linux/crash_dump.h>
 #include <linux/errno.h>
 #include <linux/io.h>
+#include <linux/init.h>
 #include <linux/memblock.h>
 #include <linux/uaccess.h>
 #include <asm/memory.h>
 
+#ifdef CONFIG_PROC_VMCORE
+static int oldmem_pfn_is_ram(unsigned long pfn)
+{
+	unsigned long addr = pfn << PAGE_SHIFT;
+
+	/*
+	 * We removed oldmem from memblock.memory, then re-added some regions
+	 * which are reserved by/for firmware as memory and nomap.
+	 * If an address exists as memory, but is marked nomap, return false.
+	 */
+	if (memblock_is_memory(addr) && !memblock_is_map_memory(addr))
+		return 0;
+
+	return 1;
+}
+
+static int __init do_register_oldmem_pfn_is_ram(void)
+{
+	return register_oldmem_pfn_is_ram(&oldmem_pfn_is_ram);
+}
+/*
+ * vmcore_init() is called via fs_initcall, ensure we register
+ * oldmem_pfn_is_ram() before then.
+ */
+arch_initcall(do_register_oldmem_pfn_is_ram);
+#endif
+
 /**
  * copy_oldmem_page() - copy one page from old kernel memory
  * @pfn: page frame number to be copied
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-10 16:38       ` James Morse
  2016-08-10 18:18         ` Pratyush Anand
@ 2016-08-18  7:15         ` AKASHI Takahiro
  2016-08-18  7:19           ` Dave Young
  2016-08-19  1:26           ` AKASHI Takahiro
  1 sibling, 2 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-18  7:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James, Pratyush,

Thank you for your testing and reporting an issue.
I've been on vacation until yesterday.

On Wed, Aug 10, 2016 at 05:38:05PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 09/08/16 02:56, AKASHI Takahiro wrote:
> > On crash dump kernel, all the information about primary kernel's system
> > memory (core image) is available in elf core header.
> > The primary kernel will set aside this header with reserve_elfcorehdr()
> > at boot time and inform crash dump kernel of its location via a new
> > device-tree property, "linux,elfcorehdr".
> > 
> > Please note that all other architectures use traditional "elfcorehdr="
> > kernel parameter for this purpose.
> > 
> > Then crash dump kernel will access the primary kernel's memory with
> > copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> > not reside in linear mapping on crash dump kernel.
> > 
> > We also need our own elfcorehdr_read() here since the header is placed
> > within crash dump kernel's usable memory.
> 
> On Seattle when I panic and boot the kdump kernel, I am unable to read the
> /proc/vmcore file. Instead I get:
> nanook at frikadeller:~$ sudo cp /proc/vmcore /
> [  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
> 0xffffff80096b6000
> [  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
> [  174.407370] Modules linked in:
> [  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
> [  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
> ROD1002C 04/08/2016
> [  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
> [  174.432313] PC is at __arch_copy_to_user+0x180/0x280
> [  174.437274] LR is at copy_oldmem_page+0xac/0xf0
> [  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
> [  174.449173] sp : ffffffc0f34bfc90
> [  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
> [  174.457776] x27: 0000000008000000 x26: 000000000000d000
> [  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
> [  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
> [  174.473679] x21: 0000000000000001 x20: 0000000030127000
> [  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
> [  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
> [  174.489579] x15: 0000007fae377590 x14: 0000000000000000
> [  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
> [  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
> [  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
> [  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
> [  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
> [  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
> [  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
> [  174.531982]
> [  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)
> 
> [  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
> [  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
> [  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
> [  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
> [  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
> [  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
> [  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
> [  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
> [  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
> [  174.897562] ---[ end trace 00801b2e35b0cd1f ]---
> 
> 
> The offending call is:
> > copy_oldmem_page(0x8000000, 0x00000000385f8000, 0x1000, 0, 1)
> 
> This is trying to access the bottom page of memory. From the efi memory map:
> > efi:   0x008000000000-0x008001e7ffff [Runtime Data       |RUN|  |WB|WT|WC|UC]*
> > efi:   0x008001e80000-0x008001ffffff [Conventional Memory|   |  |WB|WT|WC|UC]
> 
> This page is 'Runtime Data', and marked as nomap by both the original and kdump
> kernels, but copy_oldmem_page() doesn't know this.
> 
> In this case because we have already parsed the efi memory map again in the
> kdump kernel and re-marked these regions as nomap, the below hunk fixes the
> problem for me:
> =========================%<=========================
> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> index 2dc54d129be1..784d4c30b534 100644
> --- a/arch/arm64/kernel/crash_dump.c
> +++ b/arch/arm64/kernel/crash_dump.c
> @@ -37,6 +37,11 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>         if (!csize)
>                 return 0;
> 
> +       if (memblock_is_memory(pfn << PAGE_SHIFT) &&
> +           !memblock_is_map_memory(pfn << PAGE_SHIFT))
> +               /* skip this nomap memory region, reserved by firmware */
> +               return 0;
> +
>         vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);

Here I'm wandering why my original code doesn't work.
If !memblock_is_map_memory(), ioremap_cache() would call __ioremap_caller()
and return a valid virtual address mapped in vmalloc area.

>         if (!vaddr)
>                 return -ENOMEM;
> =========================%<=========================
> 
> With this I can copy the vmcore file, and feed it to crash to read dmesg, task
> list etc...
> 
> This could be a deeper/wider issue, but I can't see any other users of
> memblock_mark_nomap().
> Do you think depending on this this 're-learning' is robust enough, or should
> the nomap ranges be described in the vmcoreinfo elf notes?

The current kexec-tools identifies all the memory regions from
/proc/iomem and there is no way for user space tools to distinguish
"EFI runtime data," or any other nomap memory, from normal "System RAM"
because all those resources are currently marked as "System RAM."

So I think that such regions should be marked as, say, "reserved,"
so that we can exclude those memories from a crush dump file.

(I don't know whether this change may have a backward-compatibility
problem.)

-Takahiro AKASHI

> 
> Thanks,
> 
> James
> 
> 
> > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > new file mode 100644
> > index 0000000..2dc54d1
> > --- /dev/null
> > +++ b/arch/arm64/kernel/crash_dump.c
> > @@ -0,0 +1,71 @@
> > +/*
> > + * Routines for doing kexec-based kdump
> > + *
> > + * Copyright (C) 2014 Linaro Limited
> > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#include <linux/crash_dump.h>
> > +#include <linux/errno.h>
> > +#include <linux/io.h>
> > +#include <linux/memblock.h>
> > +#include <linux/uaccess.h>
> > +#include <asm/memory.h>
> > +
> > +/**
> > + * copy_oldmem_page() - copy one page from old kernel memory
> > + * @pfn: page frame number to be copied
> > + * @buf: buffer where the copied page is placed
> > + * @csize: number of bytes to copy
> > + * @offset: offset in bytes into the page
> > + * @userbuf: if set, @buf is in a user address space
> > + *
> > + * This function copies one page from old kernel memory into buffer pointed by
> > + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> > + * copied or negative error in case of failure.
> > + */
> > +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> > +			 size_t csize, unsigned long offset,
> > +			 int userbuf)
> > +{
> > +	void *vaddr;
> > +
> > +	if (!csize)
> > +		return 0;
> > +
> > +	vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> > +	if (!vaddr)
> > +		return -ENOMEM;
> > +
> > +	if (userbuf) {
> > +		if (copy_to_user(buf, vaddr + offset, csize)) {
> > +			iounmap(vaddr);
> > +			return -EFAULT;
> > +		}
> > +	} else {
> > +		memcpy(buf, vaddr + offset, csize);
> > +	}
> > +
> > +	iounmap(vaddr);
> > +
> > +	return csize;
> > +}
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-18  7:15         ` [PATCH v24 5/9] " AKASHI Takahiro
@ 2016-08-18  7:19           ` Dave Young
  2016-08-19  1:26           ` AKASHI Takahiro
  1 sibling, 0 replies; 52+ messages in thread
From: Dave Young @ 2016-08-18  7:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/18/16 at 04:15pm, AKASHI Takahiro wrote:
> Hi James, Pratyush,
> 
> Thank you for your testing and reporting an issue.
> I've been on vacation until yesterday.
> 
> On Wed, Aug 10, 2016 at 05:38:05PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 09/08/16 02:56, AKASHI Takahiro wrote:
> > > On crash dump kernel, all the information about primary kernel's system
> > > memory (core image) is available in elf core header.
> > > The primary kernel will set aside this header with reserve_elfcorehdr()
> > > at boot time and inform crash dump kernel of its location via a new
> > > device-tree property, "linux,elfcorehdr".
> > > 
> > > Please note that all other architectures use traditional "elfcorehdr="
> > > kernel parameter for this purpose.
> > > 
> > > Then crash dump kernel will access the primary kernel's memory with
> > > copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> > > not reside in linear mapping on crash dump kernel.
> > > 
> > > We also need our own elfcorehdr_read() here since the header is placed
> > > within crash dump kernel's usable memory.
> > 
> > On Seattle when I panic and boot the kdump kernel, I am unable to read the
> > /proc/vmcore file. Instead I get:
> > nanook at frikadeller:~$ sudo cp /proc/vmcore /
> > [  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
> > 0xffffff80096b6000
> > [  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
> > [  174.407370] Modules linked in:
> > [  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
> > [  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
> > ROD1002C 04/08/2016
> > [  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
> > [  174.432313] PC is at __arch_copy_to_user+0x180/0x280
> > [  174.437274] LR is at copy_oldmem_page+0xac/0xf0
> > [  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
> > [  174.449173] sp : ffffffc0f34bfc90
> > [  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
> > [  174.457776] x27: 0000000008000000 x26: 000000000000d000
> > [  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
> > [  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
> > [  174.473679] x21: 0000000000000001 x20: 0000000030127000
> > [  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
> > [  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
> > [  174.489579] x15: 0000007fae377590 x14: 0000000000000000
> > [  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
> > [  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
> > [  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
> > [  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
> > [  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
> > [  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
> > [  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
> > [  174.531982]
> > [  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)
> > 
> > [  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
> > [  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
> > [  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
> > [  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
> > [  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
> > [  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
> > [  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
> > [  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
> > [  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
> > [  174.897562] ---[ end trace 00801b2e35b0cd1f ]---
> > 
> > 
> > The offending call is:
> > > copy_oldmem_page(0x8000000, 0x00000000385f8000, 0x1000, 0, 1)
> > 
> > This is trying to access the bottom page of memory. From the efi memory map:
> > > efi:   0x008000000000-0x008001e7ffff [Runtime Data       |RUN|  |WB|WT|WC|UC]*
> > > efi:   0x008001e80000-0x008001ffffff [Conventional Memory|   |  |WB|WT|WC|UC]
> > 
> > This page is 'Runtime Data', and marked as nomap by both the original and kdump
> > kernels, but copy_oldmem_page() doesn't know this.
> > 
> > In this case because we have already parsed the efi memory map again in the
> > kdump kernel and re-marked these regions as nomap, the below hunk fixes the
> > problem for me:
> > =========================%<=========================
> > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > index 2dc54d129be1..784d4c30b534 100644
> > --- a/arch/arm64/kernel/crash_dump.c
> > +++ b/arch/arm64/kernel/crash_dump.c
> > @@ -37,6 +37,11 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> >         if (!csize)
> >                 return 0;
> > 
> > +       if (memblock_is_memory(pfn << PAGE_SHIFT) &&
> > +           !memblock_is_map_memory(pfn << PAGE_SHIFT))
> > +               /* skip this nomap memory region, reserved by firmware */
> > +               return 0;
> > +
> >         vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> 
> Here I'm wandering why my original code doesn't work.
> If !memblock_is_map_memory(), ioremap_cache() would call __ioremap_caller()
> and return a valid virtual address mapped in vmalloc area.
> 
> >         if (!vaddr)
> >                 return -ENOMEM;
> > =========================%<=========================
> > 
> > With this I can copy the vmcore file, and feed it to crash to read dmesg, task
> > list etc...
> > 
> > This could be a deeper/wider issue, but I can't see any other users of
> > memblock_mark_nomap().
> > Do you think depending on this this 're-learning' is robust enough, or should
> > the nomap ranges be described in the vmcoreinfo elf notes?
> 
> The current kexec-tools identifies all the memory regions from
> /proc/iomem and there is no way for user space tools to distinguish
> "EFI runtime data," or any other nomap memory, from normal "System RAM"
> because all those resources are currently marked as "System RAM."
> 
> So I think that such regions should be marked as, say, "reserved,"
> so that we can exclude those memories from a crush dump file.

Agreed.

EFI runtime memory is not system ram, in X86 they are "Reserved" ranges,
it sounds a better way to mark them ask reserved as well in arm64.

> 
> (I don't know whether this change may have a backward-compatibility
> problem.)
> 
> -Takahiro AKASHI
> 
> > 
> > Thanks,
> > 
> > James
> > 
> > 
> > > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > > new file mode 100644
> > > index 0000000..2dc54d1
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/crash_dump.c
> > > @@ -0,0 +1,71 @@
> > > +/*
> > > + * Routines for doing kexec-based kdump
> > > + *
> > > + * Copyright (C) 2014 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + */
> > > +
> > > +#include <linux/crash_dump.h>
> > > +#include <linux/errno.h>
> > > +#include <linux/io.h>
> > > +#include <linux/memblock.h>
> > > +#include <linux/uaccess.h>
> > > +#include <asm/memory.h>
> > > +
> > > +/**
> > > + * copy_oldmem_page() - copy one page from old kernel memory
> > > + * @pfn: page frame number to be copied
> > > + * @buf: buffer where the copied page is placed
> > > + * @csize: number of bytes to copy
> > > + * @offset: offset in bytes into the page
> > > + * @userbuf: if set, @buf is in a user address space
> > > + *
> > > + * This function copies one page from old kernel memory into buffer pointed by
> > > + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> > > + * copied or negative error in case of failure.
> > > + */
> > > +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> > > +			 size_t csize, unsigned long offset,
> > > +			 int userbuf)
> > > +{
> > > +	void *vaddr;
> > > +
> > > +	if (!csize)
> > > +		return 0;
> > > +
> > > +	vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> > > +	if (!vaddr)
> > > +		return -ENOMEM;
> > > +
> > > +	if (userbuf) {
> > > +		if (copy_to_user(buf, vaddr + offset, csize)) {
> > > +			iounmap(vaddr);
> > > +			return -EFAULT;
> > > +		}
> > > +	} else {
> > > +		memcpy(buf, vaddr + offset, csize);
> > > +	}
> > > +
> > > +	iounmap(vaddr);
> > > +
> > > +	return csize;
> > > +}
> > 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] fixup! arm64: kdump: add kdump support
  2016-08-17 15:33               ` [PATCH] fixup! " James Morse
@ 2016-08-18  7:32                 ` AKASHI Takahiro
  2016-08-19  8:00                 ` Pratyush Anand
  1 sibling, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-18  7:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 17, 2016 at 04:33:31PM +0100, James Morse wrote:
> copy_oldmem_page() and mmap_vmcore() provide two ways for userspace to read
> from /proc/vmcore. Neither of these check with memblock to see if the page
> they are accessing is nomap. On Seattle this causes:
> 
> [  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
> 0xffffff80096b6000
> [  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
> [  174.407370] Modules linked in:
> [  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
> [  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
> ROD1002C 04/08/2016
> [  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
> [  174.432313] PC is at __arch_copy_to_user+0x180/0x280
> [  174.437274] LR is at copy_oldmem_page+0xac/0xf0
> [  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
> [  174.449173] sp : ffffffc0f34bfc90
> [  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
> [  174.457776] x27: 0000000008000000 x26: 000000000000d000
> [  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
> [  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
> [  174.473679] x21: 0000000000000001 x20: 0000000030127000
> [  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
> [  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
> [  174.489579] x15: 0000007fae377590 x14: 0000000000000000
> 
> [  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
> [  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
> [  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
> [  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
> [  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
> [  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
> [  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
> [  174.531982]
> [  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)
> 
> [  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
> [  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
> [  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
> [  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
> [  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
> [  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
> [  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
> [  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
> [  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
> [  174.897562] ---[ end trace 00801b2e35b0cd1f ]---
> 
> When reading /proc/vmcore with cat/cp or or mmap()ing it with makedumpfile.
> 
> The fs/proc/vmcore.c code provides a hook to indicate whether oldmem pages
> are ram or not. Use this to look for our earlier handiwork in memblock.

I'm not quite sure about the background that oldmem_pfn_is_ram() was
originally introduced on x86, but I think that this feature be deserved
for fixing an issue on Xen.
See:
commit 997c136
Author: Olaf Hering <olaf@aepfle.de>
Date:   Thu May 26 16:25:54 2011 -0700

    fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages

Thanks,
-Takahiro AKASHI

> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> 
> Hi Pratyush,
> 
> I couldn't get makedumpfile to build, or rather it depends on elfutils which
> wouldn't build for autotools reasons. Does implementing this hook solve your
> makedumpfile issue?
> 
> With this patch I can extract a usable vmcore file using read or mmap,
> avoiding the earlier splat.
> 
> Akashi, if you agree this is the right thing to do, please consider folding
> this into patch 5. (no need to keep the commit mesage or anything).
> 
> Thanks,
> 
> James
> 
>  arch/arm64/kernel/crash_dump.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> index 2dc54d129be1..76c71ab42994 100644
> --- a/arch/arm64/kernel/crash_dump.c
> +++ b/arch/arm64/kernel/crash_dump.c
> @@ -12,10 +12,38 @@
>  #include <linux/crash_dump.h>
>  #include <linux/errno.h>
>  #include <linux/io.h>
> +#include <linux/init.h>
>  #include <linux/memblock.h>
>  #include <linux/uaccess.h>
>  #include <asm/memory.h>
>  
> +#ifdef CONFIG_PROC_VMCORE
> +static int oldmem_pfn_is_ram(unsigned long pfn)
> +{
> +	unsigned long addr = pfn << PAGE_SHIFT;
> +
> +	/*
> +	 * We removed oldmem from memblock.memory, then re-added some regions
> +	 * which are reserved by/for firmware as memory and nomap.
> +	 * If an address exists as memory, but is marked nomap, return false.
> +	 */
> +	if (memblock_is_memory(addr) && !memblock_is_map_memory(addr))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static int __init do_register_oldmem_pfn_is_ram(void)
> +{
> +	return register_oldmem_pfn_is_ram(&oldmem_pfn_is_ram);
> +}
> +/*
> + * vmcore_init() is called via fs_initcall, ensure we register
> + * oldmem_pfn_is_ram() before then.
> + */
> +arch_initcall(do_register_oldmem_pfn_is_ram);
> +#endif
> +
>  /**
>   * copy_oldmem_page() - copy one page from old kernel memory
>   * @pfn: page frame number to be copied
> -- 
> 2.8.0.rc3
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-18  7:15         ` [PATCH v24 5/9] " AKASHI Takahiro
  2016-08-18  7:19           ` Dave Young
@ 2016-08-19  1:26           ` AKASHI Takahiro
  2016-08-19 11:22             ` Pratyush Anand
  2016-08-19 13:28             ` James Morse
  1 sibling, 2 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-19  1:26 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Thu, Aug 18, 2016 at 04:15:48PM +0900, AKASHI Takahiro wrote:
> Hi James, Pratyush,
> 
> Thank you for your testing and reporting an issue.
> I've been on vacation until yesterday.
> 
> On Wed, Aug 10, 2016 at 05:38:05PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 09/08/16 02:56, AKASHI Takahiro wrote:
> > > On crash dump kernel, all the information about primary kernel's system
> > > memory (core image) is available in elf core header.
> > > The primary kernel will set aside this header with reserve_elfcorehdr()
> > > at boot time and inform crash dump kernel of its location via a new
> > > device-tree property, "linux,elfcorehdr".
> > > 
> > > Please note that all other architectures use traditional "elfcorehdr="
> > > kernel parameter for this purpose.
> > > 
> > > Then crash dump kernel will access the primary kernel's memory with
> > > copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> > > not reside in linear mapping on crash dump kernel.
> > > 
> > > We also need our own elfcorehdr_read() here since the header is placed
> > > within crash dump kernel's usable memory.
> > 
> > On Seattle when I panic and boot the kdump kernel, I am unable to read the
> > /proc/vmcore file. Instead I get:
> > nanook at frikadeller:~$ sudo cp /proc/vmcore /
> > [  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
> > 0xffffff80096b6000
> > [  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
> > [  174.407370] Modules linked in:
> > [  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
> > [  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
> > ROD1002C 04/08/2016
> > [  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
> > [  174.432313] PC is at __arch_copy_to_user+0x180/0x280
> > [  174.437274] LR is at copy_oldmem_page+0xac/0xf0
> > [  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
> > [  174.449173] sp : ffffffc0f34bfc90
> > [  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
> > [  174.457776] x27: 0000000008000000 x26: 000000000000d000
> > [  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
> > [  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
> > [  174.473679] x21: 0000000000000001 x20: 0000000030127000
> > [  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
> > [  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
> > [  174.489579] x15: 0000007fae377590 x14: 0000000000000000
> > [  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
> > [  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
> > [  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
> > [  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
> > [  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
> > [  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
> > [  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
> > [  174.531982]
> > [  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)
> > 
> > [  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
> > [  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
> > [  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
> > [  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
> > [  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
> > [  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
> > [  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
> > [  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
> > [  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
> > [  174.897562] ---[ end trace 00801b2e35b0cd1f ]---
> > 
> > 
> > The offending call is:
> > > copy_oldmem_page(0x8000000, 0x00000000385f8000, 0x1000, 0, 1)
> > 
> > This is trying to access the bottom page of memory. From the efi memory map:
> > > efi:   0x008000000000-0x008001e7ffff [Runtime Data       |RUN|  |WB|WT|WC|UC]*
> > > efi:   0x008001e80000-0x008001ffffff [Conventional Memory|   |  |WB|WT|WC|UC]
> > 
> > This page is 'Runtime Data', and marked as nomap by both the original and kdump
> > kernels, but copy_oldmem_page() doesn't know this.
> > 
> > In this case because we have already parsed the efi memory map again in the
> > kdump kernel and re-marked these regions as nomap, the below hunk fixes the
> > problem for me:
> > =========================%<=========================
> > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > index 2dc54d129be1..784d4c30b534 100644
> > --- a/arch/arm64/kernel/crash_dump.c
> > +++ b/arch/arm64/kernel/crash_dump.c
> > @@ -37,6 +37,11 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> >         if (!csize)
> >                 return 0;
> > 
> > +       if (memblock_is_memory(pfn << PAGE_SHIFT) &&
> > +           !memblock_is_map_memory(pfn << PAGE_SHIFT))
> > +               /* skip this nomap memory region, reserved by firmware */
> > +               return 0;
> > +
> >         vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> 
> Here I'm wandering why my original code doesn't work.
> If !memblock_is_map_memory(), ioremap_cache() would call __ioremap_caller()
> and return a valid virtual address mapped in vmalloc area.
> 
> >         if (!vaddr)
> >                 return -ENOMEM;
> > =========================%<=========================
> > 
> > With this I can copy the vmcore file, and feed it to crash to read dmesg, task
> > list etc...
> > 
> > This could be a deeper/wider issue, but I can't see any other users of
> > memblock_mark_nomap().
> > Do you think depending on this this 're-learning' is robust enough, or should
> > the nomap ranges be described in the vmcoreinfo elf notes?
> 
> The current kexec-tools identifies all the memory regions from
> /proc/iomem and there is no way for user space tools to distinguish
> "EFI runtime data," or any other nomap memory, from normal "System RAM"
> because all those resources are currently marked as "System RAM."
> 
> So I think that such regions should be marked as, say, "reserved,"
> so that we can exclude those memories from a crush dump file.

Can you try the following change?
If it fixes your problem, I will post it as a patch.

Thanks,
-Takahiro AKASHI

===8<===
>From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
From: AKASHI Takahiro <takahiro.akashi@linaro.org>
Date: Fri, 19 Aug 2016 09:57:52 +0900
Subject: [PATCH] arm64: mark reserved memblock regions explicitly

---
 arch/arm64/kernel/setup.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 38eda13..38589b5 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
 
 	for_each_memblock(memory, region) {
 		res = alloc_bootmem_low(sizeof(*res));
-		res->name  = "System RAM";
+		if (memblock_is_nomap(region)) {
+			res->name  = "reserved";
+			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+		} else {
+			res->name  = "System RAM";
+			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+		}
 		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
 		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
-		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 
 		request_resource(&iomem_resource, res);
 
-- 
2.9.0

===>8===

> (I don't know whether this change may have a backward-compatibility
> problem.)
> 
> -Takahiro AKASHI
> 
> > 
> > Thanks,
> > 
> > James
> > 
> > 
> > > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > > new file mode 100644
> > > index 0000000..2dc54d1
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/crash_dump.c
> > > @@ -0,0 +1,71 @@
> > > +/*
> > > + * Routines for doing kexec-based kdump
> > > + *
> > > + * Copyright (C) 2014 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + */
> > > +
> > > +#include <linux/crash_dump.h>
> > > +#include <linux/errno.h>
> > > +#include <linux/io.h>
> > > +#include <linux/memblock.h>
> > > +#include <linux/uaccess.h>
> > > +#include <asm/memory.h>
> > > +
> > > +/**
> > > + * copy_oldmem_page() - copy one page from old kernel memory
> > > + * @pfn: page frame number to be copied
> > > + * @buf: buffer where the copied page is placed
> > > + * @csize: number of bytes to copy
> > > + * @offset: offset in bytes into the page
> > > + * @userbuf: if set, @buf is in a user address space
> > > + *
> > > + * This function copies one page from old kernel memory into buffer pointed by
> > > + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> > > + * copied or negative error in case of failure.
> > > + */
> > > +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> > > +			 size_t csize, unsigned long offset,
> > > +			 int userbuf)
> > > +{
> > > +	void *vaddr;
> > > +
> > > +	if (!csize)
> > > +		return 0;
> > > +
> > > +	vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> > > +	if (!vaddr)
> > > +		return -ENOMEM;
> > > +
> > > +	if (userbuf) {
> > > +		if (copy_to_user(buf, vaddr + offset, csize)) {
> > > +			iounmap(vaddr);
> > > +			return -EFAULT;
> > > +		}
> > > +	} else {
> > > +		memcpy(buf, vaddr + offset, csize);
> > > +	}
> > > +
> > > +	iounmap(vaddr);
> > > +
> > > +	return csize;
> > > +}
> > 

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH] fixup! arm64: kdump: add kdump support
  2016-08-17 15:33               ` [PATCH] fixup! " James Morse
  2016-08-18  7:32                 ` AKASHI Takahiro
@ 2016-08-19  8:00                 ` Pratyush Anand
  2016-08-19 13:34                   ` James Morse
  1 sibling, 1 reply; 52+ messages in thread
From: Pratyush Anand @ 2016-08-19  8:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 17/08/2016:04:33:31 PM, James Morse wrote:
> copy_oldmem_page() and mmap_vmcore() provide two ways for userspace to read
> from /proc/vmcore. Neither of these check with memblock to see if the page
> they are accessing is nomap. On Seattle this causes:

Thanks for the patch.It did resolve the kernel crash issue with makedumpfile,
however neither there was any data in vmcore-dmesg nor crash utility was able to
work the saved vmcore.

This happened, because we still do not have correct page_offset (or vp_offset as
per new patches) calculation in kexec-tools. I still need following fixup in
kexec-tools.

https://github.com/pratyushanand/kexec-tools/commit/2358de3ec614d8282a565b8d031a1a91ebc55475

~Pratyush
> 
> [  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
> 0xffffff80096b6000
> [  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
> [  174.407370] Modules linked in:
> [  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
> [  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
> ROD1002C 04/08/2016
> [  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
> [  174.432313] PC is at __arch_copy_to_user+0x180/0x280
> [  174.437274] LR is at copy_oldmem_page+0xac/0xf0
> [  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
> [  174.449173] sp : ffffffc0f34bfc90
> [  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
> [  174.457776] x27: 0000000008000000 x26: 000000000000d000
> [  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
> [  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
> [  174.473679] x21: 0000000000000001 x20: 0000000030127000
> [  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
> [  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
> [  174.489579] x15: 0000007fae377590 x14: 0000000000000000
> 
> [  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
> [  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
> [  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
> [  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
> [  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
> [  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
> [  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
> [  174.531982]
> [  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)
> 
> [  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
> [  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
> [  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
> [  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
> [  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
> [  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
> [  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
> [  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
> [  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
> [  174.897562] ---[ end trace 00801b2e35b0cd1f ]---
> 
> When reading /proc/vmcore with cat/cp or or mmap()ing it with makedumpfile.
> 
> The fs/proc/vmcore.c code provides a hook to indicate whether oldmem pages
> are ram or not. Use this to look for our earlier handiwork in memblock.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> 
> Hi Pratyush,
> 
> I couldn't get makedumpfile to build, or rather it depends on elfutils which
> wouldn't build for autotools reasons. Does implementing this hook solve your
> makedumpfile issue?
> 
> With this patch I can extract a usable vmcore file using read or mmap,
> avoiding the earlier splat.
> 
> Akashi, if you agree this is the right thing to do, please consider folding
> this into patch 5. (no need to keep the commit mesage or anything).
> 
> Thanks,
> 
> James
> 
>  arch/arm64/kernel/crash_dump.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> index 2dc54d129be1..76c71ab42994 100644
> --- a/arch/arm64/kernel/crash_dump.c
> +++ b/arch/arm64/kernel/crash_dump.c
> @@ -12,10 +12,38 @@
>  #include <linux/crash_dump.h>
>  #include <linux/errno.h>
>  #include <linux/io.h>
> +#include <linux/init.h>
>  #include <linux/memblock.h>
>  #include <linux/uaccess.h>
>  #include <asm/memory.h>
>  
> +#ifdef CONFIG_PROC_VMCORE
> +static int oldmem_pfn_is_ram(unsigned long pfn)
> +{
> +	unsigned long addr = pfn << PAGE_SHIFT;
> +
> +	/*
> +	 * We removed oldmem from memblock.memory, then re-added some regions
> +	 * which are reserved by/for firmware as memory and nomap.
> +	 * If an address exists as memory, but is marked nomap, return false.
> +	 */
> +	if (memblock_is_memory(addr) && !memblock_is_map_memory(addr))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static int __init do_register_oldmem_pfn_is_ram(void)
> +{
> +	return register_oldmem_pfn_is_ram(&oldmem_pfn_is_ram);
> +}
> +/*
> + * vmcore_init() is called via fs_initcall, ensure we register
> + * oldmem_pfn_is_ram() before then.
> + */
> +arch_initcall(do_register_oldmem_pfn_is_ram);
> +#endif
> +
>  /**
>   * copy_oldmem_page() - copy one page from old kernel memory
>   * @pfn: page frame number to be copied
> -- 
> 2.8.0.rc3

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-19  1:26           ` AKASHI Takahiro
@ 2016-08-19 11:22             ` Pratyush Anand
  2016-08-22  1:29               ` AKASHI Takahiro
  2016-08-19 13:28             ` James Morse
  1 sibling, 1 reply; 52+ messages in thread
From: Pratyush Anand @ 2016-08-19 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/08/2016:10:26:52 AM, AKASHI Takahiro wrote:
> >From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Date: Fri, 19 Aug 2016 09:57:52 +0900
> Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> 
> ---
>  arch/arm64/kernel/setup.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 38eda13..38589b5 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
>  
>  	for_each_memblock(memory, region) {
>  		res = alloc_bootmem_low(sizeof(*res));
> -		res->name  = "System RAM";
> +		if (memblock_is_nomap(region)) {
> +			res->name  = "reserved";
> +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +		} else {
> +			res->name  = "System RAM";
> +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +		}
>  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  
>  		request_resource(&iomem_resource, res);


It will help kexec-tools to prevent copying  of any unnecessary data. I
think, then you also need to change phys_offset calculation in kexec-tools. That
should be start of either of first "reserved" or "System RAM" block.

~Pratyush

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-08-09  1:57   ` [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
@ 2016-08-19 13:26     ` Rob Herring
  2016-08-22  4:28       ` AKASHI Takahiro
  2016-09-27 23:39       ` Mark Rutland
  0 siblings, 2 replies; 52+ messages in thread
From: Rob Herring @ 2016-08-19 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 09, 2016 at 10:57:47AM +0900, AKASHI Takahiro wrote:
> From: James Morse <james.morse@arm.com>
> 
> Add documentation for
> 	linux,crashkernel-base and crashkernel-size,
> 	linux,usable-memory-range, and
> 	linux,elfcorehdr
> used by arm64 kexec/kdump to decribe the kdump reserved area, and
> the elfcorehdr's location within it.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> [takahiro.akashi at linaro.org:
>     renamed "usable-memory" to "usable-memory-range",
>     added "linux,crashkernel-base" and "-size" ]
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> ---
>  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> index 6ae9d82..236188a 100644
> --- a/Documentation/devicetree/bindings/chosen.txt
> +++ b/Documentation/devicetree/bindings/chosen.txt
> @@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
>  book3e) by some versions of kexec-tools to tell the new kernel that it
>  is being booted by kexec, as the booting environment may differ (e.g.
>  a different secondary CPU release mechanism)
> +
> +linux,crashkernel-base
> +linux,crashkernel-size
> +----------------------
> +These properties (currently used on PowerPC and arm64) indicates
> +the base address and the size, respectively, of the reserved memory
> +range for crash dump kernel.
> +e.g.
> +
> +/ {
> +	chosen {
> +		linux,crashkernel-base = <0x9 0xf0000000>;
> +		linux,crashkernel-size = <0x0 0x10000000>;
> +	};
> +};
> +
> +linux,usable-memory-range
> +-------------------------
> +
> +This property (currently used only on arm64) holds the memory range,
> +the base address and the size, which can be used as system ram on
> +the *current* kernel. Note that, if this property is present, any memory
> +regions under "memory" nodes in DT blob or ones marked as "conventional
> +memory" in EFI memory map should be ignored.
> +e.g.
> +
> +/ {
> +	chosen {
> +		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
> +	};
> +};

I've read the discussion on this. I think you should use the existing 
linux,usable-memory property in the memory nodes. If UEFI systems don't 
have memory nodes, then you can find an UEFI way to describe this. DT is 
not the dumping ground for what doesn't fit in UEFI. How do x86 systems 
work?

Rob

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-19  1:26           ` AKASHI Takahiro
  2016-08-19 11:22             ` Pratyush Anand
@ 2016-08-19 13:28             ` James Morse
  2016-08-22  1:23               ` AKASHI Takahiro
  1 sibling, 1 reply; 52+ messages in thread
From: James Morse @ 2016-08-19 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/08/16 02:26, AKASHI Takahiro wrote:
> Can you try the following change?
> If it fixes your problem, I will post it as a patch.

Almost! This causes booting with acpi=on to fail for the familiar
alignment-fault reasons[2],
details and a suggested fix below.

I think we should have this change as it matches x86's use of acpi, and means we
don't rely on re-parsing the efi memory map to learn which areas of memory
shouldn't be in the vmcore.


> ===8<===
> From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Date: Fri, 19 Aug 2016 09:57:52 +0900
> Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> 
> ---
>  arch/arm64/kernel/setup.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 38eda13..38589b5 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
>  
>  	for_each_memblock(memory, region) {
>  		res = alloc_bootmem_low(sizeof(*res));
> -		res->name  = "System RAM";
> +		if (memblock_is_nomap(region)) {
> +			res->name  = "reserved";

> +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

This causes acpica to choke.  arch/arm64/include/asm/acpi.h:acpi_os_ioremap()
calls page_is_ram(), which expects IORESOURCE_SYSTEM_RAM. From kernel/resource.c:
> /*
>  * This generic page_is_ram() returns true if specified address is
>  * registered as System RAM in iomem_resource list.
>  */
> int __weak page_is_ram(unsigned long pfn)

We are trying to infer information about the EFI memory map by looking through
iomem_resource list generated from memblock.
drivers/firmware/efi/arm-init.c:reserve_regions() adds memory with the WB
attribute to memblock via early_init_dt_add_memory_arch(), so changing
page_is_ram() for memblock_is_memory() is one step closer to checking the
attributes in the efi memory map (which turns out to tricky).

With your v24 on v4.8-rc1 and 'mark reserved memblock regions explicitly', and
this extra hack [0], I can boot, kdump, extract the vmcore (with read() and
mmap()), and pull things out of it with crash.


Thanks,

James


> +		} else {
> +			res->name  = "System RAM";
> +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +		}
>  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  
>  		request_resource(&iomem_resource, res);

[0] whitespace damaged hack for acpi_os_ioremap().
------------------------------------%<------------------------------------
+++ b/arch/arm64/include/asm/acpi.h
@@ -12,6 +12,7 @@
 #ifndef _ASM_ACPI_H
 #define _ASM_ACPI_H

+#include <linux/memblock.h>
 #include <linux/mm.h>
 #include <linux/psci.h>

@@ -32,7 +33,11 @@
 static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
                                            acpi_size size)
 {
-       if (!page_is_ram(phys >> PAGE_SHIFT))
+       /*
+        * EFI's reserve_regions() call adds memory with the WB attribute
+        * to memblock via early_init_dt_add_memory_arch().
+        */
+       if (!memblock_is_memory(phys))
                return ioremap(phys, size);

        return ioremap_cache(phys, size);
------------------------------------%<------------------------------------

[1] cat /proc/iomem | tail -n 9
8000000000-8001e7ffff : reserved
8001e80000-83ff17ffff : System RAM
  8002080000-8002b0ffff : Kernel code
  8002d90000-8002f6ffff : Kernel data
  80dfe00000-80ffdfffff : Crash kernel
83ff180000-83ff1cffff : reserved
83ff1d0000-83ff21ffff : System RAM
83ff220000-83ffe4ffff : reserved
83ffe50000-83ffffffff : System RAM

[2] Panic on boot without [0]
[    0.037098] ACPI: Core revision 20160422
[    0.041174] Unable to handle kernel paging request at virtual address
fffffc0008f831e7
[    0.049155] pgd = fffffc0008f50000
[    0.052576] [fffffc0008f831e7] *pgd=00000083fffe0003, *pud=00000083fffe0003,
*pmd=00000083fffe0003, *pte=00e80083ff1c0707
[    0.063632] Internal error: Oops: 96000021 [#1] PREEMPT SMP
[    0.069245] Modules linked in:
[    0.072317] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1+ #4892
[    0.078891] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive)
(DT)
[    0.086518] task: fffffc0008dc8a80 task.stack: fffffc0008d90000
[    0.092486] PC is at acpi_ns_lookup+0x238/0x378
[    0.097049] LR is@acpi_ds_load1_begin_op+0x88/0x260
[    0.102290] pc : [<fffffc000840fed8>] lr : [<fffffc0008405dcc>] pstate: 60000045
[    0.109740] sp : fffffc0008d93bb0
[    0.113071] x29: fffffc0008d93bb0 x28: 0000000000000001
[    0.118420] x27: 0000000000000000 x26: 000000000000001b
[    0.123768] x25: fffffc0008a813b7 x24: 000000000000001b
[    0.129119] x23: 000000000000001b x22: 0000000000000001
[    0.134466] x21: fffffc0008d93cb8 x20: 0000000000000001
[    0.139814] x19: fffffc0008f831e7 x18: 0000000000000023
[    0.145163] x17: ffffffffffffffff x16: 0000000000000000
[    0.150511] x15: 000000000000038e x14: 0000000000007fff
[    0.155859] x13: ffffffffffffff00 x12: 0000000000000008
[    0.161208] x11: 0000000000000028 x10: 00000000ffffff76
[    0.166556] x9 : 0000000000000000 x8 : fffffe03c00b0140
[    0.171907] x7 : 0000000000000003 x6 : fffffc0008d93cb8
[    0.177255] x5 : fffffe03c0060800 x4 : fffffc0008ee7fe8
[    0.182603] x3 : fffffc0008ee8000 x2 : fffffc0008ee7fe8
[    0.187951] x1 : fffffc0008f831e7 x0 : 0000000000000000
[    0.193296]
[    0.194789] Process swapper/0 (pid: 0, stack limit = 0xfffffc0008d90020)
[    0.483605] Call trace:
[    0.486062] Exception stack(0xfffffc0008d939e0 to 0xfffffc0008d93b10)
[    0.568522] [<fffffc000840fed8>] acpi_ns_lookup+0x238/0x378
[    0.574134] [<fffffc0008405dcc>] acpi_ds_load1_begin_op+0x88/0x260
[    0.580358] [<fffffc0008415e3c>] acpi_ps_build_named_op+0xa8/0x170
[    0.586582] [<fffffc0008416034>] acpi_ps_create_op+0x130/0x230
[    0.592455] [<fffffc0008415988>] acpi_ps_parse_loop+0x174/0x580
[    0.598419] [<fffffc00084168b4>] acpi_ps_parse_aml+0xa8/0x278
[    0.604208] [<fffffc0008411efc>] acpi_ns_one_complete_parse+0x130/0x158
[    0.610871] [<fffffc0008411f48>] acpi_ns_parse_table+0x24/0x44
[    0.616745] [<fffffc00084116fc>] acpi_ns_load_table+0x50/0xe0
[    0.622535] [<fffffc000841ad80>] acpi_tb_load_namespace+0xdc/0x25c
[    0.628764] [<fffffc0008b383f4>] acpi_load_tables+0x3c/0xa8
[    0.634377] [<fffffc0008b37548>] acpi_early_init+0x88/0xbc
[    0.639903] [<fffffc0008b10b30>] start_kernel+0x330/0x394
[    0.645340] [<fffffc0008b10210>] __primary_switched+0x7c/0x8c
[    0.651127] Code: 2a1a03f7 2a0002d6 36380054 321902d6 (b9400260)
[    0.657270] ---[ end trace 0000000000000000 ]---
[    0.661922] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.668673] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] fixup! arm64: kdump: add kdump support
  2016-08-19  8:00                 ` Pratyush Anand
@ 2016-08-19 13:34                   ` James Morse
  2016-08-19 15:19                     ` Pratyush Anand
  0 siblings, 1 reply; 52+ messages in thread
From: James Morse @ 2016-08-19 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/08/16 09:00, Pratyush Anand wrote:
> On 17/08/2016:04:33:31 PM, James Morse wrote:
>> copy_oldmem_page() and mmap_vmcore() provide two ways for userspace to read
>> from /proc/vmcore. Neither of these check with memblock to see if the page
>> they are accessing is nomap. On Seattle this causes:
> 
> Thanks for the patch.It did resolve the kernel crash issue with makedumpfile,
> however neither there was any data in vmcore-dmesg nor crash utility was able to
> work the saved vmcore.

vmcore-dmesg doesn't work for me, but crash did once I'd rebuilt it from the
most recent source.

The most recent commit I have is:
b349598bb755 ("Fix for the ARM64 "bt -R <symbol>" option if the only reference")

Are you using an older version?


Thanks,

James


[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/447597.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] fixup! arm64: kdump: add kdump support
  2016-08-19 13:34                   ` James Morse
@ 2016-08-19 15:19                     ` Pratyush Anand
  0 siblings, 0 replies; 52+ messages in thread
From: Pratyush Anand @ 2016-08-19 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/08/2016:02:34:48 PM, James Morse wrote:
> On 19/08/16 09:00, Pratyush Anand wrote:
> > On 17/08/2016:04:33:31 PM, James Morse wrote:
> >> copy_oldmem_page() and mmap_vmcore() provide two ways for userspace to read
> >> from /proc/vmcore. Neither of these check with memblock to see if the page
> >> they are accessing is nomap. On Seattle this causes:
> > 
> > Thanks for the patch.It did resolve the kernel crash issue with makedumpfile,
> > however neither there was any data in vmcore-dmesg nor crash utility was able to
> > work the saved vmcore.
> 
> vmcore-dmesg doesn't work for me, but crash did once I'd rebuilt it from the
> most recent source.

Yes, saved vmcore  worked with latest crash. However, we will need to correct
phys_offset and page_offset in kexec-tools to get meaningful output from vmcore-dmesg.

~Pratyush
> 
> The most recent commit I have is:
> b349598bb755 ("Fix for the ARM64 "bt -R <symbol>" option if the only reference")
> 
> Are you using an older version?
> 
> 
> Thanks,
> 
> James
> 
> 
> [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/447597.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-19 13:28             ` James Morse
@ 2016-08-22  1:23               ` AKASHI Takahiro
  0 siblings, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-22  1:23 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Fri, Aug 19, 2016 at 02:28:06PM +0100, James Morse wrote:
> On 19/08/16 02:26, AKASHI Takahiro wrote:
> > Can you try the following change?
> > If it fixes your problem, I will post it as a patch.
> 
> Almost! This causes booting with acpi=on to fail for the familiar
> alignment-fault reasons[2],
> details and a suggested fix below.

Thank you for the fix.
I will merge your hunk to the patch.

-Takahiro AKASHI

> I think we should have this change as it matches x86's use of acpi, and means we
> don't rely on re-parsing the efi memory map to learn which areas of memory
> shouldn't be in the vmcore.
> 
> 
> > ===8<===
> > From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Date: Fri, 19 Aug 2016 09:57:52 +0900
> > Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> > 
> > ---
> >  arch/arm64/kernel/setup.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 38eda13..38589b5 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
> >  
> >  	for_each_memblock(memory, region) {
> >  		res = alloc_bootmem_low(sizeof(*res));
> > -		res->name  = "System RAM";
> > +		if (memblock_is_nomap(region)) {
> > +			res->name  = "reserved";
> 
> > +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> 
> This causes acpica to choke.  arch/arm64/include/asm/acpi.h:acpi_os_ioremap()
> calls page_is_ram(), which expects IORESOURCE_SYSTEM_RAM. From kernel/resource.c:
> > /*
> >  * This generic page_is_ram() returns true if specified address is
> >  * registered as System RAM in iomem_resource list.
> >  */
> > int __weak page_is_ram(unsigned long pfn)
> 
> We are trying to infer information about the EFI memory map by looking through
> iomem_resource list generated from memblock.
> drivers/firmware/efi/arm-init.c:reserve_regions() adds memory with the WB
> attribute to memblock via early_init_dt_add_memory_arch(), so changing
> page_is_ram() for memblock_is_memory() is one step closer to checking the
> attributes in the efi memory map (which turns out to tricky).
> 
> With your v24 on v4.8-rc1 and 'mark reserved memblock regions explicitly', and
> this extra hack [0], I can boot, kdump, extract the vmcore (with read() and
> mmap()), and pull things out of it with crash.
> 
> 
> Thanks,
> 
> James
> 
> 
> > +		} else {
> > +			res->name  = "System RAM";
> > +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > +		}
> >  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> >  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> > -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >  
> >  		request_resource(&iomem_resource, res);
> 
> [0] whitespace damaged hack for acpi_os_ioremap().
> ------------------------------------%<------------------------------------
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -12,6 +12,7 @@
>  #ifndef _ASM_ACPI_H
>  #define _ASM_ACPI_H
> 
> +#include <linux/memblock.h>
>  #include <linux/mm.h>
>  #include <linux/psci.h>
> 
> @@ -32,7 +33,11 @@
>  static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
>                                             acpi_size size)
>  {
> -       if (!page_is_ram(phys >> PAGE_SHIFT))
> +       /*
> +        * EFI's reserve_regions() call adds memory with the WB attribute
> +        * to memblock via early_init_dt_add_memory_arch().
> +        */
> +       if (!memblock_is_memory(phys))
>                 return ioremap(phys, size);
> 
>         return ioremap_cache(phys, size);
> ------------------------------------%<------------------------------------
> 
> [1] cat /proc/iomem | tail -n 9
> 8000000000-8001e7ffff : reserved
> 8001e80000-83ff17ffff : System RAM
>   8002080000-8002b0ffff : Kernel code
>   8002d90000-8002f6ffff : Kernel data
>   80dfe00000-80ffdfffff : Crash kernel
> 83ff180000-83ff1cffff : reserved
> 83ff1d0000-83ff21ffff : System RAM
> 83ff220000-83ffe4ffff : reserved
> 83ffe50000-83ffffffff : System RAM
> 
> [2] Panic on boot without [0]
> [    0.037098] ACPI: Core revision 20160422
> [    0.041174] Unable to handle kernel paging request at virtual address
> fffffc0008f831e7
> [    0.049155] pgd = fffffc0008f50000
> [    0.052576] [fffffc0008f831e7] *pgd=00000083fffe0003, *pud=00000083fffe0003,
> *pmd=00000083fffe0003, *pte=00e80083ff1c0707
> [    0.063632] Internal error: Oops: 96000021 [#1] PREEMPT SMP
> [    0.069245] Modules linked in:
> [    0.072317] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1+ #4892
> [    0.078891] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive)
> (DT)
> [    0.086518] task: fffffc0008dc8a80 task.stack: fffffc0008d90000
> [    0.092486] PC is at acpi_ns_lookup+0x238/0x378
> [    0.097049] LR is at acpi_ds_load1_begin_op+0x88/0x260
> [    0.102290] pc : [<fffffc000840fed8>] lr : [<fffffc0008405dcc>] pstate: 60000045
> [    0.109740] sp : fffffc0008d93bb0
> [    0.113071] x29: fffffc0008d93bb0 x28: 0000000000000001
> [    0.118420] x27: 0000000000000000 x26: 000000000000001b
> [    0.123768] x25: fffffc0008a813b7 x24: 000000000000001b
> [    0.129119] x23: 000000000000001b x22: 0000000000000001
> [    0.134466] x21: fffffc0008d93cb8 x20: 0000000000000001
> [    0.139814] x19: fffffc0008f831e7 x18: 0000000000000023
> [    0.145163] x17: ffffffffffffffff x16: 0000000000000000
> [    0.150511] x15: 000000000000038e x14: 0000000000007fff
> [    0.155859] x13: ffffffffffffff00 x12: 0000000000000008
> [    0.161208] x11: 0000000000000028 x10: 00000000ffffff76
> [    0.166556] x9 : 0000000000000000 x8 : fffffe03c00b0140
> [    0.171907] x7 : 0000000000000003 x6 : fffffc0008d93cb8
> [    0.177255] x5 : fffffe03c0060800 x4 : fffffc0008ee7fe8
> [    0.182603] x3 : fffffc0008ee8000 x2 : fffffc0008ee7fe8
> [    0.187951] x1 : fffffc0008f831e7 x0 : 0000000000000000
> [    0.193296]
> [    0.194789] Process swapper/0 (pid: 0, stack limit = 0xfffffc0008d90020)
> [    0.483605] Call trace:
> [    0.486062] Exception stack(0xfffffc0008d939e0 to 0xfffffc0008d93b10)
> [    0.568522] [<fffffc000840fed8>] acpi_ns_lookup+0x238/0x378
> [    0.574134] [<fffffc0008405dcc>] acpi_ds_load1_begin_op+0x88/0x260
> [    0.580358] [<fffffc0008415e3c>] acpi_ps_build_named_op+0xa8/0x170
> [    0.586582] [<fffffc0008416034>] acpi_ps_create_op+0x130/0x230
> [    0.592455] [<fffffc0008415988>] acpi_ps_parse_loop+0x174/0x580
> [    0.598419] [<fffffc00084168b4>] acpi_ps_parse_aml+0xa8/0x278
> [    0.604208] [<fffffc0008411efc>] acpi_ns_one_complete_parse+0x130/0x158
> [    0.610871] [<fffffc0008411f48>] acpi_ns_parse_table+0x24/0x44
> [    0.616745] [<fffffc00084116fc>] acpi_ns_load_table+0x50/0xe0
> [    0.622535] [<fffffc000841ad80>] acpi_tb_load_namespace+0xdc/0x25c
> [    0.628764] [<fffffc0008b383f4>] acpi_load_tables+0x3c/0xa8
> [    0.634377] [<fffffc0008b37548>] acpi_early_init+0x88/0xbc
> [    0.639903] [<fffffc0008b10b30>] start_kernel+0x330/0x394
> [    0.645340] [<fffffc0008b10210>] __primary_switched+0x7c/0x8c
> [    0.651127] Code: 2a1a03f7 2a0002d6 36380054 321902d6 (b9400260)
> [    0.657270] ---[ end trace 0000000000000000 ]---
> [    0.661922] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.668673] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-19 11:22             ` Pratyush Anand
@ 2016-08-22  1:29               ` AKASHI Takahiro
  2016-08-22  7:07                 ` Pratyush Anand
  2016-08-22 13:47                 ` James Morse
  0 siblings, 2 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-22  1:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> On 19/08/2016:10:26:52 AM, AKASHI Takahiro wrote:
> > >From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Date: Fri, 19 Aug 2016 09:57:52 +0900
> > Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> > 
> > ---
> >  arch/arm64/kernel/setup.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 38eda13..38589b5 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
> >  
> >  	for_each_memblock(memory, region) {
> >  		res = alloc_bootmem_low(sizeof(*res));
> > -		res->name  = "System RAM";
> > +		if (memblock_is_nomap(region)) {
> > +			res->name  = "reserved";
> > +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> > +		} else {
> > +			res->name  = "System RAM";
> > +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > +		}
> >  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> >  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> > -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >  
> >  		request_resource(&iomem_resource, res);
> 
> 
> It will help kexec-tools to prevent copying  of any unnecessary data. I
> think, then you also need to change phys_offset calculation in kexec-tools. That
> should be start of either of first "reserved" or "System RAM" block.

Good point, but I'm not sure this is always true.
Is there any system whose ACPI memory is *not* part of DRAM
(so not part of linear mapping)?

Thanks,
-Takahiro AKASHI

> ~Pratyush

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-08-19 13:26     ` Rob Herring
@ 2016-08-22  4:28       ` AKASHI Takahiro
  2016-08-30 16:34         ` Rob Herring
  2016-09-27 23:39       ` Mark Rutland
  1 sibling, 1 reply; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-22  4:28 UTC (permalink / raw)
  To: linux-arm-kernel

Rob,
[Cc: Mark]

On Fri, Aug 19, 2016 at 08:26:41AM -0500, Rob Herring wrote:
> On Tue, Aug 09, 2016 at 10:57:47AM +0900, AKASHI Takahiro wrote:
> > From: James Morse <james.morse@arm.com>
> > 
> > Add documentation for
> > 	linux,crashkernel-base and crashkernel-size,
> > 	linux,usable-memory-range, and
> > 	linux,elfcorehdr
> > used by arm64 kexec/kdump to decribe the kdump reserved area, and
> > the elfcorehdr's location within it.
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > [takahiro.akashi at linaro.org:
> >     renamed "usable-memory" to "usable-memory-range",
> >     added "linux,crashkernel-base" and "-size" ]
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > ---
> >  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
> >  1 file changed, 50 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> > index 6ae9d82..236188a 100644
> > --- a/Documentation/devicetree/bindings/chosen.txt
> > +++ b/Documentation/devicetree/bindings/chosen.txt
> > @@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
> >  book3e) by some versions of kexec-tools to tell the new kernel that it
> >  is being booted by kexec, as the booting environment may differ (e.g.
> >  a different secondary CPU release mechanism)
> > +
> > +linux,crashkernel-base
> > +linux,crashkernel-size
> > +----------------------
> > +These properties (currently used on PowerPC and arm64) indicates
> > +the base address and the size, respectively, of the reserved memory
> > +range for crash dump kernel.
> > +e.g.
> > +
> > +/ {
> > +	chosen {
> > +		linux,crashkernel-base = <0x9 0xf0000000>;
> > +		linux,crashkernel-size = <0x0 0x10000000>;
> > +	};
> > +};
> > +
> > +linux,usable-memory-range
> > +-------------------------
> > +
> > +This property (currently used only on arm64) holds the memory range,
> > +the base address and the size, which can be used as system ram on
> > +the *current* kernel. Note that, if this property is present, any memory
> > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > +memory" in EFI memory map should be ignored.
> > +e.g.
> > +
> > +/ {
> > +	chosen {
> > +		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
> > +	};
> > +};
> 
> I've read the discussion on this. I think you should use the existing 
> linux,usable-memory property in the memory nodes. If UEFI systems don't 
> have memory nodes, then you can find an UEFI way to describe this. DT is 
> not the dumping ground for what doesn't fit in UEFI. How do x86 systems 
> work?

Kdump for x86 passes the range of usable memory to crash dump kernel
either via:
(a) "memmap=nn at ss" command line parameter, or
(b) faked BIOS e820 map (a sort of memory map table)

(a) was rejected by Mark [1], and (b) is x86-specific.

I believe that my approach is better because it works in the same way
either on UEFI systems or DT-based systems.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/400122.html

Thanks,
-Takahiro AKASHI

> Rob

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-22  1:29               ` AKASHI Takahiro
@ 2016-08-22  7:07                 ` Pratyush Anand
  2016-08-22 13:47                 ` James Morse
  1 sibling, 0 replies; 52+ messages in thread
From: Pratyush Anand @ 2016-08-22  7:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/08/2016:10:29:20 AM, AKASHI Takahiro wrote:
> On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> > On 19/08/2016:10:26:52 AM, AKASHI Takahiro wrote:
> > > >From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> > > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > Date: Fri, 19 Aug 2016 09:57:52 +0900
> > > Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> > > 
> > > ---
> > >  arch/arm64/kernel/setup.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > > index 38eda13..38589b5 100644
> > > --- a/arch/arm64/kernel/setup.c
> > > +++ b/arch/arm64/kernel/setup.c
> > > @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
> > >  
> > >  	for_each_memblock(memory, region) {
> > >  		res = alloc_bootmem_low(sizeof(*res));
> > > -		res->name  = "System RAM";
> > > +		if (memblock_is_nomap(region)) {
> > > +			res->name  = "reserved";
> > > +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> > > +		} else {
> > > +			res->name  = "System RAM";
> > > +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > > +		}
> > >  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> > >  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> > > -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > >  
> > >  		request_resource(&iomem_resource, res);
> > 
> > 
> > It will help kexec-tools to prevent copying  of any unnecessary data. I
> > think, then you also need to change phys_offset calculation in kexec-tools. That
> > should be start of either of first "reserved" or "System RAM" block.
> 
> Good point, but I'm not sure this is always true.
> Is there any system whose ACPI memory is *not* part of DRAM
> (so not part of linear mapping)?
> 

Looking into kernel/resource.c:reserve_setup(), it seems that there could be
some none-DRAM area as well, which could be marked as "reserved". So, I think if
we mark nomap region as "reserved" then applications like kexec-tools may not
always identify start of DRAM correctly. Probably, we should give an unique name
to reserved system ram area.

~Pratyush

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-22  1:29               ` AKASHI Takahiro
  2016-08-22  7:07                 ` Pratyush Anand
@ 2016-08-22 13:47                 ` James Morse
  2016-08-23  0:38                   ` AKASHI Takahiro
  1 sibling, 1 reply; 52+ messages in thread
From: James Morse @ 2016-08-22 13:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/08/16 02:29, AKASHI Takahiro wrote:
> On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
>> It will help kexec-tools to prevent copying  of any unnecessary data. I
>> think, then you also need to change phys_offset calculation in kexec-tools. That
>> should be start of either of first "reserved" or "System RAM" block.
> 
> Good point, but I'm not sure this is always true.

> Is there any system whose ACPI memory is *not* part of DRAM

>From the spec, it looks like this is allowed.

What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, so
the question is what is its type and memory attributes?
The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory or
EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by the
firmware.

It is possible these regions have to be mapped non-cacheable, page 40 has a
couple of:
> If no information about the table location exists in the UEFI memory map or
ACPI memory
> descriptors, the table is assumed to be non-cached.

reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in the
memory map that has a 'WB' attribute to the memblock.memory list (via
early_init_dt_add_memory_arch()), it will also mark as no-map regions that have
this attribute and aren't in the is_reserve_region() list.

If these ACPI regions have the 'WB' attribute, we add them as memory and mark
them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
If they don't have the 'WB' attribute, then then they are left out of memblock
and aren't part of DRAM, I don't think these will show up in /proc/iomem at all.


Thanks,

James

[0] '2.3.6 AArch64 Platforms' of version 2.6 of the UEFI spec at
http://uefi.org/specifications

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-22 13:47                 ` James Morse
@ 2016-08-23  0:38                   ` AKASHI Takahiro
  2016-08-23 11:23                     ` Pratyush Anand
  0 siblings, 1 reply; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-23  0:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 22, 2016 at 02:47:30PM +0100, James Morse wrote:
> On 22/08/16 02:29, AKASHI Takahiro wrote:
> > On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> >> It will help kexec-tools to prevent copying  of any unnecessary data. I
> >> think, then you also need to change phys_offset calculation in kexec-tools. That
> >> should be start of either of first "reserved" or "System RAM" block.
> > 
> > Good point, but I'm not sure this is always true.
> 
> > Is there any system whose ACPI memory is *not* part of DRAM
> 
> From the spec, it looks like this is allowed.
> 
> What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, so
> the question is what is its type and memory attributes?

Yes.

> The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory or
> EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by the
> firmware.
> 
> It is possible these regions have to be mapped non-cacheable, page 40 has a
> couple of:
> > If no information about the table location exists in the UEFI memory map or
> ACPI memory
> > descriptors, the table is assumed to be non-cached.
> 
> reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in the
> memory map that has a 'WB' attribute to the memblock.memory list (via
> early_init_dt_add_memory_arch()), it will also mark as no-map regions that have
> this attribute and aren't in the is_reserve_region() list.
> 
> If these ACPI regions have the 'WB' attribute, we add them as memory and mark
> them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
> If they don't have the 'WB' attribute, then then they are left out of memblock
> and aren't part of DRAM, I don't think these will show up in /proc/iomem at all.

Let's say,
0x1000-0x1fff: reserved (SRAM for UEFI, WB)
0x80000000-0xffffffff: System RAM (DRAM)

If, as Pratyush suggested, "reserved" resources are added to phys_offset
calculation, the kernel linear mapping area starts at PAGE_OFFSET, but
there is no actual mapping around PAGE_OFFSET.
It won't hurt anything, but looks funny.
So we'd better not include "reserved" in phys_offset calculation anyway.
        -> Pratyush

Thanks,
-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James
> 
> [0] '2.3.6 AArch64 Platforms' of version 2.6 of the UEFI spec at
> http://uefi.org/specifications

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-23  0:38                   ` AKASHI Takahiro
@ 2016-08-23 11:23                     ` Pratyush Anand
  2016-08-24  8:04                       ` Dave Young
  0 siblings, 1 reply; 52+ messages in thread
From: Pratyush Anand @ 2016-08-23 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/08/2016:09:38:16 AM, AKASHI Takahiro wrote:
> On Mon, Aug 22, 2016 at 02:47:30PM +0100, James Morse wrote:
> > On 22/08/16 02:29, AKASHI Takahiro wrote:
> > > On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> > >> It will help kexec-tools to prevent copying  of any unnecessary data. I
> > >> think, then you also need to change phys_offset calculation in kexec-tools. That
> > >> should be start of either of first "reserved" or "System RAM" block.
> > > 
> > > Good point, but I'm not sure this is always true.
> > 
> > > Is there any system whose ACPI memory is *not* part of DRAM
> > 
> > From the spec, it looks like this is allowed.
> > 
> > What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, so
> > the question is what is its type and memory attributes?
> 
> Yes.
> 
> > The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory or
> > EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by the
> > firmware.
> > 
> > It is possible these regions have to be mapped non-cacheable, page 40 has a
> > couple of:
> > > If no information about the table location exists in the UEFI memory map or
> > ACPI memory
> > > descriptors, the table is assumed to be non-cached.
> > 
> > reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in the
> > memory map that has a 'WB' attribute to the memblock.memory list (via
> > early_init_dt_add_memory_arch()), it will also mark as no-map regions that have
> > this attribute and aren't in the is_reserve_region() list.
> > 
> > If these ACPI regions have the 'WB' attribute, we add them as memory and mark
> > them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
> > If they don't have the 'WB' attribute, then then they are left out of memblock
> > and aren't part of DRAM, I don't think these will show up in /proc/iomem at all.
> 
> Let's say,
> 0x1000-0x1fff: reserved (SRAM for UEFI, WB)
> 0x80000000-0xffffffff: System RAM (DRAM)

May be slightly more complicated:
0x80000000-0x80001fff: System RAM (DRAM) for UEFI, WB
0x80002000-0xffffffff: System RAM (DRAM)

Kernel will have phys_offset 0x80000000, however kexec-tools will calculate it
as 0x80002000.

> 
> If, as Pratyush suggested, "reserved" resources are added to phys_offset
> calculation, the kernel linear mapping area starts at PAGE_OFFSET, but
> there is no actual mapping around PAGE_OFFSET.
> It won't hurt anything, but looks funny.
> So we'd better not include "reserved" in phys_offset calculation anyway.
>         -> Pratyush

My only concern is that, then we will have different values of phys_offset in
kernel and kexec-tools, which might lead to further confusion.

~Pratyush

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-23 11:23                     ` Pratyush Anand
@ 2016-08-24  8:04                       ` Dave Young
  2016-08-24 10:25                         ` James Morse
  0 siblings, 1 reply; 52+ messages in thread
From: Dave Young @ 2016-08-24  8:04 UTC (permalink / raw)
  To: linux-arm-kernel

Ccing uefi people.

On 08/23/16 at 04:53pm, Pratyush Anand wrote:
> On 23/08/2016:09:38:16 AM, AKASHI Takahiro wrote:
> > On Mon, Aug 22, 2016 at 02:47:30PM +0100, James Morse wrote:
> > > On 22/08/16 02:29, AKASHI Takahiro wrote:
> > > > On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> > > >> It will help kexec-tools to prevent copying  of any unnecessary data. I
> > > >> think, then you also need to change phys_offset calculation in kexec-tools. That
> > > >> should be start of either of first "reserved" or "System RAM" block.
> > > > 
> > > > Good point, but I'm not sure this is always true.
> > > 
> > > > Is there any system whose ACPI memory is *not* part of DRAM
> > > 
> > > From the spec, it looks like this is allowed.
> > > 
> > > What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, so
> > > the question is what is its type and memory attributes?
> > 
> > Yes.
> > 
> > > The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory or
> > > EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by the
> > > firmware.
> > > 
> > > It is possible these regions have to be mapped non-cacheable, page 40 has a
> > > couple of:
> > > > If no information about the table location exists in the UEFI memory map or
> > > ACPI memory
> > > > descriptors, the table is assumed to be non-cached.
> > > 
> > > reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in the
> > > memory map that has a 'WB' attribute to the memblock.memory list (via
> > > early_init_dt_add_memory_arch()), it will also mark as no-map regions that have
> > > this attribute and aren't in the is_reserve_region() list.

Looking the arm-init.c, I suspect it missed the some efi ranges as
reserved ranges like runtime code and runtime data etc. But I might be
wrong.

Below is the is_reserve_region, it will regard any regions which is not
in the EFI_* below as normal memory, it does not include the runtime
ranges and other types.

static __init int is_reserve_region(efi_memory_desc_t *md)
{
        switch (md->type) {
        case EFI_LOADER_CODE:
        case EFI_LOADER_DATA:
        case EFI_BOOT_SERVICES_CODE:
        case EFI_BOOT_SERVICES_DATA:
        case EFI_CONVENTIONAL_MEMORY:
        case EFI_PERSISTENT_MEMORY:
                return 0;
        default:
                break;
        }
        return is_normal_ram(md);
}

Let's see the x86 do_add_efi_mem_map, the default case set all other
types as reserved. Shouldn't this be same in all arches though there's
no e820 in arm(64)?

static void __init do_add_efi_memmap(void)
{

[snip]
                switch (md->type) {
                case EFI_LOADER_CODE:
                case EFI_LOADER_DATA:
                case EFI_BOOT_SERVICES_CODE:
                case EFI_BOOT_SERVICES_DATA:
                case EFI_CONVENTIONAL_MEMORY:
                        if (md->attribute & EFI_MEMORY_WB)
                                e820_type = E820_RAM;
                        else
                                e820_type = E820_RESERVED;
                        break;
[snip]
                default:
                        /*
                         * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
                         * EFI_RUNTIME_SERVICES_DATA
                         * EFI_MEMORY_MAPPED_IO
                         * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
                         */
                        e820_type = E820_RESERVED;
                        break;
                }
[snip]
}

> > > 
> > > If these ACPI regions have the 'WB' attribute, we add them as memory and mark
> > > them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
> > > If they don't have the 'WB' attribute, then then they are left out of memblock
> > > and aren't part of DRAM, I don't think these will show up in /proc/iomem at all.
> > 
> > Let's say,
> > 0x1000-0x1fff: reserved (SRAM for UEFI, WB)
> > 0x80000000-0xffffffff: System RAM (DRAM)
> 
> May be slightly more complicated:
> 0x80000000-0x80001fff: System RAM (DRAM) for UEFI, WB
> 0x80002000-0xffffffff: System RAM (DRAM)
> 
> Kernel will have phys_offset 0x80000000, however kexec-tools will calculate it
> as 0x80002000.
> 
> > 
> > If, as Pratyush suggested, "reserved" resources are added to phys_offset
> > calculation, the kernel linear mapping area starts at PAGE_OFFSET, but
> > there is no actual mapping around PAGE_OFFSET.
> > It won't hurt anything, but looks funny.
> > So we'd better not include "reserved" in phys_offset calculation anyway.
> >         -> Pratyush
> 
> My only concern is that, then we will have different values of phys_offset in
> kernel and kexec-tools, which might lead to further confusion.
> 
> ~Pratyush

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-24  8:04                       ` Dave Young
@ 2016-08-24 10:25                         ` James Morse
  2016-08-25  1:04                           ` Dave Young
  0 siblings, 1 reply; 52+ messages in thread
From: James Morse @ 2016-08-24 10:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dave,

On 24/08/16 09:04, Dave Young wrote:
> Looking the arm-init.c, I suspect it missed the some efi ranges as
> reserved ranges like runtime code and runtime data etc. But I might be
> wrong.

This had me confused for too... I think I get it, my understanding is:


> static __init int is_reserve_region(efi_memory_desc_t *md)
> {
>         switch (md->type) {
>         case EFI_LOADER_CODE:
>         case EFI_LOADER_DATA:
>         case EFI_BOOT_SERVICES_CODE:
>         case EFI_BOOT_SERVICES_DATA:
>         case EFI_CONVENTIONAL_MEMORY:
>         case EFI_PERSISTENT_MEMORY:
>                 return 0;

return false - this is the list of region-types to never reserve, regardless of
memory attributes.


>         default:
>                 break;
>         }
>         return is_normal_ram(md);

If its not in the 'never reserve' list above, then we check if the region is
'normal' ram. If it is then it will end up in memblock.memory so we return true,
causing it to be marked nomap too.

reserve_regions() in that same file calls is_normal_ram() directly before adding
all regions with the WB attribute to memblock.memory via
early_init_dt_add_memory_arch().

A runtime region with the WB attribute will be caught by is_reserve_region(),
and is_normal_ram(), so it ends up in memblock.memory and memblock.nomap.


> }
> 
> Let's see the x86 do_add_efi_mem_map, the default case set all other
> types as reserved. Shouldn't this be same in all arches though there's
> no e820 in arm(64)?

> static void __init do_add_efi_memmap(void)
> {
> 
> [snip]
>                 switch (md->type) {
>                 case EFI_LOADER_CODE:
>                 case EFI_LOADER_DATA:
>                 case EFI_BOOT_SERVICES_CODE:
>                 case EFI_BOOT_SERVICES_DATA:
>                 case EFI_CONVENTIONAL_MEMORY:
>                         if (md->attribute & EFI_MEMORY_WB)
>                                 e820_type = E820_RAM;

In this case reserve_regions() will add the memory to memblock.memory because it
has the WB attribute, and not reserve it.


>                         else
>                                 e820_type = E820_RESERVED;

Without the WB attribute, these regions are in neither memblock.memory nor
memblock.nomap.


>                         break;
> [snip]
>                 default:
>                         /*
>                          * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
>                          * EFI_RUNTIME_SERVICES_DATA
>                          * EFI_MEMORY_MAPPED_IO
>                          * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
>                          */
>                         e820_type = E820_RESERVED;
>                         break;

If any other regions has the WB attribute, it will be added to memblock.memory
and memblock.nomap. If it doesn't, it will appear in neither.


>                 }
> [snip]
> }

Does this help at all?


Thanks,

James

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-09  1:56     ` [PATCH v24 5/9] arm64: kdump: add kdump support AKASHI Takahiro
  2016-08-10 16:38       ` James Morse
@ 2016-08-24 14:44       ` Ard Biesheuvel
  2016-08-26  6:22         ` AKASHI Takahiro
  1 sibling, 1 reply; 52+ messages in thread
From: Ard Biesheuvel @ 2016-08-24 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 9 August 2016 at 03:56, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:
> On crash dump kernel, all the information about primary kernel's system
> memory (core image) is available in elf core header.
> The primary kernel will set aside this header with reserve_elfcorehdr()
> at boot time and inform crash dump kernel of its location via a new
> device-tree property, "linux,elfcorehdr".
>
> Please note that all other architectures use traditional "elfcorehdr="
> kernel parameter for this purpose.
>
> Then crash dump kernel will access the primary kernel's memory with
> copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> not reside in linear mapping on crash dump kernel.
>
> We also need our own elfcorehdr_read() here since the header is placed
> within crash dump kernel's usable memory.
>
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> ---
>  arch/arm64/Kconfig             | 11 +++++++
>  arch/arm64/kernel/Makefile     |  1 +
>  arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
>  arch/arm64/mm/init.c           | 54 ++++++++++++++++++++++++++++++++
>  4 files changed, 137 insertions(+)
>  create mode 100644 arch/arm64/kernel/crash_dump.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 69c8787..9ede54b 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -682,6 +682,17 @@ config KEXEC
>           but it is independent of the system firmware.   And like a reboot
>           you can start any kernel with it, not just Linux.
>
> +config CRASH_DUMP
> +       bool "Build kdump crash kernel"
> +       help
> +         Generate crash dump after being started by kexec. This should
> +         be normally only set in special crash dump kernels which are
> +         loaded in the main kernel with kexec-tools into a specially
> +         reserved region and then later executed after a crash by
> +         kdump/kexec.
> +
> +         For more details see Documentation/kdump/kdump.txt
> +
>  config XEN_DOM0
>         def_bool y
>         depends on XEN
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 14f7b65..f1cbfc8 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -48,6 +48,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)    += kaslr.o
>  arm64-obj-$(CONFIG_HIBERNATION)                += hibernate.o hibernate-asm.o
>  arm64-obj-$(CONFIG_KEXEC)              += machine_kexec.o relocate_kernel.o    \
>                                            cpu-reset.o
> +arm64-obj-$(CONFIG_CRASH_DUMP)         += crash_dump.o
>
>  obj-y                                  += $(arm64-obj-y) vdso/ probes/
>  obj-m                                  += $(arm64-obj-m)
> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> new file mode 100644
> index 0000000..2dc54d1
> --- /dev/null
> +++ b/arch/arm64/kernel/crash_dump.c
> @@ -0,0 +1,71 @@
> +/*
> + * Routines for doing kexec-based kdump
> + *
> + * Copyright (C) 2014 Linaro Limited
> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/crash_dump.h>
> +#include <linux/errno.h>
> +#include <linux/io.h>
> +#include <linux/memblock.h>
> +#include <linux/uaccess.h>
> +#include <asm/memory.h>
> +
> +/**
> + * copy_oldmem_page() - copy one page from old kernel memory
> + * @pfn: page frame number to be copied
> + * @buf: buffer where the copied page is placed
> + * @csize: number of bytes to copy
> + * @offset: offset in bytes into the page
> + * @userbuf: if set, @buf is in a user address space
> + *
> + * This function copies one page from old kernel memory into buffer pointed by
> + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> + * copied or negative error in case of failure.
> + */
> +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> +                        size_t csize, unsigned long offset,
> +                        int userbuf)
> +{
> +       void *vaddr;
> +
> +       if (!csize)
> +               return 0;
> +
> +       vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);

Could we please use memremap() here?

> +       if (!vaddr)
> +               return -ENOMEM;
> +
> +       if (userbuf) {
> +               if (copy_to_user(buf, vaddr + offset, csize)) {
> +                       iounmap(vaddr);
> +                       return -EFAULT;
> +               }
> +       } else {
> +               memcpy(buf, vaddr + offset, csize);
> +       }
> +
> +       iounmap(vaddr);

... and memunmap here?

ioremap_cache() is not very well defined, and memremap() has been
introduced specifically to replace it, so I think we should use it in
new code.

Thanks,
Ard.


> +
> +       return csize;
> +}
> +
> +/**
> + * elfcorehdr_read - read from ELF core header
> + * @buf: buffer where the data is placed
> + * @csize: number of bytes to read
> + * @ppos: address in the memory
> + *
> + * This function reads @count bytes from elf core header which exists
> + * on crash dump kernel's memory.
> + */
> +ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> +{
> +       memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
> +       return count;
> +}
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index e3771c4..bba1e39 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -36,6 +36,7 @@
>  #include <linux/efi.h>
>  #include <linux/swiotlb.h>
>  #include <linux/kexec.h>
> +#include <linux/crash_dump.h>
>
>  #include <asm/boot.h>
>  #include <asm/fixmap.h>
> @@ -186,6 +187,57 @@ static void __init reserve_crashkernel(void)
>  }
>  #endif /* CONFIG_KEXEC_CORE */
>
> +#ifdef CONFIG_CRASH_DUMP
> +static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> +               const char *uname, int depth, void *data)
> +{
> +       const __be32 *reg;
> +       int len;
> +
> +       if (depth != 1 || strcmp(uname, "chosen") != 0)
> +               return 0;
> +
> +       reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
> +       if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> +               return 1;
> +
> +       elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
> +       elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
> +
> +       return 1;
> +}
> +
> +/*
> + * reserve_elfcorehdr() - reserves memory for elf core header
> + *
> + * This function reserves elf core header given in "elfcorehdr=" kernel
> + * command line parameter. This region contains all the information about
> + * primary kernel's core image and is used by a dump capture kernel to
> + * access the system memory on primary kernel.
> + */
> +static void __init reserve_elfcorehdr(void)
> +{
> +       of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
> +
> +       if (!elfcorehdr_size)
> +               return;
> +
> +       if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
> +               pr_warn("elfcorehdr is overlapped\n");
> +               return;
> +       }
> +
> +       memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
> +
> +       pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
> +               elfcorehdr_size >> 10, elfcorehdr_addr);
> +}
> +#else
> +static void __init reserve_elfcorehdr(void)
> +{
> +       ;
> +}
> +#endif /* CONFIG_CRASH_DUMP */
>  /*
>   * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
>   * currently assumes that for memory starting above 4G, 32-bit devices will
> @@ -444,6 +496,8 @@ void __init arm64_memblock_init(void)
>
>         reserve_crashkernel();
>
> +       reserve_elfcorehdr();
> +
>         dma_contiguous_reserve(arm64_dma_phys_limit);
>
>         memblock_allow_resize();
> --
> 2.9.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-24 10:25                         ` James Morse
@ 2016-08-25  1:04                           ` Dave Young
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Young @ 2016-08-25  1:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/24/16 at 11:25am, James Morse wrote:
> Hi Dave,
> 
> On 24/08/16 09:04, Dave Young wrote:
> > Looking the arm-init.c, I suspect it missed the some efi ranges as
> > reserved ranges like runtime code and runtime data etc. But I might be
> > wrong.
> 
> This had me confused for too... I think I get it, my understanding is:
> 
James, thanks for your clarification. 
> 
> > static __init int is_reserve_region(efi_memory_desc_t *md)
> > {
> >         switch (md->type) {
> >         case EFI_LOADER_CODE:
> >         case EFI_LOADER_DATA:
> >         case EFI_BOOT_SERVICES_CODE:
> >         case EFI_BOOT_SERVICES_DATA:
> >         case EFI_CONVENTIONAL_MEMORY:
> >         case EFI_PERSISTENT_MEMORY:
> >                 return 0;
> 
> return false - this is the list of region-types to never reserve, regardless of
> memory attributes.
> 
> 
> >         default:
> >                 break;
> >         }
> >         return is_normal_ram(md);
> 
> If its not in the 'never reserve' list above, then we check if the region is
> 'normal' ram. If it is then it will end up in memblock.memory so we return true,
> causing it to be marked nomap too.
> 
> reserve_regions() in that same file calls is_normal_ram() directly before adding
> all regions with the WB attribute to memblock.memory via
> early_init_dt_add_memory_arch().
> 
> A runtime region with the WB attribute will be caught by is_reserve_region(),
> and is_normal_ram(), so it ends up in memblock.memory and memblock.nomap.

Hmm, It is not straitforward like the do_add_efi_memmap. I got it.

BTW, I believe there is same problem in arm as well as arm64, it also
need mark the runtime ranges as "reserved" /proc/iomem. 

> 
> 
> > }
> > 
> > Let's see the x86 do_add_efi_mem_map, the default case set all other
> > types as reserved. Shouldn't this be same in all arches though there's
> > no e820 in arm(64)?
> 
> > static void __init do_add_efi_memmap(void)
> > {
> > 
> > [snip]
> >                 switch (md->type) {
> >                 case EFI_LOADER_CODE:
> >                 case EFI_LOADER_DATA:
> >                 case EFI_BOOT_SERVICES_CODE:
> >                 case EFI_BOOT_SERVICES_DATA:
> >                 case EFI_CONVENTIONAL_MEMORY:
> >                         if (md->attribute & EFI_MEMORY_WB)
> >                                 e820_type = E820_RAM;
> 
> In this case reserve_regions() will add the memory to memblock.memory because it
> has the WB attribute, and not reserve it.
> 
> 
> >                         else
> >                                 e820_type = E820_RESERVED;
> 
> Without the WB attribute, these regions are in neither memblock.memory nor
> memblock.nomap.
> 
> 
> >                         break;
> > [snip]
> >                 default:
> >                         /*
> >                          * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
> >                          * EFI_RUNTIME_SERVICES_DATA
> >                          * EFI_MEMORY_MAPPED_IO
> >                          * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
> >                          */
> >                         e820_type = E820_RESERVED;
> >                         break;
> 
> If any other regions has the WB attribute, it will be added to memblock.memory
> and memblock.nomap. If it doesn't, it will appear in neither.
> 
> 
> >                 }
> > [snip]
> > }
> 
> Does this help at all?
> 

Yes, thanks a lot!
Dave

> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 5/9] arm64: kdump: add kdump support
  2016-08-24 14:44       ` Ard Biesheuvel
@ 2016-08-26  6:22         ` AKASHI Takahiro
  0 siblings, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-26  6:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 24, 2016 at 04:44:09PM +0200, Ard Biesheuvel wrote:
> On 9 August 2016 at 03:56, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:
> > On crash dump kernel, all the information about primary kernel's system
> > memory (core image) is available in elf core header.
> > The primary kernel will set aside this header with reserve_elfcorehdr()
> > at boot time and inform crash dump kernel of its location via a new
> > device-tree property, "linux,elfcorehdr".
> >
> > Please note that all other architectures use traditional "elfcorehdr="
> > kernel parameter for this purpose.
> >
> > Then crash dump kernel will access the primary kernel's memory with
> > copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> > not reside in linear mapping on crash dump kernel.
> >
> > We also need our own elfcorehdr_read() here since the header is placed
> > within crash dump kernel's usable memory.
> >
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > ---
> >  arch/arm64/Kconfig             | 11 +++++++
> >  arch/arm64/kernel/Makefile     |  1 +
> >  arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
> >  arch/arm64/mm/init.c           | 54 ++++++++++++++++++++++++++++++++
> >  4 files changed, 137 insertions(+)
> >  create mode 100644 arch/arm64/kernel/crash_dump.c
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 69c8787..9ede54b 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -682,6 +682,17 @@ config KEXEC
> >           but it is independent of the system firmware.   And like a reboot
> >           you can start any kernel with it, not just Linux.
> >
> > +config CRASH_DUMP
> > +       bool "Build kdump crash kernel"
> > +       help
> > +         Generate crash dump after being started by kexec. This should
> > +         be normally only set in special crash dump kernels which are
> > +         loaded in the main kernel with kexec-tools into a specially
> > +         reserved region and then later executed after a crash by
> > +         kdump/kexec.
> > +
> > +         For more details see Documentation/kdump/kdump.txt
> > +
> >  config XEN_DOM0
> >         def_bool y
> >         depends on XEN
> > diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> > index 14f7b65..f1cbfc8 100644
> > --- a/arch/arm64/kernel/Makefile
> > +++ b/arch/arm64/kernel/Makefile
> > @@ -48,6 +48,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)    += kaslr.o
> >  arm64-obj-$(CONFIG_HIBERNATION)                += hibernate.o hibernate-asm.o
> >  arm64-obj-$(CONFIG_KEXEC)              += machine_kexec.o relocate_kernel.o    \
> >                                            cpu-reset.o
> > +arm64-obj-$(CONFIG_CRASH_DUMP)         += crash_dump.o
> >
> >  obj-y                                  += $(arm64-obj-y) vdso/ probes/
> >  obj-m                                  += $(arm64-obj-m)
> > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > new file mode 100644
> > index 0000000..2dc54d1
> > --- /dev/null
> > +++ b/arch/arm64/kernel/crash_dump.c
> > @@ -0,0 +1,71 @@
> > +/*
> > + * Routines for doing kexec-based kdump
> > + *
> > + * Copyright (C) 2014 Linaro Limited
> > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#include <linux/crash_dump.h>
> > +#include <linux/errno.h>
> > +#include <linux/io.h>
> > +#include <linux/memblock.h>
> > +#include <linux/uaccess.h>
> > +#include <asm/memory.h>
> > +
> > +/**
> > + * copy_oldmem_page() - copy one page from old kernel memory
> > + * @pfn: page frame number to be copied
> > + * @buf: buffer where the copied page is placed
> > + * @csize: number of bytes to copy
> > + * @offset: offset in bytes into the page
> > + * @userbuf: if set, @buf is in a user address space
> > + *
> > + * This function copies one page from old kernel memory into buffer pointed by
> > + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> > + * copied or negative error in case of failure.
> > + */
> > +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> > +                        size_t csize, unsigned long offset,
> > +                        int userbuf)
> > +{
> > +       void *vaddr;
> > +
> > +       if (!csize)
> > +               return 0;
> > +
> > +       vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> 
> Could we please use memremap() here?
> 
> > +       if (!vaddr)
> > +               return -ENOMEM;
> > +
> > +       if (userbuf) {
> > +               if (copy_to_user(buf, vaddr + offset, csize)) {
> > +                       iounmap(vaddr);
> > +                       return -EFAULT;
> > +               }
> > +       } else {
> > +               memcpy(buf, vaddr + offset, csize);
> > +       }
> > +
> > +       iounmap(vaddr);
> 
> ... and memunmap here?
> 
> ioremap_cache() is not very well defined, and memremap() has been
> introduced specifically to replace it, so I think we should use it in
> new code.

Sure. I will use memremap(MEMREMAP_WB) instead.

Thanks,
-Takahiro AKASHI

> Thanks,
> Ard.
> 
> 
> > +
> > +       return csize;
> > +}
> > +
> > +/**
> > + * elfcorehdr_read - read from ELF core header
> > + * @buf: buffer where the data is placed
> > + * @csize: number of bytes to read
> > + * @ppos: address in the memory
> > + *
> > + * This function reads @count bytes from elf core header which exists
> > + * on crash dump kernel's memory.
> > + */
> > +ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> > +{
> > +       memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
> > +       return count;
> > +}
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index e3771c4..bba1e39 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -36,6 +36,7 @@
> >  #include <linux/efi.h>
> >  #include <linux/swiotlb.h>
> >  #include <linux/kexec.h>
> > +#include <linux/crash_dump.h>
> >
> >  #include <asm/boot.h>
> >  #include <asm/fixmap.h>
> > @@ -186,6 +187,57 @@ static void __init reserve_crashkernel(void)
> >  }
> >  #endif /* CONFIG_KEXEC_CORE */
> >
> > +#ifdef CONFIG_CRASH_DUMP
> > +static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> > +               const char *uname, int depth, void *data)
> > +{
> > +       const __be32 *reg;
> > +       int len;
> > +
> > +       if (depth != 1 || strcmp(uname, "chosen") != 0)
> > +               return 0;
> > +
> > +       reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
> > +       if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > +               return 1;
> > +
> > +       elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > +       elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > +
> > +       return 1;
> > +}
> > +
> > +/*
> > + * reserve_elfcorehdr() - reserves memory for elf core header
> > + *
> > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > + * command line parameter. This region contains all the information about
> > + * primary kernel's core image and is used by a dump capture kernel to
> > + * access the system memory on primary kernel.
> > + */
> > +static void __init reserve_elfcorehdr(void)
> > +{
> > +       of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
> > +
> > +       if (!elfcorehdr_size)
> > +               return;
> > +
> > +       if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
> > +               pr_warn("elfcorehdr is overlapped\n");
> > +               return;
> > +       }
> > +
> > +       memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
> > +
> > +       pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
> > +               elfcorehdr_size >> 10, elfcorehdr_addr);
> > +}
> > +#else
> > +static void __init reserve_elfcorehdr(void)
> > +{
> > +       ;
> > +}
> > +#endif /* CONFIG_CRASH_DUMP */
> >  /*
> >   * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
> >   * currently assumes that for memory starting above 4G, 32-bit devices will
> > @@ -444,6 +496,8 @@ void __init arm64_memblock_init(void)
> >
> >         reserve_crashkernel();
> >
> > +       reserve_elfcorehdr();
> > +
> >         dma_contiguous_reserve(arm64_dma_phys_limit);
> >
> >         memblock_allow_resize();
> > --
> > 2.9.0
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-08-22  4:28       ` AKASHI Takahiro
@ 2016-08-30 16:34         ` Rob Herring
  2016-08-30 23:45           ` AKASHI Takahiro
  0 siblings, 1 reply; 52+ messages in thread
From: Rob Herring @ 2016-08-30 16:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Aug 21, 2016 at 11:28 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Rob,
> [Cc: Mark]
>
> On Fri, Aug 19, 2016 at 08:26:41AM -0500, Rob Herring wrote:
>> On Tue, Aug 09, 2016 at 10:57:47AM +0900, AKASHI Takahiro wrote:
>> > From: James Morse <james.morse@arm.com>
>> >
>> > Add documentation for
>> >     linux,crashkernel-base and crashkernel-size,
>> >     linux,usable-memory-range, and
>> >     linux,elfcorehdr
>> > used by arm64 kexec/kdump to decribe the kdump reserved area, and
>> > the elfcorehdr's location within it.
>> >
>> > Signed-off-by: James Morse <james.morse@arm.com>
>> > [takahiro.akashi at linaro.org:
>> >     renamed "usable-memory" to "usable-memory-range",
>> >     added "linux,crashkernel-base" and "-size" ]
>> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> > ---
>> >  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
>> >  1 file changed, 50 insertions(+)
>> >
>> > diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
>> > index 6ae9d82..236188a 100644
>> > --- a/Documentation/devicetree/bindings/chosen.txt
>> > +++ b/Documentation/devicetree/bindings/chosen.txt
>> > @@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
>> >  book3e) by some versions of kexec-tools to tell the new kernel that it
>> >  is being booted by kexec, as the booting environment may differ (e.g.
>> >  a different secondary CPU release mechanism)
>> > +
>> > +linux,crashkernel-base
>> > +linux,crashkernel-size
>> > +----------------------
>> > +These properties (currently used on PowerPC and arm64) indicates
>> > +the base address and the size, respectively, of the reserved memory
>> > +range for crash dump kernel.
>> > +e.g.
>> > +
>> > +/ {
>> > +   chosen {
>> > +           linux,crashkernel-base = <0x9 0xf0000000>;
>> > +           linux,crashkernel-size = <0x0 0x10000000>;
>> > +   };
>> > +};
>> > +
>> > +linux,usable-memory-range
>> > +-------------------------
>> > +
>> > +This property (currently used only on arm64) holds the memory range,
>> > +the base address and the size, which can be used as system ram on
>> > +the *current* kernel. Note that, if this property is present, any memory
>> > +regions under "memory" nodes in DT blob or ones marked as "conventional
>> > +memory" in EFI memory map should be ignored.
>> > +e.g.
>> > +
>> > +/ {
>> > +   chosen {
>> > +           linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
>> > +   };
>> > +};
>>
>> I've read the discussion on this. I think you should use the existing
>> linux,usable-memory property in the memory nodes. If UEFI systems don't
>> have memory nodes, then you can find an UEFI way to describe this. DT is
>> not the dumping ground for what doesn't fit in UEFI. How do x86 systems
>> work?
>
> Kdump for x86 passes the range of usable memory to crash dump kernel
> either via:
> (a) "memmap=nn at ss" command line parameter, or
> (b) faked BIOS e820 map (a sort of memory map table)
>
> (a) was rejected by Mark [1], and (b) is x86-specific.
>
> I believe that my approach is better because it works in the same way
> either on UEFI systems or DT-based systems.

So we have a new way for both UEFI and DT. If UEFI folks can't come up
with a standard way to do things in the UEFI standard, then we just
dump them into DT? I'd rather see alignment with existing DT systems
(PPC) and in particular the tools.

Rob

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-08-30 16:34         ` Rob Herring
@ 2016-08-30 23:45           ` AKASHI Takahiro
  2016-08-31  5:02             ` AKASHI Takahiro
  0 siblings, 1 reply; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-30 23:45 UTC (permalink / raw)
  To: linux-arm-kernel

Rob,

On Tue, Aug 30, 2016 at 11:34:10AM -0500, Rob Herring wrote:
> On Sun, Aug 21, 2016 at 11:28 PM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > Rob,
> > [Cc: Mark]
> >
> > On Fri, Aug 19, 2016 at 08:26:41AM -0500, Rob Herring wrote:
> >> On Tue, Aug 09, 2016 at 10:57:47AM +0900, AKASHI Takahiro wrote:
> >> > From: James Morse <james.morse@arm.com>
> >> >
> >> > Add documentation for
> >> >     linux,crashkernel-base and crashkernel-size,
> >> >     linux,usable-memory-range, and
> >> >     linux,elfcorehdr
> >> > used by arm64 kexec/kdump to decribe the kdump reserved area, and
> >> > the elfcorehdr's location within it.
> >> >
> >> > Signed-off-by: James Morse <james.morse@arm.com>
> >> > [takahiro.akashi at linaro.org:
> >> >     renamed "usable-memory" to "usable-memory-range",
> >> >     added "linux,crashkernel-base" and "-size" ]
> >> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> >> > ---
> >> >  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
> >> >  1 file changed, 50 insertions(+)
> >> >
> >> > diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> >> > index 6ae9d82..236188a 100644
> >> > --- a/Documentation/devicetree/bindings/chosen.txt
> >> > +++ b/Documentation/devicetree/bindings/chosen.txt
> >> > @@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
> >> >  book3e) by some versions of kexec-tools to tell the new kernel that it
> >> >  is being booted by kexec, as the booting environment may differ (e.g.
> >> >  a different secondary CPU release mechanism)
> >> > +
> >> > +linux,crashkernel-base
> >> > +linux,crashkernel-size
> >> > +----------------------
> >> > +These properties (currently used on PowerPC and arm64) indicates
> >> > +the base address and the size, respectively, of the reserved memory
> >> > +range for crash dump kernel.
> >> > +e.g.
> >> > +
> >> > +/ {
> >> > +   chosen {
> >> > +           linux,crashkernel-base = <0x9 0xf0000000>;
> >> > +           linux,crashkernel-size = <0x0 0x10000000>;
> >> > +   };
> >> > +};
> >> > +
> >> > +linux,usable-memory-range
> >> > +-------------------------
> >> > +
> >> > +This property (currently used only on arm64) holds the memory range,
> >> > +the base address and the size, which can be used as system ram on
> >> > +the *current* kernel. Note that, if this property is present, any memory
> >> > +regions under "memory" nodes in DT blob or ones marked as "conventional
> >> > +memory" in EFI memory map should be ignored.
> >> > +e.g.
> >> > +
> >> > +/ {
> >> > +   chosen {
> >> > +           linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
> >> > +   };
> >> > +};
> >>
> >> I've read the discussion on this. I think you should use the existing
> >> linux,usable-memory property in the memory nodes. If UEFI systems don't
> >> have memory nodes, then you can find an UEFI way to describe this. DT is
> >> not the dumping ground for what doesn't fit in UEFI. How do x86 systems
> >> work?
> >
> > Kdump for x86 passes the range of usable memory to crash dump kernel
> > either via:
> > (a) "memmap=nn at ss" command line parameter, or
> > (b) faked BIOS e820 map (a sort of memory map table)
> >
> > (a) was rejected by Mark [1], and (b) is x86-specific.
> >
> > I believe that my approach is better because it works in the same way
> > either on UEFI systems or DT-based systems.
> 
> So we have a new way for both UEFI and DT. If UEFI folks can't come up
> with a standard way to do things in the UEFI standard, then we just
> dump them into DT?

No, I don't think so.
According to "Documentation/devicetree/bindings/chosen.txt,
  "The chosen node does not represent a real device, but serves as a place
   for passing data between firmware and the operating system, like boot
   arguments."

Since kexec/dump is some sort of boot loader, it all makes sense to
add a new property as an interface from the old kernel(kexec/dump)
to the new kernel. So my approach doesn't break any DT rules.

Please note that, even on UEFI/APCI systems, there are already a few
properties used under /chosen, like "linux,uefi-system-table" and 
"linux,mmap-*."
(Those properties are currently *not* defined in this document, though.)

> I'd rather see alignment with existing DT systems
> (PPC) and in particular the tools.

I'm not sure what you mean here, but I guess:
1) On DT-only systems, we should follow the way that PPC does.
   (That is, adding "usable-memory" property to each "memory" node.)
2) On UEFI/ACPI system, we must invent a new own way for our purpose
   because, as I said before, there is no standard, but x86-specific way. 

Is this what you want to do?

If so,
do you agree to my approach of adding "linux,usable-memory-range" property
to /chosen *for UEFI/ACPI use*?

Thanks,
-Takahiro AKASHI

> Rob

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-08-09  1:52 [PATCH v24 0/9] arm64: add kdump support AKASHI Takahiro
  2016-08-09  1:52 ` [PATCH v24 1/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
  2016-08-09  2:04 ` [PATCH v24 0/9] arm64: add kdump support AKASHI Takahiro
@ 2016-08-31  3:41 ` Manish Jaggi
  2016-08-31  5:31   ` AKASHI Takahiro
  2 siblings, 1 reply; 52+ messages in thread
From: Manish Jaggi @ 2016-08-31  3:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 08/09/2016 07:22 AM, AKASHI Takahiro wrote:
> This patch series adds kdump support on arm64.
> 
> To load a crash-dump kernel to the systems, a series of patches to
> kexec-tools, which have not yet been merged upstream, are needed.
> Please use my kdump patches [1].
> 
> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
>   - crash utility (coming v7.1.6 or later) [2]
>     (Necessary patches have already been queued in the master.)
> 
> [1] T.B.D.
> [2] https://github.com/crash-utility/crash.git
> 
> Changes for v24 (Aug 9, 2016):
>   o Rebase to Linux-4.8-rc1
>   o Update descriptions about newly added DT proerties
> 
> Changes for v23 (July 26, 2016):
> 
>   o Move memblock_reserve() to a single place in reserve_crashkernel()
>   o Use  cpu_park_loop() in ipi_cpu_crash_stop()
>   o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
>   o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
>     (for ACPI) from usable memory at crash kernel
> 
> Changes for v22 (July 12, 2016):
> 
>   o Export "crashkernel-base" and "crashkernel-size" via device-tree,
>     and add some descriptions about them in chosen.txt
>   o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
>     with powerpc's "usable-memory"
>   o Make cosmetic changes regarding "ifdef" usage
>   o Correct some wordings in kdump.txt
> 
> Changes for v21 (July 6, 2016):
> 
>   o Remove kexec patches.
>   o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
>   o Clarify the description about kvm in kdump.txt.
> 
> See the following link [3] for older changes:
> [3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
> 
> AKASHI Takahiro (8):
>   arm64: kdump: reserve memory for crash dump kernel
>   memblock: add memblock_cap_memory_range()
>   arm64: limit memory regions based on DT property, usable-memory-range
>   arm64: kdump: implement machine_crash_shutdown()
>   arm64: kdump: add kdump support
>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>   arm64: kdump: enable kdump in the arm64 defconfig
>   arm64: kdump: update a kernel doc
> 
> James Morse (1):
>   Documentation: dt: chosen properties for arm64 kdump
> 
>  Documentation/devicetree/bindings/chosen.txt |  45 ++++++
>  Documentation/kdump/kdump.txt                |  16 ++-
>  arch/arm64/Kconfig                           |  11 ++
>  arch/arm64/configs/defconfig                 |   1 +
>  arch/arm64/include/asm/hardirq.h             |   2 +-
>  arch/arm64/include/asm/kexec.h               |  41 +++++-
>  arch/arm64/include/asm/smp.h                 |   2 +
>  arch/arm64/kernel/Makefile                   |   1 +
>  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
>  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
>  arch/arm64/kernel/setup.c                    |   7 +-
>  arch/arm64/kernel/smp.c                      |  63 +++++++++
>  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
>  include/linux/memblock.h                     |   1 +
>  mm/memblock.c                                |  28 ++++
>  15 files changed, 551 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm64/kernel/crash_dump.c
> 
Couple of points
a) Just a note, while testing, the crashkernel reserved memory should be less than ARCH_LOW_ADDRESS_LIMIT (=arm64_dma_phys_limit).

b) Has anyone tested this on a SoC with Gicv3 ITS ?
Should the GICD/R be reset prior to switching to crash kernel ?
I am seeing lot of GICv3: RWP timeout, gone fishing while crash kernel boots.

Thanks,
Manish

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-08-30 23:45           ` AKASHI Takahiro
@ 2016-08-31  5:02             ` AKASHI Takahiro
  2016-09-02 10:11               ` AKASHI Takahiro
  0 siblings, 1 reply; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-31  5:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 31, 2016 at 08:45:46AM +0900, AKASHI Takahiro wrote:
> Rob,
> 
> On Tue, Aug 30, 2016 at 11:34:10AM -0500, Rob Herring wrote:
> > On Sun, Aug 21, 2016 at 11:28 PM, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> > > Rob,
> > > [Cc: Mark]
> > >
> > > On Fri, Aug 19, 2016 at 08:26:41AM -0500, Rob Herring wrote:
> > >> On Tue, Aug 09, 2016 at 10:57:47AM +0900, AKASHI Takahiro wrote:
> > >> > From: James Morse <james.morse@arm.com>
> > >> >
> > >> > Add documentation for
> > >> >     linux,crashkernel-base and crashkernel-size,
> > >> >     linux,usable-memory-range, and
> > >> >     linux,elfcorehdr
> > >> > used by arm64 kexec/kdump to decribe the kdump reserved area, and
> > >> > the elfcorehdr's location within it.
> > >> >
> > >> > Signed-off-by: James Morse <james.morse@arm.com>
> > >> > [takahiro.akashi at linaro.org:
> > >> >     renamed "usable-memory" to "usable-memory-range",
> > >> >     added "linux,crashkernel-base" and "-size" ]
> > >> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > >> > ---
> > >> >  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
> > >> >  1 file changed, 50 insertions(+)
> > >> >
> > >> > diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> > >> > index 6ae9d82..236188a 100644
> > >> > --- a/Documentation/devicetree/bindings/chosen.txt
> > >> > +++ b/Documentation/devicetree/bindings/chosen.txt
> > >> > @@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
> > >> >  book3e) by some versions of kexec-tools to tell the new kernel that it
> > >> >  is being booted by kexec, as the booting environment may differ (e.g.
> > >> >  a different secondary CPU release mechanism)
> > >> > +
> > >> > +linux,crashkernel-base
> > >> > +linux,crashkernel-size
> > >> > +----------------------
> > >> > +These properties (currently used on PowerPC and arm64) indicates
> > >> > +the base address and the size, respectively, of the reserved memory
> > >> > +range for crash dump kernel.
> > >> > +e.g.
> > >> > +
> > >> > +/ {
> > >> > +   chosen {
> > >> > +           linux,crashkernel-base = <0x9 0xf0000000>;
> > >> > +           linux,crashkernel-size = <0x0 0x10000000>;
> > >> > +   };
> > >> > +};
> > >> > +
> > >> > +linux,usable-memory-range
> > >> > +-------------------------
> > >> > +
> > >> > +This property (currently used only on arm64) holds the memory range,
> > >> > +the base address and the size, which can be used as system ram on
> > >> > +the *current* kernel. Note that, if this property is present, any memory
> > >> > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > >> > +memory" in EFI memory map should be ignored.
> > >> > +e.g.
> > >> > +
> > >> > +/ {
> > >> > +   chosen {
> > >> > +           linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
> > >> > +   };
> > >> > +};
> > >>
> > >> I've read the discussion on this. I think you should use the existing
> > >> linux,usable-memory property in the memory nodes. If UEFI systems don't
> > >> have memory nodes, then you can find an UEFI way to describe this. DT is
> > >> not the dumping ground for what doesn't fit in UEFI. How do x86 systems
> > >> work?
> > >
> > > Kdump for x86 passes the range of usable memory to crash dump kernel
> > > either via:
> > > (a) "memmap=nn at ss" command line parameter, or
> > > (b) faked BIOS e820 map (a sort of memory map table)
> > >
> > > (a) was rejected by Mark [1], and (b) is x86-specific.
> > >
> > > I believe that my approach is better because it works in the same way
> > > either on UEFI systems or DT-based systems.
> > 
> > So we have a new way for both UEFI and DT. If UEFI folks can't come up
> > with a standard way to do things in the UEFI standard, then we just
> > dump them into DT?
> 
> No, I don't think so.
> According to "Documentation/devicetree/bindings/chosen.txt,
>   "The chosen node does not represent a real device, but serves as a place
>    for passing data between firmware and the operating system, like boot
>    arguments."
> 
> Since kexec/dump is some sort of boot loader, it all makes sense to
> add a new property as an interface from the old kernel(kexec/dump)
> to the new kernel. So my approach doesn't break any DT rules.
> 
> Please note that, even on UEFI/APCI systems, there are already a few
> properties used under /chosen, like "linux,uefi-system-table" and 
> "linux,mmap-*."
> (Those properties are currently *not* defined in this document, though.)
> 
> > I'd rather see alignment with existing DT systems
> > (PPC) and in particular the tools.
> 
> I'm not sure what you mean here, but I guess:
> 1) On DT-only systems, we should follow the way that PPC does.
>    (That is, adding "usable-memory" property to each "memory" node.)
> 2) On UEFI/ACPI system, we must invent a new own way for our purpose
>    because, as I said before, there is no standard, but x86-specific way. 
> 
> Is this what you want to do?
> 
> If so,
> do you agree to my approach of adding "linux,usable-memory-range" property
> to /chosen *for UEFI/ACPI use*?

I've got another idea here: using "/memreserve/" field.
All the memory regions used by the crashed kernel would be added
as "/memreserve/" and be memblock_reserve'd later in crash dump kernel.
Consequently, they will be excluded from the usable memory.

Obviously,
- this approach will work both on UEFI/ACPI and DT-only systems, and
- patch#2 ("memblock: add memblock_cap_memory_range()"), as well as
  "linux,usable-memory-range" property, is no longer necessary.
  (So nothing new, except for "linux.elfcorehdr," will be invented.)

But on the other hand,
- arm64 and ppc do the different ways even though both are DT systems,
- those reserved regions are still *in* linear mapping and appear as
  "System RAM" in /proc/iomem, and
- this may imply that we cannot do kexec on crash dump kernel.

-Takahiro AKASHI


> Thanks,
> -Takahiro AKASHI
> 
> > Rob

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-08-31  3:41 ` Manish Jaggi
@ 2016-08-31  5:31   ` AKASHI Takahiro
  2016-09-02 12:53     ` Manish Jaggi
  0 siblings, 1 reply; 52+ messages in thread
From: AKASHI Takahiro @ 2016-08-31  5:31 UTC (permalink / raw)
  To: linux-arm-kernel

Manish,

Thank you for testing my kdump and reporting issues.

On Wed, Aug 31, 2016 at 09:11:52AM +0530, Manish Jaggi wrote:
> Hi Akashi,
> 
> On 08/09/2016 07:22 AM, AKASHI Takahiro wrote:
> > This patch series adds kdump support on arm64.
> > 
> > To load a crash-dump kernel to the systems, a series of patches to
> > kexec-tools, which have not yet been merged upstream, are needed.
> > Please use my kdump patches [1].
> > 
> > To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
> >   - crash utility (coming v7.1.6 or later) [2]
> >     (Necessary patches have already been queued in the master.)
> > 
> > [1] T.B.D.
> > [2] https://github.com/crash-utility/crash.git
> > 
> > Changes for v24 (Aug 9, 2016):
> >   o Rebase to Linux-4.8-rc1
> >   o Update descriptions about newly added DT proerties
> > 
> > Changes for v23 (July 26, 2016):
> > 
> >   o Move memblock_reserve() to a single place in reserve_crashkernel()
> >   o Use  cpu_park_loop() in ipi_cpu_crash_stop()
> >   o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
> >   o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
> >     (for ACPI) from usable memory at crash kernel
> > 
> > Changes for v22 (July 12, 2016):
> > 
> >   o Export "crashkernel-base" and "crashkernel-size" via device-tree,
> >     and add some descriptions about them in chosen.txt
> >   o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
> >     with powerpc's "usable-memory"
> >   o Make cosmetic changes regarding "ifdef" usage
> >   o Correct some wordings in kdump.txt
> > 
> > Changes for v21 (July 6, 2016):
> > 
> >   o Remove kexec patches.
> >   o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
> >   o Clarify the description about kvm in kdump.txt.
> > 
> > See the following link [3] for older changes:
> > [3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
> > 
> > AKASHI Takahiro (8):
> >   arm64: kdump: reserve memory for crash dump kernel
> >   memblock: add memblock_cap_memory_range()
> >   arm64: limit memory regions based on DT property, usable-memory-range
> >   arm64: kdump: implement machine_crash_shutdown()
> >   arm64: kdump: add kdump support
> >   arm64: kdump: add VMCOREINFO's for user-space coredump tools
> >   arm64: kdump: enable kdump in the arm64 defconfig
> >   arm64: kdump: update a kernel doc
> > 
> > James Morse (1):
> >   Documentation: dt: chosen properties for arm64 kdump
> > 
> >  Documentation/devicetree/bindings/chosen.txt |  45 ++++++
> >  Documentation/kdump/kdump.txt                |  16 ++-
> >  arch/arm64/Kconfig                           |  11 ++
> >  arch/arm64/configs/defconfig                 |   1 +
> >  arch/arm64/include/asm/hardirq.h             |   2 +-
> >  arch/arm64/include/asm/kexec.h               |  41 +++++-
> >  arch/arm64/include/asm/smp.h                 |   2 +
> >  arch/arm64/kernel/Makefile                   |   1 +
> >  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
> >  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
> >  arch/arm64/kernel/setup.c                    |   7 +-
> >  arch/arm64/kernel/smp.c                      |  63 +++++++++
> >  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
> >  include/linux/memblock.h                     |   1 +
> >  mm/memblock.c                                |  28 ++++
> >  15 files changed, 551 insertions(+), 7 deletions(-)
> >  create mode 100644 arch/arm64/kernel/crash_dump.c
> > 
> Couple of points
> a) Just a note, while testing, the crashkernel reserved memory should be less than ARCH_LOW_ADDRESS_LIMIT (=arm64_dma_phys_limit).

I think that this is a common mistake not only for kdump, but also
for general kernels.
Since request_standard_resources() calls alloc_bootmem_low(),
the kernel will panic if any of usable "System RAM" is located
above ARCH_LOW_ADDRESS_LIMIT.
For kdump, using "crashkernel=SS" notation is a convenient way
to avoid this issue.

> b) Has anyone tested this on a SoC with Gicv3 ITS ?
> Should the GICD/R be reset prior to switching to crash kernel ?
> I am seeing lot of GICv3: RWP timeout, gone fishing while crash kernel boots.

I've never seen this kind of messages.
I usually do my testing on a fast model.
"compatible" of interrupt-controller is "arm,gic-v3."

Thanks,
-Takahiro AKASHI

> Thanks,
> Manish

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-08-31  5:02             ` AKASHI Takahiro
@ 2016-09-02 10:11               ` AKASHI Takahiro
  0 siblings, 0 replies; 52+ messages in thread
From: AKASHI Takahiro @ 2016-09-02 10:11 UTC (permalink / raw)
  To: linux-arm-kernel

Rob,

On Wed, Aug 31, 2016 at 02:02:16PM +0900, AKASHI Takahiro wrote:
> On Wed, Aug 31, 2016 at 08:45:46AM +0900, AKASHI Takahiro wrote:
> > Rob,
> > 
> > On Tue, Aug 30, 2016 at 11:34:10AM -0500, Rob Herring wrote:
> > > On Sun, Aug 21, 2016 at 11:28 PM, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > > > Rob,
> > > > [Cc: Mark]
> > > >
> > > > On Fri, Aug 19, 2016 at 08:26:41AM -0500, Rob Herring wrote:
> > > >> On Tue, Aug 09, 2016 at 10:57:47AM +0900, AKASHI Takahiro wrote:
> > > >> > From: James Morse <james.morse@arm.com>
> > > >> >
> > > >> > Add documentation for
> > > >> >     linux,crashkernel-base and crashkernel-size,
> > > >> >     linux,usable-memory-range, and
> > > >> >     linux,elfcorehdr
> > > >> > used by arm64 kexec/kdump to decribe the kdump reserved area, and
> > > >> > the elfcorehdr's location within it.
> > > >> >
> > > >> > Signed-off-by: James Morse <james.morse@arm.com>
> > > >> > [takahiro.akashi at linaro.org:
> > > >> >     renamed "usable-memory" to "usable-memory-range",
> > > >> >     added "linux,crashkernel-base" and "-size" ]
> > > >> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > >> > ---
> > > >> >  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
> > > >> >  1 file changed, 50 insertions(+)
> > > >> >
> > > >> > diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> > > >> > index 6ae9d82..236188a 100644
> > > >> > --- a/Documentation/devicetree/bindings/chosen.txt
> > > >> > +++ b/Documentation/devicetree/bindings/chosen.txt
> > > >> > @@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
> > > >> >  book3e) by some versions of kexec-tools to tell the new kernel that it
> > > >> >  is being booted by kexec, as the booting environment may differ (e.g.
> > > >> >  a different secondary CPU release mechanism)
> > > >> > +
> > > >> > +linux,crashkernel-base
> > > >> > +linux,crashkernel-size
> > > >> > +----------------------
> > > >> > +These properties (currently used on PowerPC and arm64) indicates
> > > >> > +the base address and the size, respectively, of the reserved memory
> > > >> > +range for crash dump kernel.
> > > >> > +e.g.
> > > >> > +
> > > >> > +/ {
> > > >> > +   chosen {
> > > >> > +           linux,crashkernel-base = <0x9 0xf0000000>;
> > > >> > +           linux,crashkernel-size = <0x0 0x10000000>;
> > > >> > +   };
> > > >> > +};
> > > >> > +
> > > >> > +linux,usable-memory-range
> > > >> > +-------------------------
> > > >> > +
> > > >> > +This property (currently used only on arm64) holds the memory range,
> > > >> > +the base address and the size, which can be used as system ram on
> > > >> > +the *current* kernel. Note that, if this property is present, any memory
> > > >> > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > > >> > +memory" in EFI memory map should be ignored.
> > > >> > +e.g.
> > > >> > +
> > > >> > +/ {
> > > >> > +   chosen {
> > > >> > +           linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
> > > >> > +   };
> > > >> > +};
> > > >>
> > > >> I've read the discussion on this. I think you should use the existing
> > > >> linux,usable-memory property in the memory nodes. If UEFI systems don't
> > > >> have memory nodes, then you can find an UEFI way to describe this. DT is
> > > >> not the dumping ground for what doesn't fit in UEFI. How do x86 systems
> > > >> work?
> > > >
> > > > Kdump for x86 passes the range of usable memory to crash dump kernel
> > > > either via:
> > > > (a) "memmap=nn at ss" command line parameter, or
> > > > (b) faked BIOS e820 map (a sort of memory map table)
> > > >
> > > > (a) was rejected by Mark [1], and (b) is x86-specific.
> > > >
> > > > I believe that my approach is better because it works in the same way
> > > > either on UEFI systems or DT-based systems.
> > > 
> > > So we have a new way for both UEFI and DT. If UEFI folks can't come up
> > > with a standard way to do things in the UEFI standard, then we just
> > > dump them into DT?
> > 
> > No, I don't think so.
> > According to "Documentation/devicetree/bindings/chosen.txt,
> >   "The chosen node does not represent a real device, but serves as a place
> >    for passing data between firmware and the operating system, like boot
> >    arguments."
> > 
> > Since kexec/dump is some sort of boot loader, it all makes sense to
> > add a new property as an interface from the old kernel(kexec/dump)
> > to the new kernel. So my approach doesn't break any DT rules.
> > 
> > Please note that, even on UEFI/APCI systems, there are already a few
> > properties used under /chosen, like "linux,uefi-system-table" and 
> > "linux,mmap-*."
> > (Those properties are currently *not* defined in this document, though.)
> > 
> > > I'd rather see alignment with existing DT systems
> > > (PPC) and in particular the tools.
> > 
> > I'm not sure what you mean here, but I guess:
> > 1) On DT-only systems, we should follow the way that PPC does.
> >    (That is, adding "usable-memory" property to each "memory" node.)
> > 2) On UEFI/ACPI system, we must invent a new own way for our purpose
> >    because, as I said before, there is no standard, but x86-specific way. 
> > 
> > Is this what you want to do?

Yes?

> > If so,
> > do you agree to my approach of adding "linux,usable-memory-range" property
> > to /chosen *for UEFI/ACPI use*?
> 
> I've got another idea here: using "/memreserve/" field.

We'd better to use "reserved-memory" node:
  reserved-memory {
    #address-cells = <2>;
    #size-cells = <2>;
    ranges;

    crash_dump at 80000000 {
      reg = <0x0 0x80000000 0x0 0x8000000>;
      no-map;
    }
    ...
  }
so that we can clearly state that those regions are for kdump.
Unfortunately, we can't place "reserved-memory" under /chosen.

> All the memory regions used by the crashed kernel would be added
> as "/memreserve/" and be memblock_reserve'd later in crash dump kernel.
> Consequently, they will be excluded from the usable memory.
> 
> Obviously,
> - this approach will work both on UEFI/ACPI and DT-only systems, and
> - patch#2 ("memblock: add memblock_cap_memory_range()"), as well as
>   "linux,usable-memory-range" property, is no longer necessary.
>   (So nothing new, except for "linux.elfcorehdr," will be invented.)
> 
> But on the other hand,
> - arm64 and ppc do the different ways even though both are DT systems,
> - those reserved regions are still *in* linear mapping and appear as
>   "System RAM" in /proc/iomem, and
> - this may imply that we cannot do kexec on crash dump kernel.

If you never accept my approach of adding "linux,usable-memory-range,"
I will send a new version, v26, based on "reserved-memory" next week.

(The only change is to remove patch#2/#3 from v25, though.)

-Takahiro AKASHI


> -Takahiro AKASHI
> 
> 
> > Thanks,
> > -Takahiro AKASHI
> > 
> > > Rob

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-08-31  5:31   ` AKASHI Takahiro
@ 2016-09-02 12:53     ` Manish Jaggi
  2016-09-05  8:15       ` AKASHI Takahiro
  0 siblings, 1 reply; 52+ messages in thread
From: Manish Jaggi @ 2016-09-02 12:53 UTC (permalink / raw)
  To: linux-arm-kernel



On 08/31/2016 11:01 AM, AKASHI Takahiro wrote:
> Manish,
> 
> Thank you for testing my kdump and reporting issues.
> 
> On Wed, Aug 31, 2016 at 09:11:52AM +0530, Manish Jaggi wrote:
>> Hi Akashi,
>>
>> On 08/09/2016 07:22 AM, AKASHI Takahiro wrote:
>>> This patch series adds kdump support on arm64.
>>>
>>> To load a crash-dump kernel to the systems, a series of patches to
>>> kexec-tools, which have not yet been merged upstream, are needed.
>>> Please use my kdump patches [1].
>>>
>>> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
>>>   - crash utility (coming v7.1.6 or later) [2]
>>>     (Necessary patches have already been queued in the master.)
>>>
>>> [1] T.B.D.
>>> [2] https://github.com/crash-utility/crash.git
>>>
>>> Changes for v24 (Aug 9, 2016):
>>>   o Rebase to Linux-4.8-rc1
>>>   o Update descriptions about newly added DT proerties
>>>
>>> Changes for v23 (July 26, 2016):
>>>
>>>   o Move memblock_reserve() to a single place in reserve_crashkernel()
>>>   o Use  cpu_park_loop() in ipi_cpu_crash_stop()
>>>   o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
>>>   o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
>>>     (for ACPI) from usable memory at crash kernel
>>>
>>> Changes for v22 (July 12, 2016):
>>>
>>>   o Export "crashkernel-base" and "crashkernel-size" via device-tree,
>>>     and add some descriptions about them in chosen.txt
>>>   o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
>>>     with powerpc's "usable-memory"
>>>   o Make cosmetic changes regarding "ifdef" usage
>>>   o Correct some wordings in kdump.txt
>>>
>>> Changes for v21 (July 6, 2016):
>>>
>>>   o Remove kexec patches.
>>>   o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
>>>   o Clarify the description about kvm in kdump.txt.
>>>
>>> See the following link [3] for older changes:
>>> [3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
>>>
>>> AKASHI Takahiro (8):
>>>   arm64: kdump: reserve memory for crash dump kernel
>>>   memblock: add memblock_cap_memory_range()
>>>   arm64: limit memory regions based on DT property, usable-memory-range
>>>   arm64: kdump: implement machine_crash_shutdown()
>>>   arm64: kdump: add kdump support
>>>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>>>   arm64: kdump: enable kdump in the arm64 defconfig
>>>   arm64: kdump: update a kernel doc
>>>
>>> James Morse (1):
>>>   Documentation: dt: chosen properties for arm64 kdump
>>>
>>>  Documentation/devicetree/bindings/chosen.txt |  45 ++++++
>>>  Documentation/kdump/kdump.txt                |  16 ++-
>>>  arch/arm64/Kconfig                           |  11 ++
>>>  arch/arm64/configs/defconfig                 |   1 +
>>>  arch/arm64/include/asm/hardirq.h             |   2 +-
>>>  arch/arm64/include/asm/kexec.h               |  41 +++++-
>>>  arch/arm64/include/asm/smp.h                 |   2 +
>>>  arch/arm64/kernel/Makefile                   |   1 +
>>>  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
>>>  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
>>>  arch/arm64/kernel/setup.c                    |   7 +-
>>>  arch/arm64/kernel/smp.c                      |  63 +++++++++
>>>  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
>>>  include/linux/memblock.h                     |   1 +
>>>  mm/memblock.c                                |  28 ++++
>>>  15 files changed, 551 insertions(+), 7 deletions(-)
>>>  create mode 100644 arch/arm64/kernel/crash_dump.c
>>>
>> Couple of points
>> a) Just a note, while testing, the crashkernel reserved memory should be less than ARCH_LOW_ADDRESS_LIMIT (=arm64_dma_phys_limit).
> 
> I think that this is a common mistake not only for kdump, but also
> for general kernels.
> Since request_standard_resources() calls alloc_bootmem_low(),
> the kernel will panic if any of usable "System RAM" is located
> above ARCH_LOW_ADDRESS_LIMIT.
> For kdump, using "crashkernel=SS" notation is a convenient way
> to avoid this issue.
> 
>> b) Has anyone tested this on a SoC with Gicv3 ITS ?
>> Should the GICD/R be reset prior to switching to crash kernel ?
>> I am seeing lot of GICv3: RWP timeout, gone fishing while crash kernel boots.
> 
> I've never seen this kind of messages.
> I usually do my testing on a fast model.
> "compatible" of interrupt-controller is "arm,gic-v3."
> 
I suspect gic_cpu_pm_notifier is not being called on any of the cores prior to start of crash kernel.
We might have to call it explicitly.
> Thanks,
> -Takahiro AKASHI
> 
>> Thanks,
>> Manish

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-09-02 12:53     ` Manish Jaggi
@ 2016-09-05  8:15       ` AKASHI Takahiro
  2016-09-05 12:42         ` Manish Jaggi
  0 siblings, 1 reply; 52+ messages in thread
From: AKASHI Takahiro @ 2016-09-05  8:15 UTC (permalink / raw)
  To: linux-arm-kernel

[Cc: Marc]

On Fri, Sep 02, 2016 at 06:23:25PM +0530, Manish Jaggi wrote:
> 
> 
> On 08/31/2016 11:01 AM, AKASHI Takahiro wrote:
> > Manish,
> > 
> > Thank you for testing my kdump and reporting issues.
> > 
> > On Wed, Aug 31, 2016 at 09:11:52AM +0530, Manish Jaggi wrote:
> >> Hi Akashi,
> >>
> >> On 08/09/2016 07:22 AM, AKASHI Takahiro wrote:
> >>> This patch series adds kdump support on arm64.
> >>>
> >>> To load a crash-dump kernel to the systems, a series of patches to
> >>> kexec-tools, which have not yet been merged upstream, are needed.
> >>> Please use my kdump patches [1].
> >>>
> >>> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
> >>>   - crash utility (coming v7.1.6 or later) [2]
> >>>     (Necessary patches have already been queued in the master.)
> >>>
> >>> [1] T.B.D.
> >>> [2] https://github.com/crash-utility/crash.git
> >>>
> >>> Changes for v24 (Aug 9, 2016):
> >>>   o Rebase to Linux-4.8-rc1
> >>>   o Update descriptions about newly added DT proerties
> >>>
> >>> Changes for v23 (July 26, 2016):
> >>>
> >>>   o Move memblock_reserve() to a single place in reserve_crashkernel()
> >>>   o Use  cpu_park_loop() in ipi_cpu_crash_stop()
> >>>   o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
> >>>   o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
> >>>     (for ACPI) from usable memory at crash kernel
> >>>
> >>> Changes for v22 (July 12, 2016):
> >>>
> >>>   o Export "crashkernel-base" and "crashkernel-size" via device-tree,
> >>>     and add some descriptions about them in chosen.txt
> >>>   o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
> >>>     with powerpc's "usable-memory"
> >>>   o Make cosmetic changes regarding "ifdef" usage
> >>>   o Correct some wordings in kdump.txt
> >>>
> >>> Changes for v21 (July 6, 2016):
> >>>
> >>>   o Remove kexec patches.
> >>>   o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
> >>>   o Clarify the description about kvm in kdump.txt.
> >>>
> >>> See the following link [3] for older changes:
> >>> [3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
> >>>
> >>> AKASHI Takahiro (8):
> >>>   arm64: kdump: reserve memory for crash dump kernel
> >>>   memblock: add memblock_cap_memory_range()
> >>>   arm64: limit memory regions based on DT property, usable-memory-range
> >>>   arm64: kdump: implement machine_crash_shutdown()
> >>>   arm64: kdump: add kdump support
> >>>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
> >>>   arm64: kdump: enable kdump in the arm64 defconfig
> >>>   arm64: kdump: update a kernel doc
> >>>
> >>> James Morse (1):
> >>>   Documentation: dt: chosen properties for arm64 kdump
> >>>
> >>>  Documentation/devicetree/bindings/chosen.txt |  45 ++++++
> >>>  Documentation/kdump/kdump.txt                |  16 ++-
> >>>  arch/arm64/Kconfig                           |  11 ++
> >>>  arch/arm64/configs/defconfig                 |   1 +
> >>>  arch/arm64/include/asm/hardirq.h             |   2 +-
> >>>  arch/arm64/include/asm/kexec.h               |  41 +++++-
> >>>  arch/arm64/include/asm/smp.h                 |   2 +
> >>>  arch/arm64/kernel/Makefile                   |   1 +
> >>>  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
> >>>  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
> >>>  arch/arm64/kernel/setup.c                    |   7 +-
> >>>  arch/arm64/kernel/smp.c                      |  63 +++++++++
> >>>  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
> >>>  include/linux/memblock.h                     |   1 +
> >>>  mm/memblock.c                                |  28 ++++
> >>>  15 files changed, 551 insertions(+), 7 deletions(-)
> >>>  create mode 100644 arch/arm64/kernel/crash_dump.c
> >>>
> >> Couple of points
> >> a) Just a note, while testing, the crashkernel reserved memory should be less than ARCH_LOW_ADDRESS_LIMIT (=arm64_dma_phys_limit).
> > 
> > I think that this is a common mistake not only for kdump, but also
> > for general kernels.
> > Since request_standard_resources() calls alloc_bootmem_low(),
> > the kernel will panic if any of usable "System RAM" is located
> > above ARCH_LOW_ADDRESS_LIMIT.
> > For kdump, using "crashkernel=SS" notation is a convenient way
> > to avoid this issue.
> > 
> >> b) Has anyone tested this on a SoC with Gicv3 ITS ?
> >> Should the GICD/R be reset prior to switching to crash kernel ?
> >> I am seeing lot of GICv3: RWP timeout, gone fishing while crash kernel boots.
> > 
> > I've never seen this kind of messages.
> > I usually do my testing on a fast model.
> > "compatible" of interrupt-controller is "arm,gic-v3."
> > 
> I suspect gic_cpu_pm_notifier is not being called on any of the cores prior to start of crash kernel.
> We might have to call it explicitly.

I'm not sure that it is the cause, but anyway none of any cpu_pm_notifier's
will be called at panic. That is the reason why "maxcpus=1" should be
specified (for kdump on arm64).

-Takahiro AKASHI

> > Thanks,
> > -Takahiro AKASHI
> > 
> >> Thanks,
> >> Manish

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-09-05  8:15       ` AKASHI Takahiro
@ 2016-09-05 12:42         ` Manish Jaggi
  2016-09-06 15:33           ` Marc Zyngier
  0 siblings, 1 reply; 52+ messages in thread
From: Manish Jaggi @ 2016-09-05 12:42 UTC (permalink / raw)
  To: linux-arm-kernel



On 09/05/2016 01:45 PM, AKASHI Takahiro wrote:
> [Cc: Marc]
> 
> On Fri, Sep 02, 2016 at 06:23:25PM +0530, Manish Jaggi wrote:
>>
>>
>> On 08/31/2016 11:01 AM, AKASHI Takahiro wrote:
>>> Manish,
>>>
>>> Thank you for testing my kdump and reporting issues.
>>>
>>> On Wed, Aug 31, 2016 at 09:11:52AM +0530, Manish Jaggi wrote:
>>>> Hi Akashi,
>>>>
>>>> On 08/09/2016 07:22 AM, AKASHI Takahiro wrote:
>>>>> This patch series adds kdump support on arm64.
>>>>>
>>>>> To load a crash-dump kernel to the systems, a series of patches to
>>>>> kexec-tools, which have not yet been merged upstream, are needed.
>>>>> Please use my kdump patches [1].
>>>>>
>>>>> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
>>>>>   - crash utility (coming v7.1.6 or later) [2]
>>>>>     (Necessary patches have already been queued in the master.)
>>>>>
>>>>> [1] T.B.D.
>>>>> [2] https://github.com/crash-utility/crash.git
>>>>>
>>>>> Changes for v24 (Aug 9, 2016):
>>>>>   o Rebase to Linux-4.8-rc1
>>>>>   o Update descriptions about newly added DT proerties
>>>>>
>>>>> Changes for v23 (July 26, 2016):
>>>>>
>>>>>   o Move memblock_reserve() to a single place in reserve_crashkernel()
>>>>>   o Use  cpu_park_loop() in ipi_cpu_crash_stop()
>>>>>   o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
>>>>>   o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
>>>>>     (for ACPI) from usable memory at crash kernel
>>>>>
>>>>> Changes for v22 (July 12, 2016):
>>>>>
>>>>>   o Export "crashkernel-base" and "crashkernel-size" via device-tree,
>>>>>     and add some descriptions about them in chosen.txt
>>>>>   o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
>>>>>     with powerpc's "usable-memory"
>>>>>   o Make cosmetic changes regarding "ifdef" usage
>>>>>   o Correct some wordings in kdump.txt
>>>>>
>>>>> Changes for v21 (July 6, 2016):
>>>>>
>>>>>   o Remove kexec patches.
>>>>>   o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
>>>>>   o Clarify the description about kvm in kdump.txt.
>>>>>
>>>>> See the following link [3] for older changes:
>>>>> [3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
>>>>>
>>>>> AKASHI Takahiro (8):
>>>>>   arm64: kdump: reserve memory for crash dump kernel
>>>>>   memblock: add memblock_cap_memory_range()
>>>>>   arm64: limit memory regions based on DT property, usable-memory-range
>>>>>   arm64: kdump: implement machine_crash_shutdown()
>>>>>   arm64: kdump: add kdump support
>>>>>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>>>>>   arm64: kdump: enable kdump in the arm64 defconfig
>>>>>   arm64: kdump: update a kernel doc
>>>>>
>>>>> James Morse (1):
>>>>>   Documentation: dt: chosen properties for arm64 kdump
>>>>>
>>>>>  Documentation/devicetree/bindings/chosen.txt |  45 ++++++
>>>>>  Documentation/kdump/kdump.txt                |  16 ++-
>>>>>  arch/arm64/Kconfig                           |  11 ++
>>>>>  arch/arm64/configs/defconfig                 |   1 +
>>>>>  arch/arm64/include/asm/hardirq.h             |   2 +-
>>>>>  arch/arm64/include/asm/kexec.h               |  41 +++++-
>>>>>  arch/arm64/include/asm/smp.h                 |   2 +
>>>>>  arch/arm64/kernel/Makefile                   |   1 +
>>>>>  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
>>>>>  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
>>>>>  arch/arm64/kernel/setup.c                    |   7 +-
>>>>>  arch/arm64/kernel/smp.c                      |  63 +++++++++
>>>>>  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
>>>>>  include/linux/memblock.h                     |   1 +
>>>>>  mm/memblock.c                                |  28 ++++
>>>>>  15 files changed, 551 insertions(+), 7 deletions(-)
>>>>>  create mode 100644 arch/arm64/kernel/crash_dump.c
>>>>>
>>>> Couple of points
>>>> a) Just a note, while testing, the crashkernel reserved memory should be less than ARCH_LOW_ADDRESS_LIMIT (=arm64_dma_phys_limit).
>>>
>>> I think that this is a common mistake not only for kdump, but also
>>> for general kernels.
>>> Since request_standard_resources() calls alloc_bootmem_low(),
>>> the kernel will panic if any of usable "System RAM" is located
>>> above ARCH_LOW_ADDRESS_LIMIT.
>>> For kdump, using "crashkernel=SS" notation is a convenient way
>>> to avoid this issue.
>>>
>>>> b) Has anyone tested this on a SoC with Gicv3 ITS ?
>>>> Should the GICD/R be reset prior to switching to crash kernel ?
>>>> I am seeing lot of GICv3: RWP timeout, gone fishing while crash kernel boots.
>>>
>>> I've never seen this kind of messages.
>>> I usually do my testing on a fast model.
>>> "compatible" of interrupt-controller is "arm,gic-v3."
>>>
>> I suspect gic_cpu_pm_notifier is not being called on any of the cores prior to start of crash kernel.
>> We might have to call it explicitly.
> 
> I'm not sure that it is the cause, but anyway none of any cpu_pm_notifier's
> will be called at panic. That is the reason why "maxcpus=1" should be
> specified (for kdump on arm64).
> 
What I meant was that since cpu_pm_notifier is not called before crash kernel is started,
GIC Distributor/re-distributor/ITS is not set in quiescent state.
In my setup the GICD_CTRL[RWP] bit is not cleared in the crashkernels' distributor init function.
Marc what do you think?
> -Takahiro AKASHI
> 
>>> Thanks,
>>> -Takahiro AKASHI
>>>
>>>> Thanks,
>>>> Manish

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-09-05 12:42         ` Manish Jaggi
@ 2016-09-06 15:33           ` Marc Zyngier
  2016-09-06 16:15             ` Manish Jaggi
  0 siblings, 1 reply; 52+ messages in thread
From: Marc Zyngier @ 2016-09-06 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/09/16 13:42, Manish Jaggi wrote:
> 
> 
> On 09/05/2016 01:45 PM, AKASHI Takahiro wrote:
>> [Cc: Marc]
>>
>> On Fri, Sep 02, 2016 at 06:23:25PM +0530, Manish Jaggi wrote:
>>>
>>>
>>> On 08/31/2016 11:01 AM, AKASHI Takahiro wrote:
>>>> Manish,
>>>>
>>>> Thank you for testing my kdump and reporting issues.
>>>>
>>>> On Wed, Aug 31, 2016 at 09:11:52AM +0530, Manish Jaggi wrote:
>>>>> Hi Akashi,
>>>>>
>>>>> On 08/09/2016 07:22 AM, AKASHI Takahiro wrote:
>>>>>> This patch series adds kdump support on arm64.
>>>>>>
>>>>>> To load a crash-dump kernel to the systems, a series of patches to
>>>>>> kexec-tools, which have not yet been merged upstream, are needed.
>>>>>> Please use my kdump patches [1].
>>>>>>
>>>>>> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
>>>>>>   - crash utility (coming v7.1.6 or later) [2]
>>>>>>     (Necessary patches have already been queued in the master.)
>>>>>>
>>>>>> [1] T.B.D.
>>>>>> [2] https://github.com/crash-utility/crash.git
>>>>>>
>>>>>> Changes for v24 (Aug 9, 2016):
>>>>>>   o Rebase to Linux-4.8-rc1
>>>>>>   o Update descriptions about newly added DT proerties
>>>>>>
>>>>>> Changes for v23 (July 26, 2016):
>>>>>>
>>>>>>   o Move memblock_reserve() to a single place in reserve_crashkernel()
>>>>>>   o Use  cpu_park_loop() in ipi_cpu_crash_stop()
>>>>>>   o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
>>>>>>   o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
>>>>>>     (for ACPI) from usable memory at crash kernel
>>>>>>
>>>>>> Changes for v22 (July 12, 2016):
>>>>>>
>>>>>>   o Export "crashkernel-base" and "crashkernel-size" via device-tree,
>>>>>>     and add some descriptions about them in chosen.txt
>>>>>>   o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
>>>>>>     with powerpc's "usable-memory"
>>>>>>   o Make cosmetic changes regarding "ifdef" usage
>>>>>>   o Correct some wordings in kdump.txt
>>>>>>
>>>>>> Changes for v21 (July 6, 2016):
>>>>>>
>>>>>>   o Remove kexec patches.
>>>>>>   o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
>>>>>>   o Clarify the description about kvm in kdump.txt.
>>>>>>
>>>>>> See the following link [3] for older changes:
>>>>>> [3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
>>>>>>
>>>>>> AKASHI Takahiro (8):
>>>>>>   arm64: kdump: reserve memory for crash dump kernel
>>>>>>   memblock: add memblock_cap_memory_range()
>>>>>>   arm64: limit memory regions based on DT property, usable-memory-range
>>>>>>   arm64: kdump: implement machine_crash_shutdown()
>>>>>>   arm64: kdump: add kdump support
>>>>>>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>>>>>>   arm64: kdump: enable kdump in the arm64 defconfig
>>>>>>   arm64: kdump: update a kernel doc
>>>>>>
>>>>>> James Morse (1):
>>>>>>   Documentation: dt: chosen properties for arm64 kdump
>>>>>>
>>>>>>  Documentation/devicetree/bindings/chosen.txt |  45 ++++++
>>>>>>  Documentation/kdump/kdump.txt                |  16 ++-
>>>>>>  arch/arm64/Kconfig                           |  11 ++
>>>>>>  arch/arm64/configs/defconfig                 |   1 +
>>>>>>  arch/arm64/include/asm/hardirq.h             |   2 +-
>>>>>>  arch/arm64/include/asm/kexec.h               |  41 +++++-
>>>>>>  arch/arm64/include/asm/smp.h                 |   2 +
>>>>>>  arch/arm64/kernel/Makefile                   |   1 +
>>>>>>  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
>>>>>>  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
>>>>>>  arch/arm64/kernel/setup.c                    |   7 +-
>>>>>>  arch/arm64/kernel/smp.c                      |  63 +++++++++
>>>>>>  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
>>>>>>  include/linux/memblock.h                     |   1 +
>>>>>>  mm/memblock.c                                |  28 ++++
>>>>>>  15 files changed, 551 insertions(+), 7 deletions(-)
>>>>>>  create mode 100644 arch/arm64/kernel/crash_dump.c
>>>>>>
>>>>> Couple of points
>>>>> a) Just a note, while testing, the crashkernel reserved memory should be less than ARCH_LOW_ADDRESS_LIMIT (=arm64_dma_phys_limit).
>>>>
>>>> I think that this is a common mistake not only for kdump, but also
>>>> for general kernels.
>>>> Since request_standard_resources() calls alloc_bootmem_low(),
>>>> the kernel will panic if any of usable "System RAM" is located
>>>> above ARCH_LOW_ADDRESS_LIMIT.
>>>> For kdump, using "crashkernel=SS" notation is a convenient way
>>>> to avoid this issue.
>>>>
>>>>> b) Has anyone tested this on a SoC with Gicv3 ITS ?
>>>>> Should the GICD/R be reset prior to switching to crash kernel ?
>>>>> I am seeing lot of GICv3: RWP timeout, gone fishing while crash kernel boots.
>>>>
>>>> I've never seen this kind of messages.
>>>> I usually do my testing on a fast model.
>>>> "compatible" of interrupt-controller is "arm,gic-v3."
>>>>
>>> I suspect gic_cpu_pm_notifier is not being called on any of the cores prior to start of crash kernel.
>>> We might have to call it explicitly.
>>
>> I'm not sure that it is the cause, but anyway none of any cpu_pm_notifier's
>> will be called at panic. That is the reason why "maxcpus=1" should be
>> specified (for kdump on arm64).
>>
> What I meant was that since cpu_pm_notifier is not called before
> crash kernel is started, GIC Distributor/re-distributor/ITS is not
> set in quiescent state.

Which is fine, they are not expected to be in a sane state anyway
(that's what a crash is about...). The ITS now has provision to be put
in a disabled state before being reinitialized. As for GICD, it is
disabled before being reprogrammed, which should be enough.

> In my setup the GICD_CTRL[RWP] bit is not cleared in the
> crashkernels' distributor init function.

Which instance is failing? The initial one (just after the initial
disable)? Or the one called from gic_dist_config()?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-09-06 15:33           ` Marc Zyngier
@ 2016-09-06 16:15             ` Manish Jaggi
  2016-09-06 16:42               ` Marc Zyngier
  0 siblings, 1 reply; 52+ messages in thread
From: Manish Jaggi @ 2016-09-06 16:15 UTC (permalink / raw)
  To: linux-arm-kernel



On 09/06/2016 09:03 PM, Marc Zyngier wrote:
> On 05/09/16 13:42, Manish Jaggi wrote:
>>
>>
>> On 09/05/2016 01:45 PM, AKASHI Takahiro wrote:
>>> [Cc: Marc]
>>>
>>> On Fri, Sep 02, 2016 at 06:23:25PM +0530, Manish Jaggi wrote:
>>>>
>>>>
>>>> On 08/31/2016 11:01 AM, AKASHI Takahiro wrote:
>>>>> Manish,
>>>>>
>>>>> Thank you for testing my kdump and reporting issues.
>>>>>
>>>>> On Wed, Aug 31, 2016 at 09:11:52AM +0530, Manish Jaggi wrote:
>>>>>> Hi Akashi,
>>>>>>
>>>>>> On 08/09/2016 07:22 AM, AKASHI Takahiro wrote:
>>>>>>> This patch series adds kdump support on arm64.
>>>>>>>
>>>>>>> To load a crash-dump kernel to the systems, a series of patches to
>>>>>>> kexec-tools, which have not yet been merged upstream, are needed.
>>>>>>> Please use my kdump patches [1].
>>>>>>>
>>>>>>> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
>>>>>>>   - crash utility (coming v7.1.6 or later) [2]
>>>>>>>     (Necessary patches have already been queued in the master.)
>>>>>>>
>>>>>>> [1] T.B.D.
>>>>>>> [2] https://github.com/crash-utility/crash.git
>>>>>>>
>>>>>>> Changes for v24 (Aug 9, 2016):
>>>>>>>   o Rebase to Linux-4.8-rc1
>>>>>>>   o Update descriptions about newly added DT proerties
>>>>>>>
>>>>>>> Changes for v23 (July 26, 2016):
>>>>>>>
>>>>>>>   o Move memblock_reserve() to a single place in reserve_crashkernel()
>>>>>>>   o Use  cpu_park_loop() in ipi_cpu_crash_stop()
>>>>>>>   o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
>>>>>>>   o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
>>>>>>>     (for ACPI) from usable memory at crash kernel
>>>>>>>
>>>>>>> Changes for v22 (July 12, 2016):
>>>>>>>
>>>>>>>   o Export "crashkernel-base" and "crashkernel-size" via device-tree,
>>>>>>>     and add some descriptions about them in chosen.txt
>>>>>>>   o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
>>>>>>>     with powerpc's "usable-memory"
>>>>>>>   o Make cosmetic changes regarding "ifdef" usage
>>>>>>>   o Correct some wordings in kdump.txt
>>>>>>>
>>>>>>> Changes for v21 (July 6, 2016):
>>>>>>>
>>>>>>>   o Remove kexec patches.
>>>>>>>   o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
>>>>>>>   o Clarify the description about kvm in kdump.txt.
>>>>>>>
>>>>>>> See the following link [3] for older changes:
>>>>>>> [3]  http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
>>>>>>>
>>>>>>> AKASHI Takahiro (8):
>>>>>>>   arm64: kdump: reserve memory for crash dump kernel
>>>>>>>   memblock: add memblock_cap_memory_range()
>>>>>>>   arm64: limit memory regions based on DT property, usable-memory-range
>>>>>>>   arm64: kdump: implement machine_crash_shutdown()
>>>>>>>   arm64: kdump: add kdump support
>>>>>>>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>>>>>>>   arm64: kdump: enable kdump in the arm64 defconfig
>>>>>>>   arm64: kdump: update a kernel doc
>>>>>>>
>>>>>>> James Morse (1):
>>>>>>>   Documentation: dt: chosen properties for arm64 kdump
>>>>>>>
>>>>>>>  Documentation/devicetree/bindings/chosen.txt |  45 ++++++
>>>>>>>  Documentation/kdump/kdump.txt                |  16 ++-
>>>>>>>  arch/arm64/Kconfig                           |  11 ++
>>>>>>>  arch/arm64/configs/defconfig                 |   1 +
>>>>>>>  arch/arm64/include/asm/hardirq.h             |   2 +-
>>>>>>>  arch/arm64/include/asm/kexec.h               |  41 +++++-
>>>>>>>  arch/arm64/include/asm/smp.h                 |   2 +
>>>>>>>  arch/arm64/kernel/Makefile                   |   1 +
>>>>>>>  arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
>>>>>>>  arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
>>>>>>>  arch/arm64/kernel/setup.c                    |   7 +-
>>>>>>>  arch/arm64/kernel/smp.c                      |  63 +++++++++
>>>>>>>  arch/arm64/mm/init.c                         | 202 +++++++++++++++++++++++++++
>>>>>>>  include/linux/memblock.h                     |   1 +
>>>>>>>  mm/memblock.c                                |  28 ++++
>>>>>>>  15 files changed, 551 insertions(+), 7 deletions(-)
>>>>>>>  create mode 100644 arch/arm64/kernel/crash_dump.c
>>>>>>>
>>>>>> Couple of points
>>>>>> a) Just a note, while testing, the crashkernel reserved memory should be less than ARCH_LOW_ADDRESS_LIMIT (=arm64_dma_phys_limit).
>>>>>
>>>>> I think that this is a common mistake not only for kdump, but also
>>>>> for general kernels.
>>>>> Since request_standard_resources() calls alloc_bootmem_low(),
>>>>> the kernel will panic if any of usable "System RAM" is located
>>>>> above ARCH_LOW_ADDRESS_LIMIT.
>>>>> For kdump, using "crashkernel=SS" notation is a convenient way
>>>>> to avoid this issue.
>>>>>
>>>>>> b) Has anyone tested this on a SoC with Gicv3 ITS ?
>>>>>> Should the GICD/R be reset prior to switching to crash kernel ?
>>>>>> I am seeing lot of GICv3: RWP timeout, gone fishing while crash kernel boots.
>>>>>
>>>>> I've never seen this kind of messages.
>>>>> I usually do my testing on a fast model.
>>>>> "compatible" of interrupt-controller is "arm,gic-v3."
>>>>>
>>>> I suspect gic_cpu_pm_notifier is not being called on any of the cores prior to start of crash kernel.
>>>> We might have to call it explicitly.
>>>
>>> I'm not sure that it is the cause, but anyway none of any cpu_pm_notifier's
>>> will be called at panic. That is the reason why "maxcpus=1" should be
>>> specified (for kdump on arm64).
>>>
>> What I meant was that since cpu_pm_notifier is not called before
>> crash kernel is started, GIC Distributor/re-distributor/ITS is not
>> set in quiescent state.
> 
> Which is fine, they are not expected to be in a sane state anyway
> (that's what a crash is about...). The ITS now has provision to be put
> in a disabled state before being reinitialized. As for GICD, it is
> disabled before being reprogrammed, which should be enough.
> 
>> In my setup the GICD_CTRL[RWP] bit is not cleared in the
>> crashkernels' distributor init function.
> 
> Which instance is failing? The initial one (just after the initial
> disable)? Or the one called from gic_dist_config()?
> 
In crash kernel, when the GICD_CTRL is set to 0x0, RWP is not getting clear.
And is never cleared for any subsequent writes.
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 0/9] arm64: add kdump support
  2016-09-06 16:15             ` Manish Jaggi
@ 2016-09-06 16:42               ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2016-09-06 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/09/16 17:15, Manish Jaggi wrote:
> 
>>> In my setup the GICD_CTRL[RWP] bit is not cleared in the
>>> crashkernels' distributor init function.
>>
>> Which instance is failing? The initial one (just after the initial
>> disable)? Or the one called from gic_dist_config()?
>>
> In crash kernel, when the GICD_CTRL is set to 0x0, RWP is not getting clear.
> And is never cleared for any subsequent writes.

That's weird. It means writes are still pending, and never drained. What
happens if you put a dsb(sy) in the wait_for_rwp() loop?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-08-19 13:26     ` Rob Herring
  2016-08-22  4:28       ` AKASHI Takahiro
@ 2016-09-27 23:39       ` Mark Rutland
  1 sibling, 0 replies; 52+ messages in thread
From: Mark Rutland @ 2016-09-27 23:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Rob,

Reviving an old thread -- "rock" and "hard place" come to mind.

On Fri, Aug 19, 2016 at 08:26:41AM -0500, Rob Herring wrote:
> On Tue, Aug 09, 2016 at 10:57:47AM +0900, AKASHI Takahiro wrote:
> > From: James Morse <james.morse@arm.com>

> > +linux,usable-memory-range
> > +-------------------------
> > +
> > +This property (currently used only on arm64) holds the memory range,
> > +the base address and the size, which can be used as system ram on
> > +the *current* kernel. Note that, if this property is present, any memory
> > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > +memory" in EFI memory map should be ignored.
> > +e.g.
> > +
> > +/ {
> > +	chosen {
> > +		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
> > +	};
> > +};
> 
> I've read the discussion on this. I think you should use the existing 
> linux,usable-memory property in the memory nodes. If UEFI systems don't 
> have memory nodes, then you can find an UEFI way to describe this. DT is 
> not the dumping ground for what doesn't fit in UEFI. How do x86 systems 
> work?

Having looked at the proposals, I'm personally much keener on the approach
above (modulo naming and wording) than trying to bolt memory nodes onto a UEFI
system, or using reserved-memory to describe the inverse of the allowable
range.

I completely agree that we want one solution for DT-only and DT+UEFI.

While those approaches reuse existing infrastructure, they end up creating more
work, and I believe that they create more scope for painful problems. As kdump
is already a constrained case, I think that it's reasonable to have a
linux-specific property as above.

I also think we need to figure out how we expect this to work for the
kexec_file_load case, as that has a criticial impact on the above (e.g. what's
preferable out of cmdline options vs dt properties).

Can we try to sync at connect to discuss this?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-09-27 23:39 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-09  1:52 [PATCH v24 0/9] arm64: add kdump support AKASHI Takahiro
2016-08-09  1:52 ` [PATCH v24 1/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
2016-08-09  1:55   ` [PATCH v24 2/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
2016-08-10 16:26     ` James Morse
2016-08-09  1:56   ` [PATCH v24 3/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
2016-08-09  1:56     ` [PATCH v24 4/9] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
2016-08-09  1:56     ` [PATCH v24 5/9] arm64: kdump: add kdump support AKASHI Takahiro
2016-08-10 16:38       ` James Morse
2016-08-10 18:18         ` Pratyush Anand
2016-08-11 10:03           ` Pratyush Anand
2016-08-16 10:13             ` James Morse
2016-08-17 15:33               ` [PATCH] fixup! " James Morse
2016-08-18  7:32                 ` AKASHI Takahiro
2016-08-19  8:00                 ` Pratyush Anand
2016-08-19 13:34                   ` James Morse
2016-08-19 15:19                     ` Pratyush Anand
2016-08-18  7:15         ` [PATCH v24 5/9] " AKASHI Takahiro
2016-08-18  7:19           ` Dave Young
2016-08-19  1:26           ` AKASHI Takahiro
2016-08-19 11:22             ` Pratyush Anand
2016-08-22  1:29               ` AKASHI Takahiro
2016-08-22  7:07                 ` Pratyush Anand
2016-08-22 13:47                 ` James Morse
2016-08-23  0:38                   ` AKASHI Takahiro
2016-08-23 11:23                     ` Pratyush Anand
2016-08-24  8:04                       ` Dave Young
2016-08-24 10:25                         ` James Morse
2016-08-25  1:04                           ` Dave Young
2016-08-19 13:28             ` James Morse
2016-08-22  1:23               ` AKASHI Takahiro
2016-08-24 14:44       ` Ard Biesheuvel
2016-08-26  6:22         ` AKASHI Takahiro
2016-08-09  1:56     ` [PATCH v24 6/9] arm64: kdump: add VMCOREINFO's for user-space coredump tools AKASHI Takahiro
2016-08-09  1:56     ` [PATCH v24 7/9] arm64: kdump: enable kdump in the arm64 defconfig AKASHI Takahiro
2016-08-09  1:56     ` [PATCH v24 8/9] arm64: kdump: update a kernel doc AKASHI Takahiro
2016-08-09  1:57   ` [PATCH v24 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
2016-08-19 13:26     ` Rob Herring
2016-08-22  4:28       ` AKASHI Takahiro
2016-08-30 16:34         ` Rob Herring
2016-08-30 23:45           ` AKASHI Takahiro
2016-08-31  5:02             ` AKASHI Takahiro
2016-09-02 10:11               ` AKASHI Takahiro
2016-09-27 23:39       ` Mark Rutland
2016-08-09  2:04 ` [PATCH v24 0/9] arm64: add kdump support AKASHI Takahiro
2016-08-31  3:41 ` Manish Jaggi
2016-08-31  5:31   ` AKASHI Takahiro
2016-09-02 12:53     ` Manish Jaggi
2016-09-05  8:15       ` AKASHI Takahiro
2016-09-05 12:42         ` Manish Jaggi
2016-09-06 15:33           ` Marc Zyngier
2016-09-06 16:15             ` Manish Jaggi
2016-09-06 16:42               ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).