linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v29 0/9] arm64: add kdump support
@ 2016-12-28  4:33 AKASHI Takahiro
  2016-12-28  4:35 ` [PATCH v29 1/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
                   ` (9 more replies)
  0 siblings, 10 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:33 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series adds kdump support on arm64.

To load a crash-dump kernel to the systems, a series of patches to
kexec-tools[1] are also needed. Please use the latest one, v4 [2].
For your convinience, you can pick them up from:
   https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump
   https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump

To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
  - crash utility (v7.1.7 or later) [3]
    (As of today, the very latest tree cannot handle a core image correctly
     due to the commit 7fd8329ba502 ("taint/module: Clean up global and module
     taint flags handling")


The previous version, v27, was also:
Tested-by: Pratyush Anand <panand@redhat.com> (mustang and seattle)
Tested-by: James Morse <james.morse@arm.com> (Juno)


Changes for v29 (Dec 28, 2016)
  o rebased to Linux-v4.10-rc1
  o change asm constraints in crash_setup_regs() per Catalin

Changes for v28 (Nov 22, 2016)
  o rebased to Linux-v4.9-rc6
  o revamp patch #1 and merge memblock_cap_memory_range() with
    memblock_mem_limit_remove_map()

Changes for v27 (Nov 1, 2016)
  o rebased to Linux-v4.9-rc3
  o revert v26 change, i.e. revive "linux,usable-memory-range" property
    (patch #2/#3, updating patch #9)
  o minor fixes per review comments (patch #3/#4/#6/#8)
  o re-order patches and improve commit messages for readability

Changes for v26 (Sep 7, 2016):
  o Use /reserved-memory instead of "linux,usable-memory-range" property
    (dropping v25's patch#2 and #3, updating ex-patch#9.)

Changes for v25 (Aug 29, 2016):
  o Rebase to Linux-4.8-rc4
  o Use memremap() instead of ioremap_cache() [patch#5]

Changes for v24 (Aug 9, 2016):
  o Rebase to Linux-4.8-rc1
  o Update descriptions about newly added DT proerties

Changes for v23 (July 26, 2016):

  o Move memblock_reserve() to a single place in reserve_crashkernel()
  o Use  cpu_park_loop() in ipi_cpu_crash_stop()
  o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
  o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
    (for ACPI) from usable memory at crash kernel

Changes for v22 (July 12, 2016):

  o Export "crashkernel-base" and "crashkernel-size" via device-tree,
    and add some descriptions about them in chosen.txt
  o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
    with powerpc's "usable-memory"
  o Make cosmetic changes regarding "ifdef" usage
  o Correct some wordings in kdump.txt

Changes for v21 (July 6, 2016):

  o Remove kexec patches.
  o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
  o Clarify the description about kvm in kdump.txt.

See the link [4] for older changes.


[1] https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
[2] http://lists.infradead.org/pipermail/kexec/2016-November/017555.html
[3] https://github.com/crash-utility/crash.git
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html

AKASHI Takahiro (8):
  memblock: add memblock_cap_memory_range()
  arm64: limit memory regions based on DT property, usable-memory-range
  arm64: kdump: reserve memory for crash dump kernel
  arm64: kdump: implement machine_crash_shutdown()
  arm64: kdump: add VMCOREINFO's for user-space tools
  arm64: kdump: provide /proc/vmcore file
  arm64: kdump: enable kdump in defconfig
  Documentation: kdump: describe arm64 port

James Morse (1):
  Documentation: dt: chosen properties for arm64 kdump

 Documentation/devicetree/bindings/chosen.txt |  50 +++++++
 Documentation/kdump/kdump.txt                |  16 ++-
 arch/arm64/Kconfig                           |  11 ++
 arch/arm64/configs/defconfig                 |   1 +
 arch/arm64/include/asm/hardirq.h             |   2 +-
 arch/arm64/include/asm/kexec.h               |  42 +++++-
 arch/arm64/include/asm/smp.h                 |   2 +
 arch/arm64/kernel/Makefile                   |   1 +
 arch/arm64/kernel/crash_dump.c               |  71 ++++++++++
 arch/arm64/kernel/machine_kexec.c            |  67 ++++++++-
 arch/arm64/kernel/setup.c                    |   7 +-
 arch/arm64/kernel/smp.c                      |  63 +++++++++
 arch/arm64/mm/init.c                         | 199 +++++++++++++++++++++++++++
 include/linux/memblock.h                     |   1 +
 mm/memblock.c                                |  44 ++++--
 15 files changed, 555 insertions(+), 22 deletions(-)
 create mode 100644 arch/arm64/kernel/crash_dump.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 1/9] memblock: add memblock_cap_memory_range()
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
@ 2016-12-28  4:35 ` AKASHI Takahiro
  2017-01-10 11:16   ` Will Deacon
  2016-12-28  4:35 ` [PATCH v29 2/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:35 UTC (permalink / raw)
  To: linux-arm-kernel

Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().

This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").

This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-mm at kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/memblock.h |  1 +
 mm/memblock.c            | 44 +++++++++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5b759c9acf97..fbfcacc50c29 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -333,6 +333,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
 void memblock_mem_limit_remove_map(phys_addr_t limit);
 bool memblock_is_memory(phys_addr_t addr);
 int memblock_is_map_memory(phys_addr_t addr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 7608bc305936..fea1688fef60 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1514,11 +1514,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
 			      (phys_addr_t)ULLONG_MAX);
 }
 
+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+	int start_rgn, end_rgn;
+	int i, ret;
+
+	if (!size)
+		return;
+
+	ret = memblock_isolate_range(&memblock.memory, base, size,
+						&start_rgn, &end_rgn);
+	if (ret)
+		return;
+
+	/* remove all the MAP regions */
+	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	for (i = start_rgn - 1; i >= 0; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	/* truncate the reserved regions */
+	memblock_remove_range(&memblock.reserved, 0, base);
+	memblock_remove_range(&memblock.reserved,
+			base + size, (phys_addr_t)ULLONG_MAX);
+}
+
 void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 {
-	struct memblock_type *type = &memblock.memory;
 	phys_addr_t max_addr;
-	int i, ret, start_rgn, end_rgn;
 
 	if (!limit)
 		return;
@@ -1529,19 +1555,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 	if (max_addr == (phys_addr_t)ULLONG_MAX)
 		return;
 
-	ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
-				&start_rgn, &end_rgn);
-	if (ret)
-		return;
-
-	/* remove all the MAP regions above the limit */
-	for (i = end_rgn - 1; i >= start_rgn; i--) {
-		if (!memblock_is_nomap(&type->regions[i]))
-			memblock_remove_region(type, i);
-	}
-	/* truncate the reserved regions */
-	memblock_remove_range(&memblock.reserved, max_addr,
-			      (phys_addr_t)ULLONG_MAX);
+	memblock_cap_memory_range(0, max_addr);
 }
 
 static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 2/9] arm64: limit memory regions based on DT property, usable-memory-range
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
  2016-12-28  4:35 ` [PATCH v29 1/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
@ 2016-12-28  4:35 ` AKASHI Takahiro
  2016-12-28  4:36 ` [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:35 UTC (permalink / raw)
  To: linux-arm-kernel

Crash dump kernel utilizes only a subset of available memory as System RAM.
On arm64 kdump, This memory range is advertized to crash dump kernel via
a device-tree property under /chosen,
   linux,usable-memory-range = <BASE SIZE>

Crash dump kernel reads this property at boot time and calls
memblock_cap_memory_range() to limit usable memory ranges which are
described as entries in UEFI memory map table or "memory" nodes in
a device tree blob.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Geoff Levand <geoff@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/mm/init.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 212c4d1e2f26..65f1241c372c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -187,10 +187,45 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static int __init early_init_dt_scan_usablemem(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	struct memblock_region *usablemem = (struct memblock_region *)data;
+	const __be32 *reg;
+	int len;
+
+	usablemem->size = 0;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+static void __init fdt_enforce_memory_region(void)
+{
+	struct memblock_region reg;
+
+	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+
+	if (reg.size)
+		memblock_cap_memory_range(reg.base, reg.size);
+}
+
 void __init arm64_memblock_init(void)
 {
 	const s64 linear_region_size = -(s64)PAGE_OFFSET;
 
+	/* Handle linux,usable-memory-range property */
+	fdt_enforce_memory_region();
+
 	/*
 	 * Ensure that the linear region takes up exactly half of the kernel
 	 * virtual address space. This way, we can distinguish a linear address
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
  2016-12-28  4:35 ` [PATCH v29 1/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
  2016-12-28  4:35 ` [PATCH v29 2/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
@ 2016-12-28  4:36 ` AKASHI Takahiro
  2017-01-12 15:09   ` Mark Rutland
  2016-12-28  4:36 ` [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:36 UTC (permalink / raw)
  To: linux-arm-kernel

"crashkernel=" kernel parameter specifies the size (and optionally
the start address) of the system ram used by crash dump kernel.
reserve_crashkernel() will allocate and reserve the memory at the startup
of primary kernel.

This memory range will be exported to userspace via:
	- an entry named "Crash kernel" in /proc/iomem, and
	- "linux,crashkernel-base" and "linux,crashkernel-size" under
	  /sys/firmware/devicetree/base/chosen

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/setup.c |   7 ++-
 arch/arm64/mm/init.c      | 110 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index b051367e2149..4083069057b5 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
 #include <linux/screen_info.h>
 #include <linux/init.h>
 #include <linux/kexec.h>
-#include <linux/crash_dump.h>
 #include <linux/root_dev.h>
 #include <linux/cpu.h>
 #include <linux/interrupt.h>
@@ -225,6 +224,12 @@ static void __init request_standard_resources(void)
 		    kernel_data.end <= res->end)
 			request_resource(res, &kernel_data);
 	}
+
+#ifdef CONFIG_KEXEC_CORE
+	/* User space tools will find "Crash kernel" region in /proc/iomem. */
+	if (crashk_res.end)
+		insert_resource(&iomem_resource, &crashk_res);
+#endif
 }
 
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 65f1241c372c..1d62bf71b531 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -30,12 +30,14 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 #include <linux/vmalloc.h>
+#include <linux/kexec.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -76,6 +78,111 @@ static int __init early_initrd(char *p)
 early_param("initrd", early_initrd);
 #endif
 
+#ifdef CONFIG_KEXEC_CORE
+static unsigned long long crash_size, crash_base;
+static struct property crash_base_prop = {
+	.name = "linux,crashkernel-base",
+	.length = sizeof(u64),
+	.value = &crash_base
+};
+static struct property crash_size_prop = {
+	.name = "linux,crashkernel-size",
+	.length = sizeof(u64),
+	.value = &crash_size,
+};
+
+static int __init export_crashkernel(void)
+{
+	struct device_node *node;
+	int ret;
+
+	if (!crash_size)
+		return 0;
+
+	/* Add /chosen/linux,crashkernel-* properties */
+	node = of_find_node_by_path("/chosen");
+	if (!node)
+		return -ENOENT;
+
+	/*
+	 * There might be existing crash kernel properties, but we can't
+	 * be sure what's in them, so remove them.
+	 */
+	of_remove_property(node, of_find_property(node,
+				"linux,crashkernel-base", NULL));
+	of_remove_property(node, of_find_property(node,
+				"linux,crashkernel-size", NULL));
+
+	ret = of_add_property(node, &crash_base_prop);
+	if (ret)
+		goto ret_err;
+
+	ret = of_add_property(node, &crash_size_prop);
+	if (ret)
+		goto ret_err;
+
+	return 0;
+
+ret_err:
+	pr_warn("Exporting crashkernel region to device tree failed\n");
+	return ret;
+}
+late_initcall(export_crashkernel);
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+	int ret;
+
+	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+				&crash_size, &crash_base);
+	/* no crashkernel= or invalid value specified */
+	if (ret || !crash_size)
+		return;
+
+	if (crash_base == 0) {
+		/* Current arm64 boot protocol requires 2MB alignment */
+		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+				crash_size, SZ_2M);
+		if (crash_base == 0) {
+			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+				crash_size);
+			return;
+		}
+	} else {
+		/* User specifies base address explicitly. */
+		if (!memblock_is_region_memory(crash_base, crash_size) ||
+			memblock_is_region_reserved(crash_base, crash_size)) {
+			pr_warn("crashkernel has wrong address or size\n");
+			return;
+		}
+
+		if (!IS_ALIGNED(crash_base, SZ_2M)) {
+			pr_warn("crashkernel base address is not 2MB aligned\n");
+			return;
+		}
+	}
+	memblock_reserve(crash_base, crash_size);
+
+	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+		crash_size >> 20, crash_base >> 20);
+
+	crashk_res.start = crash_base;
+	crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+	;
+}
+#endif /* CONFIG_KEXEC_CORE */
+
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -331,6 +438,9 @@ void __init arm64_memblock_init(void)
 		arm64_dma_phys_limit = max_zone_dma_phys();
 	else
 		arm64_dma_phys_limit = PHYS_MASK + 1;
+
+	reserve_crashkernel();
+
 	dma_contiguous_reserve(arm64_dma_phys_limit);
 
 	memblock_allow_resize();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown()
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
                   ` (2 preceding siblings ...)
  2016-12-28  4:36 ` [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
@ 2016-12-28  4:36 ` AKASHI Takahiro
  2017-01-10 11:32   ` Will Deacon
  2016-12-28  4:36 ` [PATCH v29 5/9] arm64: kdump: add VMCOREINFO's for user-space tools AKASHI Takahiro
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:36 UTC (permalink / raw)
  To: linux-arm-kernel

Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.

As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/hardirq.h  |  2 +-
 arch/arm64/include/asm/kexec.h    | 42 +++++++++++++++++++++++++-
 arch/arm64/include/asm/smp.h      |  2 ++
 arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
 arch/arm64/kernel/smp.c           | 63 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 160 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297dac77..1473fc2f7ab7 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
 #include <linux/threads.h>
 #include <asm/irq.h>
 
-#define NR_IPI	6
+#define NR_IPI	7
 
 typedef struct {
 	unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc5fb61..f40ace1fa21a 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,47 @@
 static inline void crash_setup_regs(struct pt_regs *newregs,
 				    struct pt_regs *oldregs)
 {
-	/* Empty routine needed to avoid build errors. */
+	if (oldregs) {
+		memcpy(newregs, oldregs, sizeof(*newregs));
+	} else {
+		u64 tmp1, tmp2;
+
+		__asm__ __volatile__ (
+			"stp	 x0,   x1, [%2, #16 *  0]\n"
+			"stp	 x2,   x3, [%2, #16 *  1]\n"
+			"stp	 x4,   x5, [%2, #16 *  2]\n"
+			"stp	 x6,   x7, [%2, #16 *  3]\n"
+			"stp	 x8,   x9, [%2, #16 *  4]\n"
+			"stp	x10,  x11, [%2, #16 *  5]\n"
+			"stp	x12,  x13, [%2, #16 *  6]\n"
+			"stp	x14,  x15, [%2, #16 *  7]\n"
+			"stp	x16,  x17, [%2, #16 *  8]\n"
+			"stp	x18,  x19, [%2, #16 *  9]\n"
+			"stp	x20,  x21, [%2, #16 * 10]\n"
+			"stp	x22,  x23, [%2, #16 * 11]\n"
+			"stp	x24,  x25, [%2, #16 * 12]\n"
+			"stp	x26,  x27, [%2, #16 * 13]\n"
+			"stp	x28,  x29, [%2, #16 * 14]\n"
+			"mov	 %0,  sp\n"
+			"stp	x30,  %0,  [%2, #16 * 15]\n"
+
+			"/* faked current PSTATE */\n"
+			"mrs	 %0, CurrentEL\n"
+			"mrs	 %1, SPSEL\n"
+			"orr	 %0, %0, %1\n"
+			"mrs	 %1, DAIF\n"
+			"orr	 %0, %0, %1\n"
+			"mrs	 %1, NZCV\n"
+			"orr	 %0, %0, %1\n"
+			/* pc */
+			"adr	 %1, 1f\n"
+		"1:\n"
+			"stp	 %1, %0,   [%2, #16 * 16]\n"
+			: "=&r" (tmp1), "=&r" (tmp2)
+			: "r" (newregs)
+			: "memory"
+		);
+	}
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index d050d720a1b4..cea009f2657d 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,6 +148,8 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
+extern void smp_send_crash_stop(void);
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..c60346d33bb1 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
 #include <linux/kexec.h>
 #include <linux/smp.h>
 
@@ -22,6 +25,7 @@
 extern const unsigned char arm64_relocate_new_kernel[];
 extern const unsigned long arm64_relocate_new_kernel_size;
 
+static bool in_crash_kexec;
 static unsigned long kimage_start;
 
 /**
@@ -148,7 +152,8 @@ void machine_kexec(struct kimage *kimage)
 	/*
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
 	 */
-	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+	BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
+			!WARN_ON(in_crash_kexec));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
 	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -200,13 +205,58 @@ void machine_kexec(struct kimage *kimage)
 	 * relocation is complete.
 	 */
 
-	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
+	cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
 		kimage_start, 0);
 
 	BUG(); /* Should never get here. */
 }
 
+static void machine_kexec_mask_interrupts(void)
+{
+	unsigned int i;
+	struct irq_desc *desc;
+
+	for_each_irq_desc(i, desc) {
+		struct irq_chip *chip;
+		int ret;
+
+		chip = irq_desc_get_chip(desc);
+		if (!chip)
+			continue;
+
+		/*
+		 * First try to remove the active state. If this
+		 * fails, try to EOI the interrupt.
+		 */
+		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+		    chip->irq_eoi)
+			chip->irq_eoi(&desc->irq_data);
+
+		if (chip->irq_mask)
+			chip->irq_mask(&desc->irq_data);
+
+		if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+			chip->irq_disable(&desc->irq_data);
+	}
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-	/* Empty routine needed to avoid build errors. */
+	local_irq_disable();
+
+	in_crash_kexec = true;
+
+	/* shutdown non-crashing cpus */
+	smp_send_crash_stop();
+
+	/* for crashing cpu */
+	crash_save_cpu(regs, smp_processor_id());
+	machine_kexec_mask_interrupts();
+
+	pr_info("Starting crashdump kernel...\n");
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index cb87234cfcf2..446c6d48f8ec 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
 #include <linux/completion.h>
 #include <linux/of.h>
 #include <linux/irq_work.h>
+#include <linux/kexec.h>
 
 #include <asm/alternative.h>
 #include <asm/atomic.h>
@@ -74,6 +75,7 @@ enum ipi_msg_type {
 	IPI_RESCHEDULE,
 	IPI_CALL_FUNC,
 	IPI_CPU_STOP,
+	IPI_CPU_CRASH_STOP,
 	IPI_TIMER,
 	IPI_IRQ_WORK,
 	IPI_WAKEUP
@@ -753,6 +755,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
 	S(IPI_RESCHEDULE, "Rescheduling interrupts"),
 	S(IPI_CALL_FUNC, "Function call interrupts"),
 	S(IPI_CPU_STOP, "CPU stop interrupts"),
+	S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
 	S(IPI_TIMER, "Timer broadcast interrupts"),
 	S(IPI_IRQ_WORK, "IRQ work interrupts"),
 	S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -827,6 +830,29 @@ static void ipi_cpu_stop(unsigned int cpu)
 		cpu_relax();
 }
 
+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi;
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+	crash_save_cpu(regs, cpu);
+
+	atomic_dec(&waiting_for_crash_ipi);
+
+	local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+	if (cpu_ops[cpu]->cpu_die)
+		cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+	/* just in case */
+	cpu_park_loop();
+#endif
+}
+
 /*
  * Main handler for inter-processor interrupts
  */
@@ -857,6 +883,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		irq_exit();
 		break;
 
+	case IPI_CPU_CRASH_STOP:
+		if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+			irq_enter();
+			ipi_cpu_crash_stop(cpu, regs);
+
+			unreachable();
+		}
+		break;
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 	case IPI_TIMER:
 		irq_enter();
@@ -929,6 +964,34 @@ void smp_send_stop(void)
 			   cpumask_pr_args(cpu_online_mask));
 }
 
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+	cpumask_t mask;
+	unsigned long timeout;
+
+	if (num_online_cpus() == 1)
+		return;
+
+	cpumask_copy(&mask, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), &mask);
+
+	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+	pr_crit("SMP: stopping secondary CPUs\n");
+	smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+	/* Wait up to one second for other CPUs to stop */
+	timeout = USEC_PER_SEC;
+	while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+		udelay(1);
+
+	if (atomic_read(&waiting_for_crash_ipi) > 0)
+		pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+			   cpumask_pr_args(cpu_online_mask));
+}
+#endif
+
 /*
  * not supported here
  */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 5/9] arm64: kdump: add VMCOREINFO's for user-space tools
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
                   ` (3 preceding siblings ...)
  2016-12-28  4:36 ` [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
@ 2016-12-28  4:36 ` AKASHI Takahiro
  2016-12-28  4:36 ` [PATCH v29 6/9] arm64: kdump: provide /proc/vmcore file AKASHI Takahiro
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:36 UTC (permalink / raw)
  To: linux-arm-kernel

In addition to common VMCOREINFO's defined in
crash_save_vmcoreinfo_init(), we need to know, for crash utility,
  - kimage_voffset
  - PHYS_OFFSET
to examine the contents of a dump file (/proc/vmcore) correctly
due to the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.

  - VA_BITS
is also required for makedumpfile command.

arch_crash_save_vmcoreinfo() appends them to the dump file.
More VMCOREINFO's may be added later.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/machine_kexec.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index c60346d33bb1..994fe0bc5cc0 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -17,6 +17,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
+#include <asm/memory.h>
 #include <asm/mmu_context.h>
 
 #include "cpu-reset.h"
@@ -260,3 +261,13 @@ void machine_crash_shutdown(struct pt_regs *regs)
 
 	pr_info("Starting crashdump kernel...\n");
 }
+
+void arch_crash_save_vmcoreinfo(void)
+{
+	VMCOREINFO_NUMBER(VA_BITS);
+	/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
+	vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
+						kimage_voffset);
+	vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
+						PHYS_OFFSET);
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 6/9] arm64: kdump: provide /proc/vmcore file
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
                   ` (4 preceding siblings ...)
  2016-12-28  4:36 ` [PATCH v29 5/9] arm64: kdump: add VMCOREINFO's for user-space tools AKASHI Takahiro
@ 2016-12-28  4:36 ` AKASHI Takahiro
  2016-12-28  4:36 ` [PATCH v29 7/9] arm64: kdump: enable kdump in defconfig AKASHI Takahiro
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:36 UTC (permalink / raw)
  To: linux-arm-kernel

Add arch-specific functions to provide a dump file, /proc/vmcore.

This file is in ELF format and its ELF header needs to be prepared by
userspace tools, like kexec-tools, in adance. The primary kernel is
responsible to allocate the region with reserve_elfcorehdr() at boot time
and advertize its location to crash dump kernel via a new device-tree
property, "linux,elfcorehdr".

Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing it
since it does not reside in linear mapping on crash dump kernel.

We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig             | 11 +++++++
 arch/arm64/kernel/Makefile     |  1 +
 arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/mm/init.c           | 54 ++++++++++++++++++++++++++++++++
 4 files changed, 137 insertions(+)
 create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 111742126897..2bd6a1a062b9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -693,6 +693,17 @@ config KEXEC
 	  but it is independent of the system firmware.   And like a reboot
 	  you can start any kernel with it, not just Linux.
 
+config CRASH_DUMP
+	bool "Build kdump crash kernel"
+	help
+	  Generate crash dump after being started by kexec. This should
+	  be normally only set in special crash dump kernels which are
+	  loaded in the main kernel with kexec-tools into a specially
+	  reserved region and then later executed after a crash by
+	  kdump/kexec.
+
+	  For more details see Documentation/kdump/kdump.txt
+
 config XEN_DOM0
 	def_bool y
 	depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 7d66bbaafc0c..6a7384eee08d 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
 
 obj-y					+= $(arm64-obj-y) vdso/ probes/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 000000000000..c3d5a21c081e
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+			 size_t csize, unsigned long offset,
+			 int userbuf)
+{
+	void *vaddr;
+
+	if (!csize)
+		return 0;
+
+	vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+	if (!vaddr)
+		return -ENOMEM;
+
+	if (userbuf) {
+		if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
+			memunmap(vaddr);
+			return -EFAULT;
+		}
+	} else {
+		memcpy(buf, vaddr + offset, csize);
+	}
+
+	memunmap(vaddr);
+
+	return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+	memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+	return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1d62bf71b531..ef8adfde32e7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -38,6 +38,7 @@
 #include <linux/swiotlb.h>
 #include <linux/vmalloc.h>
 #include <linux/kexec.h>
+#include <linux/crash_dump.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -183,6 +184,57 @@ static void __init reserve_crashkernel(void)
 }
 #endif /* CONFIG_KEXEC_CORE */
 
+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	const __be32 *reg;
+	int len;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves elf core header given in "elfcorehdr=" kernel
+ * command line parameter. This region contains all the information about
+ * primary kernel's core image and is used by a dump capture kernel to
+ * access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
+	if (!elfcorehdr_size)
+		return;
+
+	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+		pr_warn("elfcorehdr is overlapped\n");
+		return;
+	}
+
+	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
+
+	pr_info("Reserving %lldKB of memory@0x%llx for elfcorehdr\n",
+		elfcorehdr_size >> 10, elfcorehdr_addr);
+}
+#else
+static void __init reserve_elfcorehdr(void)
+{
+	;
+}
+#endif /* CONFIG_CRASH_DUMP */
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -441,6 +493,8 @@ void __init arm64_memblock_init(void)
 
 	reserve_crashkernel();
 
+	reserve_elfcorehdr();
+
 	dma_contiguous_reserve(arm64_dma_phys_limit);
 
 	memblock_allow_resize();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 7/9] arm64: kdump: enable kdump in defconfig
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
                   ` (5 preceding siblings ...)
  2016-12-28  4:36 ` [PATCH v29 6/9] arm64: kdump: provide /proc/vmcore file AKASHI Takahiro
@ 2016-12-28  4:36 ` AKASHI Takahiro
  2016-12-28  4:36 ` [PATCH v29 8/9] Documentation: kdump: describe arm64 port AKASHI Takahiro
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:36 UTC (permalink / raw)
  To: linux-arm-kernel

Kdump is enabled by default as kexec is.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 869dded0f09f..5ce7cd00f4ea 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -79,6 +79,7 @@ CONFIG_CMA=y
 CONFIG_SECCOMP=y
 CONFIG_XEN=y
 CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_COMPAT=y
 CONFIG_CPU_IDLE=y
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 8/9] Documentation: kdump: describe arm64 port
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
                   ` (6 preceding siblings ...)
  2016-12-28  4:36 ` [PATCH v29 7/9] arm64: kdump: enable kdump in defconfig AKASHI Takahiro
@ 2016-12-28  4:36 ` AKASHI Takahiro
  2016-12-28  4:37 ` [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
  2017-01-06  3:26 ` [PATCH v29 0/9] arm64: add kdump support Pratyush Anand
  9 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:36 UTC (permalink / raw)
  To: linux-arm-kernel

Add arch specific descriptions about kdump usage on arm64 to kdump.txt.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 Documentation/kdump/kdump.txt | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index b0eb27b956d9..615434d81108 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
 a remote system.
 
 Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.
 
 When the system kernel boots, it reserves a small section of memory for
 the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)
 
     AUTO_ZRELADDR=y
 
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+  on non-VHE systems even if it is configured. This is because the CPU
+  will not be reset to EL2 on panic.
+
 Extended crashkernel syntax
 ===========================
 
@@ -305,6 +312,8 @@ Boot into System Kernel
    kernel will automatically locate the crash kernel image within the
    first 512MB of RAM if X is not given.
 
+   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
 
 Load the Dump-capture Kernel
 ============================
@@ -327,6 +336,8 @@ For s390x:
 	- Use image or bzImage
 For arm:
 	- Use zImage
+For arm64:
+	- Use vmlinux or Image
 
 If you are using a uncompressed vmlinux image then use following command
 to load dump-capture kernel.
@@ -370,6 +381,9 @@ For s390x:
 For arm:
 	"1 maxcpus=1 reset_devices"
 
+For arm64:
+	"1 maxcpus=1 reset_devices"
+
 Notes on loading the dump-capture kernel:
 
 * By default, the ELF headers are stored in ELF64 format to support
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
                   ` (7 preceding siblings ...)
  2016-12-28  4:36 ` [PATCH v29 8/9] Documentation: kdump: describe arm64 port AKASHI Takahiro
@ 2016-12-28  4:37 ` AKASHI Takahiro
  2017-01-10 11:10   ` Will Deacon
  2017-01-12 15:39   ` Mark Rutland
  2017-01-06  3:26 ` [PATCH v29 0/9] arm64: add kdump support Pratyush Anand
  9 siblings, 2 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2016-12-28  4:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: James Morse <james.morse@arm.com>

Add documentation for
	linux,crashkernel-base and crashkernel-size,
	linux,usable-memory-range
	linux,elfcorehdr
used by arm64 kdump to decribe the kdump reserved area, and
the elfcorehdr's location within it.

Signed-off-by: James Morse <james.morse@arm.com>
[takahiro.akashi at linaro.org: added "linux,crashkernel-base" and "-size" ]
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: devicetree at vger.kernel.org
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
---
 Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82d4c37..7b115165e9ec 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
 book3e) by some versions of kexec-tools to tell the new kernel that it
 is being booted by kexec, as the booting environment may differ (e.g.
 a different secondary CPU release mechanism)
+
+linux,crashkernel-base
+linux,crashkernel-size
+----------------------
+
+These properties (currently used on PowerPC and arm64) indicates
+the base address and the size, respectively, of the reserved memory
+range for crash dump kernel.
+e.g.
+
+/ {
+	chosen {
+		linux,crashkernel-base = <0x9 0xf0000000>;
+		linux,crashkernel-size = <0x0 0x10000000>;
+	};
+};
+
+linux,usable-memory-range
+-------------------------
+
+This property (currently used only on arm64) holds the memory range,
+the base address and the size, which can be used as system ram on
+the *current* kernel. Note that, if this property is present, any memory
+regions under "memory" nodes in DT blob or ones marked as "conventional
+memory" in EFI memory map should be ignored.
+e.g.
+
+/ {
+	chosen {
+		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+	};
+};
+
+The main usage is for crash dump kernel to identify its own usable
+memory and exclude, at its boot time, any other memory areas that are
+part of the panicked kernel's memory.
+
+linux,elfcorehdr
+----------------
+
+This property (currently used only on arm64) holds the memory range,
+the address and the size, of the elf core header which mainly describes
+the panicked kernel's memory layout as PT_LOAD segments of elf format.
+e.g.
+
+/ {
+	chosen {
+		linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+	};
+};
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 0/9] arm64: add kdump support
  2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
                   ` (8 preceding siblings ...)
  2016-12-28  4:37 ` [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
@ 2017-01-06  3:26 ` Pratyush Anand
  2017-01-06  4:34   ` AKASHI Takahiro
  9 siblings, 1 reply; 39+ messages in thread
From: Pratyush Anand @ 2017-01-06  3:26 UTC (permalink / raw)
  To: linux-arm-kernel



On Wednesday 28 December 2016 10:03 AM, AKASHI Takahiro wrote:
> This patch series adds kdump support on arm64.
>
> To load a crash-dump kernel to the systems, a series of patches to
> kexec-tools[1] are also needed. Please use the latest one, v4 [2].
> For your convinience, you can pick them up from:
>    https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump
>    https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump
>
> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
>   - crash utility (v7.1.7 or later) [3]
>     (As of today, the very latest tree cannot handle a core image correctly
>      due to the commit 7fd8329ba502 ("taint/module: Clean up global and module
>      taint flags handling")
>
>
> The previous version, v27, was also:
> Tested-by: Pratyush Anand <panand@redhat.com> (mustang and seattle)

You have my "Tested-by" for this version as well :-). Patches works 
great after applying them over v4.10-rc2.

I used crash-utility from below where there are few patches queued for 
crash-7.1.8
https://github.com/crash-utility/crash

~Pratyush

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 0/9] arm64: add kdump support
  2017-01-06  3:26 ` [PATCH v29 0/9] arm64: add kdump support Pratyush Anand
@ 2017-01-06  4:34   ` AKASHI Takahiro
  0 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-06  4:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 06, 2017 at 08:56:30AM +0530, Pratyush Anand wrote:
> 
> 
> On Wednesday 28 December 2016 10:03 AM, AKASHI Takahiro wrote:
> >This patch series adds kdump support on arm64.
> >
> >To load a crash-dump kernel to the systems, a series of patches to
> >kexec-tools[1] are also needed. Please use the latest one, v4 [2].
> >For your convinience, you can pick them up from:
> >   https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump
> >   https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump
> >
> >To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
> >  - crash utility (v7.1.7 or later) [3]
> >    (As of today, the very latest tree cannot handle a core image correctly
> >     due to the commit 7fd8329ba502 ("taint/module: Clean up global and module
> >     taint flags handling")
> >
> >
> >The previous version, v27, was also:
> >Tested-by: Pratyush Anand <panand@redhat.com> (mustang and seattle)
> 
> You have my "Tested-by" for this version as well :-). Patches works great
> after applying them over v4.10-rc2.

Great thanks, again.

> I used crash-utility from below where there are few patches queued for
> crash-7.1.8
> https://github.com/crash-utility/crash

Yes, we need some patch for v4.10 kernel.

-Takahiro AKASHI

> ~Pratyush

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-12-28  4:37 ` [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
@ 2017-01-10 11:10   ` Will Deacon
  2017-01-12 15:39   ` Mark Rutland
  1 sibling, 0 replies; 39+ messages in thread
From: Will Deacon @ 2017-01-10 11:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James, Akashi,

On Wed, Dec 28, 2016 at 01:37:34PM +0900, AKASHI Takahiro wrote:
> From: James Morse <james.morse@arm.com>
> 
> Add documentation for
> 	linux,crashkernel-base and crashkernel-size,
> 	linux,usable-memory-range
> 	linux,elfcorehdr
> used by arm64 kdump to decribe the kdump reserved area, and
> the elfcorehdr's location within it.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> [takahiro.akashi at linaro.org: added "linux,crashkernel-base" and "-size" ]
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Cc: devicetree at vger.kernel.org
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> ---
>  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)

I need an ack from Rob or Mark before I can merge this.

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 1/9] memblock: add memblock_cap_memory_range()
  2016-12-28  4:35 ` [PATCH v29 1/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
@ 2017-01-10 11:16   ` Will Deacon
  2017-01-11  1:57     ` Dennis Chen
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2017-01-10 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

[adding some more folks to cc]

On Wed, Dec 28, 2016 at 01:35:25PM +0900, AKASHI Takahiro wrote:
> Add memblock_cap_memory_range() which will remove all the memblock regions
> except the memory range specified in the arguments. In addition, rework is
> done on memblock_mem_limit_remove_map() to re-implement it using
> memblock_cap_memory_range().
> 
> This function, like memblock_mem_limit_remove_map(), will not remove
> memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
> later as "device memory."
> See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
> address the mem limit issue").
> 
> This function is used, in a succeeding patch in the series of arm64 kdump
> suuport, to limit the range of usable memory, or System RAM, on crash dump
> kernel.
> (Please note that "mem=" parameter is of little use for this purpose.)
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Reviewed-by: Will Deacon <will.deacon@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: linux-mm at kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/memblock.h |  1 +
>  mm/memblock.c            | 44 +++++++++++++++++++++++++++++---------------
>  2 files changed, 30 insertions(+), 15 deletions(-)

Whilst this patch looks fine to me, it would be nice if Dennis (author of
memblock_mem_limit_remove_map) or one of the mm chaps could provide an ack
before I merge it via arm64.

Thanks,

Will

> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 5b759c9acf97..fbfcacc50c29 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -333,6 +333,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
>  phys_addr_t memblock_start_of_DRAM(void);
>  phys_addr_t memblock_end_of_DRAM(void);
>  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> +void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
>  void memblock_mem_limit_remove_map(phys_addr_t limit);
>  bool memblock_is_memory(phys_addr_t addr);
>  int memblock_is_map_memory(phys_addr_t addr);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7608bc305936..fea1688fef60 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1514,11 +1514,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
>  			      (phys_addr_t)ULLONG_MAX);
>  }
>  
> +void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
> +{
> +	int start_rgn, end_rgn;
> +	int i, ret;
> +
> +	if (!size)
> +		return;
> +
> +	ret = memblock_isolate_range(&memblock.memory, base, size,
> +						&start_rgn, &end_rgn);
> +	if (ret)
> +		return;
> +
> +	/* remove all the MAP regions */
> +	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
> +		if (!memblock_is_nomap(&memblock.memory.regions[i]))
> +			memblock_remove_region(&memblock.memory, i);
> +
> +	for (i = start_rgn - 1; i >= 0; i--)
> +		if (!memblock_is_nomap(&memblock.memory.regions[i]))
> +			memblock_remove_region(&memblock.memory, i);
> +
> +	/* truncate the reserved regions */
> +	memblock_remove_range(&memblock.reserved, 0, base);
> +	memblock_remove_range(&memblock.reserved,
> +			base + size, (phys_addr_t)ULLONG_MAX);
> +}
> +
>  void __init memblock_mem_limit_remove_map(phys_addr_t limit)
>  {
> -	struct memblock_type *type = &memblock.memory;
>  	phys_addr_t max_addr;
> -	int i, ret, start_rgn, end_rgn;
>  
>  	if (!limit)
>  		return;
> @@ -1529,19 +1555,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
>  	if (max_addr == (phys_addr_t)ULLONG_MAX)
>  		return;
>  
> -	ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
> -				&start_rgn, &end_rgn);
> -	if (ret)
> -		return;
> -
> -	/* remove all the MAP regions above the limit */
> -	for (i = end_rgn - 1; i >= start_rgn; i--) {
> -		if (!memblock_is_nomap(&type->regions[i]))
> -			memblock_remove_region(type, i);
> -	}
> -	/* truncate the reserved regions */
> -	memblock_remove_range(&memblock.reserved, max_addr,
> -			      (phys_addr_t)ULLONG_MAX);
> +	memblock_cap_memory_range(0, max_addr);
>  }
>  
>  static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown()
  2016-12-28  4:36 ` [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
@ 2017-01-10 11:32   ` Will Deacon
  2017-01-11  6:36     ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2017-01-10 11:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 28, 2016 at 01:36:01PM +0900, AKASHI Takahiro wrote:
> Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
> and save registers' status in per-cpu ELF notes before starting crash
> dump kernel. See kernel_kexec().
> Even if not all secondary cpus have shut down, we do kdump anyway.
> 
> As we don't have to make non-boot(crashed) cpus offline (to preserve
> correct status of cpus at crash dump) before shutting down, this patch
> also adds a variant of smp_send_stop().
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Reviewed-by: James Morse <james.morse@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/hardirq.h  |  2 +-
>  arch/arm64/include/asm/kexec.h    | 42 +++++++++++++++++++++++++-
>  arch/arm64/include/asm/smp.h      |  2 ++
>  arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
>  arch/arm64/kernel/smp.c           | 63 +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 160 insertions(+), 5 deletions(-)

[...]

> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index d050d720a1b4..cea009f2657d 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -148,6 +148,8 @@ static inline void cpu_panic_kernel(void)
>   */
>  bool cpus_are_stuck_in_kernel(void);
>  
> +extern void smp_send_crash_stop(void);
> +
>  #endif /* ifndef __ASSEMBLY__ */
>  
>  #endif /* ifndef __ASM_SMP_H */
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index bc96c8a7fc79..c60346d33bb1 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -9,6 +9,9 @@
>   * published by the Free Software Foundation.
>   */
>  
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
>  #include <linux/kexec.h>
>  #include <linux/smp.h>
>  
> @@ -22,6 +25,7 @@
>  extern const unsigned char arm64_relocate_new_kernel[];
>  extern const unsigned long arm64_relocate_new_kernel_size;
>  
> +static bool in_crash_kexec;

Do you actually need this bool? Why not call kexec_crash_loaded() instead?

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 1/9] memblock: add memblock_cap_memory_range()
  2017-01-10 11:16   ` Will Deacon
@ 2017-01-11  1:57     ` Dennis Chen
  0 siblings, 0 replies; 39+ messages in thread
From: Dennis Chen @ 2017-01-11  1:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 10, 2017 at 11:16:18AM +0000, Will Deacon wrote:
> [adding some more folks to cc]
> 
> On Wed, Dec 28, 2016 at 01:35:25PM +0900, AKASHI Takahiro wrote:
> > Add memblock_cap_memory_range() which will remove all the memblock regions
> > except the memory range specified in the arguments. In addition, rework is
> > done on memblock_mem_limit_remove_map() to re-implement it using
> > memblock_cap_memory_range().
> > 
> > This function, like memblock_mem_limit_remove_map(), will not remove
> > memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
> > later as "device memory."
> > See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
> > address the mem limit issue").
> > 
> > This function is used, in a succeeding patch in the series of arm64 kdump
> > suuport, to limit the range of usable memory, or System RAM, on crash dump
> > kernel.
> > (Please note that "mem=" parameter is of little use for this purpose.)
> > 
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Reviewed-by: Will Deacon <will.deacon@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: linux-mm at kvack.org
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >  include/linux/memblock.h |  1 +
> >  mm/memblock.c            | 44 +++++++++++++++++++++++++++++---------------
> >  2 files changed, 30 insertions(+), 15 deletions(-)
> 
> Whilst this patch looks fine to me, it would be nice if Dennis (author of
> memblock_mem_limit_remove_map) or one of the mm chaps could provide an ack
> before I merge it via arm64.
>
Will feel free to add
Acked-by: Dennis Chen <dennis.chen@arm.com>

Thanks,
Dennis
> 
> Thanks,
> 
> Will
> 
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index 5b759c9acf97..fbfcacc50c29 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -333,6 +333,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
> >  phys_addr_t memblock_start_of_DRAM(void);
> >  phys_addr_t memblock_end_of_DRAM(void);
> >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > +void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> >  bool memblock_is_memory(phys_addr_t addr);
> >  int memblock_is_map_memory(phys_addr_t addr);
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index 7608bc305936..fea1688fef60 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -1514,11 +1514,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
> >  			      (phys_addr_t)ULLONG_MAX);
> >  }
> >  
> > +void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
> > +{
> > +	int start_rgn, end_rgn;
> > +	int i, ret;
> > +
> > +	if (!size)
> > +		return;
> > +
> > +	ret = memblock_isolate_range(&memblock.memory, base, size,
> > +						&start_rgn, &end_rgn);
> > +	if (ret)
> > +		return;
> > +
> > +	/* remove all the MAP regions */
> > +	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
> > +		if (!memblock_is_nomap(&memblock.memory.regions[i]))
> > +			memblock_remove_region(&memblock.memory, i);
> > +
> > +	for (i = start_rgn - 1; i >= 0; i--)
> > +		if (!memblock_is_nomap(&memblock.memory.regions[i]))
> > +			memblock_remove_region(&memblock.memory, i);
> > +
> > +	/* truncate the reserved regions */
> > +	memblock_remove_range(&memblock.reserved, 0, base);
> > +	memblock_remove_range(&memblock.reserved,
> > +			base + size, (phys_addr_t)ULLONG_MAX);
> > +}
> > +
> >  void __init memblock_mem_limit_remove_map(phys_addr_t limit)
> >  {
> > -	struct memblock_type *type = &memblock.memory;
> >  	phys_addr_t max_addr;
> > -	int i, ret, start_rgn, end_rgn;
> >  
> >  	if (!limit)
> >  		return;
> > @@ -1529,19 +1555,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
> >  	if (max_addr == (phys_addr_t)ULLONG_MAX)
> >  		return;
> >  
> > -	ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
> > -				&start_rgn, &end_rgn);
> > -	if (ret)
> > -		return;
> > -
> > -	/* remove all the MAP regions above the limit */
> > -	for (i = end_rgn - 1; i >= start_rgn; i--) {
> > -		if (!memblock_is_nomap(&type->regions[i]))
> > -			memblock_remove_region(type, i);
> > -	}
> > -	/* truncate the reserved regions */
> > -	memblock_remove_range(&memblock.reserved, max_addr,
> > -			      (phys_addr_t)ULLONG_MAX);
> > +	memblock_cap_memory_range(0, max_addr);
> >  }
> >  
> >  static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
> > -- 
> > 2.11.0
> > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown()
  2017-01-10 11:32   ` Will Deacon
@ 2017-01-11  6:36     ` AKASHI Takahiro
  2017-01-11 10:54       ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-11  6:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Tue, Jan 10, 2017 at 11:32:48AM +0000, Will Deacon wrote:
> On Wed, Dec 28, 2016 at 01:36:01PM +0900, AKASHI Takahiro wrote:
> > Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
> > and save registers' status in per-cpu ELF notes before starting crash
> > dump kernel. See kernel_kexec().
> > Even if not all secondary cpus have shut down, we do kdump anyway.
> > 
> > As we don't have to make non-boot(crashed) cpus offline (to preserve
> > correct status of cpus at crash dump) before shutting down, this patch
> > also adds a variant of smp_send_stop().
> > 
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Reviewed-by: James Morse <james.morse@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  arch/arm64/include/asm/hardirq.h  |  2 +-
> >  arch/arm64/include/asm/kexec.h    | 42 +++++++++++++++++++++++++-
> >  arch/arm64/include/asm/smp.h      |  2 ++
> >  arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
> >  arch/arm64/kernel/smp.c           | 63 +++++++++++++++++++++++++++++++++++++++
> >  5 files changed, 160 insertions(+), 5 deletions(-)
> 
> [...]
> 
> > diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> > index d050d720a1b4..cea009f2657d 100644
> > --- a/arch/arm64/include/asm/smp.h
> > +++ b/arch/arm64/include/asm/smp.h
> > @@ -148,6 +148,8 @@ static inline void cpu_panic_kernel(void)
> >   */
> >  bool cpus_are_stuck_in_kernel(void);
> >  
> > +extern void smp_send_crash_stop(void);
> > +
> >  #endif /* ifndef __ASSEMBLY__ */
> >  
> >  #endif /* ifndef __ASM_SMP_H */
> > diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> > index bc96c8a7fc79..c60346d33bb1 100644
> > --- a/arch/arm64/kernel/machine_kexec.c
> > +++ b/arch/arm64/kernel/machine_kexec.c
> > @@ -9,6 +9,9 @@
> >   * published by the Free Software Foundation.
> >   */
> >  
> > +#include <linux/interrupt.h>
> > +#include <linux/irq.h>
> > +#include <linux/kernel.h>
> >  #include <linux/kexec.h>
> >  #include <linux/smp.h>
> >  
> > @@ -22,6 +25,7 @@
> >  extern const unsigned char arm64_relocate_new_kernel[];
> >  extern const unsigned long arm64_relocate_new_kernel_size;
> >  
> > +static bool in_crash_kexec;
> 
> Do you actually need this bool? Why not call kexec_crash_loaded() instead?

The two have different meanings:
"in_crash_kexec" indicates that kdump is taking place, while
kexec_crash_loaded() tells us only whether crash dump kernel has been
loaded or not.

It is crucial to distinguish them especially for machine_kexec()
which can be called on normal kexec even if kdump has been set up.

Thanks,
-Takahiro AKASHI

> Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown()
  2017-01-11  6:36     ` AKASHI Takahiro
@ 2017-01-11 10:54       ` Will Deacon
  2017-01-12  4:21         ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2017-01-11 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 11, 2017 at 03:36:28PM +0900, AKASHI Takahiro wrote:
> On Tue, Jan 10, 2017 at 11:32:48AM +0000, Will Deacon wrote:
> > On Wed, Dec 28, 2016 at 01:36:01PM +0900, AKASHI Takahiro wrote:
> > > @@ -22,6 +25,7 @@
> > >  extern const unsigned char arm64_relocate_new_kernel[];
> > >  extern const unsigned long arm64_relocate_new_kernel_size;
> > >  
> > > +static bool in_crash_kexec;
> > 
> > Do you actually need this bool? Why not call kexec_crash_loaded() instead?
> 
> The two have different meanings:
> "in_crash_kexec" indicates that kdump is taking place, while
> kexec_crash_loaded() tells us only whether crash dump kernel has been
> loaded or not.
> 
> It is crucial to distinguish them especially for machine_kexec()
> which can be called on normal kexec even if kdump has been set up.

Ah, I see. So how about just doing:

  if (kimage == kexec_crash_image)

in machine_kexec?

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown()
  2017-01-11 10:54       ` Will Deacon
@ 2017-01-12  4:21         ` AKASHI Takahiro
  2017-01-12 12:01           ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-12  4:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 11, 2017 at 10:54:05AM +0000, Will Deacon wrote:
> On Wed, Jan 11, 2017 at 03:36:28PM +0900, AKASHI Takahiro wrote:
> > On Tue, Jan 10, 2017 at 11:32:48AM +0000, Will Deacon wrote:
> > > On Wed, Dec 28, 2016 at 01:36:01PM +0900, AKASHI Takahiro wrote:
> > > > @@ -22,6 +25,7 @@
> > > >  extern const unsigned char arm64_relocate_new_kernel[];
> > > >  extern const unsigned long arm64_relocate_new_kernel_size;
> > > >  
> > > > +static bool in_crash_kexec;
> > > 
> > > Do you actually need this bool? Why not call kexec_crash_loaded() instead?
> > 
> > The two have different meanings:
> > "in_crash_kexec" indicates that kdump is taking place, while
> > kexec_crash_loaded() tells us only whether crash dump kernel has been
> > loaded or not.
> > 
> > It is crucial to distinguish them especially for machine_kexec()
> > which can be called on normal kexec even if kdump has been set up.
> 
> Ah, I see. So how about just doing:
> 
>   if (kimage == kexec_crash_image)
> 
> in machine_kexec?

Yeah, it should work.
Do you want to merge the following hunk,
or expect that I will re-send the whole patch series
(with other changes if any)?

-Takahiro AkASHI

> Will
===8<==
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 994fe0bc5cc0..c0fc3d458195 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -26,7 +26,6 @@
 extern const unsigned char arm64_relocate_new_kernel[];
 extern const unsigned long arm64_relocate_new_kernel_size;
 
-static bool in_crash_kexec;
 static unsigned long kimage_start;
 
 /**
@@ -154,7 +153,7 @@ void machine_kexec(struct kimage *kimage)
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
 	 */
 	BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
-			!WARN_ON(in_crash_kexec));
+			!WARN_ON(kimage == kexec_crash_image));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
 	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -206,8 +205,8 @@ void machine_kexec(struct kimage *kimage)
 	 * relocation is complete.
 	 */
 
-	cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
-		kimage_start, 0);
+	cpu_soft_restart(kimage != kexec_crash_image,
+		reboot_code_buffer_phys, kimage->head, kimage_start, 0);
 
 	BUG(); /* Should never get here. */
 }
@@ -250,8 +249,6 @@ void machine_crash_shutdown(struct pt_regs *regs)
 {
 	local_irq_disable();
 
-	in_crash_kexec = true;
-
 	/* shutdown non-crashing cpus */
 	smp_send_crash_stop();
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown()
  2017-01-12  4:21         ` AKASHI Takahiro
@ 2017-01-12 12:01           ` Will Deacon
  2017-01-23 17:46             ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2017-01-12 12:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 12, 2017 at 01:21:44PM +0900, AKASHI Takahiro wrote:
> On Wed, Jan 11, 2017 at 10:54:05AM +0000, Will Deacon wrote:
> > On Wed, Jan 11, 2017 at 03:36:28PM +0900, AKASHI Takahiro wrote:
> > > On Tue, Jan 10, 2017 at 11:32:48AM +0000, Will Deacon wrote:
> > > > On Wed, Dec 28, 2016 at 01:36:01PM +0900, AKASHI Takahiro wrote:
> > > > > @@ -22,6 +25,7 @@
> > > > >  extern const unsigned char arm64_relocate_new_kernel[];
> > > > >  extern const unsigned long arm64_relocate_new_kernel_size;
> > > > >  
> > > > > +static bool in_crash_kexec;
> > > > 
> > > > Do you actually need this bool? Why not call kexec_crash_loaded() instead?
> > > 
> > > The two have different meanings:
> > > "in_crash_kexec" indicates that kdump is taking place, while
> > > kexec_crash_loaded() tells us only whether crash dump kernel has been
> > > loaded or not.
> > > 
> > > It is crucial to distinguish them especially for machine_kexec()
> > > which can be called on normal kexec even if kdump has been set up.
> > 
> > Ah, I see. So how about just doing:
> > 
> >   if (kimage == kexec_crash_image)
> > 
> > in machine_kexec?
> 
> Yeah, it should work.
> Do you want to merge the following hunk,
> or expect that I will re-send the whole patch series
> (with other changes if any)?

Thanks, I'll fold it in and shout if I run into any problems. My plan is
to queue this for 4.11.

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2016-12-28  4:36 ` [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
@ 2017-01-12 15:09   ` Mark Rutland
  2017-01-13  8:16     ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2017-01-12 15:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

As a general note, I must apologise for my minimial review of the series
until this point. Judging by the way the DT parts are organised. I'm
very concerned with the way the DT parts are organised, and clearly I
did not communicate my concerns and suggestions effectively in prior
rounds of review.

On Wed, Dec 28, 2016 at 01:36:00PM +0900, AKASHI Takahiro wrote:
> "crashkernel=" kernel parameter specifies the size (and optionally
> the start address) of the system ram used by crash dump kernel.
> reserve_crashkernel() will allocate and reserve the memory at the startup
> of primary kernel.
> 
> This memory range will be exported to userspace via:
> 	- an entry named "Crash kernel" in /proc/iomem, and
> 	- "linux,crashkernel-base" and "linux,crashkernel-size" under
> 	  /sys/firmware/devicetree/base/chosen

> +#ifdef CONFIG_KEXEC_CORE
> +static unsigned long long crash_size, crash_base;
> +static struct property crash_base_prop = {
> +	.name = "linux,crashkernel-base",
> +	.length = sizeof(u64),
> +	.value = &crash_base
> +};
> +static struct property crash_size_prop = {
> +	.name = "linux,crashkernel-size",
> +	.length = sizeof(u64),
> +	.value = &crash_size,
> +};
> +
> +static int __init export_crashkernel(void)
> +{
> +	struct device_node *node;
> +	int ret;
> +
> +	if (!crash_size)
> +		return 0;
> +
> +	/* Add /chosen/linux,crashkernel-* properties */
> +	node = of_find_node_by_path("/chosen");
> +	if (!node)
> +		return -ENOENT;
> +
> +	/*
> +	 * There might be existing crash kernel properties, but we can't
> +	 * be sure what's in them, so remove them.
> +	 */
> +	of_remove_property(node, of_find_property(node,
> +				"linux,crashkernel-base", NULL));
> +	of_remove_property(node, of_find_property(node,
> +				"linux,crashkernel-size", NULL));
> +
> +	ret = of_add_property(node, &crash_base_prop);
> +	if (ret)
> +		goto ret_err;
> +
> +	ret = of_add_property(node, &crash_size_prop);
> +	if (ret)
> +		goto ret_err;
> +
> +	return 0;
> +
> +ret_err:
> +	pr_warn("Exporting crashkernel region to device tree failed\n");
> +	return ret;
> +}
> +late_initcall(export_crashkernel);

I very much do not like this.

I don't think we should be modifying the DT exposed to userspace in this
manner, in the usual boot path, especially given that the kernel itself
does not appear to be a consumer of this property. I do not think that
it is right to use the DT exposed to userspace as a communication
channel solely between the kernel and userspace.

So I think we should drop the above, and for arm64 have userspace
consistently use /proc/iomem (or perhaps a new kexec-specific file) to
determine the region reserved for the crash kernel, if it needs to know
this.

I'll have further comments on this front in the binding patch.

> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +static void __init reserve_crashkernel(void)
> +{
> +	int ret;
> +
> +	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> +				&crash_size, &crash_base);
> +	/* no crashkernel= or invalid value specified */
> +	if (ret || !crash_size)
> +		return;
> +
> +	if (crash_base == 0) {
> +		/* Current arm64 boot protocol requires 2MB alignment */
> +		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
> +				crash_size, SZ_2M);
> +		if (crash_base == 0) {
> +			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> +				crash_size);
> +			return;
> +		}
> +	} else {
> +		/* User specifies base address explicitly. */
> +		if (!memblock_is_region_memory(crash_base, crash_size) ||
> +			memblock_is_region_reserved(crash_base, crash_size)) {
> +			pr_warn("crashkernel has wrong address or size\n");
> +			return;
> +		}
> +
> +		if (!IS_ALIGNED(crash_base, SZ_2M)) {
> +			pr_warn("crashkernel base address is not 2MB aligned\n");
> +			return;
> +		}
> +	}
> +	memblock_reserve(crash_base, crash_size);

This will mean that the crash kernel will have a permanent alias in the linear
map which is vulnerable to being clobbered. There could also be issues
with mismatched attributes in future.

We're probably ok for now, but in future we'll likely want to fix this
up to remove the region (or mark it nomap), and only map it temporarily
when loading things into the region.

> +
> +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> +		crash_size >> 20, crash_base >> 20);
> +
> +	crashk_res.start = crash_base;
> +	crashk_res.end = crash_base + crash_size - 1;
> +}
> +#else
> +static void __init reserve_crashkernel(void)
> +{
> +	;

Nit: the ';' line can go.

> +}
> +#endif /* CONFIG_KEXEC_CORE */
> +
>  /*
>   * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
>   * currently assumes that for memory starting above 4G, 32-bit devices will
> @@ -331,6 +438,9 @@ void __init arm64_memblock_init(void)
>  		arm64_dma_phys_limit = max_zone_dma_phys();
>  	else
>  		arm64_dma_phys_limit = PHYS_MASK + 1;
> +
> +	reserve_crashkernel();
> +
>  	dma_contiguous_reserve(arm64_dma_phys_limit);
>  
>  	memblock_allow_resize();
> -- 
> 2.11.0

Other than my comments regarding the DT usage above, this looks fine to
me.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2016-12-28  4:37 ` [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
  2017-01-10 11:10   ` Will Deacon
@ 2017-01-12 15:39   ` Mark Rutland
  2017-01-13  9:13     ` AKASHI Takahiro
  1 sibling, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2017-01-12 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, Dec 28, 2016 at 01:37:34PM +0900, AKASHI Takahiro wrote:
> From: James Morse <james.morse@arm.com>
> 
> Add documentation for
> 	linux,crashkernel-base and crashkernel-size,
> 	linux,usable-memory-range
> 	linux,elfcorehdr
> used by arm64 kdump to decribe the kdump reserved area, and
> the elfcorehdr's location within it.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> [takahiro.akashi at linaro.org: added "linux,crashkernel-base" and "-size" ]
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Cc: devicetree at vger.kernel.org
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> ---
>  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> index 6ae9d82d4c37..7b115165e9ec 100644
> --- a/Documentation/devicetree/bindings/chosen.txt
> +++ b/Documentation/devicetree/bindings/chosen.txt
> @@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
>  book3e) by some versions of kexec-tools to tell the new kernel that it
>  is being booted by kexec, as the booting environment may differ (e.g.
>  a different secondary CPU release mechanism)
> +
> +linux,crashkernel-base
> +linux,crashkernel-size
> +----------------------
> +
> +These properties (currently used on PowerPC and arm64) indicates
> +the base address and the size, respectively, of the reserved memory
> +range for crash dump kernel.

>From this description, it's not clear to me what the (expected)
consumers of this property are, nor what is expected to provide it.

In previous rounds of review, I had assumed that this was used to
describe a preference to the first kernel as to what region of memory
should be used for a subsequent kdump kernel. Looking around, I'm not
sure if I was correct in that assessment.

I see that arch/powerpc seems to consume this property to configure
crashk_res, but it also rewrites it based on crashk_res, presumably for
the benefit of userspace. It's not clear to me how on powerpc the kdump
kernel knows its memory range -- is more DT modification done in the
kernel and/or userspace?

I disagree with modifying this property to expose it to userspace. For
arm64 we should either ensure that /proc/iomem is consistently usable
(and have userspace consistently use it), or we should expose a new file
specifically to expose this information.

Further, I do not think we need this property. It makes more sense to me
for the preference of a a region to be described to the *first* kernel
using the command line consistently.

So I think we should drop this property, and not use it on arm64. Please
document this as powerpc only.

> +e.g.
> +
> +/ {
> +	chosen {
> +		linux,crashkernel-base = <0x9 0xf0000000>;
> +		linux,crashkernel-size = <0x0 0x10000000>;
> +	};
> +};

> +
> +linux,usable-memory-range
> +-------------------------
> +
> +This property (currently used only on arm64) holds the memory range,
> +the base address and the size, which can be used as system ram on
> +the *current* kernel. Note that, if this property is present, any memory
> +regions under "memory" nodes in DT blob or ones marked as "conventional
> +memory" in EFI memory map should be ignored.

Could you please replace this with:

  This property (arm64 only) holds a base address and size, describing a
  limited region in which memory may be considered available for use by
  the kernel. Memory outside of this range is not available for use.
  
  This property describes a limitation: memory within this range is only
  valid when also described through another mechanism that the kernel
  would otherwise use to determine available memory (e.g. memory nodes
  or the EFI memory map). Valid memory may be sparse within the range.

To clarify why we need this, given by above comments w.r.r. the
linux,crashkernel-* properties:

* It preserves all the original memory map information (e.g. memory
  nodes and/or EFI memory map)

* It works consistently, regardless of how the kdump kernel would
  otherwise determine which memory to use (memory nodes, EFI, etc).

* It will be simply and reliable for an in-kernel purgatory to insert,
  if we need a kexec_file_load()-based kdump (e.g. without requiring
  memory map rewrites, and avoiding clashes with command line
  parameters). For a first kernel, this is not as big a concern.

> +linux,elfcorehdr
> +----------------
> +
> +This property (currently used only on arm64) holds the memory range,
> +the address and the size, of the elf core header which mainly describes
> +the panicked kernel's memory layout as PT_LOAD segments of elf format.
> +e.g.
> +
> +/ {
> +	chosen {
> +		linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
> +	};
> +};

This property looks fine to me.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-12 15:09   ` Mark Rutland
@ 2017-01-13  8:16     ` AKASHI Takahiro
  2017-01-13 11:39       ` Mark Rutland
  0 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-13  8:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Thu, Jan 12, 2017 at 03:09:26PM +0000, Mark Rutland wrote:
> Hi,
> 
> As a general note, I must apologise for my minimial review of the series
> until this point. Judging by the way the DT parts are organised. I'm
> very concerned with the way the DT parts are organised, and clearly I
> did not communicate my concerns and suggestions effectively in prior
> rounds of review.
> 
> On Wed, Dec 28, 2016 at 01:36:00PM +0900, AKASHI Takahiro wrote:
> > "crashkernel=" kernel parameter specifies the size (and optionally
> > the start address) of the system ram used by crash dump kernel.
> > reserve_crashkernel() will allocate and reserve the memory at the startup
> > of primary kernel.
> > 
> > This memory range will be exported to userspace via:
> > 	- an entry named "Crash kernel" in /proc/iomem, and
> > 	- "linux,crashkernel-base" and "linux,crashkernel-size" under
> > 	  /sys/firmware/devicetree/base/chosen
> 
> > +#ifdef CONFIG_KEXEC_CORE
> > +static unsigned long long crash_size, crash_base;
> > +static struct property crash_base_prop = {
> > +	.name = "linux,crashkernel-base",
> > +	.length = sizeof(u64),
> > +	.value = &crash_base
> > +};
> > +static struct property crash_size_prop = {
> > +	.name = "linux,crashkernel-size",
> > +	.length = sizeof(u64),
> > +	.value = &crash_size,
> > +};
> > +
> > +static int __init export_crashkernel(void)
> > +{
> > +	struct device_node *node;
> > +	int ret;
> > +
> > +	if (!crash_size)
> > +		return 0;
> > +
> > +	/* Add /chosen/linux,crashkernel-* properties */
> > +	node = of_find_node_by_path("/chosen");
> > +	if (!node)
> > +		return -ENOENT;
> > +
> > +	/*
> > +	 * There might be existing crash kernel properties, but we can't
> > +	 * be sure what's in them, so remove them.
> > +	 */
> > +	of_remove_property(node, of_find_property(node,
> > +				"linux,crashkernel-base", NULL));
> > +	of_remove_property(node, of_find_property(node,
> > +				"linux,crashkernel-size", NULL));
> > +
> > +	ret = of_add_property(node, &crash_base_prop);
> > +	if (ret)
> > +		goto ret_err;
> > +
> > +	ret = of_add_property(node, &crash_size_prop);
> > +	if (ret)
> > +		goto ret_err;
> > +
> > +	return 0;
> > +
> > +ret_err:
> > +	pr_warn("Exporting crashkernel region to device tree failed\n");
> > +	return ret;
> > +}
> > +late_initcall(export_crashkernel);
> 
> I very much do not like this.
> 
> I don't think we should be modifying the DT exposed to userspace in this
> manner, in the usual boot path, especially given that the kernel itself
> does not appear to be a consumer of this property. I do not think that
> it is right to use the DT exposed to userspace as a communication
> channel solely between the kernel and userspace.

As you mentioned in your comments against my patch#9, this property
originates from PPC implementation.
I added it solely from the sympathy for dt-based architectures.

> So I think we should drop the above, and for arm64 have userspace
> consistently use /proc/iomem (or perhaps a new kexec-specific file) to
> determine the region reserved for the crash kernel, if it needs to know
> this.

As a matter of fact, my port of kexec-tools doesn't check this property
and dropping it won't cause any problem.

> I'll have further comments on this front in the binding patch.
> 
> > +/*
> > + * reserve_crashkernel() - reserves memory for crash kernel
> > + *
> > + * This function reserves memory area given in "crashkernel=" kernel command
> > + * line parameter. The memory reserved is used by dump capture kernel when
> > + * primary kernel is crashing.
> > + */
> > +static void __init reserve_crashkernel(void)
> > +{
> > +	int ret;
> > +
> > +	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > +				&crash_size, &crash_base);
> > +	/* no crashkernel= or invalid value specified */
> > +	if (ret || !crash_size)
> > +		return;
> > +
> > +	if (crash_base == 0) {
> > +		/* Current arm64 boot protocol requires 2MB alignment */
> > +		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
> > +				crash_size, SZ_2M);
> > +		if (crash_base == 0) {
> > +			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > +				crash_size);
> > +			return;
> > +		}
> > +	} else {
> > +		/* User specifies base address explicitly. */
> > +		if (!memblock_is_region_memory(crash_base, crash_size) ||
> > +			memblock_is_region_reserved(crash_base, crash_size)) {
> > +			pr_warn("crashkernel has wrong address or size\n");
> > +			return;
> > +		}
> > +
> > +		if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > +			pr_warn("crashkernel base address is not 2MB aligned\n");
> > +			return;
> > +		}
> > +	}
> > +	memblock_reserve(crash_base, crash_size);
> 
> This will mean that the crash kernel will have a permanent alias in the linear
> map which is vulnerable to being clobbered. There could also be issues
> with mismatched attributes in future.

Good point, I've never thought of that except making the memblock
region "reserved."

> We're probably ok for now, but in future we'll likely want to fix this
> up to remove the region (or mark it nomap), and only map it temporarily
> when loading things into the region.

Well, I found that the following commit is already in:
        commit 9b492cf58077
        Author: Xunlei Pang <xlpang@redhat.com>
        Date:   Mon May 23 16:24:10 2016 -0700

            kexec: introduce a protection mechanism for the crashkernel
            reserved memory

To make best use of this framework, I'd like to re-use set_memory_ro/rx()
instead of removing the region from linear mapping. But to do so,
we need to
* make memblock_isolate_range() global,
* allow set_memory_ro/rx() to be applied to regions in linear mapping
since set_memory_ro/rx() works only on page-level mappings.

What do you think?
(See my tentative solution below.)

> > +
> > +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > +		crash_size >> 20, crash_base >> 20);
> > +
> > +	crashk_res.start = crash_base;
> > +	crashk_res.end = crash_base + crash_size - 1;
> > +}
> > +#else
> > +static void __init reserve_crashkernel(void)
> > +{
> > +	;
> 
> Nit: the ';' line can go.

OK

Thanks,
-Takahiro AKASHI

> > +}
> > +#endif /* CONFIG_KEXEC_CORE */
> > +
> >  /*
> >   * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
> >   * currently assumes that for memory starting above 4G, 32-bit devices will
> > @@ -331,6 +438,9 @@ void __init arm64_memblock_init(void)
> >  		arm64_dma_phys_limit = max_zone_dma_phys();
> >  	else
> >  		arm64_dma_phys_limit = PHYS_MASK + 1;
> > +
> > +	reserve_crashkernel();
> > +
> >  	dma_contiguous_reserve(arm64_dma_phys_limit);
> >  
> >  	memblock_allow_resize();
> > -- 
> > 2.11.0
> 
> Other than my comments regarding the DT usage above, this looks fine to
> me.
> 
> Thanks,
> Mark.

===8<===
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index c0fc3d458195..bb21c0473b8e 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -211,6 +211,44 @@ void machine_kexec(struct kimage *kimage)
 	BUG(); /* Should never get here. */
 }
 
+static int kexec_mark_range(unsigned long start, unsigned long end,
+							bool protect)
+{
+	unsigned int nr_pages;
+
+	if (!end || start >= end)
+		return 0;
+
+	nr_pages = (end >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
+
+	if (protect)
+		return set_memory_ro(__phys_to_virt(start), nr_pages);
+	else
+		return set_memory_rw(__phys_to_virt(start), nr_pages);
+}
+
+static void kexec_mark_crashkres(bool protect)
+{
+	unsigned long control;
+
+	/* Don't touch the control code page used in crash_kexec().*/
+	control = page_to_phys(kexec_crash_image->control_code_page);
+	kexec_mark_range(crashk_res.start, control - 1, protect);
+
+	control += KEXEC_CONTROL_PAGE_SIZE;
+	kexec_mark_range(control, crashk_res.end, protect);
+}
+
+void arch_kexec_protect_crashkres(void)
+{
+	kexec_mark_crashkres(true);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+	kexec_mark_crashkres(false);
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
 	unsigned int i;
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 569ec3325bc8..764ec89c4f76 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -90,6 +90,7 @@ early_param("initrd", early_initrd);
 static void __init reserve_crashkernel(void)
 {
 	unsigned long long crash_size, crash_base;
+	int start_rgn, end_rgn;
 	int ret;
 
 	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
@@ -121,6 +122,9 @@ static void __init reserve_crashkernel(void)
 		}
 	}
 	memblock_reserve(crash_base, crash_size);
+	memblock_isolate_range(&memblock.memory, crash_base, crash_size,
+			&start_rgn, &end_rgn);
+
 
 	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
 		crash_size >> 20, crash_base >> 20);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e43184e..0f60f19c287b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/kexec.h>
 #include <linux/libfdt.h>
 #include <linux/mman.h>
 #include <linux/nodemask.h>
@@ -362,6 +363,17 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	unsigned long kernel_start = __pa(_text);
 	unsigned long kernel_end = __pa(__init_begin);
 
+#ifdef CONFIG_KEXEC_CORE
+	if (crashk_res.end && start >= crashk_res.start &&
+			end <= (crashk_res.end + 1)) {
+		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
+				     end - start, PAGE_KERNEL,
+				     early_pgtable_alloc,
+				     true);
+		return;
+	}
+#endif
+
 	/*
 	 * Take care not to create a writable alias for the
 	 * read-only text and rodata sections of the kernel image.
===>8===

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2017-01-12 15:39   ` Mark Rutland
@ 2017-01-13  9:13     ` AKASHI Takahiro
  2017-01-13 11:17       ` Mark Rutland
  0 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-13  9:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 12, 2017 at 03:39:45PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Wed, Dec 28, 2016 at 01:37:34PM +0900, AKASHI Takahiro wrote:
> > From: James Morse <james.morse@arm.com>
> > 
> > Add documentation for
> > 	linux,crashkernel-base and crashkernel-size,
> > 	linux,usable-memory-range
> > 	linux,elfcorehdr
> > used by arm64 kdump to decribe the kdump reserved area, and
> > the elfcorehdr's location within it.
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > [takahiro.akashi at linaro.org: added "linux,crashkernel-base" and "-size" ]
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Cc: devicetree at vger.kernel.org
> > Cc: Rob Herring <robh+dt@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > ---
> >  Documentation/devicetree/bindings/chosen.txt | 50 ++++++++++++++++++++++++++++
> >  1 file changed, 50 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
> > index 6ae9d82d4c37..7b115165e9ec 100644
> > --- a/Documentation/devicetree/bindings/chosen.txt
> > +++ b/Documentation/devicetree/bindings/chosen.txt
> > @@ -52,3 +52,53 @@ This property is set (currently only on PowerPC, and only needed on
> >  book3e) by some versions of kexec-tools to tell the new kernel that it
> >  is being booted by kexec, as the booting environment may differ (e.g.
> >  a different secondary CPU release mechanism)
> > +
> > +linux,crashkernel-base
> > +linux,crashkernel-size
> > +----------------------
> > +
> > +These properties (currently used on PowerPC and arm64) indicates
> > +the base address and the size, respectively, of the reserved memory
> > +range for crash dump kernel.
> 
> From this description, it's not clear to me what the (expected)
> consumers of this property are, nor what is expected to provide it.
> 
> In previous rounds of review, I had assumed that this was used to
> describe a preference to the first kernel as to what region of memory
> should be used for a subsequent kdump kernel. Looking around, I'm not
> sure if I was correct in that assessment.
> 
> I see that arch/powerpc seems to consume this property to configure
> crashk_res, but it also rewrites it based on crashk_res, presumably for
> the benefit of userspace. It's not clear to me how on powerpc the kdump
> kernel knows its memory range -- is more DT modification done in the
> kernel and/or userspace?

I don't believe that powerpc will rewrite the property any way.
As far as I know from *the source code*, powerpc kernel retrieves
the memory range for crash dump kernel from a kernel command line, i.e.
crashkernel=, and then exposes it through DT to userspace (assuming
kexec-tools).

> I disagree with modifying this property to expose it to userspace. For

Apart from the context of discussions, is this a shared consensus?

> arm64 we should either ensure that /proc/iomem is consistently usable
> (and have userspace consistently use it), or we should expose a new file
> specifically to expose this information.

The thing that I had in my mind when adding this property is that
/proc/iomem would be obsolete in the future, then we should have
an alternative in hand.

> Further, I do not think we need this property. It makes more sense to me
> for the preference of a a region to be described to the *first* kernel
> using the command line consistently.
> 
> So I think we should drop this property, and not use it on arm64. Please
> document this as powerpc only.

OK, but if we drop the property from arm64 code, we have no reason
to leave its description in this patch.
(In fact, there are a few more (undocumented) properties that only ppc
uses for kdump.)

> > +e.g.
> > +
> > +/ {
> > +	chosen {
> > +		linux,crashkernel-base = <0x9 0xf0000000>;
> > +		linux,crashkernel-size = <0x0 0x10000000>;
> > +	};
> > +};
> 
> > +
> > +linux,usable-memory-range
> > +-------------------------
> > +
> > +This property (currently used only on arm64) holds the memory range,
> > +the base address and the size, which can be used as system ram on
> > +the *current* kernel. Note that, if this property is present, any memory
> > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > +memory" in EFI memory map should be ignored.
> 
> Could you please replace this with:
> 
>   This property (arm64 only) holds a base address and size, describing a
>   limited region in which memory may be considered available for use by
>   the kernel. Memory outside of this range is not available for use.
>   
>   This property describes a limitation: memory within this range is only
>   valid when also described through another mechanism that the kernel
>   would otherwise use to determine available memory (e.g. memory nodes
>   or the EFI memory map). Valid memory may be sparse within the range.

Sure.

Thanks,
-Takahiro AKASHI

> To clarify why we need this, given by above comments w.r.r. the
> linux,crashkernel-* properties:
> 
> * It preserves all the original memory map information (e.g. memory
>   nodes and/or EFI memory map)
> 
> * It works consistently, regardless of how the kdump kernel would
>   otherwise determine which memory to use (memory nodes, EFI, etc).
> 
> * It will be simply and reliable for an in-kernel purgatory to insert,
>   if we need a kexec_file_load()-based kdump (e.g. without requiring
>   memory map rewrites, and avoiding clashes with command line
>   parameters). For a first kernel, this is not as big a concern.
> 
> > +linux,elfcorehdr
> > +----------------
> > +
> > +This property (currently used only on arm64) holds the memory range,
> > +the address and the size, of the elf core header which mainly describes
> > +the panicked kernel's memory layout as PT_LOAD segments of elf format.
> > +e.g.
> > +
> > +/ {
> > +	chosen {
> > +		linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
> > +	};
> > +};
> 
> This property looks fine to me.
> 
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2017-01-13  9:13     ` AKASHI Takahiro
@ 2017-01-13 11:17       ` Mark Rutland
  2017-01-16  8:25         ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2017-01-13 11:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 13, 2017 at 06:13:49PM +0900, AKASHI Takahiro wrote:
> On Thu, Jan 12, 2017 at 03:39:45PM +0000, Mark Rutland wrote:
> > On Wed, Dec 28, 2016 at 01:37:34PM +0900, AKASHI Takahiro wrote:
> > > +linux,crashkernel-base
> > > +linux,crashkernel-size
> > > +----------------------
> > > +
> > > +These properties (currently used on PowerPC and arm64) indicates
> > > +the base address and the size, respectively, of the reserved memory
> > > +range for crash dump kernel.
> > 
> > From this description, it's not clear to me what the (expected)
> > consumers of this property are, nor what is expected to provide it.
> > 
> > In previous rounds of review, I had assumed that this was used to
> > describe a preference to the first kernel as to what region of memory
> > should be used for a subsequent kdump kernel. Looking around, I'm not
> > sure if I was correct in that assessment.
> > 
> > I see that arch/powerpc seems to consume this property to configure
> > crashk_res, but it also rewrites it based on crashk_res, presumably for
> > the benefit of userspace. It's not clear to me how on powerpc the kdump
> > kernel knows its memory range -- is more DT modification done in the
> > kernel and/or userspace?
> 
> I don't believe that powerpc will rewrite the property any way.
> As far as I know from *the source code*, powerpc kernel retrieves
> the memory range for crash dump kernel from a kernel command line, i.e.
> crashkernel=, and then exposes it through DT to userspace (assuming
> kexec-tools).

The rewriting I describe is in export_crashk_values() in
arch/powerpc/kernel/machine_kexec.c, where the code deletes existing the
properties, and adds new ones, to the DT exposed to userspace.

So I think we're just quibbling over the definition of "rewrite".

> > arm64 we should either ensure that /proc/iomem is consistently usable
> > (and have userspace consistently use it), or we should expose a new file
> > specifically to expose this information.
> 
> The thing that I had in my mind when adding this property is that
> /proc/iomem would be obsolete in the future, then we should have
> an alternative in hand.

Ok.

My disagreement is with using the DT as a channel to convey information
from the kernel to userspace.

I'm more than happy for a new file or other mechanism to express this
information. For example, we could add
/sys/kernel/kexec_crash_{base,size} or similar.


> > Further, I do not think we need this property. It makes more sense to me
> > for the preference of a a region to be described to the *first* kernel
> > using the command line consistently.
> > 
> > So I think we should drop this property, and not use it on arm64. Please
> > document this as powerpc only.
> 
> OK, but if we drop the property from arm64 code, we have no reason
> to leave its description in this patch.
> (In fact, there are a few more (undocumented) properties that only ppc
> uses for kdump.)

I'm happy to drop it, then.

> > > +linux,usable-memory-range
> > > +-------------------------
> > > +
> > > +This property (currently used only on arm64) holds the memory range,
> > > +the base address and the size, which can be used as system ram on
> > > +the *current* kernel. Note that, if this property is present, any memory
> > > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > > +memory" in EFI memory map should be ignored.
> > 
> > Could you please replace this with:
> > 
> >   This property (arm64 only) holds a base address and size, describing a
> >   limited region in which memory may be considered available for use by
> >   the kernel. Memory outside of this range is not available for use.
> >   
> >   This property describes a limitation: memory within this range is only
> >   valid when also described through another mechanism that the kernel
> >   would otherwise use to determine available memory (e.g. memory nodes
> >   or the EFI memory map). Valid memory may be sparse within the range.
> 
> Sure.

Cheers!

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-13  8:16     ` AKASHI Takahiro
@ 2017-01-13 11:39       ` Mark Rutland
  2017-01-17  8:20         ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2017-01-13 11:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 13, 2017 at 05:16:18PM +0900, AKASHI Takahiro wrote:
> On Thu, Jan 12, 2017 at 03:09:26PM +0000, Mark Rutland wrote:
> > > +static int __init export_crashkernel(void)

> > > +	/* Add /chosen/linux,crashkernel-* properties */

> > > +	of_remove_property(node, of_find_property(node,
> > > +				"linux,crashkernel-base", NULL));
> > > +	of_remove_property(node, of_find_property(node,
> > > +				"linux,crashkernel-size", NULL));
> > > +
> > > +	ret = of_add_property(node, &crash_base_prop);
> > > +	if (ret)
> > > +		goto ret_err;
> > > +
> > > +	ret = of_add_property(node, &crash_size_prop);
> > > +	if (ret)
> > > +		goto ret_err;

> > I very much do not like this.
> > 
> > I don't think we should be modifying the DT exposed to userspace in this
> > manner, in the usual boot path, especially given that the kernel itself
> > does not appear to be a consumer of this property. I do not think that
> > it is right to use the DT exposed to userspace as a communication
> > channel solely between the kernel and userspace.
> 
> As you mentioned in your comments against my patch#9, this property
> originates from PPC implementation.
> I added it solely from the sympathy for dt-based architectures.
>
> > So I think we should drop the above, and for arm64 have userspace
> > consistently use /proc/iomem (or perhaps a new kexec-specific file) to
> > determine the region reserved for the crash kernel, if it needs to know
> > this.
> 
> As a matter of fact, my port of kexec-tools doesn't check this property
> and dropping it won't cause any problem.

Ok. It sounds like we're both happy for this to go, then.

While it's unfortunate that architectures differ, I think we have
legitimate reasons to differ, and it's preferable to do so. We have a
different set of constraints (e.g. supporting EFI memory maps), and
following the PPC approach creates longer term issues for us, making it
harder to do the right thing consistently.

> > > +/*
> > > + * reserve_crashkernel() - reserves memory for crash kernel
> > > + *
> > > + * This function reserves memory area given in "crashkernel=" kernel command
> > > + * line parameter. The memory reserved is used by dump capture kernel when
> > > + * primary kernel is crashing.
> > > + */
> > > +static void __init reserve_crashkernel(void)

> > > +	memblock_reserve(crash_base, crash_size);
> > 
> > This will mean that the crash kernel will have a permanent alias in the linear
> > map which is vulnerable to being clobbered. There could also be issues
> > with mismatched attributes in future.
> 
> Good point, I've never thought of that except making the memblock
> region "reserved."
> 
> > We're probably ok for now, but in future we'll likely want to fix this
> > up to remove the region (or mark it nomap), and only map it temporarily
> > when loading things into the region.
> 
> Well, I found that the following commit is already in:
>         commit 9b492cf58077
>         Author: Xunlei Pang <xlpang@redhat.com>
>         Date:   Mon May 23 16:24:10 2016 -0700
> 
>             kexec: introduce a protection mechanism for the crashkernel
>             reserved memory
> 
> To make best use of this framework, I'd like to re-use set_memory_ro/rx()
> instead of removing the region from linear mapping. But to do so,
> we need to
> * make memblock_isolate_range() global,
> * allow set_memory_ro/rx() to be applied to regions in linear mapping
> since set_memory_ro/rx() works only on page-level mappings.
> 
> What do you think?
> (See my tentative solution below.)

Great! I think it would be better to follow the approach of
mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise,
it looks like it should work.

Either way, this still leaves us with an RO alias on crashed cores (and
potential cache attribute mismatches in future). Do we need to read from
the region later, or could we unmap it entirely?

Thanks,
Mark.

> ===8<===
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index c0fc3d458195..bb21c0473b8e 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -211,6 +211,44 @@ void machine_kexec(struct kimage *kimage)
>  	BUG(); /* Should never get here. */
>  }
>  
> +static int kexec_mark_range(unsigned long start, unsigned long end,
> +							bool protect)
> +{
> +	unsigned int nr_pages;
> +
> +	if (!end || start >= end)
> +		return 0;
> +
> +	nr_pages = (end >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
> +
> +	if (protect)
> +		return set_memory_ro(__phys_to_virt(start), nr_pages);
> +	else
> +		return set_memory_rw(__phys_to_virt(start), nr_pages);
> +}
> +
> +static void kexec_mark_crashkres(bool protect)
> +{
> +	unsigned long control;
> +
> +	/* Don't touch the control code page used in crash_kexec().*/
> +	control = page_to_phys(kexec_crash_image->control_code_page);
> +	kexec_mark_range(crashk_res.start, control - 1, protect);
> +
> +	control += KEXEC_CONTROL_PAGE_SIZE;
> +	kexec_mark_range(control, crashk_res.end, protect);
> +}
> +
> +void arch_kexec_protect_crashkres(void)
> +{
> +	kexec_mark_crashkres(true);
> +}
> +
> +void arch_kexec_unprotect_crashkres(void)
> +{
> +	kexec_mark_crashkres(false);
> +}
> +
>  static void machine_kexec_mask_interrupts(void)
>  {
>  	unsigned int i;
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 569ec3325bc8..764ec89c4f76 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -90,6 +90,7 @@ early_param("initrd", early_initrd);
>  static void __init reserve_crashkernel(void)
>  {
>  	unsigned long long crash_size, crash_base;
> +	int start_rgn, end_rgn;
>  	int ret;
>  
>  	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> @@ -121,6 +122,9 @@ static void __init reserve_crashkernel(void)
>  		}
>  	}
>  	memblock_reserve(crash_base, crash_size);
> +	memblock_isolate_range(&memblock.memory, crash_base, crash_size,
> +			&start_rgn, &end_rgn);
> +
>  
>  	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
>  		crash_size >> 20, crash_base >> 20);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17243e43184e..0f60f19c287b 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -22,6 +22,7 @@
>  #include <linux/kernel.h>
>  #include <linux/errno.h>
>  #include <linux/init.h>
> +#include <linux/kexec.h>
>  #include <linux/libfdt.h>
>  #include <linux/mman.h>
>  #include <linux/nodemask.h>
> @@ -362,6 +363,17 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	unsigned long kernel_start = __pa(_text);
>  	unsigned long kernel_end = __pa(__init_begin);
>  
> +#ifdef CONFIG_KEXEC_CORE
> +	if (crashk_res.end && start >= crashk_res.start &&
> +			end <= (crashk_res.end + 1)) {
> +		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
> +				     end - start, PAGE_KERNEL,
> +				     early_pgtable_alloc,
> +				     true);
> +		return;
> +	}
> +#endif
> +
>  	/*
>  	 * Take care not to create a writable alias for the
>  	 * read-only text and rodata sections of the kernel image.
> ===>8===

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2017-01-13 11:17       ` Mark Rutland
@ 2017-01-16  8:25         ` AKASHI Takahiro
  2017-01-17  8:26           ` Dave Young
  2017-01-17 11:13           ` Mark Rutland
  0 siblings, 2 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-16  8:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 13, 2017 at 11:17:56AM +0000, Mark Rutland wrote:
> On Fri, Jan 13, 2017 at 06:13:49PM +0900, AKASHI Takahiro wrote:
> > On Thu, Jan 12, 2017 at 03:39:45PM +0000, Mark Rutland wrote:
> > > On Wed, Dec 28, 2016 at 01:37:34PM +0900, AKASHI Takahiro wrote:
> > > > +linux,crashkernel-base
> > > > +linux,crashkernel-size
> > > > +----------------------
> > > > +
> > > > +These properties (currently used on PowerPC and arm64) indicates
> > > > +the base address and the size, respectively, of the reserved memory
> > > > +range for crash dump kernel.
> > > 
> > > From this description, it's not clear to me what the (expected)
> > > consumers of this property are, nor what is expected to provide it.
> > > 
> > > In previous rounds of review, I had assumed that this was used to
> > > describe a preference to the first kernel as to what region of memory
> > > should be used for a subsequent kdump kernel. Looking around, I'm not
> > > sure if I was correct in that assessment.
> > > 
> > > I see that arch/powerpc seems to consume this property to configure
> > > crashk_res, but it also rewrites it based on crashk_res, presumably for
> > > the benefit of userspace. It's not clear to me how on powerpc the kdump
> > > kernel knows its memory range -- is more DT modification done in the
> > > kernel and/or userspace?
> > 
> > I don't believe that powerpc will rewrite the property any way.
> > As far as I know from *the source code*, powerpc kernel retrieves
> > the memory range for crash dump kernel from a kernel command line, i.e.
> > crashkernel=, and then exposes it through DT to userspace (assuming
> > kexec-tools).
> 
> The rewriting I describe is in export_crashk_values() in
> arch/powerpc/kernel/machine_kexec.c, where the code deletes existing the
> properties, and adds new ones, to the DT exposed to userspace.
> 
> So I think we're just quibbling over the definition of "rewrite".

Gotcha

> > > arm64 we should either ensure that /proc/iomem is consistently usable
> > > (and have userspace consistently use it), or we should expose a new file
> > > specifically to expose this information.
> > 
> > The thing that I had in my mind when adding this property is that
> > /proc/iomem would be obsolete in the future, then we should have
> > an alternative in hand.
> 
> Ok.
> 
> My disagreement is with using the DT as a channel to convey information
> from the kernel to userspace.
> 
> I'm more than happy for a new file or other mechanism to express this
> information. For example, we could add
> /sys/kernel/kexec_crash_{base,size} or similar.

It may make sense because /sys/kernel/kexec_crash_size already exists,
so why not kexec_crash_base?
My concern, however, is that this kind of interface might prevent us from
allowing multiple regions to be reserved for crash dump kernel in the future.
(There is an assumption that we have only one region at least on arm64 though.)

Thanks,
-Takahiro AKASHI

> 
> > > Further, I do not think we need this property. It makes more sense to me
> > > for the preference of a a region to be described to the *first* kernel
> > > using the command line consistently.
> > > 
> > > So I think we should drop this property, and not use it on arm64. Please
> > > document this as powerpc only.
> > 
> > OK, but if we drop the property from arm64 code, we have no reason
> > to leave its description in this patch.
> > (In fact, there are a few more (undocumented) properties that only ppc
> > uses for kdump.)
> 
> I'm happy to drop it, then.
> 
> > > > +linux,usable-memory-range
> > > > +-------------------------
> > > > +
> > > > +This property (currently used only on arm64) holds the memory range,
> > > > +the base address and the size, which can be used as system ram on
> > > > +the *current* kernel. Note that, if this property is present, any memory
> > > > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > > > +memory" in EFI memory map should be ignored.
> > > 
> > > Could you please replace this with:
> > > 
> > >   This property (arm64 only) holds a base address and size, describing a
> > >   limited region in which memory may be considered available for use by
> > >   the kernel. Memory outside of this range is not available for use.
> > >   
> > >   This property describes a limitation: memory within this range is only
> > >   valid when also described through another mechanism that the kernel
> > >   would otherwise use to determine available memory (e.g. memory nodes
> > >   or the EFI memory map). Valid memory may be sparse within the range.
> > 
> > Sure.
> 
> Cheers!
> 
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-13 11:39       ` Mark Rutland
@ 2017-01-17  8:20         ` AKASHI Takahiro
  2017-01-17 11:54           ` Mark Rutland
  0 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-17  8:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:
> On Fri, Jan 13, 2017 at 05:16:18PM +0900, AKASHI Takahiro wrote:
> > On Thu, Jan 12, 2017 at 03:09:26PM +0000, Mark Rutland wrote:
> > > > +static int __init export_crashkernel(void)
> 
> > > > +	/* Add /chosen/linux,crashkernel-* properties */
> 
> > > > +	of_remove_property(node, of_find_property(node,
> > > > +				"linux,crashkernel-base", NULL));
> > > > +	of_remove_property(node, of_find_property(node,
> > > > +				"linux,crashkernel-size", NULL));
> > > > +
> > > > +	ret = of_add_property(node, &crash_base_prop);
> > > > +	if (ret)
> > > > +		goto ret_err;
> > > > +
> > > > +	ret = of_add_property(node, &crash_size_prop);
> > > > +	if (ret)
> > > > +		goto ret_err;
> 
> > > I very much do not like this.
> > > 
> > > I don't think we should be modifying the DT exposed to userspace in this
> > > manner, in the usual boot path, especially given that the kernel itself
> > > does not appear to be a consumer of this property. I do not think that
> > > it is right to use the DT exposed to userspace as a communication
> > > channel solely between the kernel and userspace.
> > 
> > As you mentioned in your comments against my patch#9, this property
> > originates from PPC implementation.
> > I added it solely from the sympathy for dt-based architectures.
> >
> > > So I think we should drop the above, and for arm64 have userspace
> > > consistently use /proc/iomem (or perhaps a new kexec-specific file) to
> > > determine the region reserved for the crash kernel, if it needs to know
> > > this.
> > 
> > As a matter of fact, my port of kexec-tools doesn't check this property
> > and dropping it won't cause any problem.
> 
> Ok. It sounds like we're both happy for this to go, then.
> 
> While it's unfortunate that architectures differ, I think we have
> legitimate reasons to differ, and it's preferable to do so. We have a
> different set of constraints (e.g. supporting EFI memory maps), and
> following the PPC approach creates longer term issues for us, making it
> harder to do the right thing consistently.
> 
> > > > +/*
> > > > + * reserve_crashkernel() - reserves memory for crash kernel
> > > > + *
> > > > + * This function reserves memory area given in "crashkernel=" kernel command
> > > > + * line parameter. The memory reserved is used by dump capture kernel when
> > > > + * primary kernel is crashing.
> > > > + */
> > > > +static void __init reserve_crashkernel(void)
> 
> > > > +	memblock_reserve(crash_base, crash_size);
> > > 
> > > This will mean that the crash kernel will have a permanent alias in the linear
> > > map which is vulnerable to being clobbered. There could also be issues
> > > with mismatched attributes in future.
> > 
> > Good point, I've never thought of that except making the memblock
> > region "reserved."
> > 
> > > We're probably ok for now, but in future we'll likely want to fix this
> > > up to remove the region (or mark it nomap), and only map it temporarily
> > > when loading things into the region.
> > 
> > Well, I found that the following commit is already in:
> >         commit 9b492cf58077
> >         Author: Xunlei Pang <xlpang@redhat.com>
> >         Date:   Mon May 23 16:24:10 2016 -0700
> > 
> >             kexec: introduce a protection mechanism for the crashkernel
> >             reserved memory
> > 
> > To make best use of this framework, I'd like to re-use set_memory_ro/rx()
> > instead of removing the region from linear mapping. But to do so,
> > we need to
> > * make memblock_isolate_range() global,
> > * allow set_memory_ro/rx() to be applied to regions in linear mapping
> > since set_memory_ro/rx() works only on page-level mappings.
> > 
> > What do you think?
> > (See my tentative solution below.)
> 
> Great! I think it would be better to follow the approach of
> mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise,
> it looks like it should work.

I'm not quite sure what the approach of mark_rodata_ro() means, but
I found that using create_mapping_late() may cause two problems:

1) it fails when PTE_CONT bits mismatch between an old and new mmu entry.
   This can happen, say, if the memory range for crash dump kernel
   starts in the mid of _continuous_ pages.

2) The control code page, of one-page size, is still written out in
   machine_kexec() which is called at a crash, and this means that
   the range must be writable even after kexec_load(), but
   create_mapping_late() does not handle a case of changing attributes
   for a single page which is in _section_ mapping.
   We cannot make single-page mapping for the control page since the address
   of that page is not determined at the boot time.

As for (1), we need to call memblock_isolate_range() to make the region
an independent one.

> Either way, this still leaves us with an RO alias on crashed cores (and
> potential cache attribute mismatches in future). Do we need to read from
> the region later,

I believe not, but the region must be _writable_ as I mentioned in (2) above.
To avoid this issue, we have to move copying the control code page
to machine_kexec_prepare() which is called in kexec_load() and so
the region is writable anyway then.
I want Geoff to affirm that this change is safe.

(See my second solution below.)

> or could we unmap it entirely?

given the change above, I think we can.
Is there any code to re-use especially for unmapping?

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.
===8<===
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index c0fc3d458195..80a52e9aaf73 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -26,8 +26,6 @@
 extern const unsigned char arm64_relocate_new_kernel[];
 extern const unsigned long arm64_relocate_new_kernel_size;
 
-static unsigned long kimage_start;
-
 /**
  * kexec_image_info - For debugging output.
  */
@@ -68,7 +66,7 @@ void machine_kexec_cleanup(struct kimage *kimage)
  */
 int machine_kexec_prepare(struct kimage *kimage)
 {
-	kimage_start = kimage->start;
+	void *reboot_code_buffer;
 
 	kexec_image_info(kimage);
 
@@ -77,6 +75,21 @@ int machine_kexec_prepare(struct kimage *kimage)
 		return -EBUSY;
 	}
 
+	reboot_code_buffer =
+			phys_to_virt(page_to_phys(kimage->control_code_page));
+
+	/*
+	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
+	 * after the kernel is shut down.
+	 */
+	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
+		arm64_relocate_new_kernel_size);
+
+	/* Flush the reboot_code_buffer in preparation for its execution. */
+	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
+	flush_icache_range((uintptr_t)reboot_code_buffer,
+		arm64_relocate_new_kernel_size);
+
 	return 0;
 }
 
@@ -147,7 +160,6 @@ static void kexec_segment_flush(const struct kimage *kimage)
 void machine_kexec(struct kimage *kimage)
 {
 	phys_addr_t reboot_code_buffer_phys;
-	void *reboot_code_buffer;
 
 	/*
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
@@ -156,7 +168,6 @@ void machine_kexec(struct kimage *kimage)
 			!WARN_ON(kimage == kexec_crash_image));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
-	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
 
 	kexec_image_info(kimage);
 
@@ -164,26 +175,12 @@ void machine_kexec(struct kimage *kimage)
 		kimage->control_code_page);
 	pr_debug("%s:%d: reboot_code_buffer_phys:  %pa\n", __func__, __LINE__,
 		&reboot_code_buffer_phys);
-	pr_debug("%s:%d: reboot_code_buffer:       %p\n", __func__, __LINE__,
-		reboot_code_buffer);
 	pr_debug("%s:%d: relocate_new_kernel:      %p\n", __func__, __LINE__,
 		arm64_relocate_new_kernel);
 	pr_debug("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n",
 		__func__, __LINE__, arm64_relocate_new_kernel_size,
 		arm64_relocate_new_kernel_size);
 
-	/*
-	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
-	 * after the kernel is shut down.
-	 */
-	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
-		arm64_relocate_new_kernel_size);
-
-	/* Flush the reboot_code_buffer in preparation for its execution. */
-	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
-	flush_icache_range((uintptr_t)reboot_code_buffer,
-		arm64_relocate_new_kernel_size);
-
 	/* Flush the kimage list and its buffers. */
 	kexec_list_flush(kimage);
 
@@ -206,7 +203,7 @@ void machine_kexec(struct kimage *kimage)
 	 */
 
 	cpu_soft_restart(kimage != kexec_crash_image,
-		reboot_code_buffer_phys, kimage->head, kimage_start, 0);
+		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
 
 	BUG(); /* Should never get here. */
 }
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 569ec3325bc8..e4cc170edc0c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -90,6 +90,7 @@ early_param("initrd", early_initrd);
 static void __init reserve_crashkernel(void)
 {
 	unsigned long long crash_size, crash_base;
+	int start_rgn, end_rgn;
 	int ret;
 
 	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
@@ -120,6 +121,8 @@ static void __init reserve_crashkernel(void)
 			return;
 		}
 	}
+	memblock_isolate_range(&memblock.memory, crash_base, crash_size,
+			&start_rgn, &end_rgn);
 	memblock_reserve(crash_base, crash_size);
 
 	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e43184e..b7c75845407a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
 #include <linux/libfdt.h>
 #include <linux/mman.h>
 #include <linux/nodemask.h>
@@ -817,3 +819,27 @@ int pmd_clear_huge(pmd_t *pmd)
 	pmd_clear(pmd);
 	return 1;
 }
+
+#ifdef CONFIG_KEXEC_CORE
+void arch_kexec_protect_crashkres(void)
+{
+	flush_tlb_all();
+
+	create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start),
+			    resource_size(&crashk_res), PAGE_KERNEL_RO);
+
+	/* flush the TLBs after updating live kernel mappings */
+	flush_tlb_all();
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+	flush_tlb_all();
+
+	create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start),
+			    resource_size(&crashk_res), PAGE_KERNEL);
+
+	/* flush the TLBs after updating live kernel mappings */
+	flush_tlb_all();
+}
+#endif
===>8===

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2017-01-16  8:25         ` AKASHI Takahiro
@ 2017-01-17  8:26           ` Dave Young
  2017-01-19  9:01             ` AKASHI Takahiro
  2017-01-17 11:13           ` Mark Rutland
  1 sibling, 1 reply; 39+ messages in thread
From: Dave Young @ 2017-01-17  8:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/16/17 at 05:25pm, AKASHI Takahiro wrote:
> On Fri, Jan 13, 2017 at 11:17:56AM +0000, Mark Rutland wrote:
> > On Fri, Jan 13, 2017 at 06:13:49PM +0900, AKASHI Takahiro wrote:
> > > On Thu, Jan 12, 2017 at 03:39:45PM +0000, Mark Rutland wrote:
> > > > On Wed, Dec 28, 2016 at 01:37:34PM +0900, AKASHI Takahiro wrote:
> > > > > +linux,crashkernel-base
> > > > > +linux,crashkernel-size
> > > > > +----------------------
> > > > > +
> > > > > +These properties (currently used on PowerPC and arm64) indicates
> > > > > +the base address and the size, respectively, of the reserved memory
> > > > > +range for crash dump kernel.
> > > > 
> > > > From this description, it's not clear to me what the (expected)
> > > > consumers of this property are, nor what is expected to provide it.
> > > > 
> > > > In previous rounds of review, I had assumed that this was used to
> > > > describe a preference to the first kernel as to what region of memory
> > > > should be used for a subsequent kdump kernel. Looking around, I'm not
> > > > sure if I was correct in that assessment.
> > > > 
> > > > I see that arch/powerpc seems to consume this property to configure
> > > > crashk_res, but it also rewrites it based on crashk_res, presumably for
> > > > the benefit of userspace. It's not clear to me how on powerpc the kdump
> > > > kernel knows its memory range -- is more DT modification done in the
> > > > kernel and/or userspace?
> > > 
> > > I don't believe that powerpc will rewrite the property any way.
> > > As far as I know from *the source code*, powerpc kernel retrieves
> > > the memory range for crash dump kernel from a kernel command line, i.e.
> > > crashkernel=, and then exposes it through DT to userspace (assuming
> > > kexec-tools).
> > 
> > The rewriting I describe is in export_crashk_values() in
> > arch/powerpc/kernel/machine_kexec.c, where the code deletes existing the
> > properties, and adds new ones, to the DT exposed to userspace.
> > 
> > So I think we're just quibbling over the definition of "rewrite".
> 
> Gotcha
> 
> > > > arm64 we should either ensure that /proc/iomem is consistently usable
> > > > (and have userspace consistently use it), or we should expose a new file
> > > > specifically to expose this information.
> > > 
> > > The thing that I had in my mind when adding this property is that
> > > /proc/iomem would be obsolete in the future, then we should have
> > > an alternative in hand.
> > 
> > Ok.
> > 
> > My disagreement is with using the DT as a channel to convey information
> > from the kernel to userspace.
> > 
> > I'm more than happy for a new file or other mechanism to express this
> > information. For example, we could add
> > /sys/kernel/kexec_crash_{base,size} or similar.
> 
> It may make sense because /sys/kernel/kexec_crash_size already exists,
> so why not kexec_crash_base?
> My concern, however, is that this kind of interface might prevent us from
> allowing multiple regions to be reserved for crash dump kernel in the future.
> (There is an assumption that we have only one region at least on arm64 though.)

In x86 there could be two ranges, one for softiotlb under 4G and another
for range over 4G, but kexec_crash_size only shows the size of
over-4g-range.

It is better to use /proc/iomem, most arches use /proc/iomem. Do you
have any reason why it will be obsolete? At least for the time being it
is fine.

> 
> Thanks,
> -Takahiro AKASHI
> 
> > 
> > > > Further, I do not think we need this property. It makes more sense to me
> > > > for the preference of a a region to be described to the *first* kernel
> > > > using the command line consistently.
> > > > 
> > > > So I think we should drop this property, and not use it on arm64. Please
> > > > document this as powerpc only.
> > > 
> > > OK, but if we drop the property from arm64 code, we have no reason
> > > to leave its description in this patch.
> > > (In fact, there are a few more (undocumented) properties that only ppc
> > > uses for kdump.)
> > 
> > I'm happy to drop it, then.
> > 
> > > > > +linux,usable-memory-range
> > > > > +-------------------------
> > > > > +
> > > > > +This property (currently used only on arm64) holds the memory range,
> > > > > +the base address and the size, which can be used as system ram on
> > > > > +the *current* kernel. Note that, if this property is present, any memory
> > > > > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > > > > +memory" in EFI memory map should be ignored.
> > > > 
> > > > Could you please replace this with:
> > > > 
> > > >   This property (arm64 only) holds a base address and size, describing a
> > > >   limited region in which memory may be considered available for use by
> > > >   the kernel. Memory outside of this range is not available for use.
> > > >   
> > > >   This property describes a limitation: memory within this range is only
> > > >   valid when also described through another mechanism that the kernel
> > > >   would otherwise use to determine available memory (e.g. memory nodes
> > > >   or the EFI memory map). Valid memory may be sparse within the range.
> > > 
> > > Sure.
> > 
> > Cheers!
> > 
> > Thanks,
> > Mark.
> 
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2017-01-16  8:25         ` AKASHI Takahiro
  2017-01-17  8:26           ` Dave Young
@ 2017-01-17 11:13           ` Mark Rutland
  1 sibling, 0 replies; 39+ messages in thread
From: Mark Rutland @ 2017-01-17 11:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 16, 2017 at 05:25:07PM +0900, AKASHI Takahiro wrote:
> On Fri, Jan 13, 2017 at 11:17:56AM +0000, Mark Rutland wrote:
> > On Fri, Jan 13, 2017 at 06:13:49PM +0900, AKASHI Takahiro wrote:
> > > On Thu, Jan 12, 2017 at 03:39:45PM +0000, Mark Rutland wrote:

> > > > arm64 we should either ensure that /proc/iomem is consistently usable
> > > > (and have userspace consistently use it), or we should expose a new file
> > > > specifically to expose this information.
> > > 
> > > The thing that I had in my mind when adding this property is that
> > > /proc/iomem would be obsolete in the future, then we should have
> > > an alternative in hand.
> > 
> > Ok.
> > 
> > My disagreement is with using the DT as a channel to convey information
> > from the kernel to userspace.
> > 
> > I'm more than happy for a new file or other mechanism to express this
> > information. For example, we could add
> > /sys/kernel/kexec_crash_{base,size} or similar.
> 
> It may make sense because /sys/kernel/kexec_crash_size already exists,
> so why not kexec_crash_base?
> My concern, however, is that this kind of interface might prevent us from
> allowing multiple regions to be reserved for crash dump kernel in the future.
> (There is an assumption that we have only one region at least on arm64 though.)

Ok.

If we need to handle that, we should also update the description of
linux,usable-memory-range to allow multiple entries (and probably
s/range/ranges).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-17  8:20         ` AKASHI Takahiro
@ 2017-01-17 11:54           ` Mark Rutland
  2017-01-19  9:49             ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2017-01-17 11:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:
> On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:
> > Great! I think it would be better to follow the approach of
> > mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise,
> > it looks like it should work.
> 
> I'm not quite sure what the approach of mark_rodata_ro() means, but
> I found that using create_mapping_late() may cause two problems:
> 
> 1) it fails when PTE_CONT bits mismatch between an old and new mmu entry.
>    This can happen, say, if the memory range for crash dump kernel
>    starts in the mid of _continuous_ pages.

That should only happen if we try to remap a segment different to what
we originally mapped.

I was intending that we'd explicitly map the reserved region separately
in the boot path, like we do for kernel segments in map_kernel(). We
would allow sections and/or CONT entires. 

Then, in __map_memblock() we'd then skip that range as we do for the
linear map alias of the kernel image.

That way, we can later use create_mapping_late for that same region, and
it should handle sections and/or CONT entries in the exact same way as
it does for the kernel image segments in mark_rodata_ro().

> 2) The control code page, of one-page size, is still written out in
>    machine_kexec() which is called at a crash, and this means that
>    the range must be writable even after kexec_load(), but
>    create_mapping_late() does not handle a case of changing attributes
>    for a single page which is in _section_ mapping.
>    We cannot make single-page mapping for the control page since the address
>    of that page is not determined at the boot time.

That is a problem. I'm not sure I follow how set_memory_*() helps here
though?

> As for (1), we need to call memblock_isolate_range() to make the region
> an independent one.
> 
> > Either way, this still leaves us with an RO alias on crashed cores (and
> > potential cache attribute mismatches in future). Do we need to read from
> > the region later,
> 
> I believe not, but the region must be _writable_ as I mentioned in (2) above.
> To avoid this issue, we have to move copying the control code page
> to machine_kexec_prepare() which is called in kexec_load() and so
> the region is writable anyway then.
> I want Geoff to affirm that this change is safe.
> 
> (See my second solution below.)

>From a quick scan that looks ok.

> > or could we unmap it entirely?
> 
> given the change above, I think we can.

Great!

> Is there any code to re-use especially for unmapping?

I don't think we have much code useful for unmapping. We could re-use 
create_mapping_late for this, passing a set of prot bits that means the
entries are invalid (e.g. have a PAGE_KERNEL_INVALID).

We'd have to perform the TLB invalidation ourselves, but that shouldn't
be too painful.

Thanks,
Mark.

> ===8<===
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index c0fc3d458195..80a52e9aaf73 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -26,8 +26,6 @@
>  extern const unsigned char arm64_relocate_new_kernel[];
>  extern const unsigned long arm64_relocate_new_kernel_size;
>  
> -static unsigned long kimage_start;
> -
>  /**
>   * kexec_image_info - For debugging output.
>   */
> @@ -68,7 +66,7 @@ void machine_kexec_cleanup(struct kimage *kimage)
>   */
>  int machine_kexec_prepare(struct kimage *kimage)
>  {
> -	kimage_start = kimage->start;
> +	void *reboot_code_buffer;
>  
>  	kexec_image_info(kimage);
>  
> @@ -77,6 +75,21 @@ int machine_kexec_prepare(struct kimage *kimage)
>  		return -EBUSY;
>  	}
>  
> +	reboot_code_buffer =
> +			phys_to_virt(page_to_phys(kimage->control_code_page));
> +
> +	/*
> +	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
> +	 * after the kernel is shut down.
> +	 */
> +	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
> +		arm64_relocate_new_kernel_size);
> +
> +	/* Flush the reboot_code_buffer in preparation for its execution. */
> +	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
> +	flush_icache_range((uintptr_t)reboot_code_buffer,
> +		arm64_relocate_new_kernel_size);
> +
>  	return 0;
>  }
>  
> @@ -147,7 +160,6 @@ static void kexec_segment_flush(const struct kimage *kimage)
>  void machine_kexec(struct kimage *kimage)
>  {
>  	phys_addr_t reboot_code_buffer_phys;
> -	void *reboot_code_buffer;
>  
>  	/*
>  	 * New cpus may have become stuck_in_kernel after we loaded the image.
> @@ -156,7 +168,6 @@ void machine_kexec(struct kimage *kimage)
>  			!WARN_ON(kimage == kexec_crash_image));
>  
>  	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
> -	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
>  
>  	kexec_image_info(kimage);
>  
> @@ -164,26 +175,12 @@ void machine_kexec(struct kimage *kimage)
>  		kimage->control_code_page);
>  	pr_debug("%s:%d: reboot_code_buffer_phys:  %pa\n", __func__, __LINE__,
>  		&reboot_code_buffer_phys);
> -	pr_debug("%s:%d: reboot_code_buffer:       %p\n", __func__, __LINE__,
> -		reboot_code_buffer);
>  	pr_debug("%s:%d: relocate_new_kernel:      %p\n", __func__, __LINE__,
>  		arm64_relocate_new_kernel);
>  	pr_debug("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n",
>  		__func__, __LINE__, arm64_relocate_new_kernel_size,
>  		arm64_relocate_new_kernel_size);
>  
> -	/*
> -	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
> -	 * after the kernel is shut down.
> -	 */
> -	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
> -		arm64_relocate_new_kernel_size);
> -
> -	/* Flush the reboot_code_buffer in preparation for its execution. */
> -	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
> -	flush_icache_range((uintptr_t)reboot_code_buffer,
> -		arm64_relocate_new_kernel_size);
> -
>  	/* Flush the kimage list and its buffers. */
>  	kexec_list_flush(kimage);
>  
> @@ -206,7 +203,7 @@ void machine_kexec(struct kimage *kimage)
>  	 */
>  
>  	cpu_soft_restart(kimage != kexec_crash_image,
> -		reboot_code_buffer_phys, kimage->head, kimage_start, 0);
> +		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
>  
>  	BUG(); /* Should never get here. */
>  }
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 569ec3325bc8..e4cc170edc0c 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -90,6 +90,7 @@ early_param("initrd", early_initrd);
>  static void __init reserve_crashkernel(void)
>  {
>  	unsigned long long crash_size, crash_base;
> +	int start_rgn, end_rgn;
>  	int ret;
>  
>  	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> @@ -120,6 +121,8 @@ static void __init reserve_crashkernel(void)
>  			return;
>  		}
>  	}
> +	memblock_isolate_range(&memblock.memory, crash_base, crash_size,
> +			&start_rgn, &end_rgn);
>  	memblock_reserve(crash_base, crash_size);
>  
>  	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17243e43184e..b7c75845407a 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -22,6 +22,8 @@
>  #include <linux/kernel.h>
>  #include <linux/errno.h>
>  #include <linux/init.h>
> +#include <linux/ioport.h>
> +#include <linux/kexec.h>
>  #include <linux/libfdt.h>
>  #include <linux/mman.h>
>  #include <linux/nodemask.h>
> @@ -817,3 +819,27 @@ int pmd_clear_huge(pmd_t *pmd)
>  	pmd_clear(pmd);
>  	return 1;
>  }
> +
> +#ifdef CONFIG_KEXEC_CORE
> +void arch_kexec_protect_crashkres(void)
> +{
> +	flush_tlb_all();
> +
> +	create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start),
> +			    resource_size(&crashk_res), PAGE_KERNEL_RO);
> +
> +	/* flush the TLBs after updating live kernel mappings */
> +	flush_tlb_all();
> +}
> +
> +void arch_kexec_unprotect_crashkres(void)
> +{
> +	flush_tlb_all();
> +
> +	create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start),
> +			    resource_size(&crashk_res), PAGE_KERNEL);
> +
> +	/* flush the TLBs after updating live kernel mappings */
> +	flush_tlb_all();
> +}
> +#endif
> ===>8===

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump
  2017-01-17  8:26           ` Dave Young
@ 2017-01-19  9:01             ` AKASHI Takahiro
  0 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-19  9:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 04:26:29PM +0800, Dave Young wrote:
> On 01/16/17 at 05:25pm, AKASHI Takahiro wrote:
> > On Fri, Jan 13, 2017 at 11:17:56AM +0000, Mark Rutland wrote:
> > > On Fri, Jan 13, 2017 at 06:13:49PM +0900, AKASHI Takahiro wrote:
> > > > On Thu, Jan 12, 2017 at 03:39:45PM +0000, Mark Rutland wrote:
> > > > > On Wed, Dec 28, 2016 at 01:37:34PM +0900, AKASHI Takahiro wrote:
> > > > > > +linux,crashkernel-base
> > > > > > +linux,crashkernel-size
> > > > > > +----------------------
> > > > > > +
> > > > > > +These properties (currently used on PowerPC and arm64) indicates
> > > > > > +the base address and the size, respectively, of the reserved memory
> > > > > > +range for crash dump kernel.
> > > > > 
> > > > > From this description, it's not clear to me what the (expected)
> > > > > consumers of this property are, nor what is expected to provide it.
> > > > > 
> > > > > In previous rounds of review, I had assumed that this was used to
> > > > > describe a preference to the first kernel as to what region of memory
> > > > > should be used for a subsequent kdump kernel. Looking around, I'm not
> > > > > sure if I was correct in that assessment.
> > > > > 
> > > > > I see that arch/powerpc seems to consume this property to configure
> > > > > crashk_res, but it also rewrites it based on crashk_res, presumably for
> > > > > the benefit of userspace. It's not clear to me how on powerpc the kdump
> > > > > kernel knows its memory range -- is more DT modification done in the
> > > > > kernel and/or userspace?
> > > > 
> > > > I don't believe that powerpc will rewrite the property any way.
> > > > As far as I know from *the source code*, powerpc kernel retrieves
> > > > the memory range for crash dump kernel from a kernel command line, i.e.
> > > > crashkernel=, and then exposes it through DT to userspace (assuming
> > > > kexec-tools).
> > > 
> > > The rewriting I describe is in export_crashk_values() in
> > > arch/powerpc/kernel/machine_kexec.c, where the code deletes existing the
> > > properties, and adds new ones, to the DT exposed to userspace.
> > > 
> > > So I think we're just quibbling over the definition of "rewrite".
> > 
> > Gotcha
> > 
> > > > > arm64 we should either ensure that /proc/iomem is consistently usable
> > > > > (and have userspace consistently use it), or we should expose a new file
> > > > > specifically to expose this information.
> > > > 
> > > > The thing that I had in my mind when adding this property is that
> > > > /proc/iomem would be obsolete in the future, then we should have
> > > > an alternative in hand.
> > > 
> > > Ok.
> > > 
> > > My disagreement is with using the DT as a channel to convey information
> > > from the kernel to userspace.
> > > 
> > > I'm more than happy for a new file or other mechanism to express this
> > > information. For example, we could add
> > > /sys/kernel/kexec_crash_{base,size} or similar.
> > 
> > It may make sense because /sys/kernel/kexec_crash_size already exists,
> > so why not kexec_crash_base?
> > My concern, however, is that this kind of interface might prevent us from
> > allowing multiple regions to be reserved for crash dump kernel in the future.
> > (There is an assumption that we have only one region at least on arm64 though.)
> 
> In x86 there could be two ranges, one for softiotlb under 4G and another
> for range over 4G, but kexec_crash_size only shows the size of
> over-4g-range.
> 
> It is better to use /proc/iomem, most arches use /proc/iomem. Do you
> have any reason why it will be obsolete? At least for the time being it
> is fine.

I don't know.
I just think that I might have seen that someone said so somewhere
and that more _powerful_ (structured) tool could supersede it :)

-Takahiro AKASHI

> > 
> > Thanks,
> > -Takahiro AKASHI
> > 
> > > 
> > > > > Further, I do not think we need this property. It makes more sense to me
> > > > > for the preference of a a region to be described to the *first* kernel
> > > > > using the command line consistently.
> > > > > 
> > > > > So I think we should drop this property, and not use it on arm64. Please
> > > > > document this as powerpc only.
> > > > 
> > > > OK, but if we drop the property from arm64 code, we have no reason
> > > > to leave its description in this patch.
> > > > (In fact, there are a few more (undocumented) properties that only ppc
> > > > uses for kdump.)
> > > 
> > > I'm happy to drop it, then.
> > > 
> > > > > > +linux,usable-memory-range
> > > > > > +-------------------------
> > > > > > +
> > > > > > +This property (currently used only on arm64) holds the memory range,
> > > > > > +the base address and the size, which can be used as system ram on
> > > > > > +the *current* kernel. Note that, if this property is present, any memory
> > > > > > +regions under "memory" nodes in DT blob or ones marked as "conventional
> > > > > > +memory" in EFI memory map should be ignored.
> > > > > 
> > > > > Could you please replace this with:
> > > > > 
> > > > >   This property (arm64 only) holds a base address and size, describing a
> > > > >   limited region in which memory may be considered available for use by
> > > > >   the kernel. Memory outside of this range is not available for use.
> > > > >   
> > > > >   This property describes a limitation: memory within this range is only
> > > > >   valid when also described through another mechanism that the kernel
> > > > >   would otherwise use to determine available memory (e.g. memory nodes
> > > > >   or the EFI memory map). Valid memory may be sparse within the range.
> > > > 
> > > > Sure.
> > > 
> > > Cheers!
> > > 
> > > Thanks,
> > > Mark.
> > 
> > _______________________________________________
> > kexec mailing list
> > kexec at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-17 11:54           ` Mark Rutland
@ 2017-01-19  9:49             ` AKASHI Takahiro
  2017-01-19 11:28               ` Mark Rutland
  0 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-19  9:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 11:54:42AM +0000, Mark Rutland wrote:
> On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:
> > On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:
> > > Great! I think it would be better to follow the approach of
> > > mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise,
> > > it looks like it should work.
> > 
> > I'm not quite sure what the approach of mark_rodata_ro() means, but
> > I found that using create_mapping_late() may cause two problems:
> > 
> > 1) it fails when PTE_CONT bits mismatch between an old and new mmu entry.
> >    This can happen, say, if the memory range for crash dump kernel
> >    starts in the mid of _continuous_ pages.
> 
> That should only happen if we try to remap a segment different to what
> we originally mapped.
> 
> I was intending that we'd explicitly map the reserved region separately
> in the boot path, like we do for kernel segments in map_kernel(). We
> would allow sections and/or CONT entires. 
> 
> Then, in __map_memblock() we'd then skip that range as we do for the
> linear map alias of the kernel image.
> 
> That way, we can later use create_mapping_late for that same region, and
> it should handle sections and/or CONT entries in the exact same way as
> it does for the kernel image segments in mark_rodata_ro().

I see.
Which one do you prefer, yours above or my (second) solution?
Either way, they do almost the same thing in terms of mapping.

> > 2) The control code page, of one-page size, is still written out in
> >    machine_kexec() which is called at a crash, and this means that
> >    the range must be writable even after kexec_load(), but
> >    create_mapping_late() does not handle a case of changing attributes
> >    for a single page which is in _section_ mapping.
> >    We cannot make single-page mapping for the control page since the address
> >    of that page is not determined at the boot time.
> 
> That is a problem. I'm not sure I follow how set_memory_*() helps here
> though?
> 
> > As for (1), we need to call memblock_isolate_range() to make the region
> > an independent one.
> > 
> > > Either way, this still leaves us with an RO alias on crashed cores (and
> > > potential cache attribute mismatches in future). Do we need to read from
> > > the region later,
> > 
> > I believe not, but the region must be _writable_ as I mentioned in (2) above.
> > To avoid this issue, we have to move copying the control code page
> > to machine_kexec_prepare() which is called in kexec_load() and so
> > the region is writable anyway then.
> > I want Geoff to affirm that this change is safe.
> > 
> > (See my second solution below.)
> 
> From a quick scan that looks ok.
> 
> > > or could we unmap it entirely?
> > 
> > given the change above, I think we can.

I'm now asking Geoff ...

> 
> Great!
>
> > Is there any code to re-use especially for unmapping?
> 
> I don't think we have much code useful for unmapping. We could re-use 
> create_mapping_late for this, passing a set of prot bits that means the
> entries are invalid (e.g. have a PAGE_KERNEL_INVALID).

Do you really think that we should totally invalidate mmu entries?
I guess that, given proper cache & TLB flush operations, RO attribute is
good enough for memory consistency, no?
(None accesses the region, as I said, except in the case of re-loading
crash dump kernel though.)

> We'd have to perform the TLB invalidation ourselves, but that shouldn't
> be too painful.

Do we need to invalidate TLBs not only before but also after changing
permission attributes as make_rodata_ro() does?

-Takahiro AKASHI

> Thanks,
> Mark.
> 
> > ===8<===
> > diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> > index c0fc3d458195..80a52e9aaf73 100644
> > --- a/arch/arm64/kernel/machine_kexec.c
> > +++ b/arch/arm64/kernel/machine_kexec.c
> > @@ -26,8 +26,6 @@
> >  extern const unsigned char arm64_relocate_new_kernel[];
> >  extern const unsigned long arm64_relocate_new_kernel_size;
> >  
> > -static unsigned long kimage_start;
> > -
> >  /**
> >   * kexec_image_info - For debugging output.
> >   */
> > @@ -68,7 +66,7 @@ void machine_kexec_cleanup(struct kimage *kimage)
> >   */
> >  int machine_kexec_prepare(struct kimage *kimage)
> >  {
> > -	kimage_start = kimage->start;
> > +	void *reboot_code_buffer;
> >  
> >  	kexec_image_info(kimage);
> >  
> > @@ -77,6 +75,21 @@ int machine_kexec_prepare(struct kimage *kimage)
> >  		return -EBUSY;
> >  	}
> >  
> > +	reboot_code_buffer =
> > +			phys_to_virt(page_to_phys(kimage->control_code_page));
> > +
> > +	/*
> > +	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
> > +	 * after the kernel is shut down.
> > +	 */
> > +	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
> > +		arm64_relocate_new_kernel_size);
> > +
> > +	/* Flush the reboot_code_buffer in preparation for its execution. */
> > +	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
> > +	flush_icache_range((uintptr_t)reboot_code_buffer,
> > +		arm64_relocate_new_kernel_size);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -147,7 +160,6 @@ static void kexec_segment_flush(const struct kimage *kimage)
> >  void machine_kexec(struct kimage *kimage)
> >  {
> >  	phys_addr_t reboot_code_buffer_phys;
> > -	void *reboot_code_buffer;
> >  
> >  	/*
> >  	 * New cpus may have become stuck_in_kernel after we loaded the image.
> > @@ -156,7 +168,6 @@ void machine_kexec(struct kimage *kimage)
> >  			!WARN_ON(kimage == kexec_crash_image));
> >  
> >  	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
> > -	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
> >  
> >  	kexec_image_info(kimage);
> >  
> > @@ -164,26 +175,12 @@ void machine_kexec(struct kimage *kimage)
> >  		kimage->control_code_page);
> >  	pr_debug("%s:%d: reboot_code_buffer_phys:  %pa\n", __func__, __LINE__,
> >  		&reboot_code_buffer_phys);
> > -	pr_debug("%s:%d: reboot_code_buffer:       %p\n", __func__, __LINE__,
> > -		reboot_code_buffer);
> >  	pr_debug("%s:%d: relocate_new_kernel:      %p\n", __func__, __LINE__,
> >  		arm64_relocate_new_kernel);
> >  	pr_debug("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n",
> >  		__func__, __LINE__, arm64_relocate_new_kernel_size,
> >  		arm64_relocate_new_kernel_size);
> >  
> > -	/*
> > -	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
> > -	 * after the kernel is shut down.
> > -	 */
> > -	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
> > -		arm64_relocate_new_kernel_size);
> > -
> > -	/* Flush the reboot_code_buffer in preparation for its execution. */
> > -	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
> > -	flush_icache_range((uintptr_t)reboot_code_buffer,
> > -		arm64_relocate_new_kernel_size);
> > -
> >  	/* Flush the kimage list and its buffers. */
> >  	kexec_list_flush(kimage);
> >  
> > @@ -206,7 +203,7 @@ void machine_kexec(struct kimage *kimage)
> >  	 */
> >  
> >  	cpu_soft_restart(kimage != kexec_crash_image,
> > -		reboot_code_buffer_phys, kimage->head, kimage_start, 0);
> > +		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> >  
> >  	BUG(); /* Should never get here. */
> >  }
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 569ec3325bc8..e4cc170edc0c 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -90,6 +90,7 @@ early_param("initrd", early_initrd);
> >  static void __init reserve_crashkernel(void)
> >  {
> >  	unsigned long long crash_size, crash_base;
> > +	int start_rgn, end_rgn;
> >  	int ret;
> >  
> >  	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > @@ -120,6 +121,8 @@ static void __init reserve_crashkernel(void)
> >  			return;
> >  		}
> >  	}
> > +	memblock_isolate_range(&memblock.memory, crash_base, crash_size,
> > +			&start_rgn, &end_rgn);
> >  	memblock_reserve(crash_base, crash_size);
> >  
> >  	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 17243e43184e..b7c75845407a 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -22,6 +22,8 @@
> >  #include <linux/kernel.h>
> >  #include <linux/errno.h>
> >  #include <linux/init.h>
> > +#include <linux/ioport.h>
> > +#include <linux/kexec.h>
> >  #include <linux/libfdt.h>
> >  #include <linux/mman.h>
> >  #include <linux/nodemask.h>
> > @@ -817,3 +819,27 @@ int pmd_clear_huge(pmd_t *pmd)
> >  	pmd_clear(pmd);
> >  	return 1;
> >  }
> > +
> > +#ifdef CONFIG_KEXEC_CORE
> > +void arch_kexec_protect_crashkres(void)
> > +{
> > +	flush_tlb_all();
> > +
> > +	create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start),
> > +			    resource_size(&crashk_res), PAGE_KERNEL_RO);
> > +
> > +	/* flush the TLBs after updating live kernel mappings */
> > +	flush_tlb_all();
> > +}
> > +
> > +void arch_kexec_unprotect_crashkres(void)
> > +{
> > +	flush_tlb_all();
> > +
> > +	create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start),
> > +			    resource_size(&crashk_res), PAGE_KERNEL);
> > +
> > +	/* flush the TLBs after updating live kernel mappings */
> > +	flush_tlb_all();
> > +}
> > +#endif
> > ===>8===

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-19  9:49             ` AKASHI Takahiro
@ 2017-01-19 11:28               ` Mark Rutland
  2017-01-23  9:51                 ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2017-01-19 11:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 19, 2017 at 06:49:42PM +0900, AKASHI Takahiro wrote:
> On Tue, Jan 17, 2017 at 11:54:42AM +0000, Mark Rutland wrote:
> > On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:
> > > On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:
> > > > Great! I think it would be better to follow the approach of
> > > > mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise,
> > > > it looks like it should work.
> > > 
> > > I'm not quite sure what the approach of mark_rodata_ro() means, but
> > > I found that using create_mapping_late() may cause two problems:
> > > 
> > > 1) it fails when PTE_CONT bits mismatch between an old and new mmu entry.
> > >    This can happen, say, if the memory range for crash dump kernel
> > >    starts in the mid of _continuous_ pages.
> > 
> > That should only happen if we try to remap a segment different to what
> > we originally mapped.
> > 
> > I was intending that we'd explicitly map the reserved region separately
> > in the boot path, like we do for kernel segments in map_kernel(). We
> > would allow sections and/or CONT entires. 
> > 
> > Then, in __map_memblock() we'd then skip that range as we do for the
> > linear map alias of the kernel image.
> > 
> > That way, we can later use create_mapping_late for that same region, and
> > it should handle sections and/or CONT entries in the exact same way as
> > it does for the kernel image segments in mark_rodata_ro().
> 
> I see.
> Which one do you prefer, yours above or my (second) solution?
> Either way, they do almost the same thing in terms of mapping.

While both should work, I'd prefer to match the existing map_kernel()
logic, (i.e. my suggestion above), for consistency.

> > I don't think we have much code useful for unmapping. We could re-use 
> > create_mapping_late for this, passing a set of prot bits that means the
> > entries are invalid (e.g. have a PAGE_KERNEL_INVALID).
> 
> Do you really think that we should totally invalidate mmu entries?
> I guess that, given proper cache & TLB flush operations, RO attribute is
> good enough for memory consistency, no?
> (None accesses the region, as I said, except in the case of re-loading
> crash dump kernel though.)

My worry is that the first kernel and kdump kernel may map (portions of)
the region with potentially confliciting memory attributes. So it would
be necessary to completely unmap the region.

You raise a good point that this would also mean we need to perform some
cache maintenance, which makes that a little more painful. We'd need a
sequence like:

* Unmap the region
* TLB invalidation
* Remap the region with non-cacheable attributes
* Cache maintenance
* Unmap the region
* TLB invalidation

> > We'd have to perform the TLB invalidation ourselves, but that shouldn't
> > be too painful.
> 
> Do we need to invalidate TLBs not only before but also after changing
> permission attributes as make_rodata_ro() does?

I believe we'd only have to perform the TLB invalidation after the
change of attributes.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-19 11:28               ` Mark Rutland
@ 2017-01-23  9:51                 ` AKASHI Takahiro
  2017-01-23 10:23                   ` Mark Rutland
  0 siblings, 1 reply; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-23  9:51 UTC (permalink / raw)
  To: linux-arm-kernel

Mark,

On Thu, Jan 19, 2017 at 11:28:50AM +0000, Mark Rutland wrote:
> On Thu, Jan 19, 2017 at 06:49:42PM +0900, AKASHI Takahiro wrote:
> > On Tue, Jan 17, 2017 at 11:54:42AM +0000, Mark Rutland wrote:
> > > On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:
> > > > On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:
> > > > > Great! I think it would be better to follow the approach of
> > > > > mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise,
> > > > > it looks like it should work.
> > > > 
> > > > I'm not quite sure what the approach of mark_rodata_ro() means, but
> > > > I found that using create_mapping_late() may cause two problems:
> > > > 
> > > > 1) it fails when PTE_CONT bits mismatch between an old and new mmu entry.
> > > >    This can happen, say, if the memory range for crash dump kernel
> > > >    starts in the mid of _continuous_ pages.
> > > 
> > > That should only happen if we try to remap a segment different to what
> > > we originally mapped.
> > > 
> > > I was intending that we'd explicitly map the reserved region separately
> > > in the boot path, like we do for kernel segments in map_kernel(). We
> > > would allow sections and/or CONT entires. 
> > > 
> > > Then, in __map_memblock() we'd then skip that range as we do for the
> > > linear map alias of the kernel image.
> > > 
> > > That way, we can later use create_mapping_late for that same region, and
> > > it should handle sections and/or CONT entries in the exact same way as
> > > it does for the kernel image segments in mark_rodata_ro().
> > 
> > I see.
> > Which one do you prefer, yours above or my (second) solution?
> > Either way, they do almost the same thing in terms of mapping.
> 
> While both should work, I'd prefer to match the existing map_kernel()
> logic, (i.e. my suggestion above), for consistency.

OK

> > > I don't think we have much code useful for unmapping. We could re-use 
> > > create_mapping_late for this, passing a set of prot bits that means the
> > > entries are invalid (e.g. have a PAGE_KERNEL_INVALID).
> > 
> > Do you really think that we should totally invalidate mmu entries?
> > I guess that, given proper cache & TLB flush operations, RO attribute is
> > good enough for memory consistency, no?
> > (None accesses the region, as I said, except in the case of re-loading
> > crash dump kernel though.)
> 
> My worry is that the first kernel and kdump kernel may map (portions of)
> the region with potentially confliciting memory attributes. So it would
> be necessary to completely unmap the region.

I think that this can happen only if the second kernel boots up,
leaving non-crashed cpus still running for some reason.

> You raise a good point that this would also mean we need to perform some
> cache maintenance, which makes that a little more painful. We'd need a
> sequence like:
> 
> * Unmap the region
> * TLB invalidation
> * Remap the region with non-cacheable attributes
> * Cache maintenance
> * Unmap the region
> * TLB invalidation

I don't get why we need to remap the region and do cache
maintenance here. Please elaborate a bit more?
My current implementation of arch_kexec_protect_crashkres() is:

        kexec_segment_flush(kexec_crash_image);
        create_mapping_late(crashk_res.start, ..., __pgprot(0));
                                                or PAGE_KERNEL_INVALID
        flush_tlb_all();

kexec_segment_flush() will eventually do dcache-flush for all the modified
data in crash dump kernel memory.

> > > We'd have to perform the TLB invalidation ourselves, but that shouldn't
> > > be too painful.
> > 
> > Do we need to invalidate TLBs not only before but also after changing
> > permission attributes as make_rodata_ro() does?
> 
> I believe we'd only have to perform the TLB invalidation after the
> change of attributes.

OK

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-23  9:51                 ` AKASHI Takahiro
@ 2017-01-23 10:23                   ` Mark Rutland
  2017-01-24  7:55                     ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Mark Rutland @ 2017-01-23 10:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 23, 2017 at 06:51:46PM +0900, AKASHI Takahiro wrote:
> Mark,
> 
> On Thu, Jan 19, 2017 at 11:28:50AM +0000, Mark Rutland wrote:
> > On Thu, Jan 19, 2017 at 06:49:42PM +0900, AKASHI Takahiro wrote:
> > > On Tue, Jan 17, 2017 at 11:54:42AM +0000, Mark Rutland wrote:
> > > > On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:
> > > > > On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:

> > > > I don't think we have much code useful for unmapping. We could re-use 
> > > > create_mapping_late for this, passing a set of prot bits that means the
> > > > entries are invalid (e.g. have a PAGE_KERNEL_INVALID).
> > > 
> > > Do you really think that we should totally invalidate mmu entries?
> > > I guess that, given proper cache & TLB flush operations, RO attribute is
> > > good enough for memory consistency, no?
> > > (None accesses the region, as I said, except in the case of re-loading
> > > crash dump kernel though.)
> > 
> > My worry is that the first kernel and kdump kernel may map (portions of)
> > the region with potentially confliciting memory attributes. So it would
> > be necessary to completely unmap the region.
> 
> I think that this can happen only if the second kernel boots up,
> leaving non-crashed cpus still running for some reason.

Yes. I was considering a kdump case where a secondary was stuck in the
first kernel.

> > You raise a good point that this would also mean we need to perform some
> > cache maintenance, which makes that a little more painful. We'd need a
> > sequence like:
> > 
> > * Unmap the region
> > * TLB invalidation
> > * Remap the region with non-cacheable attributes
> > * Cache maintenance
> > * Unmap the region
> > * TLB invalidation
> 
> I don't get why we need to remap the region and do cache
> maintenance here. Please elaborate a bit more?

I think I was wrong, and we don't need to. Sorry about that.

My thought was that to ensure that there aren't stale lines with
differing attributes, we'd need to do a clean+invalidate while the
caches are guaranteed to not allocate anything furthe. Hence, we'd need
to use a non-cacheable mapping to perform the clean+invalidate.

However, I now think that so long as we unmap the range, this shouldn't
matter. The new kernel can perform the maintenance if it wishes to use
different attributes, similarly to what the first kernel must do per the
boot protocol.

> My current implementation of arch_kexec_protect_crashkres() is:
> 
>         kexec_segment_flush(kexec_crash_image);
>         create_mapping_late(crashk_res.start, ..., __pgprot(0));
>                                                 or PAGE_KERNEL_INVALID
>         flush_tlb_all();
> 
> kexec_segment_flush() will eventually do dcache-flush for all the modified
> data in crash dump kernel memory.

I now think this should be fine, per the above.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown()
  2017-01-12 12:01           ` Will Deacon
@ 2017-01-23 17:46             ` Will Deacon
  2017-01-24  7:52               ` AKASHI Takahiro
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2017-01-23 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 12, 2017 at 12:01:11PM +0000, Will Deacon wrote:
> On Thu, Jan 12, 2017 at 01:21:44PM +0900, AKASHI Takahiro wrote:
> > On Wed, Jan 11, 2017 at 10:54:05AM +0000, Will Deacon wrote:
> > > On Wed, Jan 11, 2017 at 03:36:28PM +0900, AKASHI Takahiro wrote:
> > > > On Tue, Jan 10, 2017 at 11:32:48AM +0000, Will Deacon wrote:
> > > > > On Wed, Dec 28, 2016 at 01:36:01PM +0900, AKASHI Takahiro wrote:
> > > > > > @@ -22,6 +25,7 @@
> > > > > >  extern const unsigned char arm64_relocate_new_kernel[];
> > > > > >  extern const unsigned long arm64_relocate_new_kernel_size;
> > > > > >  
> > > > > > +static bool in_crash_kexec;
> > > > > 
> > > > > Do you actually need this bool? Why not call kexec_crash_loaded() instead?
> > > > 
> > > > The two have different meanings:
> > > > "in_crash_kexec" indicates that kdump is taking place, while
> > > > kexec_crash_loaded() tells us only whether crash dump kernel has been
> > > > loaded or not.
> > > > 
> > > > It is crucial to distinguish them especially for machine_kexec()
> > > > which can be called on normal kexec even if kdump has been set up.
> > > 
> > > Ah, I see. So how about just doing:
> > > 
> > >   if (kimage == kexec_crash_image)
> > > 
> > > in machine_kexec?
> > 
> > Yeah, it should work.
> > Do you want to merge the following hunk,
> > or expect that I will re-send the whole patch series
> > (with other changes if any)?
> 
> Thanks, I'll fold it in and shout if I run into any problems. My plan is
> to queue this for 4.11.

Given the DT discussion with Mark, I assume you'll post a new version with
this rolled in.

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown()
  2017-01-23 17:46             ` Will Deacon
@ 2017-01-24  7:52               ` AKASHI Takahiro
  0 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-24  7:52 UTC (permalink / raw)
  To: linux-arm-kernel

Will,

On Mon, Jan 23, 2017 at 05:46:34PM +0000, Will Deacon wrote:
> On Thu, Jan 12, 2017 at 12:01:11PM +0000, Will Deacon wrote:
> > On Thu, Jan 12, 2017 at 01:21:44PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Jan 11, 2017 at 10:54:05AM +0000, Will Deacon wrote:
> > > > On Wed, Jan 11, 2017 at 03:36:28PM +0900, AKASHI Takahiro wrote:
> > > > > On Tue, Jan 10, 2017 at 11:32:48AM +0000, Will Deacon wrote:
> > > > > > On Wed, Dec 28, 2016 at 01:36:01PM +0900, AKASHI Takahiro wrote:
> > > > > > > @@ -22,6 +25,7 @@
> > > > > > >  extern const unsigned char arm64_relocate_new_kernel[];
> > > > > > >  extern const unsigned long arm64_relocate_new_kernel_size;
> > > > > > >  
> > > > > > > +static bool in_crash_kexec;
> > > > > > 
> > > > > > Do you actually need this bool? Why not call kexec_crash_loaded() instead?
> > > > > 
> > > > > The two have different meanings:
> > > > > "in_crash_kexec" indicates that kdump is taking place, while
> > > > > kexec_crash_loaded() tells us only whether crash dump kernel has been
> > > > > loaded or not.
> > > > > 
> > > > > It is crucial to distinguish them especially for machine_kexec()
> > > > > which can be called on normal kexec even if kdump has been set up.
> > > > 
> > > > Ah, I see. So how about just doing:
> > > > 
> > > >   if (kimage == kexec_crash_image)
> > > > 
> > > > in machine_kexec?
> > > 
> > > Yeah, it should work.
> > > Do you want to merge the following hunk,
> > > or expect that I will re-send the whole patch series
> > > (with other changes if any)?
> > 
> > Thanks, I'll fold it in and shout if I run into any problems. My plan is
> > to queue this for 4.11.
> 
> Given the DT discussion with Mark, I assume you'll post a new version with
> this rolled in.

Yes, I will!

-Takahiro AKASHI

> Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
  2017-01-23 10:23                   ` Mark Rutland
@ 2017-01-24  7:55                     ` AKASHI Takahiro
  0 siblings, 0 replies; 39+ messages in thread
From: AKASHI Takahiro @ 2017-01-24  7:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 23, 2017 at 10:23:15AM +0000, Mark Rutland wrote:
> On Mon, Jan 23, 2017 at 06:51:46PM +0900, AKASHI Takahiro wrote:
> > Mark,
> > 
> > On Thu, Jan 19, 2017 at 11:28:50AM +0000, Mark Rutland wrote:
> > > On Thu, Jan 19, 2017 at 06:49:42PM +0900, AKASHI Takahiro wrote:
> > > > On Tue, Jan 17, 2017 at 11:54:42AM +0000, Mark Rutland wrote:
> > > > > On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:
> > > > > > On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:
> 
> > > > > I don't think we have much code useful for unmapping. We could re-use 
> > > > > create_mapping_late for this, passing a set of prot bits that means the
> > > > > entries are invalid (e.g. have a PAGE_KERNEL_INVALID).
> > > > 
> > > > Do you really think that we should totally invalidate mmu entries?
> > > > I guess that, given proper cache & TLB flush operations, RO attribute is
> > > > good enough for memory consistency, no?
> > > > (None accesses the region, as I said, except in the case of re-loading
> > > > crash dump kernel though.)
> > > 
> > > My worry is that the first kernel and kdump kernel may map (portions of)
> > > the region with potentially confliciting memory attributes. So it would
> > > be necessary to completely unmap the region.
> > 
> > I think that this can happen only if the second kernel boots up,
> > leaving non-crashed cpus still running for some reason.
> 
> Yes. I was considering a kdump case where a secondary was stuck in the
> first kernel.
> 
> > > You raise a good point that this would also mean we need to perform some
> > > cache maintenance, which makes that a little more painful. We'd need a
> > > sequence like:
> > > 
> > > * Unmap the region
> > > * TLB invalidation
> > > * Remap the region with non-cacheable attributes
> > > * Cache maintenance
> > > * Unmap the region
> > > * TLB invalidation
> > 
> > I don't get why we need to remap the region and do cache
> > maintenance here. Please elaborate a bit more?
> 
> I think I was wrong, and we don't need to. Sorry about that.
> 
> My thought was that to ensure that there aren't stale lines with
> differing attributes, we'd need to do a clean+invalidate while the
> caches are guaranteed to not allocate anything furthe. Hence, we'd need
> to use a non-cacheable mapping to perform the clean+invalidate.
> 
> However, I now think that so long as we unmap the range, this shouldn't
> matter. The new kernel can perform the maintenance if it wishes to use
> different attributes, similarly to what the first kernel must do per the
> boot protocol.
> 
> > My current implementation of arch_kexec_protect_crashkres() is:
> > 
> >         kexec_segment_flush(kexec_crash_image);
> >         create_mapping_late(crashk_res.start, ..., __pgprot(0));
> >                                                 or PAGE_KERNEL_INVALID
> >         flush_tlb_all();
> > 
> > kexec_segment_flush() will eventually do dcache-flush for all the modified
> > data in crash dump kernel memory.
> 
> I now think this should be fine, per the above.

OK.
I think that now I can see a light of the goal :)

-Takahiro AKASHI

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-01-24  7:55 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-28  4:33 [PATCH v29 0/9] arm64: add kdump support AKASHI Takahiro
2016-12-28  4:35 ` [PATCH v29 1/9] memblock: add memblock_cap_memory_range() AKASHI Takahiro
2017-01-10 11:16   ` Will Deacon
2017-01-11  1:57     ` Dennis Chen
2016-12-28  4:35 ` [PATCH v29 2/9] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
2016-12-28  4:36 ` [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
2017-01-12 15:09   ` Mark Rutland
2017-01-13  8:16     ` AKASHI Takahiro
2017-01-13 11:39       ` Mark Rutland
2017-01-17  8:20         ` AKASHI Takahiro
2017-01-17 11:54           ` Mark Rutland
2017-01-19  9:49             ` AKASHI Takahiro
2017-01-19 11:28               ` Mark Rutland
2017-01-23  9:51                 ` AKASHI Takahiro
2017-01-23 10:23                   ` Mark Rutland
2017-01-24  7:55                     ` AKASHI Takahiro
2016-12-28  4:36 ` [PATCH v29 4/9] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
2017-01-10 11:32   ` Will Deacon
2017-01-11  6:36     ` AKASHI Takahiro
2017-01-11 10:54       ` Will Deacon
2017-01-12  4:21         ` AKASHI Takahiro
2017-01-12 12:01           ` Will Deacon
2017-01-23 17:46             ` Will Deacon
2017-01-24  7:52               ` AKASHI Takahiro
2016-12-28  4:36 ` [PATCH v29 5/9] arm64: kdump: add VMCOREINFO's for user-space tools AKASHI Takahiro
2016-12-28  4:36 ` [PATCH v29 6/9] arm64: kdump: provide /proc/vmcore file AKASHI Takahiro
2016-12-28  4:36 ` [PATCH v29 7/9] arm64: kdump: enable kdump in defconfig AKASHI Takahiro
2016-12-28  4:36 ` [PATCH v29 8/9] Documentation: kdump: describe arm64 port AKASHI Takahiro
2016-12-28  4:37 ` [PATCH v29 9/9] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
2017-01-10 11:10   ` Will Deacon
2017-01-12 15:39   ` Mark Rutland
2017-01-13  9:13     ` AKASHI Takahiro
2017-01-13 11:17       ` Mark Rutland
2017-01-16  8:25         ` AKASHI Takahiro
2017-01-17  8:26           ` Dave Young
2017-01-19  9:01             ` AKASHI Takahiro
2017-01-17 11:13           ` Mark Rutland
2017-01-06  3:26 ` [PATCH v29 0/9] arm64: add kdump support Pratyush Anand
2017-01-06  4:34   ` AKASHI Takahiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).