All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v31 00/12] add kdump support
@ 2017-02-01 12:42 ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series adds kdump support on arm64.

To load a crash-dump kernel to the systems, a series of patches to
kexec-tools[1] are also needed. Please use the latest one, v5 [2].
For your convinience, you can pick them up from:
   https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump
   https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump

To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
  - crash utility (v7.1.8 or later, i.e. master for now) [3]

I tested this patchset on fast model and hikey.
(under various configurations, including 4KB-page/39,48-bits, 64KB-page/
42-bits with hibernate)

The previous versions were also:
Tested-by: Pratyush Anand <panand@redhat.com> (v29, mustang and seattle)
Tested-by: James Morse <james.morse@arm.com> (v27, Juno)

Changes for v31 (Feb 1, 2017)
  o add/use remove_pgd_mapping() instead of modifying (__)create_pgd_mapping()
    to protect crash dump kernel memory (patch #4,5)
  o fix an issue at the isolation of crash dump kernel memory in
    map_mem()/__map_memblock(), adding map_crashkernel() (patch#5)
  o preserve the contents of crash dump kernel memory around hibernation
    (patch#6)

Changes for v30 (Jan 24, 2017)
  o rebased to Linux-v4.10-rc5
  o remove "linux,crashkernel-base/size" from exported device tree
  o protect memory region for crash-dump kernel (adding patch#4,5)
  o remove "in_crash_kexec" variable
  o and other trivial changes

Changes for v29 (Dec 28, 2016)
  o rebased to Linux-v4.10-rc1
  o change asm constraints in crash_setup_regs() per Catalin

Changes for v28 (Nov 22, 2016)
  o rebased to Linux-v4.9-rc6
  o revamp patch #1 and merge memblock_cap_memory_range() with
    memblock_mem_limit_remove_map()

Changes for v27 (Nov 1, 2016)
  o rebased to Linux-v4.9-rc3
  o revert v26 change, i.e. revive "linux,usable-memory-range" property
    (patch #2/#3, updating patch #9)
  o minor fixes per review comments (patch #3/#4/#6/#8)
  o re-order patches and improve commit messages for readability

Changes for v26 (Sep 7, 2016):
  o Use /reserved-memory instead of "linux,usable-memory-range" property
    (dropping v25's patch#2 and #3, updating ex-patch#9.)

Changes for v25 (Aug 29, 2016):
  o Rebase to Linux-4.8-rc4
  o Use memremap() instead of ioremap_cache() [patch#5]

Changes for v24 (Aug 9, 2016):
  o Rebase to Linux-4.8-rc1
  o Update descriptions about newly added DT proerties

Changes for v23 (July 26, 2016):

  o Move memblock_reserve() to a single place in reserve_crashkernel()
  o Use  cpu_park_loop() in ipi_cpu_crash_stop()
  o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
  o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
    (for ACPI) from usable memory at crash kernel

Changes for v22 (July 12, 2016):

  o Export "crashkernel-base" and "crashkernel-size" via device-tree,
    and add some descriptions about them in chosen.txt
  o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
    with powerpc's "usable-memory"
  o Make cosmetic changes regarding "ifdef" usage
  o Correct some wordings in kdump.txt

Changes for v21 (July 6, 2016):

  o Remove kexec patches.
  o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
  o Clarify the description about kvm in kdump.txt.

See the link [4] for older changes.


[1] https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
[2] http://lists.infradead.org/pipermail/kexec/2017-January/018002.html
[3] https://github.com/crash-utility/crash.git
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html

AKASHI Takahiro (11):
  memblock: add memblock_cap_memory_range()
  arm64: limit memory regions based on DT property, usable-memory-range
  arm64: kdump: reserve memory for crash dump kernel
  arm64: mm: allow for unmapping part of kernel mapping
  arm64: kdump: protect crash dump kernel memory
  arm64: hibernate: preserve kdump image around hibernation
  arm64: kdump: implement machine_crash_shutdown()
  arm64: kdump: add VMCOREINFO's for user-space tools
  arm64: kdump: provide /proc/vmcore file
  arm64: kdump: enable kdump in defconfig
  Documentation: kdump: describe arm64 port

James Morse (1):
  Documentation: dt: chosen properties for arm64 kdump

 Documentation/devicetree/bindings/chosen.txt |  37 ++++++
 Documentation/kdump/kdump.txt                |  16 ++-
 arch/arm64/Kconfig                           |  11 ++
 arch/arm64/configs/defconfig                 |   1 +
 arch/arm64/include/asm/hardirq.h             |   2 +-
 arch/arm64/include/asm/kexec.h               |  42 ++++++-
 arch/arm64/include/asm/mmu.h                 |   2 +
 arch/arm64/include/asm/smp.h                 |   2 +
 arch/arm64/kernel/Makefile                   |   1 +
 arch/arm64/kernel/crash_dump.c               |  71 ++++++++++++
 arch/arm64/kernel/hibernate.c                |  13 +++
 arch/arm64/kernel/machine_kexec.c            | 129 +++++++++++++++++----
 arch/arm64/kernel/setup.c                    |   7 +-
 arch/arm64/kernel/smp.c                      |  63 ++++++++++
 arch/arm64/mm/init.c                         | 150 ++++++++++++++++++++++++
 arch/arm64/mm/mmu.c                          | 165 ++++++++++++++++++++++++++-
 include/linux/memblock.h                     |   1 +
 mm/memblock.c                                |  44 ++++---
 18 files changed, 709 insertions(+), 48 deletions(-)
 create mode 100644 arch/arm64/kernel/crash_dump.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 00/12] add kdump support
@ 2017-02-01 12:42 ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:42 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

This patch series adds kdump support on arm64.

To load a crash-dump kernel to the systems, a series of patches to
kexec-tools[1] are also needed. Please use the latest one, v5 [2].
For your convinience, you can pick them up from:
   https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump
   https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump

To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
  - crash utility (v7.1.8 or later, i.e. master for now) [3]

I tested this patchset on fast model and hikey.
(under various configurations, including 4KB-page/39,48-bits, 64KB-page/
42-bits with hibernate)

The previous versions were also:
Tested-by: Pratyush Anand <panand@redhat.com> (v29, mustang and seattle)
Tested-by: James Morse <james.morse@arm.com> (v27, Juno)

Changes for v31 (Feb 1, 2017)
  o add/use remove_pgd_mapping() instead of modifying (__)create_pgd_mapping()
    to protect crash dump kernel memory (patch #4,5)
  o fix an issue at the isolation of crash dump kernel memory in
    map_mem()/__map_memblock(), adding map_crashkernel() (patch#5)
  o preserve the contents of crash dump kernel memory around hibernation
    (patch#6)

Changes for v30 (Jan 24, 2017)
  o rebased to Linux-v4.10-rc5
  o remove "linux,crashkernel-base/size" from exported device tree
  o protect memory region for crash-dump kernel (adding patch#4,5)
  o remove "in_crash_kexec" variable
  o and other trivial changes

Changes for v29 (Dec 28, 2016)
  o rebased to Linux-v4.10-rc1
  o change asm constraints in crash_setup_regs() per Catalin

Changes for v28 (Nov 22, 2016)
  o rebased to Linux-v4.9-rc6
  o revamp patch #1 and merge memblock_cap_memory_range() with
    memblock_mem_limit_remove_map()

Changes for v27 (Nov 1, 2016)
  o rebased to Linux-v4.9-rc3
  o revert v26 change, i.e. revive "linux,usable-memory-range" property
    (patch #2/#3, updating patch #9)
  o minor fixes per review comments (patch #3/#4/#6/#8)
  o re-order patches and improve commit messages for readability

Changes for v26 (Sep 7, 2016):
  o Use /reserved-memory instead of "linux,usable-memory-range" property
    (dropping v25's patch#2 and #3, updating ex-patch#9.)

Changes for v25 (Aug 29, 2016):
  o Rebase to Linux-4.8-rc4
  o Use memremap() instead of ioremap_cache() [patch#5]

Changes for v24 (Aug 9, 2016):
  o Rebase to Linux-4.8-rc1
  o Update descriptions about newly added DT proerties

Changes for v23 (July 26, 2016):

  o Move memblock_reserve() to a single place in reserve_crashkernel()
  o Use  cpu_park_loop() in ipi_cpu_crash_stop()
  o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
  o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
    (for ACPI) from usable memory at crash kernel

Changes for v22 (July 12, 2016):

  o Export "crashkernel-base" and "crashkernel-size" via device-tree,
    and add some descriptions about them in chosen.txt
  o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
    with powerpc's "usable-memory"
  o Make cosmetic changes regarding "ifdef" usage
  o Correct some wordings in kdump.txt

Changes for v21 (July 6, 2016):

  o Remove kexec patches.
  o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
  o Clarify the description about kvm in kdump.txt.

See the link [4] for older changes.


[1] https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
[2] http://lists.infradead.org/pipermail/kexec/2017-January/018002.html
[3] https://github.com/crash-utility/crash.git
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html

AKASHI Takahiro (11):
  memblock: add memblock_cap_memory_range()
  arm64: limit memory regions based on DT property, usable-memory-range
  arm64: kdump: reserve memory for crash dump kernel
  arm64: mm: allow for unmapping part of kernel mapping
  arm64: kdump: protect crash dump kernel memory
  arm64: hibernate: preserve kdump image around hibernation
  arm64: kdump: implement machine_crash_shutdown()
  arm64: kdump: add VMCOREINFO's for user-space tools
  arm64: kdump: provide /proc/vmcore file
  arm64: kdump: enable kdump in defconfig
  Documentation: kdump: describe arm64 port

James Morse (1):
  Documentation: dt: chosen properties for arm64 kdump

 Documentation/devicetree/bindings/chosen.txt |  37 ++++++
 Documentation/kdump/kdump.txt                |  16 ++-
 arch/arm64/Kconfig                           |  11 ++
 arch/arm64/configs/defconfig                 |   1 +
 arch/arm64/include/asm/hardirq.h             |   2 +-
 arch/arm64/include/asm/kexec.h               |  42 ++++++-
 arch/arm64/include/asm/mmu.h                 |   2 +
 arch/arm64/include/asm/smp.h                 |   2 +
 arch/arm64/kernel/Makefile                   |   1 +
 arch/arm64/kernel/crash_dump.c               |  71 ++++++++++++
 arch/arm64/kernel/hibernate.c                |  13 +++
 arch/arm64/kernel/machine_kexec.c            | 129 +++++++++++++++++----
 arch/arm64/kernel/setup.c                    |   7 +-
 arch/arm64/kernel/smp.c                      |  63 ++++++++++
 arch/arm64/mm/init.c                         | 150 ++++++++++++++++++++++++
 arch/arm64/mm/mmu.c                          | 165 ++++++++++++++++++++++++++-
 include/linux/memblock.h                     |   1 +
 mm/memblock.c                                |  44 ++++---
 18 files changed, 709 insertions(+), 48 deletions(-)
 create mode 100644 arch/arm64/kernel/crash_dump.c

-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 01/12] memblock: add memblock_cap_memory_range()
  2017-02-01 12:42 ` AKASHI Takahiro
  (?)
@ 2017-02-01 12:45   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:45 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: james.morse, geoff, bauerman, dyoung, mark.rutland, kexec,
	linux-arm-kernel, linux-mm, AKASHI Takahiro

Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().

This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").

This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Dennis Chen <dennis.chen@arm.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/memblock.h |  1 +
 mm/memblock.c            | 44 +++++++++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5b759c9acf97..fbfcacc50c29 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -333,6 +333,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
 void memblock_mem_limit_remove_map(phys_addr_t limit);
 bool memblock_is_memory(phys_addr_t addr);
 int memblock_is_map_memory(phys_addr_t addr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 7608bc305936..fea1688fef60 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1514,11 +1514,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
 			      (phys_addr_t)ULLONG_MAX);
 }
 
+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+	int start_rgn, end_rgn;
+	int i, ret;
+
+	if (!size)
+		return;
+
+	ret = memblock_isolate_range(&memblock.memory, base, size,
+						&start_rgn, &end_rgn);
+	if (ret)
+		return;
+
+	/* remove all the MAP regions */
+	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	for (i = start_rgn - 1; i >= 0; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	/* truncate the reserved regions */
+	memblock_remove_range(&memblock.reserved, 0, base);
+	memblock_remove_range(&memblock.reserved,
+			base + size, (phys_addr_t)ULLONG_MAX);
+}
+
 void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 {
-	struct memblock_type *type = &memblock.memory;
 	phys_addr_t max_addr;
-	int i, ret, start_rgn, end_rgn;
 
 	if (!limit)
 		return;
@@ -1529,19 +1555,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 	if (max_addr == (phys_addr_t)ULLONG_MAX)
 		return;
 
-	ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
-				&start_rgn, &end_rgn);
-	if (ret)
-		return;
-
-	/* remove all the MAP regions above the limit */
-	for (i = end_rgn - 1; i >= start_rgn; i--) {
-		if (!memblock_is_nomap(&type->regions[i]))
-			memblock_remove_region(type, i);
-	}
-	/* truncate the reserved regions */
-	memblock_remove_range(&memblock.reserved, max_addr,
-			      (phys_addr_t)ULLONG_MAX);
+	memblock_cap_memory_range(0, max_addr);
 }
 
 static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 01/12] memblock: add memblock_cap_memory_range()
@ 2017-02-01 12:45   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().

This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").

This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Dennis Chen <dennis.chen@arm.com>
Cc: linux-mm at kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/memblock.h |  1 +
 mm/memblock.c            | 44 +++++++++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5b759c9acf97..fbfcacc50c29 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -333,6 +333,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
 void memblock_mem_limit_remove_map(phys_addr_t limit);
 bool memblock_is_memory(phys_addr_t addr);
 int memblock_is_map_memory(phys_addr_t addr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 7608bc305936..fea1688fef60 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1514,11 +1514,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
 			      (phys_addr_t)ULLONG_MAX);
 }
 
+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+	int start_rgn, end_rgn;
+	int i, ret;
+
+	if (!size)
+		return;
+
+	ret = memblock_isolate_range(&memblock.memory, base, size,
+						&start_rgn, &end_rgn);
+	if (ret)
+		return;
+
+	/* remove all the MAP regions */
+	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	for (i = start_rgn - 1; i >= 0; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	/* truncate the reserved regions */
+	memblock_remove_range(&memblock.reserved, 0, base);
+	memblock_remove_range(&memblock.reserved,
+			base + size, (phys_addr_t)ULLONG_MAX);
+}
+
 void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 {
-	struct memblock_type *type = &memblock.memory;
 	phys_addr_t max_addr;
-	int i, ret, start_rgn, end_rgn;
 
 	if (!limit)
 		return;
@@ -1529,19 +1555,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 	if (max_addr == (phys_addr_t)ULLONG_MAX)
 		return;
 
-	ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
-				&start_rgn, &end_rgn);
-	if (ret)
-		return;
-
-	/* remove all the MAP regions above the limit */
-	for (i = end_rgn - 1; i >= start_rgn; i--) {
-		if (!memblock_is_nomap(&type->regions[i]))
-			memblock_remove_region(type, i);
-	}
-	/* truncate the reserved regions */
-	memblock_remove_range(&memblock.reserved, max_addr,
-			      (phys_addr_t)ULLONG_MAX);
+	memblock_cap_memory_range(0, max_addr);
 }
 
 static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 01/12] memblock: add memblock_cap_memory_range()
@ 2017-02-01 12:45   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:45 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: mark.rutland, AKASHI Takahiro, geoff, kexec, linux-mm,
	james.morse, bauerman, dyoung, linux-arm-kernel

Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().

This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").

This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Dennis Chen <dennis.chen@arm.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/memblock.h |  1 +
 mm/memblock.c            | 44 +++++++++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5b759c9acf97..fbfcacc50c29 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -333,6 +333,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
 void memblock_mem_limit_remove_map(phys_addr_t limit);
 bool memblock_is_memory(phys_addr_t addr);
 int memblock_is_map_memory(phys_addr_t addr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 7608bc305936..fea1688fef60 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1514,11 +1514,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
 			      (phys_addr_t)ULLONG_MAX);
 }
 
+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+	int start_rgn, end_rgn;
+	int i, ret;
+
+	if (!size)
+		return;
+
+	ret = memblock_isolate_range(&memblock.memory, base, size,
+						&start_rgn, &end_rgn);
+	if (ret)
+		return;
+
+	/* remove all the MAP regions */
+	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	for (i = start_rgn - 1; i >= 0; i--)
+		if (!memblock_is_nomap(&memblock.memory.regions[i]))
+			memblock_remove_region(&memblock.memory, i);
+
+	/* truncate the reserved regions */
+	memblock_remove_range(&memblock.reserved, 0, base);
+	memblock_remove_range(&memblock.reserved,
+			base + size, (phys_addr_t)ULLONG_MAX);
+}
+
 void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 {
-	struct memblock_type *type = &memblock.memory;
 	phys_addr_t max_addr;
-	int i, ret, start_rgn, end_rgn;
 
 	if (!limit)
 		return;
@@ -1529,19 +1555,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 	if (max_addr == (phys_addr_t)ULLONG_MAX)
 		return;
 
-	ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
-				&start_rgn, &end_rgn);
-	if (ret)
-		return;
-
-	/* remove all the MAP regions above the limit */
-	for (i = end_rgn - 1; i >= start_rgn; i--) {
-		if (!memblock_is_nomap(&type->regions[i]))
-			memblock_remove_region(type, i);
-	}
-	/* truncate the reserved regions */
-	memblock_remove_range(&memblock.reserved, max_addr,
-			      (phys_addr_t)ULLONG_MAX);
+	memblock_cap_memory_range(0, max_addr);
 }
 
 static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 02/12] arm64: limit memory regions based on DT property, usable-memory-range
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

Crash dump kernel utilizes only a subset of available memory as System RAM.
On arm64 kdump, This memory range is advertized to crash dump kernel via
a device-tree property under /chosen,
   linux,usable-memory-range = <BASE SIZE>

Crash dump kernel reads this property at boot time and calls
memblock_cap_memory_range() to limit usable memory ranges which are
described as entries in UEFI memory map table or "memory" nodes in
a device tree blob.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Geoff Levand <geoff@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/mm/init.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 380ebe705093..6cddb566eb21 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -187,10 +187,45 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static int __init early_init_dt_scan_usablemem(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	struct memblock_region *usablemem = (struct memblock_region *)data;
+	const __be32 *reg;
+	int len;
+
+	usablemem->size = 0;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+static void __init fdt_enforce_memory_region(void)
+{
+	struct memblock_region reg;
+
+	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+
+	if (reg.size)
+		memblock_cap_memory_range(reg.base, reg.size);
+}
+
 void __init arm64_memblock_init(void)
 {
 	const s64 linear_region_size = -(s64)PAGE_OFFSET;
 
+	/* Handle linux,usable-memory-range property */
+	fdt_enforce_memory_region();
+
 	/*
 	 * Ensure that the linear region takes up exactly half of the kernel
 	 * virtual address space. This way, we can distinguish a linear address
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 02/12] arm64: limit memory regions based on DT property, usable-memory-range
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

Crash dump kernel utilizes only a subset of available memory as System RAM.
On arm64 kdump, This memory range is advertized to crash dump kernel via
a device-tree property under /chosen,
   linux,usable-memory-range = <BASE SIZE>

Crash dump kernel reads this property at boot time and calls
memblock_cap_memory_range() to limit usable memory ranges which are
described as entries in UEFI memory map table or "memory" nodes in
a device tree blob.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Geoff Levand <geoff@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/mm/init.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 380ebe705093..6cddb566eb21 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -187,10 +187,45 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static int __init early_init_dt_scan_usablemem(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	struct memblock_region *usablemem = (struct memblock_region *)data;
+	const __be32 *reg;
+	int len;
+
+	usablemem->size = 0;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+static void __init fdt_enforce_memory_region(void)
+{
+	struct memblock_region reg;
+
+	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+
+	if (reg.size)
+		memblock_cap_memory_range(reg.base, reg.size);
+}
+
 void __init arm64_memblock_init(void)
 {
 	const s64 linear_region_size = -(s64)PAGE_OFFSET;
 
+	/* Handle linux,usable-memory-range property */
+	fdt_enforce_memory_region();
+
 	/*
 	 * Ensure that the linear region takes up exactly half of the kernel
 	 * virtual address space. This way, we can distinguish a linear address
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

"crashkernel=" kernel parameter specifies the size (and optionally
the start address) of the system ram used by crash dump kernel.
reserve_crashkernel() will allocate and reserve the memory at the startup
of primary kernel.

This memory range will be exported to userspace via an entry named
"Crash kernel" in /proc/iomem.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/setup.c |  7 +++++-
 arch/arm64/mm/init.c      | 62 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index b051367e2149..515e9c6696df 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
 #include <linux/screen_info.h>
 #include <linux/init.h>
 #include <linux/kexec.h>
-#include <linux/crash_dump.h>
 #include <linux/root_dev.h>
 #include <linux/cpu.h>
 #include <linux/interrupt.h>
@@ -224,6 +223,12 @@ static void __init request_standard_resources(void)
 		if (kernel_data.start >= res->start &&
 		    kernel_data.end <= res->end)
 			request_resource(res, &kernel_data);
+#ifdef CONFIG_KEXEC_CORE
+		/* Userspace will find "Crash kernel" region in /proc/iomem. */
+		if (crashk_res.end && crashk_res.start >= res->start &&
+		    crashk_res.end <= res->end)
+			request_resource(res, &crashk_res);
+#endif
 	}
 }
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 6cddb566eb21..2aba75dc7720 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -30,12 +30,14 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 #include <linux/vmalloc.h>
+#include <linux/kexec.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -76,6 +78,63 @@ static int __init early_initrd(char *p)
 early_param("initrd", early_initrd);
 #endif
 
+#ifdef CONFIG_KEXEC_CORE
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+	unsigned long long crash_base, crash_size;
+	int ret;
+
+	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+				&crash_size, &crash_base);
+	/* no crashkernel= or invalid value specified */
+	if (ret || !crash_size)
+		return;
+
+	crash_size = PAGE_ALIGN(crash_size);
+
+	if (crash_base == 0) {
+		/* Current arm64 boot protocol requires 2MB alignment */
+		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+				crash_size, SZ_2M);
+		if (crash_base == 0) {
+			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+				crash_size);
+			return;
+		}
+	} else {
+		/* User specifies base address explicitly. */
+		if (!memblock_is_region_memory(crash_base, crash_size) ||
+			memblock_is_region_reserved(crash_base, crash_size)) {
+			pr_warn("crashkernel has wrong address or size\n");
+			return;
+		}
+
+		if (!IS_ALIGNED(crash_base, SZ_2M)) {
+			pr_warn("crashkernel base address is not 2MB aligned\n");
+			return;
+		}
+	}
+	memblock_reserve(crash_base, crash_size);
+
+	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+		crash_size >> 20, crash_base >> 20);
+
+	crashk_res.start = crash_base;
+	crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+}
+#endif /* CONFIG_KEXEC_CORE */
+
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -331,6 +390,9 @@ void __init arm64_memblock_init(void)
 		arm64_dma_phys_limit = max_zone_dma_phys();
 	else
 		arm64_dma_phys_limit = PHYS_MASK + 1;
+
+	reserve_crashkernel();
+
 	dma_contiguous_reserve(arm64_dma_phys_limit);
 
 	memblock_allow_resize();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, Pratyush Anand, geoff, kexec, AKASHI Takahiro,
	james.morse, Mark Salter, bauerman, dyoung, linux-arm-kernel

"crashkernel=" kernel parameter specifies the size (and optionally
the start address) of the system ram used by crash dump kernel.
reserve_crashkernel() will allocate and reserve the memory at the startup
of primary kernel.

This memory range will be exported to userspace via an entry named
"Crash kernel" in /proc/iomem.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/setup.c |  7 +++++-
 arch/arm64/mm/init.c      | 62 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index b051367e2149..515e9c6696df 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
 #include <linux/screen_info.h>
 #include <linux/init.h>
 #include <linux/kexec.h>
-#include <linux/crash_dump.h>
 #include <linux/root_dev.h>
 #include <linux/cpu.h>
 #include <linux/interrupt.h>
@@ -224,6 +223,12 @@ static void __init request_standard_resources(void)
 		if (kernel_data.start >= res->start &&
 		    kernel_data.end <= res->end)
 			request_resource(res, &kernel_data);
+#ifdef CONFIG_KEXEC_CORE
+		/* Userspace will find "Crash kernel" region in /proc/iomem. */
+		if (crashk_res.end && crashk_res.start >= res->start &&
+		    crashk_res.end <= res->end)
+			request_resource(res, &crashk_res);
+#endif
 	}
 }
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 6cddb566eb21..2aba75dc7720 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -30,12 +30,14 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 #include <linux/vmalloc.h>
+#include <linux/kexec.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -76,6 +78,63 @@ static int __init early_initrd(char *p)
 early_param("initrd", early_initrd);
 #endif
 
+#ifdef CONFIG_KEXEC_CORE
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+	unsigned long long crash_base, crash_size;
+	int ret;
+
+	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+				&crash_size, &crash_base);
+	/* no crashkernel= or invalid value specified */
+	if (ret || !crash_size)
+		return;
+
+	crash_size = PAGE_ALIGN(crash_size);
+
+	if (crash_base == 0) {
+		/* Current arm64 boot protocol requires 2MB alignment */
+		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+				crash_size, SZ_2M);
+		if (crash_base == 0) {
+			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+				crash_size);
+			return;
+		}
+	} else {
+		/* User specifies base address explicitly. */
+		if (!memblock_is_region_memory(crash_base, crash_size) ||
+			memblock_is_region_reserved(crash_base, crash_size)) {
+			pr_warn("crashkernel has wrong address or size\n");
+			return;
+		}
+
+		if (!IS_ALIGNED(crash_base, SZ_2M)) {
+			pr_warn("crashkernel base address is not 2MB aligned\n");
+			return;
+		}
+	}
+	memblock_reserve(crash_base, crash_size);
+
+	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+		crash_size >> 20, crash_base >> 20);
+
+	crashk_res.start = crash_base;
+	crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+}
+#endif /* CONFIG_KEXEC_CORE */
+
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -331,6 +390,9 @@ void __init arm64_memblock_init(void)
 		arm64_dma_phys_limit = max_zone_dma_phys();
 	else
 		arm64_dma_phys_limit = PHYS_MASK + 1;
+
+	reserve_crashkernel();
+
 	dma_contiguous_reserve(arm64_dma_phys_limit);
 
 	memblock_allow_resize();
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

A new function, remove_pgd_mapping(), is added.
It allows us to unmap a specific portion of kernel mapping later as far as
the mapping is made using create_pgd_mapping() and unless we try to free
a sub-set of memory range within a section mapping.

This function will be used in a later kdump patch to implement
the protection against corrupting crash dump kernel memory which is to be
set aside.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/include/asm/mmu.h |   2 +
 arch/arm64/mm/mmu.c          | 130 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..04eb240736d8 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -36,6 +36,8 @@ extern void init_mem_pgprot(void);
 extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       pgprot_t prot, bool page_mappings_only);
+extern void remove_pgd_mapping(struct mm_struct *mm, unsigned long virt,
+			       phys_addr_t size);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 
 #endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e43184e..9d3cea1db3b4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -334,12 +334,10 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
 }
 
-void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
-			       unsigned long virt, phys_addr_t size,
-			       pgprot_t prot, bool page_mappings_only)
+void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
+			unsigned long virt, phys_addr_t size,
+			pgprot_t prot, bool page_mappings_only)
 {
-	BUG_ON(mm == &init_mm);
-
 	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
 			     pgd_pgtable_alloc, page_mappings_only);
 }
@@ -357,6 +355,128 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
 			     NULL, debug_pagealloc_enabled());
 }
 
+static bool pgtable_is_cleared(void *pgtable)
+{
+	unsigned long *desc, *end;
+
+	for (desc = pgtable, end = (unsigned long *)(pgtable + PAGE_SIZE);
+	     desc < end; desc++)
+		if (*desc)
+			return false;
+
+	return true;
+}
+
+static void clear_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
+{
+	pte_t *pte;
+
+	if (WARN_ON(pmd_bad(*pmd)))
+		return;
+
+	pte = pte_set_fixmap_offset(pmd, addr);
+
+	do {
+		pte_clear(NULL, NULL, pte);
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+
+	pte_clear_fixmap();
+}
+
+static void clear_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	if (WARN_ON(pud_bad(*pud)))
+		return;
+
+	pmd = pmd_set_fixmap_offset(pud, addr);
+
+	do {
+		next = pmd_addr_end(addr, end);
+
+		if (pmd_table(*pmd)) {
+			clear_pte_range(pmd, addr, next);
+			if (((next - addr) == PMD_SIZE) ||
+			    pgtable_is_cleared(__va(pmd_page_paddr(*pmd)))) {
+				__free_page(pmd_page(*pmd));
+				pmd_clear(pmd);
+			}
+		} else {
+			if (WARN_ON((next - addr) != PMD_SIZE))
+				return;
+			pmd_clear(pmd);
+		}
+	} while (pmd++, addr = next, addr != end);
+
+	pmd_clear_fixmap();
+}
+
+static void clear_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	if (WARN_ON(pgd_bad(*pgd)))
+		return;
+
+	pud = pud_set_fixmap_offset(pgd, addr);
+
+	do {
+		next = pud_addr_end(addr, end);
+
+		if (pud_table(*pud)) {
+			clear_pmd_range(pud, addr, next);
+#if CONFIG_PGTABLE_LEVELS > 2
+			if (((next - addr) == PUD_SIZE) ||
+			    pgtable_is_cleared(__va(pud_page_paddr(*pud)))) {
+				__free_page(pud_page(*pud));
+				pud_clear(pud);
+			}
+#endif
+		} else {
+#if CONFIG_PGTABLE_LEVELS > 2
+			if (WARN_ON((next - addr) != PUD_SIZE))
+				return;
+			pud_clear(pud);
+#endif
+		}
+	} while (pud++, addr = next, addr != end);
+
+	pud_clear_fixmap();
+}
+
+static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long virt,
+				 phys_addr_t size)
+{
+	unsigned long addr, length, end, next;
+	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
+
+	addr = virt & PAGE_MASK;
+	length = PAGE_ALIGN(size + (virt & ~PAGE_MASK));
+
+	end = addr + length;
+	do {
+		next = pgd_addr_end(addr, end);
+		clear_pud_range(pgd, addr, next);
+
+#if CONFIG_PGTABLE_LEVELS > 3
+		if (((next - addr) == PGD_SIZE) ||
+		    pgtable_is_cleared(__va(pgd_page_paddr(*pgd)))) {
+			__free_page(pgd_page(*pgd));
+			pgd_clear(pgd);
+		}
+#endif
+	} while (pgd++, addr = next, addr != end);
+}
+
+void remove_pgd_mapping(struct mm_struct *mm, unsigned long virt,
+			phys_addr_t size)
+{
+	__remove_pgd_mapping(mm->pgd, virt, size);
+}
+
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
 {
 	unsigned long kernel_start = __pa(_text);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

A new function, remove_pgd_mapping(), is added.
It allows us to unmap a specific portion of kernel mapping later as far as
the mapping is made using create_pgd_mapping() and unless we try to free
a sub-set of memory range within a section mapping.

This function will be used in a later kdump patch to implement
the protection against corrupting crash dump kernel memory which is to be
set aside.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/include/asm/mmu.h |   2 +
 arch/arm64/mm/mmu.c          | 130 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..04eb240736d8 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -36,6 +36,8 @@ extern void init_mem_pgprot(void);
 extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       pgprot_t prot, bool page_mappings_only);
+extern void remove_pgd_mapping(struct mm_struct *mm, unsigned long virt,
+			       phys_addr_t size);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 
 #endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e43184e..9d3cea1db3b4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -334,12 +334,10 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
 }
 
-void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
-			       unsigned long virt, phys_addr_t size,
-			       pgprot_t prot, bool page_mappings_only)
+void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
+			unsigned long virt, phys_addr_t size,
+			pgprot_t prot, bool page_mappings_only)
 {
-	BUG_ON(mm == &init_mm);
-
 	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
 			     pgd_pgtable_alloc, page_mappings_only);
 }
@@ -357,6 +355,128 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
 			     NULL, debug_pagealloc_enabled());
 }
 
+static bool pgtable_is_cleared(void *pgtable)
+{
+	unsigned long *desc, *end;
+
+	for (desc = pgtable, end = (unsigned long *)(pgtable + PAGE_SIZE);
+	     desc < end; desc++)
+		if (*desc)
+			return false;
+
+	return true;
+}
+
+static void clear_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
+{
+	pte_t *pte;
+
+	if (WARN_ON(pmd_bad(*pmd)))
+		return;
+
+	pte = pte_set_fixmap_offset(pmd, addr);
+
+	do {
+		pte_clear(NULL, NULL, pte);
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+
+	pte_clear_fixmap();
+}
+
+static void clear_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	if (WARN_ON(pud_bad(*pud)))
+		return;
+
+	pmd = pmd_set_fixmap_offset(pud, addr);
+
+	do {
+		next = pmd_addr_end(addr, end);
+
+		if (pmd_table(*pmd)) {
+			clear_pte_range(pmd, addr, next);
+			if (((next - addr) == PMD_SIZE) ||
+			    pgtable_is_cleared(__va(pmd_page_paddr(*pmd)))) {
+				__free_page(pmd_page(*pmd));
+				pmd_clear(pmd);
+			}
+		} else {
+			if (WARN_ON((next - addr) != PMD_SIZE))
+				return;
+			pmd_clear(pmd);
+		}
+	} while (pmd++, addr = next, addr != end);
+
+	pmd_clear_fixmap();
+}
+
+static void clear_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	if (WARN_ON(pgd_bad(*pgd)))
+		return;
+
+	pud = pud_set_fixmap_offset(pgd, addr);
+
+	do {
+		next = pud_addr_end(addr, end);
+
+		if (pud_table(*pud)) {
+			clear_pmd_range(pud, addr, next);
+#if CONFIG_PGTABLE_LEVELS > 2
+			if (((next - addr) == PUD_SIZE) ||
+			    pgtable_is_cleared(__va(pud_page_paddr(*pud)))) {
+				__free_page(pud_page(*pud));
+				pud_clear(pud);
+			}
+#endif
+		} else {
+#if CONFIG_PGTABLE_LEVELS > 2
+			if (WARN_ON((next - addr) != PUD_SIZE))
+				return;
+			pud_clear(pud);
+#endif
+		}
+	} while (pud++, addr = next, addr != end);
+
+	pud_clear_fixmap();
+}
+
+static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long virt,
+				 phys_addr_t size)
+{
+	unsigned long addr, length, end, next;
+	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
+
+	addr = virt & PAGE_MASK;
+	length = PAGE_ALIGN(size + (virt & ~PAGE_MASK));
+
+	end = addr + length;
+	do {
+		next = pgd_addr_end(addr, end);
+		clear_pud_range(pgd, addr, next);
+
+#if CONFIG_PGTABLE_LEVELS > 3
+		if (((next - addr) == PGD_SIZE) ||
+		    pgtable_is_cleared(__va(pgd_page_paddr(*pgd)))) {
+			__free_page(pgd_page(*pgd));
+			pgd_clear(pgd);
+		}
+#endif
+	} while (pgd++, addr = next, addr != end);
+}
+
+void remove_pgd_mapping(struct mm_struct *mm, unsigned long virt,
+			phys_addr_t size)
+{
+	__remove_pgd_mapping(mm->pgd, virt, size);
+}
+
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
 {
 	unsigned long kernel_start = __pa(_text);
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called around kexec_load() in order to protect
the memory allocated for crash dump kernel once after it's loaded.

The protection is implemented here by unmapping the region rather than
making it read-only.
To make the things work correctly, we also have to
- put the region in an isolated, page-level mapping initially, and
- move copying kexec's control_code_page to machine_kexec_prepare()

Note that page-level mapping is also required to allow for shrinking
the size of memory, through /sys/kernel/kexec_crash_size, by any number
of multiple pages.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/kernel/machine_kexec.c | 65 ++++++++++++++++++++++++++-------------
 arch/arm64/mm/mmu.c               | 35 +++++++++++++++++++++
 2 files changed, 79 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..016f2dd693aa 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,6 +14,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
+#include <asm/mmu.h>
 #include <asm/mmu_context.h>
 
 #include "cpu-reset.h"
@@ -22,8 +23,6 @@
 extern const unsigned char arm64_relocate_new_kernel[];
 extern const unsigned long arm64_relocate_new_kernel_size;
 
-static unsigned long kimage_start;
-
 /**
  * kexec_image_info - For debugging output.
  */
@@ -64,7 +63,7 @@ void machine_kexec_cleanup(struct kimage *kimage)
  */
 int machine_kexec_prepare(struct kimage *kimage)
 {
-	kimage_start = kimage->start;
+	void *reboot_code_buffer;
 
 	kexec_image_info(kimage);
 
@@ -73,6 +72,21 @@ int machine_kexec_prepare(struct kimage *kimage)
 		return -EBUSY;
 	}
 
+	reboot_code_buffer =
+			phys_to_virt(page_to_phys(kimage->control_code_page));
+
+	/*
+	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
+	 * after the kernel is shut down.
+	 */
+	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
+		arm64_relocate_new_kernel_size);
+
+	/* Flush the reboot_code_buffer in preparation for its execution. */
+	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
+	flush_icache_range((uintptr_t)reboot_code_buffer,
+		arm64_relocate_new_kernel_size);
+
 	return 0;
 }
 
@@ -143,7 +157,6 @@ static void kexec_segment_flush(const struct kimage *kimage)
 void machine_kexec(struct kimage *kimage)
 {
 	phys_addr_t reboot_code_buffer_phys;
-	void *reboot_code_buffer;
 
 	/*
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
@@ -151,7 +164,6 @@ void machine_kexec(struct kimage *kimage)
 	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
-	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
 
 	kexec_image_info(kimage);
 
@@ -159,31 +171,17 @@ void machine_kexec(struct kimage *kimage)
 		kimage->control_code_page);
 	pr_debug("%s:%d: reboot_code_buffer_phys:  %pa\n", __func__, __LINE__,
 		&reboot_code_buffer_phys);
-	pr_debug("%s:%d: reboot_code_buffer:       %p\n", __func__, __LINE__,
-		reboot_code_buffer);
 	pr_debug("%s:%d: relocate_new_kernel:      %p\n", __func__, __LINE__,
 		arm64_relocate_new_kernel);
 	pr_debug("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n",
 		__func__, __LINE__, arm64_relocate_new_kernel_size,
 		arm64_relocate_new_kernel_size);
 
-	/*
-	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
-	 * after the kernel is shut down.
-	 */
-	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
-		arm64_relocate_new_kernel_size);
-
-	/* Flush the reboot_code_buffer in preparation for its execution. */
-	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
-	flush_icache_range((uintptr_t)reboot_code_buffer,
-		arm64_relocate_new_kernel_size);
-
 	/* Flush the kimage list and its buffers. */
 	kexec_list_flush(kimage);
 
 	/* Flush the new image if already in place. */
-	if (kimage->head & IND_DONE)
+	if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
 		kexec_segment_flush(kimage);
 
 	pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
 	 */
 
 	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
-		kimage_start, 0);
+		kimage->start, 0);
 
 	BUG(); /* Should never get here. */
 }
@@ -210,3 +208,28 @@ void machine_crash_shutdown(struct pt_regs *regs)
 {
 	/* Empty routine needed to avoid build errors. */
 }
+
+void arch_kexec_protect_crashkres(void)
+{
+	kexec_segment_flush(kexec_crash_image);
+
+	remove_pgd_mapping(&init_mm, __phys_to_virt(crashk_res.start),
+			resource_size(&crashk_res));
+
+	flush_tlb_all();
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+	/*
+	 * We don't have to make page-level mappings here because
+	 * the crash dump kernel memory is not allowed to be shrunk
+	 * once the kernel is loaded.
+	 */
+	create_pgd_mapping(&init_mm, crashk_res.start,
+			__phys_to_virt(crashk_res.start),
+			resource_size(&crashk_res), PAGE_KERNEL,
+			debug_pagealloc_enabled());
+
+	flush_tlb_all();
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9d3cea1db3b4..87861e62316a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
 #include <linux/libfdt.h>
 #include <linux/mman.h>
 #include <linux/nodemask.h>
@@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
 		if (memblock_is_nomap(reg))
 			continue;
 
+#ifdef CONFIG_KEXEC_CORE
+		/*
+		 * While crash dump kernel memory is contained in a single
+		 * memblock for now, it should appear in an isolated mapping
+		 * so that we can independently unmap the region later.
+		 */
+		if (crashk_res.end &&
+		    (start <= crashk_res.start) &&
+		    ((crashk_res.end + 1) < end)) {
+			if (crashk_res.start != start)
+				__map_memblock(pgd, start, crashk_res.start);
+
+			if ((crashk_res.end + 1) < end)
+				__map_memblock(pgd, crashk_res.end + 1, end);
+
+			continue;
+		}
+#endif
 		__map_memblock(pgd, start, end);
 	}
 }
@@ -623,6 +643,21 @@ static void __init map_kernel(pgd_t *pgd)
 	kasan_copy_shadow(pgd);
 }
 
+#ifdef CONFIG_KEXEC_CORE
+static int __init map_crashkernel(void)
+{
+	/* page-level mapping only to allow for shrinking */
+	if (crashk_res.end)
+		create_pgd_mapping(&init_mm, crashk_res.start,
+				   __phys_to_virt(crashk_res.start),
+				   resource_size(&crashk_res), PAGE_KERNEL,
+				   true);
+
+	return 0;
+}
+subsys_initcall(map_crashkernel);
+#endif
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps and sets up the zero page.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called around kexec_load() in order to protect
the memory allocated for crash dump kernel once after it's loaded.

The protection is implemented here by unmapping the region rather than
making it read-only.
To make the things work correctly, we also have to
- put the region in an isolated, page-level mapping initially, and
- move copying kexec's control_code_page to machine_kexec_prepare()

Note that page-level mapping is also required to allow for shrinking
the size of memory, through /sys/kernel/kexec_crash_size, by any number
of multiple pages.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/kernel/machine_kexec.c | 65 ++++++++++++++++++++++++++-------------
 arch/arm64/mm/mmu.c               | 35 +++++++++++++++++++++
 2 files changed, 79 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..016f2dd693aa 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,6 +14,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
+#include <asm/mmu.h>
 #include <asm/mmu_context.h>
 
 #include "cpu-reset.h"
@@ -22,8 +23,6 @@
 extern const unsigned char arm64_relocate_new_kernel[];
 extern const unsigned long arm64_relocate_new_kernel_size;
 
-static unsigned long kimage_start;
-
 /**
  * kexec_image_info - For debugging output.
  */
@@ -64,7 +63,7 @@ void machine_kexec_cleanup(struct kimage *kimage)
  */
 int machine_kexec_prepare(struct kimage *kimage)
 {
-	kimage_start = kimage->start;
+	void *reboot_code_buffer;
 
 	kexec_image_info(kimage);
 
@@ -73,6 +72,21 @@ int machine_kexec_prepare(struct kimage *kimage)
 		return -EBUSY;
 	}
 
+	reboot_code_buffer =
+			phys_to_virt(page_to_phys(kimage->control_code_page));
+
+	/*
+	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
+	 * after the kernel is shut down.
+	 */
+	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
+		arm64_relocate_new_kernel_size);
+
+	/* Flush the reboot_code_buffer in preparation for its execution. */
+	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
+	flush_icache_range((uintptr_t)reboot_code_buffer,
+		arm64_relocate_new_kernel_size);
+
 	return 0;
 }
 
@@ -143,7 +157,6 @@ static void kexec_segment_flush(const struct kimage *kimage)
 void machine_kexec(struct kimage *kimage)
 {
 	phys_addr_t reboot_code_buffer_phys;
-	void *reboot_code_buffer;
 
 	/*
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
@@ -151,7 +164,6 @@ void machine_kexec(struct kimage *kimage)
 	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
-	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
 
 	kexec_image_info(kimage);
 
@@ -159,31 +171,17 @@ void machine_kexec(struct kimage *kimage)
 		kimage->control_code_page);
 	pr_debug("%s:%d: reboot_code_buffer_phys:  %pa\n", __func__, __LINE__,
 		&reboot_code_buffer_phys);
-	pr_debug("%s:%d: reboot_code_buffer:       %p\n", __func__, __LINE__,
-		reboot_code_buffer);
 	pr_debug("%s:%d: relocate_new_kernel:      %p\n", __func__, __LINE__,
 		arm64_relocate_new_kernel);
 	pr_debug("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n",
 		__func__, __LINE__, arm64_relocate_new_kernel_size,
 		arm64_relocate_new_kernel_size);
 
-	/*
-	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
-	 * after the kernel is shut down.
-	 */
-	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
-		arm64_relocate_new_kernel_size);
-
-	/* Flush the reboot_code_buffer in preparation for its execution. */
-	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
-	flush_icache_range((uintptr_t)reboot_code_buffer,
-		arm64_relocate_new_kernel_size);
-
 	/* Flush the kimage list and its buffers. */
 	kexec_list_flush(kimage);
 
 	/* Flush the new image if already in place. */
-	if (kimage->head & IND_DONE)
+	if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
 		kexec_segment_flush(kimage);
 
 	pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
 	 */
 
 	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
-		kimage_start, 0);
+		kimage->start, 0);
 
 	BUG(); /* Should never get here. */
 }
@@ -210,3 +208,28 @@ void machine_crash_shutdown(struct pt_regs *regs)
 {
 	/* Empty routine needed to avoid build errors. */
 }
+
+void arch_kexec_protect_crashkres(void)
+{
+	kexec_segment_flush(kexec_crash_image);
+
+	remove_pgd_mapping(&init_mm, __phys_to_virt(crashk_res.start),
+			resource_size(&crashk_res));
+
+	flush_tlb_all();
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+	/*
+	 * We don't have to make page-level mappings here because
+	 * the crash dump kernel memory is not allowed to be shrunk
+	 * once the kernel is loaded.
+	 */
+	create_pgd_mapping(&init_mm, crashk_res.start,
+			__phys_to_virt(crashk_res.start),
+			resource_size(&crashk_res), PAGE_KERNEL,
+			debug_pagealloc_enabled());
+
+	flush_tlb_all();
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9d3cea1db3b4..87861e62316a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
 #include <linux/libfdt.h>
 #include <linux/mman.h>
 #include <linux/nodemask.h>
@@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
 		if (memblock_is_nomap(reg))
 			continue;
 
+#ifdef CONFIG_KEXEC_CORE
+		/*
+		 * While crash dump kernel memory is contained in a single
+		 * memblock for now, it should appear in an isolated mapping
+		 * so that we can independently unmap the region later.
+		 */
+		if (crashk_res.end &&
+		    (start <= crashk_res.start) &&
+		    ((crashk_res.end + 1) < end)) {
+			if (crashk_res.start != start)
+				__map_memblock(pgd, start, crashk_res.start);
+
+			if ((crashk_res.end + 1) < end)
+				__map_memblock(pgd, crashk_res.end + 1, end);
+
+			continue;
+		}
+#endif
 		__map_memblock(pgd, start, end);
 	}
 }
@@ -623,6 +643,21 @@ static void __init map_kernel(pgd_t *pgd)
 	kasan_copy_shadow(pgd);
 }
 
+#ifdef CONFIG_KEXEC_CORE
+static int __init map_crashkernel(void)
+{
+	/* page-level mapping only to allow for shrinking */
+	if (crashk_res.end)
+		create_pgd_mapping(&init_mm, crashk_res.start,
+				   __phys_to_virt(crashk_res.start),
+				   resource_size(&crashk_res), PAGE_KERNEL,
+				   true);
+
+	return 0;
+}
+subsys_initcall(map_crashkernel);
+#endif
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps and sets up the zero page.
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 06/12] arm64: hibernate: preserve kdump image around hibernation
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

Since arch_kexec_protect_crashkres() removes a mapping for crash dump
kernel memory, the loaded contents won't be preserved around hibernation.

In this patch, arch_kexec_(un)protect_crashkres() are additionally called
before/after hibernation so that the relevant region will be mapped again
and restored just as the other memory regions are.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/kernel/hibernate.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index fe301cbcb442..111a849333ee 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -16,6 +16,7 @@
  */
 #define pr_fmt(x) "hibernate: " x
 #include <linux/cpu.h>
+#include <linux/kexec.h>
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <linux/pm.h>
@@ -289,6 +290,12 @@ int swsusp_arch_suspend(void)
 	local_dbg_save(flags);
 
 	if (__cpu_suspend_enter(&state)) {
+#ifdef CONFIG_KEXEC_CORE
+		/* make the crash dump kernel region mapped */
+		if (kexec_crash_image)
+			arch_kexec_unprotect_crashkres();
+#endif
+
 		sleep_cpu = smp_processor_id();
 		ret = swsusp_save();
 	} else {
@@ -300,6 +307,12 @@ int swsusp_arch_suspend(void)
 		if (el2_reset_needed())
 			dcache_clean_range(__hyp_idmap_text_start, __hyp_idmap_text_end);
 
+#ifdef CONFIG_KEXEC_CORE
+		/* make the crash dump kernel region unmapped */
+		if (kexec_crash_image)
+			arch_kexec_protect_crashkres();
+#endif
+
 		/*
 		 * Tell the hibernation core that we've just restored
 		 * the memory
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 06/12] arm64: hibernate: preserve kdump image around hibernation
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

Since arch_kexec_protect_crashkres() removes a mapping for crash dump
kernel memory, the loaded contents won't be preserved around hibernation.

In this patch, arch_kexec_(un)protect_crashkres() are additionally called
before/after hibernation so that the relevant region will be mapped again
and restored just as the other memory regions are.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/kernel/hibernate.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index fe301cbcb442..111a849333ee 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -16,6 +16,7 @@
  */
 #define pr_fmt(x) "hibernate: " x
 #include <linux/cpu.h>
+#include <linux/kexec.h>
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <linux/pm.h>
@@ -289,6 +290,12 @@ int swsusp_arch_suspend(void)
 	local_dbg_save(flags);
 
 	if (__cpu_suspend_enter(&state)) {
+#ifdef CONFIG_KEXEC_CORE
+		/* make the crash dump kernel region mapped */
+		if (kexec_crash_image)
+			arch_kexec_unprotect_crashkres();
+#endif
+
 		sleep_cpu = smp_processor_id();
 		ret = swsusp_save();
 	} else {
@@ -300,6 +307,12 @@ int swsusp_arch_suspend(void)
 		if (el2_reset_needed())
 			dcache_clean_range(__hyp_idmap_text_start, __hyp_idmap_text_end);
 
+#ifdef CONFIG_KEXEC_CORE
+		/* make the crash dump kernel region unmapped */
+		if (kexec_crash_image)
+			arch_kexec_protect_crashkres();
+#endif
+
 		/*
 		 * Tell the hibernation core that we've just restored
 		 * the memory
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 07/12] arm64: kdump: implement machine_crash_shutdown()
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.

As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/hardirq.h  |  2 +-
 arch/arm64/include/asm/kexec.h    | 42 +++++++++++++++++++++++++-
 arch/arm64/include/asm/smp.h      |  2 ++
 arch/arm64/kernel/machine_kexec.c | 55 +++++++++++++++++++++++++++++++---
 arch/arm64/kernel/smp.c           | 63 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 158 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297dac77..1473fc2f7ab7 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
 #include <linux/threads.h>
 #include <asm/irq.h>
 
-#define NR_IPI	6
+#define NR_IPI	7
 
 typedef struct {
 	unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc5fb61..f40ace1fa21a 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,47 @@
 static inline void crash_setup_regs(struct pt_regs *newregs,
 				    struct pt_regs *oldregs)
 {
-	/* Empty routine needed to avoid build errors. */
+	if (oldregs) {
+		memcpy(newregs, oldregs, sizeof(*newregs));
+	} else {
+		u64 tmp1, tmp2;
+
+		__asm__ __volatile__ (
+			"stp	 x0,   x1, [%2, #16 *  0]\n"
+			"stp	 x2,   x3, [%2, #16 *  1]\n"
+			"stp	 x4,   x5, [%2, #16 *  2]\n"
+			"stp	 x6,   x7, [%2, #16 *  3]\n"
+			"stp	 x8,   x9, [%2, #16 *  4]\n"
+			"stp	x10,  x11, [%2, #16 *  5]\n"
+			"stp	x12,  x13, [%2, #16 *  6]\n"
+			"stp	x14,  x15, [%2, #16 *  7]\n"
+			"stp	x16,  x17, [%2, #16 *  8]\n"
+			"stp	x18,  x19, [%2, #16 *  9]\n"
+			"stp	x20,  x21, [%2, #16 * 10]\n"
+			"stp	x22,  x23, [%2, #16 * 11]\n"
+			"stp	x24,  x25, [%2, #16 * 12]\n"
+			"stp	x26,  x27, [%2, #16 * 13]\n"
+			"stp	x28,  x29, [%2, #16 * 14]\n"
+			"mov	 %0,  sp\n"
+			"stp	x30,  %0,  [%2, #16 * 15]\n"
+
+			"/* faked current PSTATE */\n"
+			"mrs	 %0, CurrentEL\n"
+			"mrs	 %1, SPSEL\n"
+			"orr	 %0, %0, %1\n"
+			"mrs	 %1, DAIF\n"
+			"orr	 %0, %0, %1\n"
+			"mrs	 %1, NZCV\n"
+			"orr	 %0, %0, %1\n"
+			/* pc */
+			"adr	 %1, 1f\n"
+		"1:\n"
+			"stp	 %1, %0,   [%2, #16 * 16]\n"
+			: "=&r" (tmp1), "=&r" (tmp2)
+			: "r" (newregs)
+			: "memory"
+		);
+	}
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index d050d720a1b4..cea009f2657d 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,6 +148,8 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
+extern void smp_send_crash_stop(void);
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 016f2dd693aa..c61dd7b7dca0 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
 #include <linux/kexec.h>
 #include <linux/smp.h>
 
@@ -161,7 +164,8 @@ void machine_kexec(struct kimage *kimage)
 	/*
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
 	 */
-	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+	BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
+			!WARN_ON(kimage == kexec_crash_image));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
 
@@ -198,15 +202,58 @@ void machine_kexec(struct kimage *kimage)
 	 * relocation is complete.
 	 */
 
-	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
-		kimage->start, 0);
+	cpu_soft_restart(kimage != kexec_crash_image,
+		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
 
 	BUG(); /* Should never get here. */
 }
 
+static void machine_kexec_mask_interrupts(void)
+{
+	unsigned int i;
+	struct irq_desc *desc;
+
+	for_each_irq_desc(i, desc) {
+		struct irq_chip *chip;
+		int ret;
+
+		chip = irq_desc_get_chip(desc);
+		if (!chip)
+			continue;
+
+		/*
+		 * First try to remove the active state. If this
+		 * fails, try to EOI the interrupt.
+		 */
+		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+		    chip->irq_eoi)
+			chip->irq_eoi(&desc->irq_data);
+
+		if (chip->irq_mask)
+			chip->irq_mask(&desc->irq_data);
+
+		if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+			chip->irq_disable(&desc->irq_data);
+	}
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-	/* Empty routine needed to avoid build errors. */
+	local_irq_disable();
+
+	/* shutdown non-crashing cpus */
+	smp_send_crash_stop();
+
+	/* for crashing cpu */
+	crash_save_cpu(regs, smp_processor_id());
+	machine_kexec_mask_interrupts();
+
+	pr_info("Starting crashdump kernel...\n");
 }
 
 void arch_kexec_protect_crashkres(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index cb87234cfcf2..446c6d48f8ec 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
 #include <linux/completion.h>
 #include <linux/of.h>
 #include <linux/irq_work.h>
+#include <linux/kexec.h>
 
 #include <asm/alternative.h>
 #include <asm/atomic.h>
@@ -74,6 +75,7 @@ enum ipi_msg_type {
 	IPI_RESCHEDULE,
 	IPI_CALL_FUNC,
 	IPI_CPU_STOP,
+	IPI_CPU_CRASH_STOP,
 	IPI_TIMER,
 	IPI_IRQ_WORK,
 	IPI_WAKEUP
@@ -753,6 +755,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
 	S(IPI_RESCHEDULE, "Rescheduling interrupts"),
 	S(IPI_CALL_FUNC, "Function call interrupts"),
 	S(IPI_CPU_STOP, "CPU stop interrupts"),
+	S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
 	S(IPI_TIMER, "Timer broadcast interrupts"),
 	S(IPI_IRQ_WORK, "IRQ work interrupts"),
 	S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -827,6 +830,29 @@ static void ipi_cpu_stop(unsigned int cpu)
 		cpu_relax();
 }
 
+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi;
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+	crash_save_cpu(regs, cpu);
+
+	atomic_dec(&waiting_for_crash_ipi);
+
+	local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+	if (cpu_ops[cpu]->cpu_die)
+		cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+	/* just in case */
+	cpu_park_loop();
+#endif
+}
+
 /*
  * Main handler for inter-processor interrupts
  */
@@ -857,6 +883,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		irq_exit();
 		break;
 
+	case IPI_CPU_CRASH_STOP:
+		if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+			irq_enter();
+			ipi_cpu_crash_stop(cpu, regs);
+
+			unreachable();
+		}
+		break;
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 	case IPI_TIMER:
 		irq_enter();
@@ -929,6 +964,34 @@ void smp_send_stop(void)
 			   cpumask_pr_args(cpu_online_mask));
 }
 
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+	cpumask_t mask;
+	unsigned long timeout;
+
+	if (num_online_cpus() == 1)
+		return;
+
+	cpumask_copy(&mask, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), &mask);
+
+	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+	pr_crit("SMP: stopping secondary CPUs\n");
+	smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+	/* Wait up to one second for other CPUs to stop */
+	timeout = USEC_PER_SEC;
+	while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+		udelay(1);
+
+	if (atomic_read(&waiting_for_crash_ipi) > 0)
+		pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+			   cpumask_pr_args(cpu_online_mask));
+}
+#endif
+
 /*
  * not supported here
  */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 07/12] arm64: kdump: implement machine_crash_shutdown()
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.

As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/hardirq.h  |  2 +-
 arch/arm64/include/asm/kexec.h    | 42 +++++++++++++++++++++++++-
 arch/arm64/include/asm/smp.h      |  2 ++
 arch/arm64/kernel/machine_kexec.c | 55 +++++++++++++++++++++++++++++++---
 arch/arm64/kernel/smp.c           | 63 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 158 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297dac77..1473fc2f7ab7 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
 #include <linux/threads.h>
 #include <asm/irq.h>
 
-#define NR_IPI	6
+#define NR_IPI	7
 
 typedef struct {
 	unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc5fb61..f40ace1fa21a 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,47 @@
 static inline void crash_setup_regs(struct pt_regs *newregs,
 				    struct pt_regs *oldregs)
 {
-	/* Empty routine needed to avoid build errors. */
+	if (oldregs) {
+		memcpy(newregs, oldregs, sizeof(*newregs));
+	} else {
+		u64 tmp1, tmp2;
+
+		__asm__ __volatile__ (
+			"stp	 x0,   x1, [%2, #16 *  0]\n"
+			"stp	 x2,   x3, [%2, #16 *  1]\n"
+			"stp	 x4,   x5, [%2, #16 *  2]\n"
+			"stp	 x6,   x7, [%2, #16 *  3]\n"
+			"stp	 x8,   x9, [%2, #16 *  4]\n"
+			"stp	x10,  x11, [%2, #16 *  5]\n"
+			"stp	x12,  x13, [%2, #16 *  6]\n"
+			"stp	x14,  x15, [%2, #16 *  7]\n"
+			"stp	x16,  x17, [%2, #16 *  8]\n"
+			"stp	x18,  x19, [%2, #16 *  9]\n"
+			"stp	x20,  x21, [%2, #16 * 10]\n"
+			"stp	x22,  x23, [%2, #16 * 11]\n"
+			"stp	x24,  x25, [%2, #16 * 12]\n"
+			"stp	x26,  x27, [%2, #16 * 13]\n"
+			"stp	x28,  x29, [%2, #16 * 14]\n"
+			"mov	 %0,  sp\n"
+			"stp	x30,  %0,  [%2, #16 * 15]\n"
+
+			"/* faked current PSTATE */\n"
+			"mrs	 %0, CurrentEL\n"
+			"mrs	 %1, SPSEL\n"
+			"orr	 %0, %0, %1\n"
+			"mrs	 %1, DAIF\n"
+			"orr	 %0, %0, %1\n"
+			"mrs	 %1, NZCV\n"
+			"orr	 %0, %0, %1\n"
+			/* pc */
+			"adr	 %1, 1f\n"
+		"1:\n"
+			"stp	 %1, %0,   [%2, #16 * 16]\n"
+			: "=&r" (tmp1), "=&r" (tmp2)
+			: "r" (newregs)
+			: "memory"
+		);
+	}
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index d050d720a1b4..cea009f2657d 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,6 +148,8 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
+extern void smp_send_crash_stop(void);
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 016f2dd693aa..c61dd7b7dca0 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
 #include <linux/kexec.h>
 #include <linux/smp.h>
 
@@ -161,7 +164,8 @@ void machine_kexec(struct kimage *kimage)
 	/*
 	 * New cpus may have become stuck_in_kernel after we loaded the image.
 	 */
-	BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+	BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
+			!WARN_ON(kimage == kexec_crash_image));
 
 	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
 
@@ -198,15 +202,58 @@ void machine_kexec(struct kimage *kimage)
 	 * relocation is complete.
 	 */
 
-	cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
-		kimage->start, 0);
+	cpu_soft_restart(kimage != kexec_crash_image,
+		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
 
 	BUG(); /* Should never get here. */
 }
 
+static void machine_kexec_mask_interrupts(void)
+{
+	unsigned int i;
+	struct irq_desc *desc;
+
+	for_each_irq_desc(i, desc) {
+		struct irq_chip *chip;
+		int ret;
+
+		chip = irq_desc_get_chip(desc);
+		if (!chip)
+			continue;
+
+		/*
+		 * First try to remove the active state. If this
+		 * fails, try to EOI the interrupt.
+		 */
+		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+		    chip->irq_eoi)
+			chip->irq_eoi(&desc->irq_data);
+
+		if (chip->irq_mask)
+			chip->irq_mask(&desc->irq_data);
+
+		if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+			chip->irq_disable(&desc->irq_data);
+	}
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-	/* Empty routine needed to avoid build errors. */
+	local_irq_disable();
+
+	/* shutdown non-crashing cpus */
+	smp_send_crash_stop();
+
+	/* for crashing cpu */
+	crash_save_cpu(regs, smp_processor_id());
+	machine_kexec_mask_interrupts();
+
+	pr_info("Starting crashdump kernel...\n");
 }
 
 void arch_kexec_protect_crashkres(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index cb87234cfcf2..446c6d48f8ec 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
 #include <linux/completion.h>
 #include <linux/of.h>
 #include <linux/irq_work.h>
+#include <linux/kexec.h>
 
 #include <asm/alternative.h>
 #include <asm/atomic.h>
@@ -74,6 +75,7 @@ enum ipi_msg_type {
 	IPI_RESCHEDULE,
 	IPI_CALL_FUNC,
 	IPI_CPU_STOP,
+	IPI_CPU_CRASH_STOP,
 	IPI_TIMER,
 	IPI_IRQ_WORK,
 	IPI_WAKEUP
@@ -753,6 +755,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
 	S(IPI_RESCHEDULE, "Rescheduling interrupts"),
 	S(IPI_CALL_FUNC, "Function call interrupts"),
 	S(IPI_CPU_STOP, "CPU stop interrupts"),
+	S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
 	S(IPI_TIMER, "Timer broadcast interrupts"),
 	S(IPI_IRQ_WORK, "IRQ work interrupts"),
 	S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -827,6 +830,29 @@ static void ipi_cpu_stop(unsigned int cpu)
 		cpu_relax();
 }
 
+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi;
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+	crash_save_cpu(regs, cpu);
+
+	atomic_dec(&waiting_for_crash_ipi);
+
+	local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+	if (cpu_ops[cpu]->cpu_die)
+		cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+	/* just in case */
+	cpu_park_loop();
+#endif
+}
+
 /*
  * Main handler for inter-processor interrupts
  */
@@ -857,6 +883,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		irq_exit();
 		break;
 
+	case IPI_CPU_CRASH_STOP:
+		if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+			irq_enter();
+			ipi_cpu_crash_stop(cpu, regs);
+
+			unreachable();
+		}
+		break;
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 	case IPI_TIMER:
 		irq_enter();
@@ -929,6 +964,34 @@ void smp_send_stop(void)
 			   cpumask_pr_args(cpu_online_mask));
 }
 
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+	cpumask_t mask;
+	unsigned long timeout;
+
+	if (num_online_cpus() == 1)
+		return;
+
+	cpumask_copy(&mask, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), &mask);
+
+	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+	pr_crit("SMP: stopping secondary CPUs\n");
+	smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+	/* Wait up to one second for other CPUs to stop */
+	timeout = USEC_PER_SEC;
+	while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+		udelay(1);
+
+	if (atomic_read(&waiting_for_crash_ipi) > 0)
+		pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+			   cpumask_pr_args(cpu_online_mask));
+}
+#endif
+
 /*
  * not supported here
  */
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 08/12] arm64: kdump: add VMCOREINFO's for user-space tools
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

In addition to common VMCOREINFO's defined in
crash_save_vmcoreinfo_init(), we need to know, for crash utility,
  - kimage_voffset
  - PHYS_OFFSET
to examine the contents of a dump file (/proc/vmcore) correctly
due to the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.

  - VA_BITS
is also required for makedumpfile command.

arch_crash_save_vmcoreinfo() appends them to the dump file.
More VMCOREINFO's may be added later.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/machine_kexec.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index c61dd7b7dca0..1fcd5483c1bb 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -17,6 +17,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
+#include <asm/memory.h>
 #include <asm/mmu.h>
 #include <asm/mmu_context.h>
 
@@ -280,3 +281,13 @@ void arch_kexec_unprotect_crashkres(void)
 
 	flush_tlb_all();
 }
+
+void arch_crash_save_vmcoreinfo(void)
+{
+	VMCOREINFO_NUMBER(VA_BITS);
+	/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
+	vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
+						kimage_voffset);
+	vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
+						PHYS_OFFSET);
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 08/12] arm64: kdump: add VMCOREINFO's for user-space tools
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

In addition to common VMCOREINFO's defined in
crash_save_vmcoreinfo_init(), we need to know, for crash utility,
  - kimage_voffset
  - PHYS_OFFSET
to examine the contents of a dump file (/proc/vmcore) correctly
due to the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.

  - VA_BITS
is also required for makedumpfile command.

arch_crash_save_vmcoreinfo() appends them to the dump file.
More VMCOREINFO's may be added later.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/machine_kexec.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index c61dd7b7dca0..1fcd5483c1bb 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -17,6 +17,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
+#include <asm/memory.h>
 #include <asm/mmu.h>
 #include <asm/mmu_context.h>
 
@@ -280,3 +281,13 @@ void arch_kexec_unprotect_crashkres(void)
 
 	flush_tlb_all();
 }
+
+void arch_crash_save_vmcoreinfo(void)
+{
+	VMCOREINFO_NUMBER(VA_BITS);
+	/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
+	vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
+						kimage_voffset);
+	vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
+						PHYS_OFFSET);
+}
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

Add arch-specific functions to provide a dump file, /proc/vmcore.

This file is in ELF format and its ELF header needs to be prepared by
userspace tools, like kexec-tools, in adance. The primary kernel is
responsible to allocate the region with reserve_elfcorehdr() at boot time
and advertize its location to crash dump kernel via a new device-tree
property, "linux,elfcorehdr".

Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing it
since it does not reside in linear mapping on crash dump kernel.

We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig             | 11 +++++++
 arch/arm64/kernel/Makefile     |  1 +
 arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/mm/init.c           | 53 +++++++++++++++++++++++++++++++
 4 files changed, 136 insertions(+)
 create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 111742126897..2bd6a1a062b9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -693,6 +693,17 @@ config KEXEC
 	  but it is independent of the system firmware.   And like a reboot
 	  you can start any kernel with it, not just Linux.
 
+config CRASH_DUMP
+	bool "Build kdump crash kernel"
+	help
+	  Generate crash dump after being started by kexec. This should
+	  be normally only set in special crash dump kernels which are
+	  loaded in the main kernel with kexec-tools into a specially
+	  reserved region and then later executed after a crash by
+	  kdump/kexec.
+
+	  For more details see Documentation/kdump/kdump.txt
+
 config XEN_DOM0
 	def_bool y
 	depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 7d66bbaafc0c..6a7384eee08d 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
 
 obj-y					+= $(arm64-obj-y) vdso/ probes/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 000000000000..c3d5a21c081e
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+			 size_t csize, unsigned long offset,
+			 int userbuf)
+{
+	void *vaddr;
+
+	if (!csize)
+		return 0;
+
+	vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+	if (!vaddr)
+		return -ENOMEM;
+
+	if (userbuf) {
+		if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
+			memunmap(vaddr);
+			return -EFAULT;
+		}
+	} else {
+		memcpy(buf, vaddr + offset, csize);
+	}
+
+	memunmap(vaddr);
+
+	return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+	memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+	return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 2aba75dc7720..323b87197e18 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -38,6 +38,7 @@
 #include <linux/swiotlb.h>
 #include <linux/vmalloc.h>
 #include <linux/kexec.h>
+#include <linux/crash_dump.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -135,6 +136,56 @@ static void __init reserve_crashkernel(void)
 }
 #endif /* CONFIG_KEXEC_CORE */
 
+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	const __be32 *reg;
+	int len;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves elf core header given in "elfcorehdr=" kernel
+ * command line parameter. This region contains all the information about
+ * primary kernel's core image and is used by a dump capture kernel to
+ * access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
+	if (!elfcorehdr_size)
+		return;
+
+	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+		pr_warn("elfcorehdr is overlapped\n");
+		return;
+	}
+
+	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
+
+	pr_info("Reserving %lldKB of memory@0x%llx for elfcorehdr\n",
+		elfcorehdr_size >> 10, elfcorehdr_addr);
+}
+#else
+static void __init reserve_elfcorehdr(void)
+{
+}
+#endif /* CONFIG_CRASH_DUMP */
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -393,6 +444,8 @@ void __init arm64_memblock_init(void)
 
 	reserve_crashkernel();
 
+	reserve_elfcorehdr();
+
 	dma_contiguous_reserve(arm64_dma_phys_limit);
 
 	memblock_allow_resize();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

Add arch-specific functions to provide a dump file, /proc/vmcore.

This file is in ELF format and its ELF header needs to be prepared by
userspace tools, like kexec-tools, in adance. The primary kernel is
responsible to allocate the region with reserve_elfcorehdr() at boot time
and advertize its location to crash dump kernel via a new device-tree
property, "linux,elfcorehdr".

Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing it
since it does not reside in linear mapping on crash dump kernel.

We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig             | 11 +++++++
 arch/arm64/kernel/Makefile     |  1 +
 arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/mm/init.c           | 53 +++++++++++++++++++++++++++++++
 4 files changed, 136 insertions(+)
 create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 111742126897..2bd6a1a062b9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -693,6 +693,17 @@ config KEXEC
 	  but it is independent of the system firmware.   And like a reboot
 	  you can start any kernel with it, not just Linux.
 
+config CRASH_DUMP
+	bool "Build kdump crash kernel"
+	help
+	  Generate crash dump after being started by kexec. This should
+	  be normally only set in special crash dump kernels which are
+	  loaded in the main kernel with kexec-tools into a specially
+	  reserved region and then later executed after a crash by
+	  kdump/kexec.
+
+	  For more details see Documentation/kdump/kdump.txt
+
 config XEN_DOM0
 	def_bool y
 	depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 7d66bbaafc0c..6a7384eee08d 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
 
 obj-y					+= $(arm64-obj-y) vdso/ probes/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 000000000000..c3d5a21c081e
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+			 size_t csize, unsigned long offset,
+			 int userbuf)
+{
+	void *vaddr;
+
+	if (!csize)
+		return 0;
+
+	vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+	if (!vaddr)
+		return -ENOMEM;
+
+	if (userbuf) {
+		if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
+			memunmap(vaddr);
+			return -EFAULT;
+		}
+	} else {
+		memcpy(buf, vaddr + offset, csize);
+	}
+
+	memunmap(vaddr);
+
+	return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+	memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+	return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 2aba75dc7720..323b87197e18 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -38,6 +38,7 @@
 #include <linux/swiotlb.h>
 #include <linux/vmalloc.h>
 #include <linux/kexec.h>
+#include <linux/crash_dump.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
@@ -135,6 +136,56 @@ static void __init reserve_crashkernel(void)
 }
 #endif /* CONFIG_KEXEC_CORE */
 
+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+	const __be32 *reg;
+	int len;
+
+	if (depth != 1 || strcmp(uname, "chosen") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+		return 1;
+
+	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+	return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves elf core header given in "elfcorehdr=" kernel
+ * command line parameter. This region contains all the information about
+ * primary kernel's core image and is used by a dump capture kernel to
+ * access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
+	if (!elfcorehdr_size)
+		return;
+
+	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+		pr_warn("elfcorehdr is overlapped\n");
+		return;
+	}
+
+	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
+
+	pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
+		elfcorehdr_size >> 10, elfcorehdr_addr);
+}
+#else
+static void __init reserve_elfcorehdr(void)
+{
+}
+#endif /* CONFIG_CRASH_DUMP */
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -393,6 +444,8 @@ void __init arm64_memblock_init(void)
 
 	reserve_crashkernel();
 
+	reserve_elfcorehdr();
+
 	dma_contiguous_reserve(arm64_dma_phys_limit);
 
 	memblock_allow_resize();
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 10/12] arm64: kdump: enable kdump in defconfig
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

Kdump is enabled by default as kexec is.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 33b744d54739..94c2ea523a8a 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -79,6 +79,7 @@ CONFIG_CMA=y
 CONFIG_SECCOMP=y
 CONFIG_XEN=y
 CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_COMPAT=y
 CONFIG_CPU_IDLE=y
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 10/12] arm64: kdump: enable kdump in defconfig
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

Kdump is enabled by default as kexec is.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 33b744d54739..94c2ea523a8a 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -79,6 +79,7 @@ CONFIG_CMA=y
 CONFIG_SECCOMP=y
 CONFIG_XEN=y
 CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_COMPAT=y
 CONFIG_CPU_IDLE=y
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 11/12] Documentation: kdump: describe arm64 port
  2017-02-01 12:42 ` AKASHI Takahiro
@ 2017-02-01 12:46   ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

Add arch specific descriptions about kdump usage on arm64 to kdump.txt.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 Documentation/kdump/kdump.txt | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index b0eb27b956d9..615434d81108 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
 a remote system.
 
 Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.
 
 When the system kernel boots, it reserves a small section of memory for
 the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)
 
     AUTO_ZRELADDR=y
 
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+  on non-VHE systems even if it is configured. This is because the CPU
+  will not be reset to EL2 on panic.
+
 Extended crashkernel syntax
 ===========================
 
@@ -305,6 +312,8 @@ Boot into System Kernel
    kernel will automatically locate the crash kernel image within the
    first 512MB of RAM if X is not given.
 
+   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
 
 Load the Dump-capture Kernel
 ============================
@@ -327,6 +336,8 @@ For s390x:
 	- Use image or bzImage
 For arm:
 	- Use zImage
+For arm64:
+	- Use vmlinux or Image
 
 If you are using a uncompressed vmlinux image then use following command
 to load dump-capture kernel.
@@ -370,6 +381,9 @@ For s390x:
 For arm:
 	"1 maxcpus=1 reset_devices"
 
+For arm64:
+	"1 maxcpus=1 reset_devices"
+
 Notes on loading the dump-capture kernel:
 
 * By default, the ELF headers are stored in ELF64 format to support
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 11/12] Documentation: kdump: describe arm64 port
@ 2017-02-01 12:46   ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:46 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: mark.rutland, geoff, kexec, AKASHI Takahiro, james.morse,
	bauerman, dyoung, linux-arm-kernel

Add arch specific descriptions about kdump usage on arm64 to kdump.txt.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 Documentation/kdump/kdump.txt | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index b0eb27b956d9..615434d81108 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
 a remote system.
 
 Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.
 
 When the system kernel boots, it reserves a small section of memory for
 the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)
 
     AUTO_ZRELADDR=y
 
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+  on non-VHE systems even if it is configured. This is because the CPU
+  will not be reset to EL2 on panic.
+
 Extended crashkernel syntax
 ===========================
 
@@ -305,6 +312,8 @@ Boot into System Kernel
    kernel will automatically locate the crash kernel image within the
    first 512MB of RAM if X is not given.
 
+   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
 
 Load the Dump-capture Kernel
 ============================
@@ -327,6 +336,8 @@ For s390x:
 	- Use image or bzImage
 For arm:
 	- Use zImage
+For arm64:
+	- Use vmlinux or Image
 
 If you are using a uncompressed vmlinux image then use following command
 to load dump-capture kernel.
@@ -370,6 +381,9 @@ For s390x:
 For arm:
 	"1 maxcpus=1 reset_devices"
 
+For arm64:
+	"1 maxcpus=1 reset_devices"
+
 Notes on loading the dump-capture kernel:
 
 * By default, the ELF headers are stored in ELF64 format to support
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 12/12] Documentation: dt: chosen properties for arm64 kdump
  2017-02-01 12:42 ` AKASHI Takahiro
  (?)
@ 2017-02-01 12:48     ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:48 UTC (permalink / raw)
  To: catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8
  Cc: james.morse-5wv7dgnIgG8, geoff-wEGCiKHe2LqWVfeAwA7xHQ,
	bauerman-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, AKASHI Takahiro

From: James Morse <james.morse-5wv7dgnIgG8@public.gmane.org>

Add documentation for DT properties:
	linux,usable-memory-range
	linux,elfcorehdr
used by arm64 kdump. Those decribe the usable memory range for crash dump
kernel and the elfcorehdr's location within it, respectively.

Signed-off-by: James Morse <james.morse-5wv7dgnIgG8@public.gmane.org>
[takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org: added "linux,crashkernel-base" and "-size" ]
Signed-off-by: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Acked-by: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>
Cc: devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Rob Herring <robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 Documentation/devicetree/bindings/chosen.txt | 37 ++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82d4c37..8dc82431acc1 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,40 @@ This property is set (currently only on PowerPC, and only needed on
 book3e) by some versions of kexec-tools to tell the new kernel that it
 is being booted by kexec, as the booting environment may differ (e.g.
 a different secondary CPU release mechanism)
+
+linux,usable-memory-range
+-------------------------
+
+This property (arm64 only) holds a base address and size, describing a
+limited region in which memory may be considered available for use by
+the kernel. Memory outside of this range is not available for use.
+
+This property describes a limitation: memory within this range is only
+valid when also described through another mechanism that the kernel
+would otherwise use to determine available memory (e.g. memory nodes
+or the EFI memory map). Valid memory may be sparse within the range.
+e.g.
+
+/ {
+	chosen {
+		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+	};
+};
+
+The main usage is for crash dump kernel to identify its own usable
+memory and exclude, at its boot time, any other memory areas that are
+part of the panicked kernel's memory.
+
+linux,elfcorehdr
+----------------
+
+This property (currently used only on arm64) holds the memory range,
+the address and the size, of the elf core header which mainly describes
+the panicked kernel's memory layout as PT_LOAD segments of elf format.
+e.g.
+
+/ {
+	chosen {
+		linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+	};
+};
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 12/12] Documentation: dt: chosen properties for arm64 kdump
@ 2017-02-01 12:48     ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:48 UTC (permalink / raw)
  To: linux-arm-kernel

From: James Morse <james.morse@arm.com>

Add documentation for DT properties:
	linux,usable-memory-range
	linux,elfcorehdr
used by arm64 kdump. Those decribe the usable memory range for crash dump
kernel and the elfcorehdr's location within it, respectively.

Signed-off-by: James Morse <james.morse@arm.com>
[takahiro.akashi at linaro.org: added "linux,crashkernel-base" and "-size" ]
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: devicetree at vger.kernel.org
Cc: Rob Herring <robh+dt@kernel.org>
---
 Documentation/devicetree/bindings/chosen.txt | 37 ++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82d4c37..8dc82431acc1 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,40 @@ This property is set (currently only on PowerPC, and only needed on
 book3e) by some versions of kexec-tools to tell the new kernel that it
 is being booted by kexec, as the booting environment may differ (e.g.
 a different secondary CPU release mechanism)
+
+linux,usable-memory-range
+-------------------------
+
+This property (arm64 only) holds a base address and size, describing a
+limited region in which memory may be considered available for use by
+the kernel. Memory outside of this range is not available for use.
+
+This property describes a limitation: memory within this range is only
+valid when also described through another mechanism that the kernel
+would otherwise use to determine available memory (e.g. memory nodes
+or the EFI memory map). Valid memory may be sparse within the range.
+e.g.
+
+/ {
+	chosen {
+		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+	};
+};
+
+The main usage is for crash dump kernel to identify its own usable
+memory and exclude, at its boot time, any other memory areas that are
+part of the panicked kernel's memory.
+
+linux,elfcorehdr
+----------------
+
+This property (currently used only on arm64) holds the memory range,
+the address and the size, of the elf core header which mainly describes
+the panicked kernel's memory layout as PT_LOAD segments of elf format.
+e.g.
+
+/ {
+	chosen {
+		linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+	};
+};
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 12/12] Documentation: dt: chosen properties for arm64 kdump
@ 2017-02-01 12:48     ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-01 12:48 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, robh+dt, mark.rutland
  Cc: devicetree, geoff, kexec, AKASHI Takahiro, james.morse, bauerman,
	dyoung, linux-arm-kernel

From: James Morse <james.morse@arm.com>

Add documentation for DT properties:
	linux,usable-memory-range
	linux,elfcorehdr
used by arm64 kdump. Those decribe the usable memory range for crash dump
kernel and the elfcorehdr's location within it, respectively.

Signed-off-by: James Morse <james.morse@arm.com>
[takahiro.akashi@linaro.org: added "linux,crashkernel-base" and "-size" ]
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: devicetree@vger.kernel.org
Cc: Rob Herring <robh+dt@kernel.org>
---
 Documentation/devicetree/bindings/chosen.txt | 37 ++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82d4c37..8dc82431acc1 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,40 @@ This property is set (currently only on PowerPC, and only needed on
 book3e) by some versions of kexec-tools to tell the new kernel that it
 is being booted by kexec, as the booting environment may differ (e.g.
 a different secondary CPU release mechanism)
+
+linux,usable-memory-range
+-------------------------
+
+This property (arm64 only) holds a base address and size, describing a
+limited region in which memory may be considered available for use by
+the kernel. Memory outside of this range is not available for use.
+
+This property describes a limitation: memory within this range is only
+valid when also described through another mechanism that the kernel
+would otherwise use to determine available memory (e.g. memory nodes
+or the EFI memory map). Valid memory may be sparse within the range.
+e.g.
+
+/ {
+	chosen {
+		linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+	};
+};
+
+The main usage is for crash dump kernel to identify its own usable
+memory and exclude, at its boot time, any other memory areas that are
+part of the panicked kernel's memory.
+
+linux,elfcorehdr
+----------------
+
+This property (currently used only on arm64) holds the memory range,
+the address and the size, of the elf core header which mainly describes
+the panicked kernel's memory layout as PT_LOAD segments of elf format.
+e.g.
+
+/ {
+	chosen {
+		linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+	};
+};
-- 
2.11.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 02/12] arm64: limit memory regions based on DT property, usable-memory-range
  2017-02-01 12:46   ` AKASHI Takahiro
@ 2017-02-01 15:07     ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 15:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, Feb 01, 2017 at 09:46:21PM +0900, AKASHI Takahiro wrote:
> +static int __init early_init_dt_scan_usablemem(unsigned long node,
> +		const char *uname, int depth, void *data)
> +{
> +	struct memblock_region *usablemem = (struct memblock_region *)data;

Nit: unnecessary cast.

> +	const __be32 *reg;
> +	int len;
> +
> +	usablemem->size = 0;

Could we please lift this assignment/initialisation into the caller...

> +
> +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> +		return 0;
> +
> +	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> +		return 1;
> +
> +	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> +	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> +
> +	return 1;
> +}
> +
> +static void __init fdt_enforce_memory_region(void)
> +{
> +	struct memblock_region reg;

... e.g. have:

	struct memblock_region reg = {
		.size = 0,
	};

That saves us from making unnecessary assignments to the size field, and
makes it clear that reg.size has definitely been initialised, regardless
of what of_scan_flat_dt() happens to do.

With that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> +
> +	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> +
> +	if (reg.size)
> +		memblock_cap_memory_range(reg.base, reg.size);
> +}
> +
>  void __init arm64_memblock_init(void)
>  {
>  	const s64 linear_region_size = -(s64)PAGE_OFFSET;
>  
> +	/* Handle linux,usable-memory-range property */
> +	fdt_enforce_memory_region();
> +
>  	/*
>  	 * Ensure that the linear region takes up exactly half of the kernel
>  	 * virtual address space. This way, we can distinguish a linear address
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 02/12] arm64: limit memory regions based on DT property, usable-memory-range
@ 2017-02-01 15:07     ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 15:07 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

Hi,

On Wed, Feb 01, 2017 at 09:46:21PM +0900, AKASHI Takahiro wrote:
> +static int __init early_init_dt_scan_usablemem(unsigned long node,
> +		const char *uname, int depth, void *data)
> +{
> +	struct memblock_region *usablemem = (struct memblock_region *)data;

Nit: unnecessary cast.

> +	const __be32 *reg;
> +	int len;
> +
> +	usablemem->size = 0;

Could we please lift this assignment/initialisation into the caller...

> +
> +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> +		return 0;
> +
> +	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> +		return 1;
> +
> +	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> +	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> +
> +	return 1;
> +}
> +
> +static void __init fdt_enforce_memory_region(void)
> +{
> +	struct memblock_region reg;

... e.g. have:

	struct memblock_region reg = {
		.size = 0,
	};

That saves us from making unnecessary assignments to the size field, and
makes it clear that reg.size has definitely been initialised, regardless
of what of_scan_flat_dt() happens to do.

With that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> +
> +	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> +
> +	if (reg.size)
> +		memblock_cap_memory_range(reg.base, reg.size);
> +}
> +
>  void __init arm64_memblock_init(void)
>  {
>  	const s64 linear_region_size = -(s64)PAGE_OFFSET;
>  
> +	/* Handle linux,usable-memory-range property */
> +	fdt_enforce_memory_region();
> +
>  	/*
>  	 * Ensure that the linear region takes up exactly half of the kernel
>  	 * virtual address space. This way, we can distinguish a linear address
> -- 
> 2.11.0
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
  2017-02-01 12:46   ` AKASHI Takahiro
@ 2017-02-01 15:26     ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, Feb 01, 2017 at 09:46:22PM +0900, AKASHI Takahiro wrote:
> +#ifdef CONFIG_KEXEC_CORE
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +static void __init reserve_crashkernel(void)
> +{
> +	unsigned long long crash_base, crash_size;
> +	int ret;
> +
> +	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> +				&crash_size, &crash_base);
> +	/* no crashkernel= or invalid value specified */
> +	if (ret || !crash_size)
> +		return;
> +
> +	crash_size = PAGE_ALIGN(crash_size);
> +
> +	if (crash_base == 0) {
> +		/* Current arm64 boot protocol requires 2MB alignment */
> +		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
> +				crash_size, SZ_2M);
> +		if (crash_base == 0) {
> +			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> +				crash_size);

Nit: can we please make that "size: 0x%llx", so that it's always clearly
a hex number? e.g.

	pr_warn("cannot allocate 0x%llx bytes for crashkernel\n",
		crash_size);

> +			return;
> +		}
> +	} else {
> +		/* User specifies base address explicitly. */
> +		if (!memblock_is_region_memory(crash_base, crash_size) ||
> +			memblock_is_region_reserved(crash_base, crash_size)) {
> +			pr_warn("crashkernel has wrong address or size\n");
> +			return;
> +		}

To better report the error, can we please split these up:

		if (!memblock_is_region_memory(crash_base, crash_size)) {
			pr_warn("cannot reserve crashkernel: region is not memory\n");
			return;
		}

		if (!memblock_is_region_memory(crash_base, crash_size)) {
			pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
			return;
		}

> +		if (!IS_ALIGNED(crash_base, SZ_2M)) {
> +			pr_warn("crashkernel base address is not 2MB aligned\n");
> +			return;
> +		}
> +	}
> +	memblock_reserve(crash_base, crash_size);
> +
> +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> +		crash_size >> 20, crash_base >> 20);

We only page-align the size, so the MB will be a little off, but that's
probably OK. However, it would also be nicer to log the base as an
address.

Could we dump this as we do for the kernel memory layout? e.g.

	pr_info("crashkernel reserved: 0x%016lx - 0x%016lx (%lld MB)\n",
		crash_base, crash_base + crash_size, crash_size >> 20);

With those:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
@ 2017-02-01 15:26     ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 15:26 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Pratyush Anand, geoff, catalin.marinas, will.deacon, james.morse,
	Mark Salter, bauerman, dyoung, kexec, linux-arm-kernel

Hi,

On Wed, Feb 01, 2017 at 09:46:22PM +0900, AKASHI Takahiro wrote:
> +#ifdef CONFIG_KEXEC_CORE
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +static void __init reserve_crashkernel(void)
> +{
> +	unsigned long long crash_base, crash_size;
> +	int ret;
> +
> +	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> +				&crash_size, &crash_base);
> +	/* no crashkernel= or invalid value specified */
> +	if (ret || !crash_size)
> +		return;
> +
> +	crash_size = PAGE_ALIGN(crash_size);
> +
> +	if (crash_base == 0) {
> +		/* Current arm64 boot protocol requires 2MB alignment */
> +		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
> +				crash_size, SZ_2M);
> +		if (crash_base == 0) {
> +			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> +				crash_size);

Nit: can we please make that "size: 0x%llx", so that it's always clearly
a hex number? e.g.

	pr_warn("cannot allocate 0x%llx bytes for crashkernel\n",
		crash_size);

> +			return;
> +		}
> +	} else {
> +		/* User specifies base address explicitly. */
> +		if (!memblock_is_region_memory(crash_base, crash_size) ||
> +			memblock_is_region_reserved(crash_base, crash_size)) {
> +			pr_warn("crashkernel has wrong address or size\n");
> +			return;
> +		}

To better report the error, can we please split these up:

		if (!memblock_is_region_memory(crash_base, crash_size)) {
			pr_warn("cannot reserve crashkernel: region is not memory\n");
			return;
		}

		if (!memblock_is_region_memory(crash_base, crash_size)) {
			pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
			return;
		}

> +		if (!IS_ALIGNED(crash_base, SZ_2M)) {
> +			pr_warn("crashkernel base address is not 2MB aligned\n");
> +			return;
> +		}
> +	}
> +	memblock_reserve(crash_base, crash_size);
> +
> +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> +		crash_size >> 20, crash_base >> 20);

We only page-align the size, so the MB will be a little off, but that's
probably OK. However, it would also be nicer to log the base as an
address.

Could we dump this as we do for the kernel memory layout? e.g.

	pr_info("crashkernel reserved: 0x%016lx - 0x%016lx (%lld MB)\n",
		crash_base, crash_base + crash_size, crash_size >> 20);

With those:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-01 12:46   ` AKASHI Takahiro
@ 2017-02-01 16:03     ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> A new function, remove_pgd_mapping(), is added.
> It allows us to unmap a specific portion of kernel mapping later as far as
> the mapping is made using create_pgd_mapping() and unless we try to free
> a sub-set of memory range within a section mapping.

I'm not keen on adding more page table modification code. It was painful
enough to ensure that those worked in all configurations.

Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
and use an invalid prot (i.e. 0), what is the problem?

I can see that we wouldn't free/reallocate the pud or pmd entries, but
the "wasted" memory should be small. If anything, I'd argue that it's
preferable to keep that around so that we don't have to allocate memory
when we need to map the crashkernel region.

Is there another problem I'm missing?

> +static bool pgtable_is_cleared(void *pgtable)
> +{
> +	unsigned long *desc, *end;
> +
> +	for (desc = pgtable, end = (unsigned long *)(pgtable + PAGE_SIZE);
> +	     desc < end; desc++)
> +		if (*desc)
> +			return false;
> +
> +	return true;
> +}

If nothing else, we should be using the normal accessors for this, e.g.
{pte,pmd,pud}_none().

As above, I'd very much like to reuse our existing page table routines
if we can.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-01 16:03     ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 16:03 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

Hi,

On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> A new function, remove_pgd_mapping(), is added.
> It allows us to unmap a specific portion of kernel mapping later as far as
> the mapping is made using create_pgd_mapping() and unless we try to free
> a sub-set of memory range within a section mapping.

I'm not keen on adding more page table modification code. It was painful
enough to ensure that those worked in all configurations.

Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
and use an invalid prot (i.e. 0), what is the problem?

I can see that we wouldn't free/reallocate the pud or pmd entries, but
the "wasted" memory should be small. If anything, I'd argue that it's
preferable to keep that around so that we don't have to allocate memory
when we need to map the crashkernel region.

Is there another problem I'm missing?

> +static bool pgtable_is_cleared(void *pgtable)
> +{
> +	unsigned long *desc, *end;
> +
> +	for (desc = pgtable, end = (unsigned long *)(pgtable + PAGE_SIZE);
> +	     desc < end; desc++)
> +		if (*desc)
> +			return false;
> +
> +	return true;
> +}

If nothing else, we should be using the normal accessors for this, e.g.
{pte,pmd,pud}_none().

As above, I'd very much like to reuse our existing page table routines
if we can.

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-01 12:46   ` AKASHI Takahiro
@ 2017-02-01 18:00     ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 18:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> are meant to be called around kexec_load() in order to protect
> the memory allocated for crash dump kernel once after it's loaded.
> 
> The protection is implemented here by unmapping the region rather than
> making it read-only.
> To make the things work correctly, we also have to
> - put the region in an isolated, page-level mapping initially, and
> - move copying kexec's control_code_page to machine_kexec_prepare()
> 
> Note that page-level mapping is also required to allow for shrinking
> the size of memory, through /sys/kernel/kexec_crash_size, by any number
> of multiple pages.

Looking at kexec_crash_size_store(), I don't see where memory returned
to the OS is mapped. AFAICT, if the region is protected when the user
shrinks the region, the memory will not be mapped, yet handed over to
the kernel for general allocation.

Surely we need an arch-specific callback to handle that? e.g.

arch_crash_release_region(unsigned long base, unsigned long size)
{
	/*
	 * Ensure the region is part of the linear map before we return
	 * it to the OS. We won't unmap this again, so we can use block
	 * mappings.
	 */
	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
			   size, PAGE_KERNEL, false);
}

... which we'd call from crash_shrink_memory() before we freed the
reserved pages.

[...]

> +void arch_kexec_unprotect_crashkres(void)
> +{
> +	/*
> +	 * We don't have to make page-level mappings here because
> +	 * the crash dump kernel memory is not allowed to be shrunk
> +	 * once the kernel is loaded.
> +	 */
> +	create_pgd_mapping(&init_mm, crashk_res.start,
> +			__phys_to_virt(crashk_res.start),
> +			resource_size(&crashk_res), PAGE_KERNEL,
> +			debug_pagealloc_enabled());
> +
> +	flush_tlb_all();
> +}

We can lose the flush_tlb_all() here; TLBs aren't allowed to cache an
invalid entry, so there's nothing to remove from the TLBs.

[...]

> @@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
>  		if (memblock_is_nomap(reg))
>  			continue;
>  
> +#ifdef CONFIG_KEXEC_CORE
> +		/*
> +		 * While crash dump kernel memory is contained in a single
> +		 * memblock for now, it should appear in an isolated mapping
> +		 * so that we can independently unmap the region later.
> +		 */
> +		if (crashk_res.end &&
> +		    (start <= crashk_res.start) &&
> +		    ((crashk_res.end + 1) < end)) {
> +			if (crashk_res.start != start)
> +				__map_memblock(pgd, start, crashk_res.start);
> +
> +			if ((crashk_res.end + 1) < end)
> +				__map_memblock(pgd, crashk_res.end + 1, end);
> +
> +			continue;
> +		}
> +#endif

This wasn't quite what I had in mind. I had expected that here we would
isolate the ranges we wanted to avoid mapping (with a comment as to why
we couldn't move the memblock_isolate_range() calls earlier). In
map_memblock(), we'd skip those ranges entirely.

I believe the above isn't correct if we have a single memblock.memory
region covering both the crashkernel and kernel regions. In that case,
we'd erroneously map the portion which overlaps the kernel.

It seems there are a number of subtle problems here. :/

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-01 18:00     ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 18:00 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> are meant to be called around kexec_load() in order to protect
> the memory allocated for crash dump kernel once after it's loaded.
> 
> The protection is implemented here by unmapping the region rather than
> making it read-only.
> To make the things work correctly, we also have to
> - put the region in an isolated, page-level mapping initially, and
> - move copying kexec's control_code_page to machine_kexec_prepare()
> 
> Note that page-level mapping is also required to allow for shrinking
> the size of memory, through /sys/kernel/kexec_crash_size, by any number
> of multiple pages.

Looking at kexec_crash_size_store(), I don't see where memory returned
to the OS is mapped. AFAICT, if the region is protected when the user
shrinks the region, the memory will not be mapped, yet handed over to
the kernel for general allocation.

Surely we need an arch-specific callback to handle that? e.g.

arch_crash_release_region(unsigned long base, unsigned long size)
{
	/*
	 * Ensure the region is part of the linear map before we return
	 * it to the OS. We won't unmap this again, so we can use block
	 * mappings.
	 */
	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
			   size, PAGE_KERNEL, false);
}

... which we'd call from crash_shrink_memory() before we freed the
reserved pages.

[...]

> +void arch_kexec_unprotect_crashkres(void)
> +{
> +	/*
> +	 * We don't have to make page-level mappings here because
> +	 * the crash dump kernel memory is not allowed to be shrunk
> +	 * once the kernel is loaded.
> +	 */
> +	create_pgd_mapping(&init_mm, crashk_res.start,
> +			__phys_to_virt(crashk_res.start),
> +			resource_size(&crashk_res), PAGE_KERNEL,
> +			debug_pagealloc_enabled());
> +
> +	flush_tlb_all();
> +}

We can lose the flush_tlb_all() here; TLBs aren't allowed to cache an
invalid entry, so there's nothing to remove from the TLBs.

[...]

> @@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
>  		if (memblock_is_nomap(reg))
>  			continue;
>  
> +#ifdef CONFIG_KEXEC_CORE
> +		/*
> +		 * While crash dump kernel memory is contained in a single
> +		 * memblock for now, it should appear in an isolated mapping
> +		 * so that we can independently unmap the region later.
> +		 */
> +		if (crashk_res.end &&
> +		    (start <= crashk_res.start) &&
> +		    ((crashk_res.end + 1) < end)) {
> +			if (crashk_res.start != start)
> +				__map_memblock(pgd, start, crashk_res.start);
> +
> +			if ((crashk_res.end + 1) < end)
> +				__map_memblock(pgd, crashk_res.end + 1, end);
> +
> +			continue;
> +		}
> +#endif

This wasn't quite what I had in mind. I had expected that here we would
isolate the ranges we wanted to avoid mapping (with a comment as to why
we couldn't move the memblock_isolate_range() calls earlier). In
map_memblock(), we'd skip those ranges entirely.

I believe the above isn't correct if we have a single memblock.memory
region covering both the crashkernel and kernel regions. In that case,
we'd erroneously map the portion which overlaps the kernel.

It seems there are a number of subtle problems here. :/

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-01 18:00     ` Mark Rutland
@ 2017-02-01 18:25       ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 18:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > are meant to be called around kexec_load() in order to protect
> > the memory allocated for crash dump kernel once after it's loaded.
> > 
> > The protection is implemented here by unmapping the region rather than
> > making it read-only.
> > To make the things work correctly, we also have to
> > - put the region in an isolated, page-level mapping initially, and
> > - move copying kexec's control_code_page to machine_kexec_prepare()
> > 
> > Note that page-level mapping is also required to allow for shrinking
> > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > of multiple pages.
> 
> Looking at kexec_crash_size_store(), I don't see where memory returned
> to the OS is mapped. AFAICT, if the region is protected when the user
> shrinks the region, the memory will not be mapped, yet handed over to
> the kernel for general allocation.
> 
> Surely we need an arch-specific callback to handle that? e.g.
> 
> arch_crash_release_region(unsigned long base, unsigned long size)
> {
> 	/*
> 	 * Ensure the region is part of the linear map before we return
> 	 * it to the OS. We won't unmap this again, so we can use block
> 	 * mappings.
> 	 */
> 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> 			   size, PAGE_KERNEL, false);
> }
> 
> ... which we'd call from crash_shrink_memory() before we freed the
> reserved pages.

Another question is (how) does hyp map this region?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-01 18:25       ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 18:25 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > are meant to be called around kexec_load() in order to protect
> > the memory allocated for crash dump kernel once after it's loaded.
> > 
> > The protection is implemented here by unmapping the region rather than
> > making it read-only.
> > To make the things work correctly, we also have to
> > - put the region in an isolated, page-level mapping initially, and
> > - move copying kexec's control_code_page to machine_kexec_prepare()
> > 
> > Note that page-level mapping is also required to allow for shrinking
> > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > of multiple pages.
> 
> Looking at kexec_crash_size_store(), I don't see where memory returned
> to the OS is mapped. AFAICT, if the region is protected when the user
> shrinks the region, the memory will not be mapped, yet handed over to
> the kernel for general allocation.
> 
> Surely we need an arch-specific callback to handle that? e.g.
> 
> arch_crash_release_region(unsigned long base, unsigned long size)
> {
> 	/*
> 	 * Ensure the region is part of the linear map before we return
> 	 * it to the OS. We won't unmap this again, so we can use block
> 	 * mappings.
> 	 */
> 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> 			   size, PAGE_KERNEL, false);
> }
> 
> ... which we'd call from crash_shrink_memory() before we freed the
> reserved pages.

Another question is (how) does hyp map this region?

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
  2017-02-01 12:46   ` AKASHI Takahiro
@ 2017-02-01 19:21     ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 19:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:
> Add arch-specific functions to provide a dump file, /proc/vmcore.
> 
> This file is in ELF format and its ELF header needs to be prepared by
> userspace tools, like kexec-tools, in adance. The primary kernel is
> responsible to allocate the region with reserve_elfcorehdr() at boot time
> and advertize its location to crash dump kernel via a new device-tree
> property, "linux,elfcorehdr".

> +static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> +		const char *uname, int depth, void *data)
> +{
> +	const __be32 *reg;
> +	int len;
> +
> +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> +		return 0;
> +
> +	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
> +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> +		return 1;
> +
> +	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
> +	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
> +
> +	return 1;
> +}
> +
> +/*
> + * reserve_elfcorehdr() - reserves memory for elf core header
> + *
> + * This function reserves elf core header given in "elfcorehdr=" kernel
> + * command line parameter. This region contains all the information about
> + * primary kernel's core image and is used by a dump capture kernel to
> + * access the system memory on primary kernel.
> + */
> +static void __init reserve_elfcorehdr(void)
> +{
> +	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
> +
> +	if (!elfcorehdr_size)
> +		return;
> +
> +	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
> +		pr_warn("elfcorehdr is overlapped\n");
> +		return;
> +	}
> +
> +	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
> +
> +	pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
> +		elfcorehdr_size >> 10, elfcorehdr_addr);
> +}

This doesn't seem right to me.  The logic here doesn't match the commit
message, the comment above reserve_elfcorehdr() doesn't match the
implementation, and this doesn't match my understanding of how this was
intended to be used from the DT binding.

I had assumed that we'd treat this in much the same way as the
linux,reserved-memory-region property, with the primary kernel either
dynamically allocating the region or using a command line option, and
the base being exposed to userspace via /sys/ or /proc/ somehow.

Why is that not the case?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
@ 2017-02-01 19:21     ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-01 19:21 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:
> Add arch-specific functions to provide a dump file, /proc/vmcore.
> 
> This file is in ELF format and its ELF header needs to be prepared by
> userspace tools, like kexec-tools, in adance. The primary kernel is
> responsible to allocate the region with reserve_elfcorehdr() at boot time
> and advertize its location to crash dump kernel via a new device-tree
> property, "linux,elfcorehdr".

> +static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> +		const char *uname, int depth, void *data)
> +{
> +	const __be32 *reg;
> +	int len;
> +
> +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> +		return 0;
> +
> +	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
> +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> +		return 1;
> +
> +	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
> +	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
> +
> +	return 1;
> +}
> +
> +/*
> + * reserve_elfcorehdr() - reserves memory for elf core header
> + *
> + * This function reserves elf core header given in "elfcorehdr=" kernel
> + * command line parameter. This region contains all the information about
> + * primary kernel's core image and is used by a dump capture kernel to
> + * access the system memory on primary kernel.
> + */
> +static void __init reserve_elfcorehdr(void)
> +{
> +	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
> +
> +	if (!elfcorehdr_size)
> +		return;
> +
> +	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
> +		pr_warn("elfcorehdr is overlapped\n");
> +		return;
> +	}
> +
> +	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
> +
> +	pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
> +		elfcorehdr_size >> 10, elfcorehdr_addr);
> +}

This doesn't seem right to me.  The logic here doesn't match the commit
message, the comment above reserve_elfcorehdr() doesn't match the
implementation, and this doesn't match my understanding of how this was
intended to be used from the DT binding.

I had assumed that we'd treat this in much the same way as the
linux,reserved-memory-region property, with the primary kernel either
dynamically allocating the region or using a command line option, and
the base being exposed to userspace via /sys/ or /proc/ somehow.

Why is that not the case?

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 02/12] arm64: limit memory regions based on DT property, usable-memory-range
  2017-02-01 15:07     ` Mark Rutland
@ 2017-02-02  4:21       ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02  4:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 03:07:00PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Wed, Feb 01, 2017 at 09:46:21PM +0900, AKASHI Takahiro wrote:
> > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > +		const char *uname, int depth, void *data)
> > +{
> > +	struct memblock_region *usablemem = (struct memblock_region *)data;
> 
> Nit: unnecessary cast.

OK

> > +	const __be32 *reg;
> > +	int len;
> > +
> > +	usablemem->size = 0;
> 
> Could we please lift this assignment/initialisation into the caller...
> 
> > +
> > +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> > +		return 0;
> > +
> > +	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > +		return 1;
> > +
> > +	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > +	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > +
> > +	return 1;
> > +}
> > +
> > +static void __init fdt_enforce_memory_region(void)
> > +{
> > +	struct memblock_region reg;
> 
> ... e.g. have:
> 
> 	struct memblock_region reg = {
> 		.size = 0,
> 	};

Sure

> That saves us from making unnecessary assignments to the size field, and
> makes it clear that reg.size has definitely been initialised, regardless
> of what of_scan_flat_dt() happens to do.
>
> With that:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.
> 
> > +
> > +	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > +
> > +	if (reg.size)
> > +		memblock_cap_memory_range(reg.base, reg.size);
> > +}
> > +
> >  void __init arm64_memblock_init(void)
> >  {
> >  	const s64 linear_region_size = -(s64)PAGE_OFFSET;
> >  
> > +	/* Handle linux,usable-memory-range property */
> > +	fdt_enforce_memory_region();
> > +
> >  	/*
> >  	 * Ensure that the linear region takes up exactly half of the kernel
> >  	 * virtual address space. This way, we can distinguish a linear address
> > -- 
> > 2.11.0
> > 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 02/12] arm64: limit memory regions based on DT property, usable-memory-range
@ 2017-02-02  4:21       ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02  4:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Wed, Feb 01, 2017 at 03:07:00PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Wed, Feb 01, 2017 at 09:46:21PM +0900, AKASHI Takahiro wrote:
> > +static int __init early_init_dt_scan_usablemem(unsigned long node,
> > +		const char *uname, int depth, void *data)
> > +{
> > +	struct memblock_region *usablemem = (struct memblock_region *)data;
> 
> Nit: unnecessary cast.

OK

> > +	const __be32 *reg;
> > +	int len;
> > +
> > +	usablemem->size = 0;
> 
> Could we please lift this assignment/initialisation into the caller...
> 
> > +
> > +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> > +		return 0;
> > +
> > +	reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> > +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > +		return 1;
> > +
> > +	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > +	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > +
> > +	return 1;
> > +}
> > +
> > +static void __init fdt_enforce_memory_region(void)
> > +{
> > +	struct memblock_region reg;
> 
> ... e.g. have:
> 
> 	struct memblock_region reg = {
> 		.size = 0,
> 	};

Sure

> That saves us from making unnecessary assignments to the size field, and
> makes it clear that reg.size has definitely been initialised, regardless
> of what of_scan_flat_dt() happens to do.
>
> With that:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.
> 
> > +
> > +	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > +
> > +	if (reg.size)
> > +		memblock_cap_memory_range(reg.base, reg.size);
> > +}
> > +
> >  void __init arm64_memblock_init(void)
> >  {
> >  	const s64 linear_region_size = -(s64)PAGE_OFFSET;
> >  
> > +	/* Handle linux,usable-memory-range property */
> > +	fdt_enforce_memory_region();
> > +
> >  	/*
> >  	 * Ensure that the linear region takes up exactly half of the kernel
> >  	 * virtual address space. This way, we can distinguish a linear address
> > -- 
> > 2.11.0
> > 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
  2017-02-01 15:26     ` Mark Rutland
@ 2017-02-02  4:52       ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02  4:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 03:26:09PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Wed, Feb 01, 2017 at 09:46:22PM +0900, AKASHI Takahiro wrote:
> > +#ifdef CONFIG_KEXEC_CORE
> > +/*
> > + * reserve_crashkernel() - reserves memory for crash kernel
> > + *
> > + * This function reserves memory area given in "crashkernel=" kernel command
> > + * line parameter. The memory reserved is used by dump capture kernel when
> > + * primary kernel is crashing.
> > + */
> > +static void __init reserve_crashkernel(void)
> > +{
> > +	unsigned long long crash_base, crash_size;
> > +	int ret;
> > +
> > +	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > +				&crash_size, &crash_base);
> > +	/* no crashkernel= or invalid value specified */
> > +	if (ret || !crash_size)
> > +		return;
> > +
> > +	crash_size = PAGE_ALIGN(crash_size);
> > +
> > +	if (crash_base == 0) {
> > +		/* Current arm64 boot protocol requires 2MB alignment */
> > +		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
> > +				crash_size, SZ_2M);
> > +		if (crash_base == 0) {
> > +			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > +				crash_size);
> 
> Nit: can we please make that "size: 0x%llx", so that it's always clearly
> a hex number? e.g.
> 
> 	pr_warn("cannot allocate 0x%llx bytes for crashkernel\n",
> 		crash_size);

OK

> > +			return;
> > +		}
> > +	} else {
> > +		/* User specifies base address explicitly. */
> > +		if (!memblock_is_region_memory(crash_base, crash_size) ||
> > +			memblock_is_region_reserved(crash_base, crash_size)) {
> > +			pr_warn("crashkernel has wrong address or size\n");
> > +			return;
> > +		}
> 
> To better report the error, can we please split these up:
> 
> 		if (!memblock_is_region_memory(crash_base, crash_size)) {
> 			pr_warn("cannot reserve crashkernel: region is not memory\n");
> 			return;
> 		}
> 
> 		if (!memblock_is_region_memory(crash_base, crash_size)) {
> 			pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
> 			return;
> 		}

OK, and

> > +		if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > +			pr_warn("crashkernel base address is not 2MB aligned\n");


			pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");

> > +			return;
> > +		}
> > +	}
> > +	memblock_reserve(crash_base, crash_size);
> > +
> > +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > +		crash_size >> 20, crash_base >> 20);
> 
> We only page-align the size, so the MB will be a little off, but that's
> probably OK. However, it would also be nicer to log the base as an
> address.

You might notice that the exact same message is used by all the other
architectures, but

> Could we dump this as we do for the kernel memory layout? e.g.
> 
> 	pr_info("crashkernel reserved: 0x%016lx - 0x%016lx (%lld MB)\n",
> 		crash_base, crash_base + crash_size, crash_size >> 20);

We can go either way.

> With those:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
@ 2017-02-02  4:52       ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02  4:52 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Pratyush Anand, geoff, catalin.marinas, will.deacon, james.morse,
	Mark Salter, bauerman, dyoung, kexec, linux-arm-kernel

On Wed, Feb 01, 2017 at 03:26:09PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Wed, Feb 01, 2017 at 09:46:22PM +0900, AKASHI Takahiro wrote:
> > +#ifdef CONFIG_KEXEC_CORE
> > +/*
> > + * reserve_crashkernel() - reserves memory for crash kernel
> > + *
> > + * This function reserves memory area given in "crashkernel=" kernel command
> > + * line parameter. The memory reserved is used by dump capture kernel when
> > + * primary kernel is crashing.
> > + */
> > +static void __init reserve_crashkernel(void)
> > +{
> > +	unsigned long long crash_base, crash_size;
> > +	int ret;
> > +
> > +	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > +				&crash_size, &crash_base);
> > +	/* no crashkernel= or invalid value specified */
> > +	if (ret || !crash_size)
> > +		return;
> > +
> > +	crash_size = PAGE_ALIGN(crash_size);
> > +
> > +	if (crash_base == 0) {
> > +		/* Current arm64 boot protocol requires 2MB alignment */
> > +		crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
> > +				crash_size, SZ_2M);
> > +		if (crash_base == 0) {
> > +			pr_warn("Unable to allocate crashkernel (size:%llx)\n",
> > +				crash_size);
> 
> Nit: can we please make that "size: 0x%llx", so that it's always clearly
> a hex number? e.g.
> 
> 	pr_warn("cannot allocate 0x%llx bytes for crashkernel\n",
> 		crash_size);

OK

> > +			return;
> > +		}
> > +	} else {
> > +		/* User specifies base address explicitly. */
> > +		if (!memblock_is_region_memory(crash_base, crash_size) ||
> > +			memblock_is_region_reserved(crash_base, crash_size)) {
> > +			pr_warn("crashkernel has wrong address or size\n");
> > +			return;
> > +		}
> 
> To better report the error, can we please split these up:
> 
> 		if (!memblock_is_region_memory(crash_base, crash_size)) {
> 			pr_warn("cannot reserve crashkernel: region is not memory\n");
> 			return;
> 		}
> 
> 		if (!memblock_is_region_memory(crash_base, crash_size)) {
> 			pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
> 			return;
> 		}

OK, and

> > +		if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > +			pr_warn("crashkernel base address is not 2MB aligned\n");


			pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");

> > +			return;
> > +		}
> > +	}
> > +	memblock_reserve(crash_base, crash_size);
> > +
> > +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > +		crash_size >> 20, crash_base >> 20);
> 
> We only page-align the size, so the MB will be a little off, but that's
> probably OK. However, it would also be nicer to log the base as an
> address.

You might notice that the exact same message is used by all the other
architectures, but

> Could we dump this as we do for the kernel memory layout? e.g.
> 
> 	pr_info("crashkernel reserved: 0x%016lx - 0x%016lx (%lld MB)\n",
> 		crash_base, crash_base + crash_size, crash_size >> 20);

We can go either way.

> With those:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
  2017-02-01 19:21     ` Mark Rutland
@ 2017-02-02  6:24       ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02  6:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 07:21:22PM +0000, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:
> > Add arch-specific functions to provide a dump file, /proc/vmcore.
> > 
> > This file is in ELF format and its ELF header needs to be prepared by
> > userspace tools, like kexec-tools, in adance. The primary kernel is
> > responsible to allocate the region with reserve_elfcorehdr() at boot time
> > and advertize its location to crash dump kernel via a new device-tree
> > property, "linux,elfcorehdr".
> 
> > +static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> > +		const char *uname, int depth, void *data)
> > +{
> > +	const __be32 *reg;
> > +	int len;
> > +
> > +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> > +		return 0;
> > +
> > +	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
> > +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > +		return 1;
> > +
> > +	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > +	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > +
> > +	return 1;
> > +}
> > +
> > +/*
> > + * reserve_elfcorehdr() - reserves memory for elf core header
> > + *
> > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > + * command line parameter. This region contains all the information about
> > + * primary kernel's core image and is used by a dump capture kernel to
> > + * access the system memory on primary kernel.
> > + */
> > +static void __init reserve_elfcorehdr(void)
> > +{
> > +	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
> > +
> > +	if (!elfcorehdr_size)
> > +		return;
> > +
> > +	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
> > +		pr_warn("elfcorehdr is overlapped\n");
> > +		return;
> > +	}
> > +
> > +	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
> > +
> > +	pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
> > +		elfcorehdr_size >> 10, elfcorehdr_addr);
> > +}
> 
> This doesn't seem right to me.  The logic here doesn't match the commit
> message, the comment above reserve_elfcorehdr() doesn't match the
> implementation, and this doesn't match my understanding of how this was
> intended to be used from the DT binding.

Surely the commit message was wrong/misleading; It should say

===8<===
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.

A user space tool, like kexec-tools, is responsible for allocating
a separate region for the core's ELF header within crash kdump kernel 
memory and filling it in when executing kexec_load().

Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with reserve_elfcorehdr()@boot time.

On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
it since it does not reside in linear mapping on crash dump kernel.

Meanwhile, elfcorehdr_read() is simple as the region is always mapped.
===>8===

Does this make things clear?

> I had assumed that we'd treat this in much the same way as the
> linux,reserved-memory-region property, with the primary kernel either
> dynamically allocating the region or using a command line option, and
> the base being exposed to userspace via /sys/ or /proc/ somehow.

I didn't get the point here, but please note that the data in ELF core header
is produced by kexec-tools (who knows its location, too), and consumed solely
by the crash dump kernel.

Thanks,
-Takahiro AKASHI

> Why is that not the case?
> 
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
@ 2017-02-02  6:24       ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02  6:24 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Wed, Feb 01, 2017 at 07:21:22PM +0000, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:
> > Add arch-specific functions to provide a dump file, /proc/vmcore.
> > 
> > This file is in ELF format and its ELF header needs to be prepared by
> > userspace tools, like kexec-tools, in adance. The primary kernel is
> > responsible to allocate the region with reserve_elfcorehdr() at boot time
> > and advertize its location to crash dump kernel via a new device-tree
> > property, "linux,elfcorehdr".
> 
> > +static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> > +		const char *uname, int depth, void *data)
> > +{
> > +	const __be32 *reg;
> > +	int len;
> > +
> > +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> > +		return 0;
> > +
> > +	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
> > +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > +		return 1;
> > +
> > +	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > +	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > +
> > +	return 1;
> > +}
> > +
> > +/*
> > + * reserve_elfcorehdr() - reserves memory for elf core header
> > + *
> > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > + * command line parameter. This region contains all the information about
> > + * primary kernel's core image and is used by a dump capture kernel to
> > + * access the system memory on primary kernel.
> > + */
> > +static void __init reserve_elfcorehdr(void)
> > +{
> > +	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
> > +
> > +	if (!elfcorehdr_size)
> > +		return;
> > +
> > +	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
> > +		pr_warn("elfcorehdr is overlapped\n");
> > +		return;
> > +	}
> > +
> > +	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
> > +
> > +	pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
> > +		elfcorehdr_size >> 10, elfcorehdr_addr);
> > +}
> 
> This doesn't seem right to me.  The logic here doesn't match the commit
> message, the comment above reserve_elfcorehdr() doesn't match the
> implementation, and this doesn't match my understanding of how this was
> intended to be used from the DT binding.

Surely the commit message was wrong/misleading; It should say

===8<===
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.

A user space tool, like kexec-tools, is responsible for allocating
a separate region for the core's ELF header within crash kdump kernel 
memory and filling it in when executing kexec_load().

Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with reserve_elfcorehdr() at boot time.

On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
it since it does not reside in linear mapping on crash dump kernel.

Meanwhile, elfcorehdr_read() is simple as the region is always mapped.
===>8===

Does this make things clear?

> I had assumed that we'd treat this in much the same way as the
> linux,reserved-memory-region property, with the primary kernel either
> dynamically allocating the region or using a command line option, and
> the base being exposed to userspace via /sys/ or /proc/ somehow.

I didn't get the point here, but please note that the data in ELF core header
is produced by kexec-tools (who knows its location, too), and consumed solely
by the crash dump kernel.

Thanks,
-Takahiro AKASHI

> Why is that not the case?
> 
> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-01 16:03     ` Mark Rutland
@ 2017-02-02 10:21       ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > A new function, remove_pgd_mapping(), is added.
> > It allows us to unmap a specific portion of kernel mapping later as far as
> > the mapping is made using create_pgd_mapping() and unless we try to free
> > a sub-set of memory range within a section mapping.
> 
> I'm not keen on adding more page table modification code. It was painful
> enough to ensure that those worked in all configurations.
> 
> Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> and use an invalid prot (i.e. 0), what is the problem?

As I did in v30?
(though my implementation in v30 should be improved.)

> I can see that we wouldn't free/reallocate the pud or pmd entries, but
> the "wasted" memory should be small. If anything, I'd argue that it's
> preferable to keep that around so that we don't have to allocate memory
> when we need to map the crashkernel region.
> 
> Is there another problem I'm missing?

If we don't need to free unused page tables, that would make things
much simple. There are still some minor problems on the merge, but
we can sort it out.

Thanks,
-Takahiro AKASHI

> > +static bool pgtable_is_cleared(void *pgtable)
> > +{
> > +	unsigned long *desc, *end;
> > +
> > +	for (desc = pgtable, end = (unsigned long *)(pgtable + PAGE_SIZE);
> > +	     desc < end; desc++)
> > +		if (*desc)
> > +			return false;
> > +
> > +	return true;
> > +}
> 
> If nothing else, we should be using the normal accessors for this, e.g.
> {pte,pmd,pud}_none().
> 
> As above, I'd very much like to reuse our existing page table routines
> if we can.
> 
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-02 10:21       ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 10:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > A new function, remove_pgd_mapping(), is added.
> > It allows us to unmap a specific portion of kernel mapping later as far as
> > the mapping is made using create_pgd_mapping() and unless we try to free
> > a sub-set of memory range within a section mapping.
> 
> I'm not keen on adding more page table modification code. It was painful
> enough to ensure that those worked in all configurations.
> 
> Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> and use an invalid prot (i.e. 0), what is the problem?

As I did in v30?
(though my implementation in v30 should be improved.)

> I can see that we wouldn't free/reallocate the pud or pmd entries, but
> the "wasted" memory should be small. If anything, I'd argue that it's
> preferable to keep that around so that we don't have to allocate memory
> when we need to map the crashkernel region.
> 
> Is there another problem I'm missing?

If we don't need to free unused page tables, that would make things
much simple. There are still some minor problems on the merge, but
we can sort it out.

Thanks,
-Takahiro AKASHI

> > +static bool pgtable_is_cleared(void *pgtable)
> > +{
> > +	unsigned long *desc, *end;
> > +
> > +	for (desc = pgtable, end = (unsigned long *)(pgtable + PAGE_SIZE);
> > +	     desc < end; desc++)
> > +		if (*desc)
> > +			return false;
> > +
> > +	return true;
> > +}
> 
> If nothing else, we should be using the normal accessors for this, e.g.
> {pte,pmd,pud}_none().
> 
> As above, I'd very much like to reuse our existing page table routines
> if we can.
> 
> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-01 18:00     ` Mark Rutland
@ 2017-02-02 10:31       ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > are meant to be called around kexec_load() in order to protect
> > the memory allocated for crash dump kernel once after it's loaded.
> > 
> > The protection is implemented here by unmapping the region rather than
> > making it read-only.
> > To make the things work correctly, we also have to
> > - put the region in an isolated, page-level mapping initially, and
> > - move copying kexec's control_code_page to machine_kexec_prepare()
> > 
> > Note that page-level mapping is also required to allow for shrinking
> > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > of multiple pages.
> 
> Looking at kexec_crash_size_store(), I don't see where memory returned
> to the OS is mapped. AFAICT, if the region is protected when the user
> shrinks the region, the memory will not be mapped, yet handed over to
> the kernel for general allocation.

The region is protected only when the crash dump kernel is loaded,
and after that, we are no longer able to shrink the region.

> Surely we need an arch-specific callback to handle that? e.g.
> 
> arch_crash_release_region(unsigned long base, unsigned long size)
> {
> 	/*
> 	 * Ensure the region is part of the linear map before we return
> 	 * it to the OS. We won't unmap this again, so we can use block
> 	 * mappings.
> 	 */
> 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> 			   size, PAGE_KERNEL, false);
> }
> 
> ... which we'd call from crash_shrink_memory() before we freed the
> reserved pages.

All the memory is mapped by my map_crashkernel() at boot time.

> [...]
> 
> > +void arch_kexec_unprotect_crashkres(void)
> > +{
> > +	/*
> > +	 * We don't have to make page-level mappings here because
> > +	 * the crash dump kernel memory is not allowed to be shrunk
> > +	 * once the kernel is loaded.
> > +	 */
> > +	create_pgd_mapping(&init_mm, crashk_res.start,
> > +			__phys_to_virt(crashk_res.start),
> > +			resource_size(&crashk_res), PAGE_KERNEL,
> > +			debug_pagealloc_enabled());
> > +
> > +	flush_tlb_all();
> > +}
> 
> We can lose the flush_tlb_all() here; TLBs aren't allowed to cache an
> invalid entry, so there's nothing to remove from the TLBs.

Ah, yes!

> [...]
> 
> > @@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
> >  		if (memblock_is_nomap(reg))
> >  			continue;
> >  
> > +#ifdef CONFIG_KEXEC_CORE
> > +		/*
> > +		 * While crash dump kernel memory is contained in a single
> > +		 * memblock for now, it should appear in an isolated mapping
> > +		 * so that we can independently unmap the region later.
> > +		 */
> > +		if (crashk_res.end &&
> > +		    (start <= crashk_res.start) &&
> > +		    ((crashk_res.end + 1) < end)) {
> > +			if (crashk_res.start != start)
> > +				__map_memblock(pgd, start, crashk_res.start);
> > +
> > +			if ((crashk_res.end + 1) < end)
> > +				__map_memblock(pgd, crashk_res.end + 1, end);
> > +
> > +			continue;
> > +		}
> > +#endif
> 
> This wasn't quite what I had in mind. I had expected that here we would
> isolate the ranges we wanted to avoid mapping (with a comment as to why
> we couldn't move the memblock_isolate_range() calls earlier). In
> map_memblock(), we'd skip those ranges entirely.
> 
> I believe the above isn't correct if we have a single memblock.memory
> region covering both the crashkernel and kernel regions. In that case,
> we'd erroneously map the portion which overlaps the kernel.
> 
> It seems there are a number of subtle problems here. :/

I didn't see any problems, but I will go back with memblock_isolate_range()
here in map_mem().

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 10:31       ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 10:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > are meant to be called around kexec_load() in order to protect
> > the memory allocated for crash dump kernel once after it's loaded.
> > 
> > The protection is implemented here by unmapping the region rather than
> > making it read-only.
> > To make the things work correctly, we also have to
> > - put the region in an isolated, page-level mapping initially, and
> > - move copying kexec's control_code_page to machine_kexec_prepare()
> > 
> > Note that page-level mapping is also required to allow for shrinking
> > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > of multiple pages.
> 
> Looking at kexec_crash_size_store(), I don't see where memory returned
> to the OS is mapped. AFAICT, if the region is protected when the user
> shrinks the region, the memory will not be mapped, yet handed over to
> the kernel for general allocation.

The region is protected only when the crash dump kernel is loaded,
and after that, we are no longer able to shrink the region.

> Surely we need an arch-specific callback to handle that? e.g.
> 
> arch_crash_release_region(unsigned long base, unsigned long size)
> {
> 	/*
> 	 * Ensure the region is part of the linear map before we return
> 	 * it to the OS. We won't unmap this again, so we can use block
> 	 * mappings.
> 	 */
> 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> 			   size, PAGE_KERNEL, false);
> }
> 
> ... which we'd call from crash_shrink_memory() before we freed the
> reserved pages.

All the memory is mapped by my map_crashkernel() at boot time.

> [...]
> 
> > +void arch_kexec_unprotect_crashkres(void)
> > +{
> > +	/*
> > +	 * We don't have to make page-level mappings here because
> > +	 * the crash dump kernel memory is not allowed to be shrunk
> > +	 * once the kernel is loaded.
> > +	 */
> > +	create_pgd_mapping(&init_mm, crashk_res.start,
> > +			__phys_to_virt(crashk_res.start),
> > +			resource_size(&crashk_res), PAGE_KERNEL,
> > +			debug_pagealloc_enabled());
> > +
> > +	flush_tlb_all();
> > +}
> 
> We can lose the flush_tlb_all() here; TLBs aren't allowed to cache an
> invalid entry, so there's nothing to remove from the TLBs.

Ah, yes!

> [...]
> 
> > @@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
> >  		if (memblock_is_nomap(reg))
> >  			continue;
> >  
> > +#ifdef CONFIG_KEXEC_CORE
> > +		/*
> > +		 * While crash dump kernel memory is contained in a single
> > +		 * memblock for now, it should appear in an isolated mapping
> > +		 * so that we can independently unmap the region later.
> > +		 */
> > +		if (crashk_res.end &&
> > +		    (start <= crashk_res.start) &&
> > +		    ((crashk_res.end + 1) < end)) {
> > +			if (crashk_res.start != start)
> > +				__map_memblock(pgd, start, crashk_res.start);
> > +
> > +			if ((crashk_res.end + 1) < end)
> > +				__map_memblock(pgd, crashk_res.end + 1, end);
> > +
> > +			continue;
> > +		}
> > +#endif
> 
> This wasn't quite what I had in mind. I had expected that here we would
> isolate the ranges we wanted to avoid mapping (with a comment as to why
> we couldn't move the memblock_isolate_range() calls earlier). In
> map_memblock(), we'd skip those ranges entirely.
> 
> I believe the above isn't correct if we have a single memblock.memory
> region covering both the crashkernel and kernel regions. In that case,
> we'd erroneously map the portion which overlaps the kernel.
> 
> It seems there are a number of subtle problems here. :/

I didn't see any problems, but I will go back with memblock_isolate_range()
here in map_mem().

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-01 18:25       ` Mark Rutland
@ 2017-02-02 10:39         ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 10:39 UTC (permalink / raw)
  To: linux-arm-kernel

Mark,

On Wed, Feb 01, 2017 at 06:25:06PM +0000, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > are meant to be called around kexec_load() in order to protect
> > > the memory allocated for crash dump kernel once after it's loaded.
> > > 
> > > The protection is implemented here by unmapping the region rather than
> > > making it read-only.
> > > To make the things work correctly, we also have to
> > > - put the region in an isolated, page-level mapping initially, and
> > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > 
> > > Note that page-level mapping is also required to allow for shrinking
> > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > of multiple pages.
> > 
> > Looking at kexec_crash_size_store(), I don't see where memory returned
> > to the OS is mapped. AFAICT, if the region is protected when the user
> > shrinks the region, the memory will not be mapped, yet handed over to
> > the kernel for general allocation.
> > 
> > Surely we need an arch-specific callback to handle that? e.g.
> > 
> > arch_crash_release_region(unsigned long base, unsigned long size)
> > {
> > 	/*
> > 	 * Ensure the region is part of the linear map before we return
> > 	 * it to the OS. We won't unmap this again, so we can use block
> > 	 * mappings.
> > 	 */
> > 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> > 			   size, PAGE_KERNEL, false);
> > }
> > 
> > ... which we'd call from crash_shrink_memory() before we freed the
> > reserved pages.
> 
> Another question is (how) does hyp map this region?

I don't get your point here.
Hyp mode does care only physical memory in intermediate address, doesn't it?

If this is not a matter now, I will post v32 tomorrow :)

-Takahiro AKASHI


> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 10:39         ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 10:39 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

Mark,

On Wed, Feb 01, 2017 at 06:25:06PM +0000, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > are meant to be called around kexec_load() in order to protect
> > > the memory allocated for crash dump kernel once after it's loaded.
> > > 
> > > The protection is implemented here by unmapping the region rather than
> > > making it read-only.
> > > To make the things work correctly, we also have to
> > > - put the region in an isolated, page-level mapping initially, and
> > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > 
> > > Note that page-level mapping is also required to allow for shrinking
> > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > of multiple pages.
> > 
> > Looking at kexec_crash_size_store(), I don't see where memory returned
> > to the OS is mapped. AFAICT, if the region is protected when the user
> > shrinks the region, the memory will not be mapped, yet handed over to
> > the kernel for general allocation.
> > 
> > Surely we need an arch-specific callback to handle that? e.g.
> > 
> > arch_crash_release_region(unsigned long base, unsigned long size)
> > {
> > 	/*
> > 	 * Ensure the region is part of the linear map before we return
> > 	 * it to the OS. We won't unmap this again, so we can use block
> > 	 * mappings.
> > 	 */
> > 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> > 			   size, PAGE_KERNEL, false);
> > }
> > 
> > ... which we'd call from crash_shrink_memory() before we freed the
> > reserved pages.
> 
> Another question is (how) does hyp map this region?

I don't get your point here.
Hyp mode does care only physical memory in intermediate address, doesn't it?

If this is not a matter now, I will post v32 tomorrow :)

-Takahiro AKASHI


> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-01 18:25       ` Mark Rutland
@ 2017-02-02 10:45         ` James Morse
  -1 siblings, 0 replies; 92+ messages in thread
From: James Morse @ 2017-02-02 10:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi, Mark,

On 01/02/17 18:25, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
>> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
>>> arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
>>> are meant to be called around kexec_load() in order to protect
>>> the memory allocated for crash dump kernel once after it's loaded.
>>>
>>> The protection is implemented here by unmapping the region rather than
>>> making it read-only.
>>> To make the things work correctly, we also have to
>>> - put the region in an isolated, page-level mapping initially, and
>>> - move copying kexec's control_code_page to machine_kexec_prepare()
>>>
>>> Note that page-level mapping is also required to allow for shrinking
>>> the size of memory, through /sys/kernel/kexec_crash_size, by any number
>>> of multiple pages.

>> Looking at kexec_crash_size_store(), I don't see where memory returned
>> to the OS is mapped. AFAICT, if the region is protected when the user
>> shrinks the region, the memory will not be mapped, yet handed over to
>> the kernel for general allocation.

kernel/kexec_core.c:crash_shrink_memory() will bailout:
>	if (kexec_crash_image) {
>		ret = -ENOENT;
>		goto unlock;
>	}

So it should only be possible to return memory to the allocators when there is
no crash image loaded, so the area is mapped.

What happens when we unload the crash image? It looks like an unload is a call
to do_kexec_load with nr_segments == 0, do_kexec_load() has:
>	if (flags & KEXEC_ON_CRASH) {
>		dest_image = &kexec_crash_image;
>		if (kexec_crash_image)
>			arch_kexec_unprotect_crashkres();

So we unprotect the region when we unload the kernel causing it to be remapped.
Provided the load/protect and {load,unload}/unprotect are kept in sync, I think
this is safe.


Given the core code can unload a crash image, this hunk has me worried:
+void arch_kexec_unprotect_crashkres(void)
+{
+	/*
+	 * We don't have to make page-level mappings here because
+	 * the crash dump kernel memory is not allowed to be shrunk
+	 * once the kernel is loaded.
+	 */
+	create_pgd_mapping(&init_mm, crashk_res.start,
+			__phys_to_virt(crashk_res.start),
+			resource_size(&crashk_res), PAGE_KERNEL,
+			debug_pagealloc_enabled());


I don't think this is true if the order is: load -> protect, unload ->
unprotect, shrink. The shrink will happen with potentially non-page-size mappings.


Thanks,

James

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 10:45         ` James Morse
  0 siblings, 0 replies; 92+ messages in thread
From: James Morse @ 2017-02-02 10:45 UTC (permalink / raw)
  To: Mark Rutland, AKASHI Takahiro
  Cc: geoff, catalin.marinas, will.deacon, bauerman, dyoung, kexec,
	linux-arm-kernel

Hi Akashi, Mark,

On 01/02/17 18:25, Mark Rutland wrote:
> On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
>> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
>>> arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
>>> are meant to be called around kexec_load() in order to protect
>>> the memory allocated for crash dump kernel once after it's loaded.
>>>
>>> The protection is implemented here by unmapping the region rather than
>>> making it read-only.
>>> To make the things work correctly, we also have to
>>> - put the region in an isolated, page-level mapping initially, and
>>> - move copying kexec's control_code_page to machine_kexec_prepare()
>>>
>>> Note that page-level mapping is also required to allow for shrinking
>>> the size of memory, through /sys/kernel/kexec_crash_size, by any number
>>> of multiple pages.

>> Looking at kexec_crash_size_store(), I don't see where memory returned
>> to the OS is mapped. AFAICT, if the region is protected when the user
>> shrinks the region, the memory will not be mapped, yet handed over to
>> the kernel for general allocation.

kernel/kexec_core.c:crash_shrink_memory() will bailout:
>	if (kexec_crash_image) {
>		ret = -ENOENT;
>		goto unlock;
>	}

So it should only be possible to return memory to the allocators when there is
no crash image loaded, so the area is mapped.

What happens when we unload the crash image? It looks like an unload is a call
to do_kexec_load with nr_segments == 0, do_kexec_load() has:
>	if (flags & KEXEC_ON_CRASH) {
>		dest_image = &kexec_crash_image;
>		if (kexec_crash_image)
>			arch_kexec_unprotect_crashkres();

So we unprotect the region when we unload the kernel causing it to be remapped.
Provided the load/protect and {load,unload}/unprotect are kept in sync, I think
this is safe.


Given the core code can unload a crash image, this hunk has me worried:
+void arch_kexec_unprotect_crashkres(void)
+{
+	/*
+	 * We don't have to make page-level mappings here because
+	 * the crash dump kernel memory is not allowed to be shrunk
+	 * once the kernel is loaded.
+	 */
+	create_pgd_mapping(&init_mm, crashk_res.start,
+			__phys_to_virt(crashk_res.start),
+			resource_size(&crashk_res), PAGE_KERNEL,
+			debug_pagealloc_enabled());


I don't think this is true if the order is: load -> protect, unload ->
unprotect, shrink. The shrink will happen with potentially non-page-size mappings.


Thanks,

James


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-02 10:31       ` AKASHI Takahiro
@ 2017-02-02 11:16         ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, Feb 02, 2017 at 07:31:30PM +0900, AKASHI Takahiro wrote:
> On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > are meant to be called around kexec_load() in order to protect
> > > the memory allocated for crash dump kernel once after it's loaded.
> > > 
> > > The protection is implemented here by unmapping the region rather than
> > > making it read-only.
> > > To make the things work correctly, we also have to
> > > - put the region in an isolated, page-level mapping initially, and
> > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > 
> > > Note that page-level mapping is also required to allow for shrinking
> > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > of multiple pages.
> > 
> > Looking at kexec_crash_size_store(), I don't see where memory returned
> > to the OS is mapped. AFAICT, if the region is protected when the user
> > shrinks the region, the memory will not be mapped, yet handed over to
> > the kernel for general allocation.
> 
> The region is protected only when the crash dump kernel is loaded,
> and after that, we are no longer able to shrink the region.

Ah, sorry. My misunderstanding strikes again. That should be fine; sorry
for the noise, and thanks for explaining.

> > > @@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
> > >  		if (memblock_is_nomap(reg))
> > >  			continue;
> > >  
> > > +#ifdef CONFIG_KEXEC_CORE
> > > +		/*
> > > +		 * While crash dump kernel memory is contained in a single
> > > +		 * memblock for now, it should appear in an isolated mapping
> > > +		 * so that we can independently unmap the region later.
> > > +		 */
> > > +		if (crashk_res.end &&
> > > +		    (start <= crashk_res.start) &&
> > > +		    ((crashk_res.end + 1) < end)) {
> > > +			if (crashk_res.start != start)
> > > +				__map_memblock(pgd, start, crashk_res.start);
> > > +
> > > +			if ((crashk_res.end + 1) < end)
> > > +				__map_memblock(pgd, crashk_res.end + 1, end);
> > > +
> > > +			continue;
> > > +		}
> > > +#endif
> > 
> > This wasn't quite what I had in mind. I had expected that here we would
> > isolate the ranges we wanted to avoid mapping (with a comment as to why
> > we couldn't move the memblock_isolate_range() calls earlier). In
> > map_memblock(), we'd skip those ranges entirely.
> > 
> > I believe the above isn't correct if we have a single memblock.memory
> > region covering both the crashkernel and kernel regions. In that case,
> > we'd erroneously map the portion which overlaps the kernel.
> > 
> > It seems there are a number of subtle problems here. :/
> 
> I didn't see any problems, but I will go back with memblock_isolate_range()
> here in map_mem().

Imagine we have phyiscal memory:

singe RAM bank:    |---------------------------------------------------|
kernel image:              |---|
crashkernel:                                      |------|

... we reserve the image and crashkernel region, but these would still
remain part of the memory memblock, and we'd have a memblock layout
like:

memblock.memory:   |---------------------------------------------------|
memblock.reserved:         |---|                  |------|

... in map_mem() we iterate over memblock.memory, so we only have a
single entry to handle in this case. With the code above, we'd find that
it overlaps the crashk_res, and we'd map the parts which don't overlap,
e.g.

memblock.memory:   |---------------------------------------------------|
crashkernel:                                      |------|
mapped regions:    |-----------------------------|        |------------|

... hwoever, this means we've mapped the portion which overlaps with the
kernel's linear alias (i.e. the case that we try to handle in
__map_memblock()). What we actually wanted was:

memblock.memory:   |---------------------------------------------------|
kernel image:              |---|
crashkernel:                                      |------|
mapped regions:    |------|     |----------------|        |------------|



To handle all cases I think we have to isolate *both* the image and
crashkernel in map_mem(). That would leave use with:

memblock.memory:   |------||---||----------------||------||------------|
memblock.reserved:         |---|                  |------|

... so then we can check for overlap with either the kernel or
crashkernel in __map_memblock(), and return early, e.g.

__map_memblock(...)
	if (overlaps_with_kernel(...))
		return;
	if (overlaps_with_crashekrenl(...))
		return;
	
	__create_pgd_mapping(...);
}

We can pull the kernel alias mapping out of __map_memblock() and put it
at the end of map_mem().

Does that make sense?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 11:16         ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:16 UTC (permalink / raw)
  To: AKASHI Takahiro, catalin.marinas, will.deacon, james.morse,
	geoff, bauerman, dyoung, kexec, linux-arm-kernel

Hi,

On Thu, Feb 02, 2017 at 07:31:30PM +0900, AKASHI Takahiro wrote:
> On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > are meant to be called around kexec_load() in order to protect
> > > the memory allocated for crash dump kernel once after it's loaded.
> > > 
> > > The protection is implemented here by unmapping the region rather than
> > > making it read-only.
> > > To make the things work correctly, we also have to
> > > - put the region in an isolated, page-level mapping initially, and
> > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > 
> > > Note that page-level mapping is also required to allow for shrinking
> > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > of multiple pages.
> > 
> > Looking at kexec_crash_size_store(), I don't see where memory returned
> > to the OS is mapped. AFAICT, if the region is protected when the user
> > shrinks the region, the memory will not be mapped, yet handed over to
> > the kernel for general allocation.
> 
> The region is protected only when the crash dump kernel is loaded,
> and after that, we are no longer able to shrink the region.

Ah, sorry. My misunderstanding strikes again. That should be fine; sorry
for the noise, and thanks for explaining.

> > > @@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
> > >  		if (memblock_is_nomap(reg))
> > >  			continue;
> > >  
> > > +#ifdef CONFIG_KEXEC_CORE
> > > +		/*
> > > +		 * While crash dump kernel memory is contained in a single
> > > +		 * memblock for now, it should appear in an isolated mapping
> > > +		 * so that we can independently unmap the region later.
> > > +		 */
> > > +		if (crashk_res.end &&
> > > +		    (start <= crashk_res.start) &&
> > > +		    ((crashk_res.end + 1) < end)) {
> > > +			if (crashk_res.start != start)
> > > +				__map_memblock(pgd, start, crashk_res.start);
> > > +
> > > +			if ((crashk_res.end + 1) < end)
> > > +				__map_memblock(pgd, crashk_res.end + 1, end);
> > > +
> > > +			continue;
> > > +		}
> > > +#endif
> > 
> > This wasn't quite what I had in mind. I had expected that here we would
> > isolate the ranges we wanted to avoid mapping (with a comment as to why
> > we couldn't move the memblock_isolate_range() calls earlier). In
> > map_memblock(), we'd skip those ranges entirely.
> > 
> > I believe the above isn't correct if we have a single memblock.memory
> > region covering both the crashkernel and kernel regions. In that case,
> > we'd erroneously map the portion which overlaps the kernel.
> > 
> > It seems there are a number of subtle problems here. :/
> 
> I didn't see any problems, but I will go back with memblock_isolate_range()
> here in map_mem().

Imagine we have phyiscal memory:

singe RAM bank:    |---------------------------------------------------|
kernel image:              |---|
crashkernel:                                      |------|

... we reserve the image and crashkernel region, but these would still
remain part of the memory memblock, and we'd have a memblock layout
like:

memblock.memory:   |---------------------------------------------------|
memblock.reserved:         |---|                  |------|

... in map_mem() we iterate over memblock.memory, so we only have a
single entry to handle in this case. With the code above, we'd find that
it overlaps the crashk_res, and we'd map the parts which don't overlap,
e.g.

memblock.memory:   |---------------------------------------------------|
crashkernel:                                      |------|
mapped regions:    |-----------------------------|        |------------|

... hwoever, this means we've mapped the portion which overlaps with the
kernel's linear alias (i.e. the case that we try to handle in
__map_memblock()). What we actually wanted was:

memblock.memory:   |---------------------------------------------------|
kernel image:              |---|
crashkernel:                                      |------|
mapped regions:    |------|     |----------------|        |------------|



To handle all cases I think we have to isolate *both* the image and
crashkernel in map_mem(). That would leave use with:

memblock.memory:   |------||---||----------------||------||------------|
memblock.reserved:         |---|                  |------|

... so then we can check for overlap with either the kernel or
crashkernel in __map_memblock(), and return early, e.g.

__map_memblock(...)
	if (overlaps_with_kernel(...))
		return;
	if (overlaps_with_crashekrenl(...))
		return;
	
	__create_pgd_mapping(...);
}

We can pull the kernel alias mapping out of __map_memblock() and put it
at the end of map_mem().

Does that make sense?

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-02 10:45         ` James Morse
@ 2017-02-02 11:19           ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 11:19 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Thu, Feb 02, 2017 at 10:45:58AM +0000, James Morse wrote:
> Hi Akashi, Mark,
> 
> On 01/02/17 18:25, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> >> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> >>> arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> >>> are meant to be called around kexec_load() in order to protect
> >>> the memory allocated for crash dump kernel once after it's loaded.
> >>>
> >>> The protection is implemented here by unmapping the region rather than
> >>> making it read-only.
> >>> To make the things work correctly, we also have to
> >>> - put the region in an isolated, page-level mapping initially, and
> >>> - move copying kexec's control_code_page to machine_kexec_prepare()
> >>>
> >>> Note that page-level mapping is also required to allow for shrinking
> >>> the size of memory, through /sys/kernel/kexec_crash_size, by any number
> >>> of multiple pages.
> 
> >> Looking at kexec_crash_size_store(), I don't see where memory returned
> >> to the OS is mapped. AFAICT, if the region is protected when the user
> >> shrinks the region, the memory will not be mapped, yet handed over to
> >> the kernel for general allocation.
> 
> kernel/kexec_core.c:crash_shrink_memory() will bailout:
> >	if (kexec_crash_image) {
> >		ret = -ENOENT;
> >		goto unlock;
> >	}
> 
> So it should only be possible to return memory to the allocators when there is
> no crash image loaded, so the area is mapped.
> 
> What happens when we unload the crash image? It looks like an unload is a call

Thank you for this heads-up!
I've almost forgot this feature.

> to do_kexec_load with nr_segments == 0, do_kexec_load() has:
> >	if (flags & KEXEC_ON_CRASH) {
> >		dest_image = &kexec_crash_image;
> >		if (kexec_crash_image)
> >			arch_kexec_unprotect_crashkres();
> 
> So we unprotect the region when we unload the kernel causing it to be remapped.
> Provided the load/protect and {load,unload}/unprotect are kept in sync, I think
> this is safe.
> 
> 
> Given the core code can unload a crash image, this hunk has me worried:
> +void arch_kexec_unprotect_crashkres(void)
> +{
> +	/*
> +	 * We don't have to make page-level mappings here because
> +	 * the crash dump kernel memory is not allowed to be shrunk
> +	 * once the kernel is loaded.
> +	 */
> +	create_pgd_mapping(&init_mm, crashk_res.start,
> +			__phys_to_virt(crashk_res.start),
> +			resource_size(&crashk_res), PAGE_KERNEL,
> +			debug_pagealloc_enabled());
> 
> 
> I don't think this is true if the order is: load -> protect, unload ->
> unprotect, shrink. The shrink will happen with potentially non-page-size mappings.

So we have to always do page-mapping, here.

-Takahiro AKASHI

> 
> Thanks,
> 
> James
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 11:19           ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 11:19 UTC (permalink / raw)
  To: James Morse
  Cc: Mark Rutland, geoff, catalin.marinas, will.deacon, bauerman,
	dyoung, kexec, linux-arm-kernel

James,

On Thu, Feb 02, 2017 at 10:45:58AM +0000, James Morse wrote:
> Hi Akashi, Mark,
> 
> On 01/02/17 18:25, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> >> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> >>> arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> >>> are meant to be called around kexec_load() in order to protect
> >>> the memory allocated for crash dump kernel once after it's loaded.
> >>>
> >>> The protection is implemented here by unmapping the region rather than
> >>> making it read-only.
> >>> To make the things work correctly, we also have to
> >>> - put the region in an isolated, page-level mapping initially, and
> >>> - move copying kexec's control_code_page to machine_kexec_prepare()
> >>>
> >>> Note that page-level mapping is also required to allow for shrinking
> >>> the size of memory, through /sys/kernel/kexec_crash_size, by any number
> >>> of multiple pages.
> 
> >> Looking at kexec_crash_size_store(), I don't see where memory returned
> >> to the OS is mapped. AFAICT, if the region is protected when the user
> >> shrinks the region, the memory will not be mapped, yet handed over to
> >> the kernel for general allocation.
> 
> kernel/kexec_core.c:crash_shrink_memory() will bailout:
> >	if (kexec_crash_image) {
> >		ret = -ENOENT;
> >		goto unlock;
> >	}
> 
> So it should only be possible to return memory to the allocators when there is
> no crash image loaded, so the area is mapped.
> 
> What happens when we unload the crash image? It looks like an unload is a call

Thank you for this heads-up!
I've almost forgot this feature.

> to do_kexec_load with nr_segments == 0, do_kexec_load() has:
> >	if (flags & KEXEC_ON_CRASH) {
> >		dest_image = &kexec_crash_image;
> >		if (kexec_crash_image)
> >			arch_kexec_unprotect_crashkres();
> 
> So we unprotect the region when we unload the kernel causing it to be remapped.
> Provided the load/protect and {load,unload}/unprotect are kept in sync, I think
> this is safe.
> 
> 
> Given the core code can unload a crash image, this hunk has me worried:
> +void arch_kexec_unprotect_crashkres(void)
> +{
> +	/*
> +	 * We don't have to make page-level mappings here because
> +	 * the crash dump kernel memory is not allowed to be shrunk
> +	 * once the kernel is loaded.
> +	 */
> +	create_pgd_mapping(&init_mm, crashk_res.start,
> +			__phys_to_virt(crashk_res.start),
> +			resource_size(&crashk_res), PAGE_KERNEL,
> +			debug_pagealloc_enabled());
> 
> 
> I don't think this is true if the order is: load -> protect, unload ->
> unprotect, shrink. The shrink will happen with potentially non-page-size mappings.

So we have to always do page-mapping, here.

-Takahiro AKASHI

> 
> Thanks,
> 
> James
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
  2017-02-02  4:52       ` AKASHI Takahiro
@ 2017-02-02 11:26         ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 01:52:36PM +0900, AKASHI Takahiro wrote:
> On Wed, Feb 01, 2017 at 03:26:09PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 09:46:22PM +0900, AKASHI Takahiro wrote:
> > > +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > > +		crash_size >> 20, crash_base >> 20);
> > 
> > We only page-align the size, so the MB will be a little off, but that's
> > probably OK. However, it would also be nicer to log the base as an
> > address.
> 
> You might notice that the exact same message is used by all the other
> architectures, but

Almost all; I see arch/sh prints the address with %08x. ;)

> > Could we dump this as we do for the kernel memory layout? e.g.
> > 
> > 	pr_info("crashkernel reserved: 0x%016lx - 0x%016lx (%lld MB)\n",
> > 		crash_base, crash_base + crash_size, crash_size >> 20);
> 
> We can go either way.

Even if it's different from other archtiectures, I'd prefer to log as
above, with the range in hex.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
@ 2017-02-02 11:26         ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:26 UTC (permalink / raw)
  To: AKASHI Takahiro, catalin.marinas, will.deacon, james.morse,
	geoff, bauerman, dyoung, kexec, linux-arm-kernel, Mark Salter,
	Pratyush Anand

On Thu, Feb 02, 2017 at 01:52:36PM +0900, AKASHI Takahiro wrote:
> On Wed, Feb 01, 2017 at 03:26:09PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 09:46:22PM +0900, AKASHI Takahiro wrote:
> > > +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > > +		crash_size >> 20, crash_base >> 20);
> > 
> > We only page-align the size, so the MB will be a little off, but that's
> > probably OK. However, it would also be nicer to log the base as an
> > address.
> 
> You might notice that the exact same message is used by all the other
> architectures, but

Almost all; I see arch/sh prints the address with %08x. ;)

> > Could we dump this as we do for the kernel memory layout? e.g.
> > 
> > 	pr_info("crashkernel reserved: 0x%016lx - 0x%016lx (%lld MB)\n",
> > 		crash_base, crash_base + crash_size, crash_size >> 20);
> 
> We can go either way.

Even if it's different from other archtiectures, I'd prefer to log as
above, with the range in hex.

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-02 10:21       ` AKASHI Takahiro
@ 2017-02-02 11:44         ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > Hi,
> > 
> > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > A new function, remove_pgd_mapping(), is added.
> > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > a sub-set of memory range within a section mapping.
> > 
> > I'm not keen on adding more page table modification code. It was painful
> > enough to ensure that those worked in all configurations.
> > 
> > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > and use an invalid prot (i.e. 0), what is the problem?
> 
> As I did in v30?
> (though my implementation in v30 should be improved.)

Something like that. I wasn't entirely sure why we needed to change
those functions so much, so I'm clearly missing something there. I'll go
have another look.

> > I can see that we wouldn't free/reallocate the pud or pmd entries, but
> > the "wasted" memory should be small. If anything, I'd argue that it's
> > preferable to keep that around so that we don't have to allocate memory
> > when we need to map the crashkernel region.
> > 
> > Is there another problem I'm missing?
> 
> If we don't need to free unused page tables, that would make things
> much simple. There are still some minor problems on the merge, but
> we can sort it out.

I'm not sure I follow what you mean by 'on merge' here. Could you
elaborate?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-02 11:44         ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:44 UTC (permalink / raw)
  To: AKASHI Takahiro, catalin.marinas, will.deacon, james.morse,
	geoff, bauerman, dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > Hi,
> > 
> > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > A new function, remove_pgd_mapping(), is added.
> > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > a sub-set of memory range within a section mapping.
> > 
> > I'm not keen on adding more page table modification code. It was painful
> > enough to ensure that those worked in all configurations.
> > 
> > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > and use an invalid prot (i.e. 0), what is the problem?
> 
> As I did in v30?
> (though my implementation in v30 should be improved.)

Something like that. I wasn't entirely sure why we needed to change
those functions so much, so I'm clearly missing something there. I'll go
have another look.

> > I can see that we wouldn't free/reallocate the pud or pmd entries, but
> > the "wasted" memory should be small. If anything, I'd argue that it's
> > preferable to keep that around so that we don't have to allocate memory
> > when we need to map the crashkernel region.
> > 
> > Is there another problem I'm missing?
> 
> If we don't need to free unused page tables, that would make things
> much simple. There are still some minor problems on the merge, but
> we can sort it out.

I'm not sure I follow what you mean by 'on merge' here. Could you
elaborate?

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-02 10:45         ` James Morse
@ 2017-02-02 11:48           ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, Feb 02, 2017 at 10:45:58AM +0000, James Morse wrote:
> On 01/02/17 18:25, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> >> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> >>> arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> >>> are meant to be called around kexec_load() in order to protect
> >>> the memory allocated for crash dump kernel once after it's loaded.
> >>>
> >>> The protection is implemented here by unmapping the region rather than
> >>> making it read-only.
> >>> To make the things work correctly, we also have to
> >>> - put the region in an isolated, page-level mapping initially, and
> >>> - move copying kexec's control_code_page to machine_kexec_prepare()
> >>>
> >>> Note that page-level mapping is also required to allow for shrinking
> >>> the size of memory, through /sys/kernel/kexec_crash_size, by any number
> >>> of multiple pages.
> 
> >> Looking at kexec_crash_size_store(), I don't see where memory returned
> >> to the OS is mapped. AFAICT, if the region is protected when the user
> >> shrinks the region, the memory will not be mapped, yet handed over to
> >> the kernel for general allocation.
> 
> kernel/kexec_core.c:crash_shrink_memory() will bailout:
> >	if (kexec_crash_image) {
> >		ret = -ENOENT;
> >		goto unlock;
> >	}
> 
> So it should only be possible to return memory to the allocators when there is
> no crash image loaded, so the area is mapped.

Ah. That being the case, there should be no problem.

Any part given back to the linear map will be at page granularity, but
that won't result in any functional problem.

> Given the core code can unload a crash image, this hunk has me worried:
> +void arch_kexec_unprotect_crashkres(void)
> +{
> +	/*
> +	 * We don't have to make page-level mappings here because
> +	 * the crash dump kernel memory is not allowed to be shrunk
> +	 * once the kernel is loaded.
> +	 */
> +	create_pgd_mapping(&init_mm, crashk_res.start,
> +			__phys_to_virt(crashk_res.start),
> +			resource_size(&crashk_res), PAGE_KERNEL,
> +			debug_pagealloc_enabled());
> 
> 
> I don't think this is true if the order is: load -> protect, unload ->
> unprotect, shrink. The shrink will happen with potentially non-page-size mappings.

Let's consistently use page mappings for the crashkernel region.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 11:48           ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:48 UTC (permalink / raw)
  To: James Morse
  Cc: geoff, catalin.marinas, will.deacon, AKASHI Takahiro, bauerman,
	dyoung, kexec, linux-arm-kernel

Hi,

On Thu, Feb 02, 2017 at 10:45:58AM +0000, James Morse wrote:
> On 01/02/17 18:25, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> >> On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> >>> arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> >>> are meant to be called around kexec_load() in order to protect
> >>> the memory allocated for crash dump kernel once after it's loaded.
> >>>
> >>> The protection is implemented here by unmapping the region rather than
> >>> making it read-only.
> >>> To make the things work correctly, we also have to
> >>> - put the region in an isolated, page-level mapping initially, and
> >>> - move copying kexec's control_code_page to machine_kexec_prepare()
> >>>
> >>> Note that page-level mapping is also required to allow for shrinking
> >>> the size of memory, through /sys/kernel/kexec_crash_size, by any number
> >>> of multiple pages.
> 
> >> Looking at kexec_crash_size_store(), I don't see where memory returned
> >> to the OS is mapped. AFAICT, if the region is protected when the user
> >> shrinks the region, the memory will not be mapped, yet handed over to
> >> the kernel for general allocation.
> 
> kernel/kexec_core.c:crash_shrink_memory() will bailout:
> >	if (kexec_crash_image) {
> >		ret = -ENOENT;
> >		goto unlock;
> >	}
> 
> So it should only be possible to return memory to the allocators when there is
> no crash image loaded, so the area is mapped.

Ah. That being the case, there should be no problem.

Any part given back to the linear map will be at page granularity, but
that won't result in any functional problem.

> Given the core code can unload a crash image, this hunk has me worried:
> +void arch_kexec_unprotect_crashkres(void)
> +{
> +	/*
> +	 * We don't have to make page-level mappings here because
> +	 * the crash dump kernel memory is not allowed to be shrunk
> +	 * once the kernel is loaded.
> +	 */
> +	create_pgd_mapping(&init_mm, crashk_res.start,
> +			__phys_to_virt(crashk_res.start),
> +			resource_size(&crashk_res), PAGE_KERNEL,
> +			debug_pagealloc_enabled());
> 
> 
> I don't think this is true if the order is: load -> protect, unload ->
> unprotect, shrink. The shrink will happen with potentially non-page-size mappings.

Let's consistently use page mappings for the crashkernel region.

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-02 10:39         ` AKASHI Takahiro
@ 2017-02-02 11:54           ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 07:39:06PM +0900, AKASHI Takahiro wrote:
> Mark,
> 
> On Wed, Feb 01, 2017 at 06:25:06PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > > are meant to be called around kexec_load() in order to protect
> > > > the memory allocated for crash dump kernel once after it's loaded.
> > > > 
> > > > The protection is implemented here by unmapping the region rather than
> > > > making it read-only.
> > > > To make the things work correctly, we also have to
> > > > - put the region in an isolated, page-level mapping initially, and
> > > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > > 
> > > > Note that page-level mapping is also required to allow for shrinking
> > > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > > of multiple pages.
> > > 
> > > Looking at kexec_crash_size_store(), I don't see where memory returned
> > > to the OS is mapped. AFAICT, if the region is protected when the user
> > > shrinks the region, the memory will not be mapped, yet handed over to
> > > the kernel for general allocation.
> > > 
> > > Surely we need an arch-specific callback to handle that? e.g.
> > > 
> > > arch_crash_release_region(unsigned long base, unsigned long size)
> > > {
> > > 	/*
> > > 	 * Ensure the region is part of the linear map before we return
> > > 	 * it to the OS. We won't unmap this again, so we can use block
> > > 	 * mappings.
> > > 	 */
> > > 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> > > 			   size, PAGE_KERNEL, false);
> > > }
> > > 
> > > ... which we'd call from crash_shrink_memory() before we freed the
> > > reserved pages.
> > 
> > Another question is (how) does hyp map this region?
> 
> I don't get your point here.
> Hyp mode does care only physical memory in intermediate address, doesn't it?

My concern was that hyp may map the region; and thus buggy code at hyp
can corrupt the region (and/or hyp may conflict w.r.t. attributes).

We mght have to ensure hyp doesn't map the crashkernel region, and to
case us pain, disallow freeing of any part of the region.

I'll dig into this.

Thanks,
Mark.

> If this is not a matter now, I will post v32 tomorrow :)
> 
> -Takahiro AKASHI
> 
> 
> > Thanks,
> > Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 11:54           ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 11:54 UTC (permalink / raw)
  To: AKASHI Takahiro, geoff, catalin.marinas, will.deacon,
	james.morse, bauerman, dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 07:39:06PM +0900, AKASHI Takahiro wrote:
> Mark,
> 
> On Wed, Feb 01, 2017 at 06:25:06PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > > are meant to be called around kexec_load() in order to protect
> > > > the memory allocated for crash dump kernel once after it's loaded.
> > > > 
> > > > The protection is implemented here by unmapping the region rather than
> > > > making it read-only.
> > > > To make the things work correctly, we also have to
> > > > - put the region in an isolated, page-level mapping initially, and
> > > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > > 
> > > > Note that page-level mapping is also required to allow for shrinking
> > > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > > of multiple pages.
> > > 
> > > Looking at kexec_crash_size_store(), I don't see where memory returned
> > > to the OS is mapped. AFAICT, if the region is protected when the user
> > > shrinks the region, the memory will not be mapped, yet handed over to
> > > the kernel for general allocation.
> > > 
> > > Surely we need an arch-specific callback to handle that? e.g.
> > > 
> > > arch_crash_release_region(unsigned long base, unsigned long size)
> > > {
> > > 	/*
> > > 	 * Ensure the region is part of the linear map before we return
> > > 	 * it to the OS. We won't unmap this again, so we can use block
> > > 	 * mappings.
> > > 	 */
> > > 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> > > 			   size, PAGE_KERNEL, false);
> > > }
> > > 
> > > ... which we'd call from crash_shrink_memory() before we freed the
> > > reserved pages.
> > 
> > Another question is (how) does hyp map this region?
> 
> I don't get your point here.
> Hyp mode does care only physical memory in intermediate address, doesn't it?

My concern was that hyp may map the region; and thus buggy code at hyp
can corrupt the region (and/or hyp may conflict w.r.t. attributes).

We mght have to ensure hyp doesn't map the crashkernel region, and to
case us pain, disallow freeing of any part of the region.

I'll dig into this.

Thanks,
Mark.

> If this is not a matter now, I will post v32 tomorrow :)
> 
> -Takahiro AKASHI
> 
> 
> > Thanks,
> > Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
  2017-02-02  6:24       ` AKASHI Takahiro
@ 2017-02-02 12:03         ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 12:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 03:24:09PM +0900, AKASHI Takahiro wrote:
> On Wed, Feb 01, 2017 at 07:21:22PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:
> > > Add arch-specific functions to provide a dump file, /proc/vmcore.
> > > 
> > > This file is in ELF format and its ELF header needs to be prepared by
> > > userspace tools, like kexec-tools, in adance. The primary kernel is
> > > responsible to allocate the region with reserve_elfcorehdr() at boot time
> > > and advertize its location to crash dump kernel via a new device-tree
> > > property, "linux,elfcorehdr".
> > 
> > > +static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> > > +		const char *uname, int depth, void *data)
> > > +{
> > > +	const __be32 *reg;
> > > +	int len;
> > > +
> > > +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> > > +		return 0;
> > > +
> > > +	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
> > > +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > > +		return 1;
> > > +
> > > +	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > > +	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > > +
> > > +	return 1;
> > > +}
> > > +
> > > +/*
> > > + * reserve_elfcorehdr() - reserves memory for elf core header
> > > + *
> > > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > > + * command line parameter. This region contains all the information about
> > > + * primary kernel's core image and is used by a dump capture kernel to
> > > + * access the system memory on primary kernel.
> > > + */
> > > +static void __init reserve_elfcorehdr(void)
> > > +{
> > > +	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
> > > +
> > > +	if (!elfcorehdr_size)
> > > +		return;
> > > +
> > > +	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
> > > +		pr_warn("elfcorehdr is overlapped\n");
> > > +		return;
> > > +	}
> > > +
> > > +	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
> > > +
> > > +	pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
> > > +		elfcorehdr_size >> 10, elfcorehdr_addr);
> > > +}
> > 
> > This doesn't seem right to me.  The logic here doesn't match the commit
> > message, the comment above reserve_elfcorehdr() doesn't match the
> > implementation, and this doesn't match my understanding of how this was
> > intended to be used from the DT binding.
> 
> Surely the commit message was wrong/misleading; It should say
> 
> ===8<===
> Arch-specific functions are added to allow for implementing a crash dump
> file interface, /proc/vmcore, which can be viewed as a ELF file.
> 
> A user space tool, like kexec-tools, is responsible for allocating
> a separate region for the core's ELF header within crash kdump kernel 
> memory and filling it in when executing kexec_load().
> 
> Then, its location will be advertised to crash dump kernel via a new
> device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
> the region for later use with reserve_elfcorehdr() at boot time.
> 
> On crash dump kernel, /proc/vmcore will access the primary kernel's memory
> with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
> it since it does not reside in linear mapping on crash dump kernel.
> 
> Meanwhile, elfcorehdr_read() is simple as the region is always mapped.
> ===>8===
> 
> Does this make things clear?

Yes. That sounds roughly like what I was expecting. Thanks for
clarifiying that!

Can you also fix the comment above reserve_elfcorehdr()? It refers to a
non-existent kernel command line option.

... at the same time, can we change the print there to something like:

	pr_info("elfcorehdr found and reserved at 0x%016llx - 0x%016llx (%lld KB)\n",
		elfcorehdr_addr, elfcorehdr_addr + elfcorehdr_size, 
		elfcorehdr_size >> 10);

... that makes it clear that we've found an existing elfcorehdr, rather
than we're reserving space to later fill with an elfcorehdr, and
printing the rgion base and size in that way aligns with what we do
elsewhere.

With all that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
@ 2017-02-02 12:03         ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 12:03 UTC (permalink / raw)
  To: AKASHI Takahiro, catalin.marinas, will.deacon, james.morse,
	geoff, bauerman, dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 03:24:09PM +0900, AKASHI Takahiro wrote:
> On Wed, Feb 01, 2017 at 07:21:22PM +0000, Mark Rutland wrote:
> > On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:
> > > Add arch-specific functions to provide a dump file, /proc/vmcore.
> > > 
> > > This file is in ELF format and its ELF header needs to be prepared by
> > > userspace tools, like kexec-tools, in adance. The primary kernel is
> > > responsible to allocate the region with reserve_elfcorehdr() at boot time
> > > and advertize its location to crash dump kernel via a new device-tree
> > > property, "linux,elfcorehdr".
> > 
> > > +static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> > > +		const char *uname, int depth, void *data)
> > > +{
> > > +	const __be32 *reg;
> > > +	int len;
> > > +
> > > +	if (depth != 1 || strcmp(uname, "chosen") != 0)
> > > +		return 0;
> > > +
> > > +	reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
> > > +	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> > > +		return 1;
> > > +
> > > +	elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
> > > +	elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
> > > +
> > > +	return 1;
> > > +}
> > > +
> > > +/*
> > > + * reserve_elfcorehdr() - reserves memory for elf core header
> > > + *
> > > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > > + * command line parameter. This region contains all the information about
> > > + * primary kernel's core image and is used by a dump capture kernel to
> > > + * access the system memory on primary kernel.
> > > + */
> > > +static void __init reserve_elfcorehdr(void)
> > > +{
> > > +	of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
> > > +
> > > +	if (!elfcorehdr_size)
> > > +		return;
> > > +
> > > +	if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
> > > +		pr_warn("elfcorehdr is overlapped\n");
> > > +		return;
> > > +	}
> > > +
> > > +	memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
> > > +
> > > +	pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
> > > +		elfcorehdr_size >> 10, elfcorehdr_addr);
> > > +}
> > 
> > This doesn't seem right to me.  The logic here doesn't match the commit
> > message, the comment above reserve_elfcorehdr() doesn't match the
> > implementation, and this doesn't match my understanding of how this was
> > intended to be used from the DT binding.
> 
> Surely the commit message was wrong/misleading; It should say
> 
> ===8<===
> Arch-specific functions are added to allow for implementing a crash dump
> file interface, /proc/vmcore, which can be viewed as a ELF file.
> 
> A user space tool, like kexec-tools, is responsible for allocating
> a separate region for the core's ELF header within crash kdump kernel 
> memory and filling it in when executing kexec_load().
> 
> Then, its location will be advertised to crash dump kernel via a new
> device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
> the region for later use with reserve_elfcorehdr() at boot time.
> 
> On crash dump kernel, /proc/vmcore will access the primary kernel's memory
> with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
> it since it does not reside in linear mapping on crash dump kernel.
> 
> Meanwhile, elfcorehdr_read() is simple as the region is always mapped.
> ===>8===
> 
> Does this make things clear?

Yes. That sounds roughly like what I was expecting. Thanks for
clarifiying that!

Can you also fix the comment above reserve_elfcorehdr()? It refers to a
non-existent kernel command line option.

... at the same time, can we change the print there to something like:

	pr_info("elfcorehdr found and reserved at 0x%016llx - 0x%016llx (%lld KB)\n",
		elfcorehdr_addr, elfcorehdr_addr + elfcorehdr_size, 
		elfcorehdr_size >> 10);

... that makes it clear that we've found an existing elfcorehdr, rather
than we're reserving space to later fill with an elfcorehdr, and
printing the rgion base and size in that way aligns with what we do
elsewhere.

With all that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
  2017-02-02 12:03         ` Mark Rutland
@ 2017-02-02 12:08           ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 12:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 12:03:20PM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 03:24:09PM +0900, AKASHI Takahiro wrote:
> > On Wed, Feb 01, 2017 at 07:21:22PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:

> > > > +/*
> > > > + * reserve_elfcorehdr() - reserves memory for elf core header
> > > > + *
> > > > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > > > + * command line parameter. This region contains all the information about
> > > > + * primary kernel's core image and is used by a dump capture kernel to
> > > > + * access the system memory on primary kernel.
> > > > + */

> Can you also fix the comment above reserve_elfcorehdr()? It refers to a
> non-existent kernel command line option.

... or rather, one that's handled by core code, not
reserve_elfcorehdr().

So more specifically, it would be nicer to have something like:

	/*
	 * reserve_elfcorehdr() - reserves memory for elf core header
	 *
	 * This function reserves the memory occupied by an elf core header
	 * described in the device tree. This region contains all the
	 * information about primary kernel's core image and is used by a dump
	 * capture kernel to access the system memory on primary kernel.
	 */

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
@ 2017-02-02 12:08           ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 12:08 UTC (permalink / raw)
  To: AKASHI Takahiro, catalin.marinas, will.deacon, james.morse,
	geoff, bauerman, dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 12:03:20PM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 03:24:09PM +0900, AKASHI Takahiro wrote:
> > On Wed, Feb 01, 2017 at 07:21:22PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:

> > > > +/*
> > > > + * reserve_elfcorehdr() - reserves memory for elf core header
> > > > + *
> > > > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > > > + * command line parameter. This region contains all the information about
> > > > + * primary kernel's core image and is used by a dump capture kernel to
> > > > + * access the system memory on primary kernel.
> > > > + */

> Can you also fix the comment above reserve_elfcorehdr()? It refers to a
> non-existent kernel command line option.

... or rather, one that's handled by core code, not
reserve_elfcorehdr().

So more specifically, it would be nicer to have something like:

	/*
	 * reserve_elfcorehdr() - reserves memory for elf core header
	 *
	 * This function reserves the memory occupied by an elf core header
	 * described in the device tree. This region contains all the
	 * information about primary kernel's core image and is used by a dump
	 * capture kernel to access the system memory on primary kernel.
	 */

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
  2017-02-02 11:26         ` Mark Rutland
@ 2017-02-02 13:44           ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 11:26:20AM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 01:52:36PM +0900, AKASHI Takahiro wrote:
> > On Wed, Feb 01, 2017 at 03:26:09PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 09:46:22PM +0900, AKASHI Takahiro wrote:
> > > > +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > > > +		crash_size >> 20, crash_base >> 20);
> > > 
> > > We only page-align the size, so the MB will be a little off, but that's
> > > probably OK. However, it would also be nicer to log the base as an
> > > address.
> > 
> > You might notice that the exact same message is used by all the other
> > architectures, but
> 
> Almost all; I see arch/sh prints the address with %08x. ;)
> 
> > > Could we dump this as we do for the kernel memory layout? e.g.
> > > 
> > > 	pr_info("crashkernel reserved: 0x%016lx - 0x%016lx (%lld MB)\n",
> > > 		crash_base, crash_base + crash_size, crash_size >> 20);
> > 
> > We can go either way.
> 
> Even if it's different from other archtiectures, I'd prefer to log as
> above, with the range in hex.

OK.

-Takhiro AKASHI

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel
@ 2017-02-02 13:44           ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 13:44 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Pratyush Anand, geoff, catalin.marinas, will.deacon, james.morse,
	Mark Salter, bauerman, dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 11:26:20AM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 01:52:36PM +0900, AKASHI Takahiro wrote:
> > On Wed, Feb 01, 2017 at 03:26:09PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 09:46:22PM +0900, AKASHI Takahiro wrote:
> > > > +	pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
> > > > +		crash_size >> 20, crash_base >> 20);
> > > 
> > > We only page-align the size, so the MB will be a little off, but that's
> > > probably OK. However, it would also be nicer to log the base as an
> > > address.
> > 
> > You might notice that the exact same message is used by all the other
> > architectures, but
> 
> Almost all; I see arch/sh prints the address with %08x. ;)
> 
> > > Could we dump this as we do for the kernel memory layout? e.g.
> > > 
> > > 	pr_info("crashkernel reserved: 0x%016lx - 0x%016lx (%lld MB)\n",
> > > 		crash_base, crash_base + crash_size, crash_size >> 20);
> > 
> > We can go either way.
> 
> Even if it's different from other archtiectures, I'd prefer to log as
> above, with the range in hex.

OK.

-Takhiro AKASHI

> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-02 11:44         ` Mark Rutland
@ 2017-02-02 14:01           ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 14:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 11:44:38AM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> > On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > > Hi,
> > > 
> > > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > > A new function, remove_pgd_mapping(), is added.
> > > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > > a sub-set of memory range within a section mapping.
> > > 
> > > I'm not keen on adding more page table modification code. It was painful
> > > enough to ensure that those worked in all configurations.
> > > 
> > > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > > and use an invalid prot (i.e. 0), what is the problem?
> > 
> > As I did in v30?
> > (though my implementation in v30 should be improved.)
> 
> Something like that. I wasn't entirely sure why we needed to change
> those functions so much, so I'm clearly missing something there. I'll go
> have another look.

I would be much easier if you see my new code.

> > > I can see that we wouldn't free/reallocate the pud or pmd entries, but
> > > the "wasted" memory should be small. If anything, I'd argue that it's
> > > preferable to keep that around so that we don't have to allocate memory
> > > when we need to map the crashkernel region.
> > > 
> > > Is there another problem I'm missing?
> > 
> > If we don't need to free unused page tables, that would make things
> > much simple. There are still some minor problems on the merge, but
> > we can sort it out.
> 
> I'm not sure I follow what you mean by 'on merge' here. Could you
> elaborate?

What I had in mind is some changes needed to handle "__prot(0)" properly
in alloc_init_pxx(). For example, p[mu]d_set_huge() doesn't make
a "zeroed" entry.

-Takahiro AKASHI

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-02 14:01           ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 14:01 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 11:44:38AM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> > On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > > Hi,
> > > 
> > > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > > A new function, remove_pgd_mapping(), is added.
> > > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > > a sub-set of memory range within a section mapping.
> > > 
> > > I'm not keen on adding more page table modification code. It was painful
> > > enough to ensure that those worked in all configurations.
> > > 
> > > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > > and use an invalid prot (i.e. 0), what is the problem?
> > 
> > As I did in v30?
> > (though my implementation in v30 should be improved.)
> 
> Something like that. I wasn't entirely sure why we needed to change
> those functions so much, so I'm clearly missing something there. I'll go
> have another look.

I would be much easier if you see my new code.

> > > I can see that we wouldn't free/reallocate the pud or pmd entries, but
> > > the "wasted" memory should be small. If anything, I'd argue that it's
> > > preferable to keep that around so that we don't have to allocate memory
> > > when we need to map the crashkernel region.
> > > 
> > > Is there another problem I'm missing?
> > 
> > If we don't need to free unused page tables, that would make things
> > much simple. There are still some minor problems on the merge, but
> > we can sort it out.
> 
> I'm not sure I follow what you mean by 'on merge' here. Could you
> elaborate?

What I had in mind is some changes needed to handle "__prot(0)" properly
in alloc_init_pxx(). For example, p[mu]d_set_huge() doesn't make
a "zeroed" entry.

-Takahiro AKASHI

> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-02 14:01           ` AKASHI Takahiro
@ 2017-02-02 14:35             ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 11:01:03PM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 11:44:38AM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > > > Hi,
> > > > 
> > > > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > > > A new function, remove_pgd_mapping(), is added.
> > > > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > > > a sub-set of memory range within a section mapping.
> > > > 
> > > > I'm not keen on adding more page table modification code. It was painful
> > > > enough to ensure that those worked in all configurations.
> > > > 
> > > > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > > > and use an invalid prot (i.e. 0), what is the problem?
> > > 
> > > As I did in v30?
> > > (though my implementation in v30 should be improved.)
> > 
> > Something like that. I wasn't entirely sure why we needed to change
> > those functions so much, so I'm clearly missing something there. I'll go
> > have another look.
> 
> I would be much easier if you see my new code.

Sure. FWIW, I took a look, and I understand why those changes were
necessary.

> > > If we don't need to free unused page tables, that would make things
> > > much simple. There are still some minor problems on the merge, but
> > > we can sort it out.
> > 
> > I'm not sure I follow what you mean by 'on merge' here. Could you
> > elaborate?
> 
> What I had in mind is some changes needed to handle "__prot(0)" properly
> in alloc_init_pxx(). For example, p[mu]d_set_huge() doesn't make
> a "zeroed" entry.

I think that if we only allow ourselves to make PTEs invalid, we don't
have to handle that case. If we use page_mappings_only, we should only
check pgattr_change_is_safe() for the pte level, and the {pmd,pud,pgd}
entries shouldn't change.

Is the below sufficient to allow that, or have I missed something?

Thanks,
Mark.

---->8----
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e4..05bf7bf 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -105,6 +105,22 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
        return old  == 0 || new  == 0 || ((old ^ new) & ~mask) == 0;
 }

+static bool pte_change_is_valid(pte old, pte new)
+{
+       /*
+        * So long as we subsequently perform TLB invalidation, it is safe to
+        * change a PTE to an invalid, but non-zero value. We only allow this
+        * for PTEs since there's no complicated allocation/free issues to deal
+        * with.
+        *
+        * Otherwise, the usual attribute change rules apply.
+        */
+       if (!pte_valid(old) || !pte_valid(new))
+               return true;
+
+       return pgattr_change_is_safe(pte_val(old), pte_val(new));
+}
+
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
                                  unsigned long end, unsigned long pfn,
                                  pgprot_t prot,
@@ -143,11 +159,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
                set_pte(pte, pfn_pte(pfn, __prot));
                pfn++;

-               /*
-                * After the PTE entry has been populated once, we
-                * only allow updates to the permission attributes.
-                */
-               BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
+               BUG_ON(!pte_change_is_valid(old_pte, pte));

        } while (pte++, addr += PAGE_SIZE, addr != end);

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-02 14:35             ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 14:35 UTC (permalink / raw)
  To: AKASHI Takahiro, catalin.marinas, will.deacon, james.morse,
	geoff, bauerman, dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 11:01:03PM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 11:44:38AM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > > > Hi,
> > > > 
> > > > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > > > A new function, remove_pgd_mapping(), is added.
> > > > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > > > a sub-set of memory range within a section mapping.
> > > > 
> > > > I'm not keen on adding more page table modification code. It was painful
> > > > enough to ensure that those worked in all configurations.
> > > > 
> > > > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > > > and use an invalid prot (i.e. 0), what is the problem?
> > > 
> > > As I did in v30?
> > > (though my implementation in v30 should be improved.)
> > 
> > Something like that. I wasn't entirely sure why we needed to change
> > those functions so much, so I'm clearly missing something there. I'll go
> > have another look.
> 
> I would be much easier if you see my new code.

Sure. FWIW, I took a look, and I understand why those changes were
necessary.

> > > If we don't need to free unused page tables, that would make things
> > > much simple. There are still some minor problems on the merge, but
> > > we can sort it out.
> > 
> > I'm not sure I follow what you mean by 'on merge' here. Could you
> > elaborate?
> 
> What I had in mind is some changes needed to handle "__prot(0)" properly
> in alloc_init_pxx(). For example, p[mu]d_set_huge() doesn't make
> a "zeroed" entry.

I think that if we only allow ourselves to make PTEs invalid, we don't
have to handle that case. If we use page_mappings_only, we should only
check pgattr_change_is_safe() for the pte level, and the {pmd,pud,pgd}
entries shouldn't change.

Is the below sufficient to allow that, or have I missed something?

Thanks,
Mark.

---->8----
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e4..05bf7bf 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -105,6 +105,22 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
        return old  == 0 || new  == 0 || ((old ^ new) & ~mask) == 0;
 }

+static bool pte_change_is_valid(pte old, pte new)
+{
+       /*
+        * So long as we subsequently perform TLB invalidation, it is safe to
+        * change a PTE to an invalid, but non-zero value. We only allow this
+        * for PTEs since there's no complicated allocation/free issues to deal
+        * with.
+        *
+        * Otherwise, the usual attribute change rules apply.
+        */
+       if (!pte_valid(old) || !pte_valid(new))
+               return true;
+
+       return pgattr_change_is_safe(pte_val(old), pte_val(new));
+}
+
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
                                  unsigned long end, unsigned long pfn,
                                  pgprot_t prot,
@@ -143,11 +159,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
                set_pte(pte, pfn_pte(pfn, __prot));
                pfn++;

-               /*
-                * After the PTE entry has been populated once, we
-                * only allow updates to the permission attributes.
-                */
-               BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
+               BUG_ON(!pte_change_is_valid(old_pte, pte));

        } while (pte++, addr += PAGE_SIZE, addr != end);



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-02 11:16         ` Mark Rutland
@ 2017-02-02 14:36           ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 14:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 11:16:37AM +0000, Mark Rutland wrote:
> Hi,
> 
> On Thu, Feb 02, 2017 at 07:31:30PM +0900, AKASHI Takahiro wrote:
> > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > > are meant to be called around kexec_load() in order to protect
> > > > the memory allocated for crash dump kernel once after it's loaded.
> > > > 
> > > > The protection is implemented here by unmapping the region rather than
> > > > making it read-only.
> > > > To make the things work correctly, we also have to
> > > > - put the region in an isolated, page-level mapping initially, and
> > > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > > 
> > > > Note that page-level mapping is also required to allow for shrinking
> > > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > > of multiple pages.
> > > 
> > > Looking at kexec_crash_size_store(), I don't see where memory returned
> > > to the OS is mapped. AFAICT, if the region is protected when the user
> > > shrinks the region, the memory will not be mapped, yet handed over to
> > > the kernel for general allocation.
> > 
> > The region is protected only when the crash dump kernel is loaded,
> > and after that, we are no longer able to shrink the region.
> 
> Ah, sorry. My misunderstanding strikes again. That should be fine; sorry
> for the noise, and thanks for explaining.
> 
> > > > @@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
> > > >  		if (memblock_is_nomap(reg))
> > > >  			continue;
> > > >  
> > > > +#ifdef CONFIG_KEXEC_CORE
> > > > +		/*
> > > > +		 * While crash dump kernel memory is contained in a single
> > > > +		 * memblock for now, it should appear in an isolated mapping
> > > > +		 * so that we can independently unmap the region later.
> > > > +		 */
> > > > +		if (crashk_res.end &&
> > > > +		    (start <= crashk_res.start) &&
> > > > +		    ((crashk_res.end + 1) < end)) {
> > > > +			if (crashk_res.start != start)
> > > > +				__map_memblock(pgd, start, crashk_res.start);
> > > > +
> > > > +			if ((crashk_res.end + 1) < end)
> > > > +				__map_memblock(pgd, crashk_res.end + 1, end);
> > > > +
> > > > +			continue;
> > > > +		}
> > > > +#endif
> > > 
> > > This wasn't quite what I had in mind. I had expected that here we would
> > > isolate the ranges we wanted to avoid mapping (with a comment as to why
> > > we couldn't move the memblock_isolate_range() calls earlier). In
> > > map_memblock(), we'd skip those ranges entirely.
> > > 
> > > I believe the above isn't correct if we have a single memblock.memory
> > > region covering both the crashkernel and kernel regions. In that case,
> > > we'd erroneously map the portion which overlaps the kernel.
> > > 
> > > It seems there are a number of subtle problems here. :/
> > 
> > I didn't see any problems, but I will go back with memblock_isolate_range()
> > here in map_mem().
> 
> Imagine we have phyiscal memory:
> 
> singe RAM bank:    |---------------------------------------------------|
> kernel image:              |---|
> crashkernel:                                      |------|
> 
> ... we reserve the image and crashkernel region, but these would still
> remain part of the memory memblock, and we'd have a memblock layout
> like:
> 
> memblock.memory:   |---------------------------------------------------|
> memblock.reserved:         |---|                  |------|
> 
> ... in map_mem() we iterate over memblock.memory, so we only have a
> single entry to handle in this case. With the code above, we'd find that
> it overlaps the crashk_res, and we'd map the parts which don't overlap,
> e.g.
> 
> memblock.memory:   |---------------------------------------------------|
> crashkernel:                                      |------|
> mapped regions:    |-----------------------------|        |------------|

I'm afraid that you might be talking about my v30.
The code in v31 was a bit modified, and now

> ... hwoever, this means we've mapped the portion which overlaps with the
> kernel's linear alias (i.e. the case that we try to handle in
> __map_memblock()). What we actually wanted was:
> 
> memblock.memory:   |---------------------------------------------------|
> kernel image:              |---|
> crashkernel:                                      |------|

                     |-----------(A)---------------|        |----(B)-----|

__map_memblock() is called against each of (A) and (B),
so I think we will get

> mapped regions:    |------|     |----------------|        |------------|

this mapping.

> 
> 
> To handle all cases I think we have to isolate *both* the image and
> crashkernel in map_mem(). That would leave use with:
> 
> memblock.memory:   |------||---||----------------||------||------------|
> memblock.reserved:         |---|                  |------|
> 
> ... so then we can check for overlap with either the kernel or
> crashkernel in __map_memblock(), and return early, e.g.
> 
> __map_memblock(...)
> 	if (overlaps_with_kernel(...))
> 		return;
> 	if (overlaps_with_crashekrenl(...))
> 		return;
> 	
> 	__create_pgd_mapping(...);
> }
> 
> We can pull the kernel alias mapping out of __map_memblock() and put it
> at the end of map_mem().
> 
> Does that make sense?

OK, I now understand your anticipation.

Thanks,
-Takahiro AKASHI


> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 14:36           ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 14:36 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 11:16:37AM +0000, Mark Rutland wrote:
> Hi,
> 
> On Thu, Feb 02, 2017 at 07:31:30PM +0900, AKASHI Takahiro wrote:
> > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > > are meant to be called around kexec_load() in order to protect
> > > > the memory allocated for crash dump kernel once after it's loaded.
> > > > 
> > > > The protection is implemented here by unmapping the region rather than
> > > > making it read-only.
> > > > To make the things work correctly, we also have to
> > > > - put the region in an isolated, page-level mapping initially, and
> > > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > > 
> > > > Note that page-level mapping is also required to allow for shrinking
> > > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > > of multiple pages.
> > > 
> > > Looking at kexec_crash_size_store(), I don't see where memory returned
> > > to the OS is mapped. AFAICT, if the region is protected when the user
> > > shrinks the region, the memory will not be mapped, yet handed over to
> > > the kernel for general allocation.
> > 
> > The region is protected only when the crash dump kernel is loaded,
> > and after that, we are no longer able to shrink the region.
> 
> Ah, sorry. My misunderstanding strikes again. That should be fine; sorry
> for the noise, and thanks for explaining.
> 
> > > > @@ -538,6 +540,24 @@ static void __init map_mem(pgd_t *pgd)
> > > >  		if (memblock_is_nomap(reg))
> > > >  			continue;
> > > >  
> > > > +#ifdef CONFIG_KEXEC_CORE
> > > > +		/*
> > > > +		 * While crash dump kernel memory is contained in a single
> > > > +		 * memblock for now, it should appear in an isolated mapping
> > > > +		 * so that we can independently unmap the region later.
> > > > +		 */
> > > > +		if (crashk_res.end &&
> > > > +		    (start <= crashk_res.start) &&
> > > > +		    ((crashk_res.end + 1) < end)) {
> > > > +			if (crashk_res.start != start)
> > > > +				__map_memblock(pgd, start, crashk_res.start);
> > > > +
> > > > +			if ((crashk_res.end + 1) < end)
> > > > +				__map_memblock(pgd, crashk_res.end + 1, end);
> > > > +
> > > > +			continue;
> > > > +		}
> > > > +#endif
> > > 
> > > This wasn't quite what I had in mind. I had expected that here we would
> > > isolate the ranges we wanted to avoid mapping (with a comment as to why
> > > we couldn't move the memblock_isolate_range() calls earlier). In
> > > map_memblock(), we'd skip those ranges entirely.
> > > 
> > > I believe the above isn't correct if we have a single memblock.memory
> > > region covering both the crashkernel and kernel regions. In that case,
> > > we'd erroneously map the portion which overlaps the kernel.
> > > 
> > > It seems there are a number of subtle problems here. :/
> > 
> > I didn't see any problems, but I will go back with memblock_isolate_range()
> > here in map_mem().
> 
> Imagine we have phyiscal memory:
> 
> singe RAM bank:    |---------------------------------------------------|
> kernel image:              |---|
> crashkernel:                                      |------|
> 
> ... we reserve the image and crashkernel region, but these would still
> remain part of the memory memblock, and we'd have a memblock layout
> like:
> 
> memblock.memory:   |---------------------------------------------------|
> memblock.reserved:         |---|                  |------|
> 
> ... in map_mem() we iterate over memblock.memory, so we only have a
> single entry to handle in this case. With the code above, we'd find that
> it overlaps the crashk_res, and we'd map the parts which don't overlap,
> e.g.
> 
> memblock.memory:   |---------------------------------------------------|
> crashkernel:                                      |------|
> mapped regions:    |-----------------------------|        |------------|

I'm afraid that you might be talking about my v30.
The code in v31 was a bit modified, and now

> ... hwoever, this means we've mapped the portion which overlaps with the
> kernel's linear alias (i.e. the case that we try to handle in
> __map_memblock()). What we actually wanted was:
> 
> memblock.memory:   |---------------------------------------------------|
> kernel image:              |---|
> crashkernel:                                      |------|

                     |-----------(A)---------------|        |----(B)-----|

__map_memblock() is called against each of (A) and (B),
so I think we will get

> mapped regions:    |------|     |----------------|        |------------|

this mapping.

> 
> 
> To handle all cases I think we have to isolate *both* the image and
> crashkernel in map_mem(). That would leave use with:
> 
> memblock.memory:   |------||---||----------------||------||------------|
> memblock.reserved:         |---|                  |------|
> 
> ... so then we can check for overlap with either the kernel or
> crashkernel in __map_memblock(), and return early, e.g.
> 
> __map_memblock(...)
> 	if (overlaps_with_kernel(...))
> 		return;
> 	if (overlaps_with_crashekrenl(...))
> 		return;
> 	
> 	__create_pgd_mapping(...);
> }
> 
> We can pull the kernel alias mapping out of __map_memblock() and put it
> at the end of map_mem().
> 
> Does that make sense?

OK, I now understand your anticipation.

Thanks,
-Takahiro AKASHI


> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
  2017-02-02 12:08           ` Mark Rutland
@ 2017-02-02 14:39             ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 14:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 12:08:50PM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 12:03:20PM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 03:24:09PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Feb 01, 2017 at 07:21:22PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:
> 
> > > > > +/*
> > > > > + * reserve_elfcorehdr() - reserves memory for elf core header
> > > > > + *
> > > > > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > > > > + * command line parameter. This region contains all the information about
> > > > > + * primary kernel's core image and is used by a dump capture kernel to
> > > > > + * access the system memory on primary kernel.
> > > > > + */
> 
> > Can you also fix the comment above reserve_elfcorehdr()? It refers to a
> > non-existent kernel command line option.
> 
> ... or rather, one that's handled by core code, not
> reserve_elfcorehdr().
> 
> So more specifically, it would be nicer to have something like:
> 
> 	/*
> 	 * reserve_elfcorehdr() - reserves memory for elf core header
> 	 *
> 	 * This function reserves the memory occupied by an elf core header
> 	 * described in the device tree. This region contains all the
> 	 * information about primary kernel's core image and is used by a dump
> 	 * capture kernel to access the system memory on primary kernel.
> 	 */

Perfect!

Thanks a lot,
-Takahiro AKASHI


> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file
@ 2017-02-02 14:39             ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 14:39 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 12:08:50PM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 12:03:20PM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 03:24:09PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Feb 01, 2017 at 07:21:22PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 01, 2017 at 09:46:28PM +0900, AKASHI Takahiro wrote:
> 
> > > > > +/*
> > > > > + * reserve_elfcorehdr() - reserves memory for elf core header
> > > > > + *
> > > > > + * This function reserves elf core header given in "elfcorehdr=" kernel
> > > > > + * command line parameter. This region contains all the information about
> > > > > + * primary kernel's core image and is used by a dump capture kernel to
> > > > > + * access the system memory on primary kernel.
> > > > > + */
> 
> > Can you also fix the comment above reserve_elfcorehdr()? It refers to a
> > non-existent kernel command line option.
> 
> ... or rather, one that's handled by core code, not
> reserve_elfcorehdr().
> 
> So more specifically, it would be nicer to have something like:
> 
> 	/*
> 	 * reserve_elfcorehdr() - reserves memory for elf core header
> 	 *
> 	 * This function reserves the memory occupied by an elf core header
> 	 * described in the device tree. This region contains all the
> 	 * information about primary kernel's core image and is used by a dump
> 	 * capture kernel to access the system memory on primary kernel.
> 	 */

Perfect!

Thanks a lot,
-Takahiro AKASHI


> Thanks,
> Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-02 14:35             ` Mark Rutland
@ 2017-02-02 14:55               ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 14:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 02:35:35PM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 11:01:03PM +0900, AKASHI Takahiro wrote:
> > On Thu, Feb 02, 2017 at 11:44:38AM +0000, Mark Rutland wrote:
> > > On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> > > > On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > > > > Hi,
> > > > > 
> > > > > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > > > > A new function, remove_pgd_mapping(), is added.
> > > > > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > > > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > > > > a sub-set of memory range within a section mapping.
> > > > > 
> > > > > I'm not keen on adding more page table modification code. It was painful
> > > > > enough to ensure that those worked in all configurations.
> > > > > 
> > > > > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > > > > and use an invalid prot (i.e. 0), what is the problem?
> > > > 
> > > > As I did in v30?
> > > > (though my implementation in v30 should be improved.)
> > > 
> > > Something like that. I wasn't entirely sure why we needed to change
> > > those functions so much, so I'm clearly missing something there. I'll go
> > > have another look.
> > 
> > I would be much easier if you see my new code.
> 
> Sure. FWIW, I took a look, and I understand why those changes were
> necessary.
> 
> > > > If we don't need to free unused page tables, that would make things
> > > > much simple. There are still some minor problems on the merge, but
> > > > we can sort it out.
> > > 
> > > I'm not sure I follow what you mean by 'on merge' here. Could you
> > > elaborate?
> > 
> > What I had in mind is some changes needed to handle "__prot(0)" properly
> > in alloc_init_pxx(). For example, p[mu]d_set_huge() doesn't make
> > a "zeroed" entry.
> 
> I think that if we only allow ourselves to make PTEs invalid, we don't
> have to handle that case. If we use page_mappings_only, we should only
> check pgattr_change_is_safe() for the pte level, and the {pmd,pud,pgd}
> entries shouldn't change.
> 
> Is the below sufficient to allow that, or have I missed something?

I think it will be OK, but will double-check tomorrow.
However, is is acceptable that create_pgd_mapping( __prot(0) ) can
only handle the cases of page-mapping-only?
That would be fine to kdump, but in general?

-Takahiro AKASHI

> Thanks,
> Mark.
> 
> ---->8----
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17243e4..05bf7bf 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -105,6 +105,22 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>         return old  == 0 || new  == 0 || ((old ^ new) & ~mask) == 0;
>  }
> 
> +static bool pte_change_is_valid(pte old, pte new)
> +{
> +       /*
> +        * So long as we subsequently perform TLB invalidation, it is safe to
> +        * change a PTE to an invalid, but non-zero value. We only allow this
> +        * for PTEs since there's no complicated allocation/free issues to deal
> +        * with.
> +        *
> +        * Otherwise, the usual attribute change rules apply.
> +        */
> +       if (!pte_valid(old) || !pte_valid(new))
> +               return true;
> +
> +       return pgattr_change_is_safe(pte_val(old), pte_val(new));
> +}
> +
>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>                                   unsigned long end, unsigned long pfn,
>                                   pgprot_t prot,
> @@ -143,11 +159,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>                 set_pte(pte, pfn_pte(pfn, __prot));
>                 pfn++;
> 
> -               /*
> -                * After the PTE entry has been populated once, we
> -                * only allow updates to the permission attributes.
> -                */
> -               BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
> +               BUG_ON(!pte_change_is_valid(old_pte, pte));
> 
>         } while (pte++, addr += PAGE_SIZE, addr != end);
> 
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-02 14:55               ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-02 14:55 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 02:35:35PM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 11:01:03PM +0900, AKASHI Takahiro wrote:
> > On Thu, Feb 02, 2017 at 11:44:38AM +0000, Mark Rutland wrote:
> > > On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> > > > On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > > > > Hi,
> > > > > 
> > > > > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > > > > A new function, remove_pgd_mapping(), is added.
> > > > > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > > > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > > > > a sub-set of memory range within a section mapping.
> > > > > 
> > > > > I'm not keen on adding more page table modification code. It was painful
> > > > > enough to ensure that those worked in all configurations.
> > > > > 
> > > > > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > > > > and use an invalid prot (i.e. 0), what is the problem?
> > > > 
> > > > As I did in v30?
> > > > (though my implementation in v30 should be improved.)
> > > 
> > > Something like that. I wasn't entirely sure why we needed to change
> > > those functions so much, so I'm clearly missing something there. I'll go
> > > have another look.
> > 
> > I would be much easier if you see my new code.
> 
> Sure. FWIW, I took a look, and I understand why those changes were
> necessary.
> 
> > > > If we don't need to free unused page tables, that would make things
> > > > much simple. There are still some minor problems on the merge, but
> > > > we can sort it out.
> > > 
> > > I'm not sure I follow what you mean by 'on merge' here. Could you
> > > elaborate?
> > 
> > What I had in mind is some changes needed to handle "__prot(0)" properly
> > in alloc_init_pxx(). For example, p[mu]d_set_huge() doesn't make
> > a "zeroed" entry.
> 
> I think that if we only allow ourselves to make PTEs invalid, we don't
> have to handle that case. If we use page_mappings_only, we should only
> check pgattr_change_is_safe() for the pte level, and the {pmd,pud,pgd}
> entries shouldn't change.
> 
> Is the below sufficient to allow that, or have I missed something?

I think it will be OK, but will double-check tomorrow.
However, is is acceptable that create_pgd_mapping( __prot(0) ) can
only handle the cases of page-mapping-only?
That would be fine to kdump, but in general?

-Takahiro AKASHI

> Thanks,
> Mark.
> 
> ---->8----
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17243e4..05bf7bf 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -105,6 +105,22 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>         return old  == 0 || new  == 0 || ((old ^ new) & ~mask) == 0;
>  }
> 
> +static bool pte_change_is_valid(pte old, pte new)
> +{
> +       /*
> +        * So long as we subsequently perform TLB invalidation, it is safe to
> +        * change a PTE to an invalid, but non-zero value. We only allow this
> +        * for PTEs since there's no complicated allocation/free issues to deal
> +        * with.
> +        *
> +        * Otherwise, the usual attribute change rules apply.
> +        */
> +       if (!pte_valid(old) || !pte_valid(new))
> +               return true;
> +
> +       return pgattr_change_is_safe(pte_val(old), pte_val(new));
> +}
> +
>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>                                   unsigned long end, unsigned long pfn,
>                                   pgprot_t prot,
> @@ -143,11 +159,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>                 set_pte(pte, pfn_pte(pfn, __prot));
>                 pfn++;
> 
> -               /*
> -                * After the PTE entry has been populated once, we
> -                * only allow updates to the permission attributes.
> -                */
> -               BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
> +               BUG_ON(!pte_change_is_valid(old_pte, pte));
> 
>         } while (pte++, addr += PAGE_SIZE, addr != end);
> 
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-02 14:36           ` AKASHI Takahiro
@ 2017-02-02 15:36             ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 02, 2017 at 11:36:04PM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 11:16:37AM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 07:31:30PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:

> > Imagine we have phyiscal memory:
> > 
> > singe RAM bank:    |---------------------------------------------------|
> > kernel image:              |---|
> > crashkernel:                                      |------|
> > 
> > ... we reserve the image and crashkernel region, but these would still
> > remain part of the memory memblock, and we'd have a memblock layout
> > like:
> > 
> > memblock.memory:   |---------------------------------------------------|
> > memblock.reserved:         |---|                  |------|
> > 
> > ... in map_mem() we iterate over memblock.memory, so we only have a
> > single entry to handle in this case. With the code above, we'd find that
> > it overlaps the crashk_res, and we'd map the parts which don't overlap,
> > e.g.
> > 
> > memblock.memory:   |---------------------------------------------------|
> > crashkernel:                                      |------|
> > mapped regions:    |-----------------------------|        |------------|
> 
> I'm afraid that you might be talking about my v30.
> The code in v31 was a bit modified, and now
> 
> > ... hwoever, this means we've mapped the portion which overlaps with the
> > kernel's linear alias (i.e. the case that we try to handle in
> > __map_memblock()). What we actually wanted was:
> > 
> > memblock.memory:   |---------------------------------------------------|
> > kernel image:              |---|
> > crashkernel:                                      |------|
> 
>                      |-----------(A)---------------|        |----(B)-----|
> 
> __map_memblock() is called against each of (A) and (B),
> so I think we will get
> 
> > mapped regions:    |------|     |----------------|        |------------|
> 
> this mapping.

Ah, I see. You are correct; this will work.

Clearly I needed more coffee before considering this. :/

> > To handle all cases I think we have to isolate *both* the image and
> > crashkernel in map_mem(). That would leave use with:
> > 
> > memblock.memory:   |------||---||----------------||------||------------|
> > memblock.reserved:         |---|                  |------|
> > 
> > ... so then we can check for overlap with either the kernel or
> > crashkernel in __map_memblock(), and return early, e.g.
> > 
> > __map_memblock(...)
> > 	if (overlaps_with_kernel(...))
> > 		return;
> > 	if (overlaps_with_crashekrenl(...))
> > 		return;
> > 	
> > 	__create_pgd_mapping(...);
> > }
> > 
> > We can pull the kernel alias mapping out of __map_memblock() and put it
> > at the end of map_mem().
> > 
> > Does that make sense?
> 
> OK, I now understand your anticipation.

Thinking about it, we can do this consistently in one place if we use nomap
(temporarily), which would allow us to simplify the existing code. We'd have to
introduce memblock_clear_nomap(), but that's less of an internal detail than
memblock_isolate_range(), so that seems reasonable.

I think we can handle this like:

/*
 * Create a linear mapping for all memblock memory regions which are mappable.
 */
static void __init __map_mem(pgd_t *pgd)
{
        struct memblock_region *reg;

        /* map all the memory banks */
        for_each_memblock(memory, reg) {
                phys_addr_t start = reg->base;
                phys_addr_t end = start + reg->size;

                if (start >= end)
                        break;
                if (memblock_is_nomap(reg))
                        continue;

                __create_pgd_mapping(pgd, start, __phys_to_virt(start),
                                     end - start, PAGE_KERNEL,
                                     early_pgtable_alloc,
                                     debug_pagealloc_enabled());
        }   
}

/*
 * Create a linear mapping for mappable memory, handling regions which have
 * special mapping requirements.
 */
static void __init map_mem(pgd_t *pgd)
{
        unsigned long kernel_start = __pa(_text);
        unsigned long kernel_end = __pa(__init_begin);

        memblock_mark_nomap(kernel_start, kernel_end);

        // mark crashkernel nomap here

        __map_mem(pgd);

        /*  
         * Map the linear alias of the [_text, __init_begin) interval as
         * read-only/non-executable. This makes the contents of the
         * region accessible to subsystems such as hibernate, but
         * protects it from inadvertent modification or execution.
         */
        __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
                             kernel_end - kernel_start, PAGE_KERNEL_RO,
                             early_pgtable_alloc, debug_pagealloc_enabled());
        memblock_clear_nomap(kernel_start, kernel_end);

        // map crashkernel here
        // clear nomap for crashkernel
}

Thoughts?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-02 15:36             ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-02 15:36 UTC (permalink / raw)
  To: AKASHI Takahiro, catalin.marinas, will.deacon, james.morse,
	geoff, bauerman, dyoung, kexec, linux-arm-kernel

On Thu, Feb 02, 2017 at 11:36:04PM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 11:16:37AM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 07:31:30PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:

> > Imagine we have phyiscal memory:
> > 
> > singe RAM bank:    |---------------------------------------------------|
> > kernel image:              |---|
> > crashkernel:                                      |------|
> > 
> > ... we reserve the image and crashkernel region, but these would still
> > remain part of the memory memblock, and we'd have a memblock layout
> > like:
> > 
> > memblock.memory:   |---------------------------------------------------|
> > memblock.reserved:         |---|                  |------|
> > 
> > ... in map_mem() we iterate over memblock.memory, so we only have a
> > single entry to handle in this case. With the code above, we'd find that
> > it overlaps the crashk_res, and we'd map the parts which don't overlap,
> > e.g.
> > 
> > memblock.memory:   |---------------------------------------------------|
> > crashkernel:                                      |------|
> > mapped regions:    |-----------------------------|        |------------|
> 
> I'm afraid that you might be talking about my v30.
> The code in v31 was a bit modified, and now
> 
> > ... hwoever, this means we've mapped the portion which overlaps with the
> > kernel's linear alias (i.e. the case that we try to handle in
> > __map_memblock()). What we actually wanted was:
> > 
> > memblock.memory:   |---------------------------------------------------|
> > kernel image:              |---|
> > crashkernel:                                      |------|
> 
>                      |-----------(A)---------------|        |----(B)-----|
> 
> __map_memblock() is called against each of (A) and (B),
> so I think we will get
> 
> > mapped regions:    |------|     |----------------|        |------------|
> 
> this mapping.

Ah, I see. You are correct; this will work.

Clearly I needed more coffee before considering this. :/

> > To handle all cases I think we have to isolate *both* the image and
> > crashkernel in map_mem(). That would leave use with:
> > 
> > memblock.memory:   |------||---||----------------||------||------------|
> > memblock.reserved:         |---|                  |------|
> > 
> > ... so then we can check for overlap with either the kernel or
> > crashkernel in __map_memblock(), and return early, e.g.
> > 
> > __map_memblock(...)
> > 	if (overlaps_with_kernel(...))
> > 		return;
> > 	if (overlaps_with_crashekrenl(...))
> > 		return;
> > 	
> > 	__create_pgd_mapping(...);
> > }
> > 
> > We can pull the kernel alias mapping out of __map_memblock() and put it
> > at the end of map_mem().
> > 
> > Does that make sense?
> 
> OK, I now understand your anticipation.

Thinking about it, we can do this consistently in one place if we use nomap
(temporarily), which would allow us to simplify the existing code. We'd have to
introduce memblock_clear_nomap(), but that's less of an internal detail than
memblock_isolate_range(), so that seems reasonable.

I think we can handle this like:

/*
 * Create a linear mapping for all memblock memory regions which are mappable.
 */
static void __init __map_mem(pgd_t *pgd)
{
        struct memblock_region *reg;

        /* map all the memory banks */
        for_each_memblock(memory, reg) {
                phys_addr_t start = reg->base;
                phys_addr_t end = start + reg->size;

                if (start >= end)
                        break;
                if (memblock_is_nomap(reg))
                        continue;

                __create_pgd_mapping(pgd, start, __phys_to_virt(start),
                                     end - start, PAGE_KERNEL,
                                     early_pgtable_alloc,
                                     debug_pagealloc_enabled());
        }   
}

/*
 * Create a linear mapping for mappable memory, handling regions which have
 * special mapping requirements.
 */
static void __init map_mem(pgd_t *pgd)
{
        unsigned long kernel_start = __pa(_text);
        unsigned long kernel_end = __pa(__init_begin);

        memblock_mark_nomap(kernel_start, kernel_end);

        // mark crashkernel nomap here

        __map_mem(pgd);

        /*  
         * Map the linear alias of the [_text, __init_begin) interval as
         * read-only/non-executable. This makes the contents of the
         * region accessible to subsystems such as hibernate, but
         * protects it from inadvertent modification or execution.
         */
        __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
                             kernel_end - kernel_start, PAGE_KERNEL_RO,
                             early_pgtable_alloc, debug_pagealloc_enabled());
        memblock_clear_nomap(kernel_start, kernel_end);

        // map crashkernel here
        // clear nomap for crashkernel
}

Thoughts?

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-02 11:54           ` Mark Rutland
@ 2017-02-03  1:45             ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-03  1:45 UTC (permalink / raw)
  To: linux-arm-kernel

Mark,

On Thu, Feb 02, 2017 at 11:54:25AM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 07:39:06PM +0900, AKASHI Takahiro wrote:
> > Mark,
> > 
> > On Wed, Feb 01, 2017 at 06:25:06PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > > > are meant to be called around kexec_load() in order to protect
> > > > > the memory allocated for crash dump kernel once after it's loaded.
> > > > > 
> > > > > The protection is implemented here by unmapping the region rather than
> > > > > making it read-only.
> > > > > To make the things work correctly, we also have to
> > > > > - put the region in an isolated, page-level mapping initially, and
> > > > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > > > 
> > > > > Note that page-level mapping is also required to allow for shrinking
> > > > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > > > of multiple pages.
> > > > 
> > > > Looking at kexec_crash_size_store(), I don't see where memory returned
> > > > to the OS is mapped. AFAICT, if the region is protected when the user
> > > > shrinks the region, the memory will not be mapped, yet handed over to
> > > > the kernel for general allocation.
> > > > 
> > > > Surely we need an arch-specific callback to handle that? e.g.
> > > > 
> > > > arch_crash_release_region(unsigned long base, unsigned long size)
> > > > {
> > > > 	/*
> > > > 	 * Ensure the region is part of the linear map before we return
> > > > 	 * it to the OS. We won't unmap this again, so we can use block
> > > > 	 * mappings.
> > > > 	 */
> > > > 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> > > > 			   size, PAGE_KERNEL, false);
> > > > }
> > > > 
> > > > ... which we'd call from crash_shrink_memory() before we freed the
> > > > reserved pages.
> > > 
> > > Another question is (how) does hyp map this region?
> > 
> > I don't get your point here.
> > Hyp mode does care only physical memory in intermediate address, doesn't it?
> 
> My concern was that hyp may map the region; and thus buggy code at hyp
> can corrupt the region (and/or hyp may conflict w.r.t. attributes).

Grep'ing create_hyp_mappings() under arch/arm(64)/kvm shows that
we have only a few small regions of memory mapped in hyp mode.
I also confirmed that there is no active mapping for crash dump kernel
memory by checking mmu tables with DS-5 debugger.

> We mght have to ensure hyp doesn't map the crashkernel region, and to
> case us pain, disallow freeing of any part of the region.

So I don't believe we need to worry such a case.

Thanks,
-Takahiro AKASHI

> I'll dig into this.
> 
> Thanks,
> Mark.
> 
> > If this is not a matter now, I will post v32 tomorrow :)
> > 
> > -Takahiro AKASHI
> > 
> > 
> > > Thanks,
> > > Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-03  1:45             ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-03  1:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

Mark,

On Thu, Feb 02, 2017 at 11:54:25AM +0000, Mark Rutland wrote:
> On Thu, Feb 02, 2017 at 07:39:06PM +0900, AKASHI Takahiro wrote:
> > Mark,
> > 
> > On Wed, Feb 01, 2017 at 06:25:06PM +0000, Mark Rutland wrote:
> > > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 01, 2017 at 09:46:24PM +0900, AKASHI Takahiro wrote:
> > > > > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
> > > > > are meant to be called around kexec_load() in order to protect
> > > > > the memory allocated for crash dump kernel once after it's loaded.
> > > > > 
> > > > > The protection is implemented here by unmapping the region rather than
> > > > > making it read-only.
> > > > > To make the things work correctly, we also have to
> > > > > - put the region in an isolated, page-level mapping initially, and
> > > > > - move copying kexec's control_code_page to machine_kexec_prepare()
> > > > > 
> > > > > Note that page-level mapping is also required to allow for shrinking
> > > > > the size of memory, through /sys/kernel/kexec_crash_size, by any number
> > > > > of multiple pages.
> > > > 
> > > > Looking at kexec_crash_size_store(), I don't see where memory returned
> > > > to the OS is mapped. AFAICT, if the region is protected when the user
> > > > shrinks the region, the memory will not be mapped, yet handed over to
> > > > the kernel for general allocation.
> > > > 
> > > > Surely we need an arch-specific callback to handle that? e.g.
> > > > 
> > > > arch_crash_release_region(unsigned long base, unsigned long size)
> > > > {
> > > > 	/*
> > > > 	 * Ensure the region is part of the linear map before we return
> > > > 	 * it to the OS. We won't unmap this again, so we can use block
> > > > 	 * mappings.
> > > > 	 */
> > > > 	create_pgd_mapping(&init_mm, start, __phys_to_virt(start),
> > > > 			   size, PAGE_KERNEL, false);
> > > > }
> > > > 
> > > > ... which we'd call from crash_shrink_memory() before we freed the
> > > > reserved pages.
> > > 
> > > Another question is (how) does hyp map this region?
> > 
> > I don't get your point here.
> > Hyp mode does care only physical memory in intermediate address, doesn't it?
> 
> My concern was that hyp may map the region; and thus buggy code at hyp
> can corrupt the region (and/or hyp may conflict w.r.t. attributes).

Grep'ing create_hyp_mappings() under arch/arm(64)/kvm shows that
we have only a few small regions of memory mapped in hyp mode.
I also confirmed that there is no active mapping for crash dump kernel
memory by checking mmu tables with DS-5 debugger.

> We mght have to ensure hyp doesn't map the crashkernel region, and to
> case us pain, disallow freeing of any part of the region.

So I don't believe we need to worry such a case.

Thanks,
-Takahiro AKASHI

> I'll dig into this.
> 
> Thanks,
> Mark.
> 
> > If this is not a matter now, I will post v32 tomorrow :)
> > 
> > -Takahiro AKASHI
> > 
> > 
> > > Thanks,
> > > Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-02 14:55               ` AKASHI Takahiro
@ 2017-02-03  6:13                 ` AKASHI Takahiro
  -1 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-03  6:13 UTC (permalink / raw)
  To: linux-arm-kernel

Mark,

On Thu, Feb 02, 2017 at 11:55:54PM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 02:35:35PM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 11:01:03PM +0900, AKASHI Takahiro wrote:
> > > On Thu, Feb 02, 2017 at 11:44:38AM +0000, Mark Rutland wrote:
> > > > On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> > > > > On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > > > > > A new function, remove_pgd_mapping(), is added.
> > > > > > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > > > > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > > > > > a sub-set of memory range within a section mapping.
> > > > > > 
> > > > > > I'm not keen on adding more page table modification code. It was painful
> > > > > > enough to ensure that those worked in all configurations.
> > > > > > 
> > > > > > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > > > > > and use an invalid prot (i.e. 0), what is the problem?
> > > > > 
> > > > > As I did in v30?
> > > > > (though my implementation in v30 should be improved.)
> > > > 
> > > > Something like that. I wasn't entirely sure why we needed to change
> > > > those functions so much, so I'm clearly missing something there. I'll go
> > > > have another look.
> > > 
> > > I would be much easier if you see my new code.
> > 
> > Sure. FWIW, I took a look, and I understand why those changes were
> > necessary.
> > 
> > > > > If we don't need to free unused page tables, that would make things
> > > > > much simple. There are still some minor problems on the merge, but
> > > > > we can sort it out.
> > > > 
> > > > I'm not sure I follow what you mean by 'on merge' here. Could you
> > > > elaborate?
> > > 
> > > What I had in mind is some changes needed to handle "__prot(0)" properly
> > > in alloc_init_pxx(). For example, p[mu]d_set_huge() doesn't make
> > > a "zeroed" entry.
> > 
> > I think that if we only allow ourselves to make PTEs invalid, we don't
> > have to handle that case. If we use page_mappings_only, we should only
> > check pgattr_change_is_safe() for the pte level, and the {pmd,pud,pgd}
> > entries shouldn't change.
> > 
> > Is the below sufficient to allow that, or have I missed something?
> 
> I think it will be OK, but will double-check tomorrow.
> However, is is acceptable that create_pgd_mapping( __prot(0) ) can
> only handle the cases of page-mapping-only?
> That would be fine to kdump, but in general?

My proposed code is attached below.
I think that the changes are quite trivial and it works even if
there is a section mapping as far as we refrain from reclaiming
unsed p[ug]d tables.

(Of course, we can't merely unmap a subset of a section mapping here.)

Thanks,
-Takahiro AKASHI

===8<===
arch/arm64/include/asm/pgtable-prot.h |  1 +
arch/arm64/mm/mmu.c                   | 36 +++++++++++++++++++++++++----------
2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 2142c7726e76..945d84cd5df7 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -54,6 +54,7 @@
#define PAGE_KERNEL_ROX		__pgprot(_PAGE_DEFAULT | PTE_UXN | PTE_DIRTY | PTE_RDONLY)
#define PAGE_KERNEL_EXEC	__pgprot(_PAGE_DEFAULT | PTE_UXN | PTE_DIRTY | PTE_WRITE)
#define PAGE_KERNEL_EXEC_CONT	__pgprot(_PAGE_DEFAULT | PTE_UXN | PTE_DIRTY | PTE_WRITE | PTE_CONT)
+#define PAGE_KERNEL_INVALID	__pgprot(0)

#define PAGE_HYP		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_HYP_XN)
#define PAGE_HYP_EXEC		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e43184e..7f96eabc99d7 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -140,7 +140,11 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
                        __prot = prot;
        }

-		set_pte(pte, pfn_pte(pfn, __prot));
+		if (pgprot_val(prot) & PTE_VALID)
+			set_pte(pte, pfn_pte(pfn, __prot));
+		else
+			pte_clear(null, null, pte);
+
        pfn++;

        /*
@@ -185,7 +189,8 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,

        /* try section mapping first */
        if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
-		      !page_mappings_only) {
+		    !page_mappings_only &&
+		    (pmd_none(old_pmd) || pmd_sect(old_pmd))) {
                /*
                 * Set the contiguous bit for the subsequent group of
                 * PMDs if its size and alignment are appropriate.
@@ -256,7 +261,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
        /*
         * For 4K granule only, attempt to put down a 1GB block
         */
-		if (use_1G_block(addr, next, phys) && !page_mappings_only) {
+		if (use_1G_block(addr, next, phys) && !page_mappings_only &&
+		    (pud_none(old_pud) || pud_sect(old_pud))) {
                pud_set_huge(pud, phys, prot);

                /*
@@ -334,12 +340,10 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
}

-void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
-			       unsigned long virt, phys_addr_t size,
-			       pgprot_t prot, bool page_mappings_only)
+void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
+			unsigned long virt, phys_addr_t size,
+			pgprot_t prot, bool page_mappings_only)
{
-	BUG_ON(mm == &init_mm);
-
__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
                     pgd_pgtable_alloc, page_mappings_only);
}
@@ -791,14 +795,26 @@ int __init arch_ioremap_pmd_supported(void)
int pud_set_huge(pud_t *pud, phys_addr_t phys, pgprot_t prot)
{
BUG_ON(phys & ~PUD_MASK);
-	set_pud(pud, __pud(phys | PUD_TYPE_SECT | pgprot_val(mk_sect_prot(prot))));
+
+	if (pgprot_val(prot) & PTE_VALID)
+		set_pud(pud, __pud(phys | PUD_TYPE_SECT |
+				   pgprot_val(mk_sect_prot(prot))));
+	else
+		pud_clear(pud);
+
return 1;
}

int pmd_set_huge(pmd_t *pmd, phys_addr_t phys, pgprot_t prot)
{
BUG_ON(phys & ~PMD_MASK);
-	set_pmd(pmd, __pmd(phys | PMD_TYPE_SECT | pgprot_val(mk_sect_prot(prot))));
+
+	if (pgprot_val(prot) & PTE_VALID)
+		set_pmd(pmd, __pmd(phys | PMD_TYPE_SECT |
+				   pgprot_val(mk_sect_prot(prot))));
+	else
+		pmd_clear(pmd);
+
return 1;
}
===>8===

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-03  6:13                 ` AKASHI Takahiro
  0 siblings, 0 replies; 92+ messages in thread
From: AKASHI Takahiro @ 2017-02-03  6:13 UTC (permalink / raw)
  To: Mark Rutland
  Cc: geoff, catalin.marinas, will.deacon, james.morse, bauerman,
	dyoung, kexec, linux-arm-kernel

Mark,

On Thu, Feb 02, 2017 at 11:55:54PM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 02:35:35PM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 11:01:03PM +0900, AKASHI Takahiro wrote:
> > > On Thu, Feb 02, 2017 at 11:44:38AM +0000, Mark Rutland wrote:
> > > > On Thu, Feb 02, 2017 at 07:21:32PM +0900, AKASHI Takahiro wrote:
> > > > > On Wed, Feb 01, 2017 at 04:03:54PM +0000, Mark Rutland wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On Wed, Feb 01, 2017 at 09:46:23PM +0900, AKASHI Takahiro wrote:
> > > > > > > A new function, remove_pgd_mapping(), is added.
> > > > > > > It allows us to unmap a specific portion of kernel mapping later as far as
> > > > > > > the mapping is made using create_pgd_mapping() and unless we try to free
> > > > > > > a sub-set of memory range within a section mapping.
> > > > > > 
> > > > > > I'm not keen on adding more page table modification code. It was painful
> > > > > > enough to ensure that those worked in all configurations.
> > > > > > 
> > > > > > Why can't we reuse create_pgd_mapping()? If we pass page_mappings_only,
> > > > > > and use an invalid prot (i.e. 0), what is the problem?
> > > > > 
> > > > > As I did in v30?
> > > > > (though my implementation in v30 should be improved.)
> > > > 
> > > > Something like that. I wasn't entirely sure why we needed to change
> > > > those functions so much, so I'm clearly missing something there. I'll go
> > > > have another look.
> > > 
> > > I would be much easier if you see my new code.
> > 
> > Sure. FWIW, I took a look, and I understand why those changes were
> > necessary.
> > 
> > > > > If we don't need to free unused page tables, that would make things
> > > > > much simple. There are still some minor problems on the merge, but
> > > > > we can sort it out.
> > > > 
> > > > I'm not sure I follow what you mean by 'on merge' here. Could you
> > > > elaborate?
> > > 
> > > What I had in mind is some changes needed to handle "__prot(0)" properly
> > > in alloc_init_pxx(). For example, p[mu]d_set_huge() doesn't make
> > > a "zeroed" entry.
> > 
> > I think that if we only allow ourselves to make PTEs invalid, we don't
> > have to handle that case. If we use page_mappings_only, we should only
> > check pgattr_change_is_safe() for the pte level, and the {pmd,pud,pgd}
> > entries shouldn't change.
> > 
> > Is the below sufficient to allow that, or have I missed something?
> 
> I think it will be OK, but will double-check tomorrow.
> However, is is acceptable that create_pgd_mapping( __prot(0) ) can
> only handle the cases of page-mapping-only?
> That would be fine to kdump, but in general?

My proposed code is attached below.
I think that the changes are quite trivial and it works even if
there is a section mapping as far as we refrain from reclaiming
unsed p[ug]d tables.

(Of course, we can't merely unmap a subset of a section mapping here.)

Thanks,
-Takahiro AKASHI

===8<===
arch/arm64/include/asm/pgtable-prot.h |  1 +
arch/arm64/mm/mmu.c                   | 36 +++++++++++++++++++++++++----------
2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 2142c7726e76..945d84cd5df7 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -54,6 +54,7 @@
#define PAGE_KERNEL_ROX		__pgprot(_PAGE_DEFAULT | PTE_UXN | PTE_DIRTY | PTE_RDONLY)
#define PAGE_KERNEL_EXEC	__pgprot(_PAGE_DEFAULT | PTE_UXN | PTE_DIRTY | PTE_WRITE)
#define PAGE_KERNEL_EXEC_CONT	__pgprot(_PAGE_DEFAULT | PTE_UXN | PTE_DIRTY | PTE_WRITE | PTE_CONT)
+#define PAGE_KERNEL_INVALID	__pgprot(0)

#define PAGE_HYP		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_HYP_XN)
#define PAGE_HYP_EXEC		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17243e43184e..7f96eabc99d7 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -140,7 +140,11 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
                        __prot = prot;
        }

-		set_pte(pte, pfn_pte(pfn, __prot));
+		if (pgprot_val(prot) & PTE_VALID)
+			set_pte(pte, pfn_pte(pfn, __prot));
+		else
+			pte_clear(null, null, pte);
+
        pfn++;

        /*
@@ -185,7 +189,8 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,

        /* try section mapping first */
        if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
-		      !page_mappings_only) {
+		    !page_mappings_only &&
+		    (pmd_none(old_pmd) || pmd_sect(old_pmd))) {
                /*
                 * Set the contiguous bit for the subsequent group of
                 * PMDs if its size and alignment are appropriate.
@@ -256,7 +261,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
        /*
         * For 4K granule only, attempt to put down a 1GB block
         */
-		if (use_1G_block(addr, next, phys) && !page_mappings_only) {
+		if (use_1G_block(addr, next, phys) && !page_mappings_only &&
+		    (pud_none(old_pud) || pud_sect(old_pud))) {
                pud_set_huge(pud, phys, prot);

                /*
@@ -334,12 +340,10 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
}

-void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
-			       unsigned long virt, phys_addr_t size,
-			       pgprot_t prot, bool page_mappings_only)
+void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
+			unsigned long virt, phys_addr_t size,
+			pgprot_t prot, bool page_mappings_only)
{
-	BUG_ON(mm == &init_mm);
-
__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
                     pgd_pgtable_alloc, page_mappings_only);
}
@@ -791,14 +795,26 @@ int __init arch_ioremap_pmd_supported(void)
int pud_set_huge(pud_t *pud, phys_addr_t phys, pgprot_t prot)
{
BUG_ON(phys & ~PUD_MASK);
-	set_pud(pud, __pud(phys | PUD_TYPE_SECT | pgprot_val(mk_sect_prot(prot))));
+
+	if (pgprot_val(prot) & PTE_VALID)
+		set_pud(pud, __pud(phys | PUD_TYPE_SECT |
+				   pgprot_val(mk_sect_prot(prot))));
+	else
+		pud_clear(pud);
+
return 1;
}

int pmd_set_huge(pmd_t *pmd, phys_addr_t phys, pgprot_t prot)
{
BUG_ON(phys & ~PMD_MASK);
-	set_pmd(pmd, __pmd(phys | PMD_TYPE_SECT | pgprot_val(mk_sect_prot(prot))));
+
+	if (pgprot_val(prot) & PTE_VALID)
+		set_pmd(pmd, __pmd(phys | PMD_TYPE_SECT |
+				   pgprot_val(mk_sect_prot(prot))));
+	else
+		pmd_clear(pmd);
+
return 1;
}
===>8===

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
  2017-02-03  1:45             ` AKASHI Takahiro
@ 2017-02-03 11:51               ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-03 11:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Feb 03, 2017 at 10:45:39AM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 11:54:25AM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 07:39:06PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Feb 01, 2017 at 06:25:06PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > > Another question is (how) does hyp map this region?
> > > 
> > > I don't get your point here.
> > > Hyp mode does care only physical memory in intermediate address, doesn't it?
> > 
> > My concern was that hyp may map the region; and thus buggy code at hyp
> > can corrupt the region (and/or hyp may conflict w.r.t. attributes).
> 
> Grep'ing create_hyp_mappings() under arch/arm(64)/kvm shows that
> we have only a few small regions of memory mapped in hyp mode.
> I also confirmed that there is no active mapping for crash dump kernel
> memory by checking mmu tables with DS-5 debugger.

Ok, that sounds fine.

I was under the mistaken impression that hyp had a copy of the whole
linear mapping, but it seems it lazily maps that in (at page
granularity) as required. So there should be no alias for the
crashkernel region.

> > We mght have to ensure hyp doesn't map the crashkernel region, and to
> > case us pain, disallow freeing of any part of the region.
> 
> So I don't believe we need to worry such a case.

I agree.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory
@ 2017-02-03 11:51               ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-03 11:51 UTC (permalink / raw)
  To: AKASHI Takahiro, geoff, catalin.marinas, will.deacon,
	james.morse, bauerman, dyoung, kexec, linux-arm-kernel

Hi,

On Fri, Feb 03, 2017 at 10:45:39AM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 11:54:25AM +0000, Mark Rutland wrote:
> > On Thu, Feb 02, 2017 at 07:39:06PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Feb 01, 2017 at 06:25:06PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 01, 2017 at 06:00:08PM +0000, Mark Rutland wrote:
> > > > Another question is (how) does hyp map this region?
> > > 
> > > I don't get your point here.
> > > Hyp mode does care only physical memory in intermediate address, doesn't it?
> > 
> > My concern was that hyp may map the region; and thus buggy code at hyp
> > can corrupt the region (and/or hyp may conflict w.r.t. attributes).
> 
> Grep'ing create_hyp_mappings() under arch/arm(64)/kvm shows that
> we have only a few small regions of memory mapped in hyp mode.
> I also confirmed that there is no active mapping for crash dump kernel
> memory by checking mmu tables with DS-5 debugger.

Ok, that sounds fine.

I was under the mistaken impression that hyp had a copy of the whole
linear mapping, but it seems it lazily maps that in (at page
granularity) as required. So there should be no alias for the
crashkernel region.

> > We mght have to ensure hyp doesn't map the crashkernel region, and to
> > case us pain, disallow freeing of any part of the region.
> 
> So I don't believe we need to worry such a case.

I agree.

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
  2017-02-03  6:13                 ` AKASHI Takahiro
@ 2017-02-03 14:22                   ` Mark Rutland
  -1 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-03 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Feb 03, 2017 at 03:13:18PM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 11:55:54PM +0900, AKASHI Takahiro wrote:
> > On Thu, Feb 02, 2017 at 02:35:35PM +0000, Mark Rutland wrote:

> > > I think that if we only allow ourselves to make PTEs invalid, we don't
> > > have to handle that case. If we use page_mappings_only, we should only
> > > check pgattr_change_is_safe() for the pte level, and the {pmd,pud,pgd}
> > > entries shouldn't change.
> > > 
> > > Is the below sufficient to allow that, or have I missed something?
> > 
> > I think it will be OK, but will double-check tomorrow.
> > However, is is acceptable that create_pgd_mapping( __prot(0) ) can
> > only handle the cases of page-mapping-only?
> > That would be fine to kdump, but in general?

Given we're only going to use this for page mappings, I think it's fine
(and preferable) to restrict it to page mappings for now. Until we need
to do this for the pmd/pud/pgd levels, those won't see any testing, and
it will be very easy for us to accidentally break this.

> My proposed code is attached below.
> I think that the changes are quite trivial and it works even if
> there is a section mapping as far as we refrain from reclaiming
> unsed p[ug]d tables.
> 
> (Of course, we can't merely unmap a subset of a section mapping here.)

Sure. We have a similar restriction when changing permissions, so I
think that's fine.

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17243e43184e..7f96eabc99d7 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -140,7 +140,11 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>                         __prot = prot;
>         }
> 
> -		set_pte(pte, pfn_pte(pfn, __prot));
> +		if (pgprot_val(prot) & PTE_VALID)
> +			set_pte(pte, pfn_pte(pfn, __prot));
> +		else
> +			pte_clear(null, null, pte);

It took me a moment to figure out how this line could compile. ;)

I'm happy to take this approach in alloc_init_pte(), but as above I'd
prefer that we only handled it there, and left the pmd/pud/pgd code
as-is for now.

We can/should add a comment to make that clear.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping
@ 2017-02-03 14:22                   ` Mark Rutland
  0 siblings, 0 replies; 92+ messages in thread
From: Mark Rutland @ 2017-02-03 14:22 UTC (permalink / raw)
  To: AKASHI Takahiro, catalin.marinas, will.deacon, james.morse,
	geoff, bauerman, dyoung, kexec, linux-arm-kernel

Hi,

On Fri, Feb 03, 2017 at 03:13:18PM +0900, AKASHI Takahiro wrote:
> On Thu, Feb 02, 2017 at 11:55:54PM +0900, AKASHI Takahiro wrote:
> > On Thu, Feb 02, 2017 at 02:35:35PM +0000, Mark Rutland wrote:

> > > I think that if we only allow ourselves to make PTEs invalid, we don't
> > > have to handle that case. If we use page_mappings_only, we should only
> > > check pgattr_change_is_safe() for the pte level, and the {pmd,pud,pgd}
> > > entries shouldn't change.
> > > 
> > > Is the below sufficient to allow that, or have I missed something?
> > 
> > I think it will be OK, but will double-check tomorrow.
> > However, is is acceptable that create_pgd_mapping( __prot(0) ) can
> > only handle the cases of page-mapping-only?
> > That would be fine to kdump, but in general?

Given we're only going to use this for page mappings, I think it's fine
(and preferable) to restrict it to page mappings for now. Until we need
to do this for the pmd/pud/pgd levels, those won't see any testing, and
it will be very easy for us to accidentally break this.

> My proposed code is attached below.
> I think that the changes are quite trivial and it works even if
> there is a section mapping as far as we refrain from reclaiming
> unsed p[ug]d tables.
> 
> (Of course, we can't merely unmap a subset of a section mapping here.)

Sure. We have a similar restriction when changing permissions, so I
think that's fine.

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17243e43184e..7f96eabc99d7 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -140,7 +140,11 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>                         __prot = prot;
>         }
> 
> -		set_pte(pte, pfn_pte(pfn, __prot));
> +		if (pgprot_val(prot) & PTE_VALID)
> +			set_pte(pte, pfn_pte(pfn, __prot));
> +		else
> +			pte_clear(null, null, pte);

It took me a moment to figure out how this line could compile. ;)

I'm happy to take this approach in alloc_init_pte(), but as above I'd
prefer that we only handled it there, and left the pmd/pud/pgd code
as-is for now.

We can/should add a comment to make that clear.

Thanks,
Mark.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2017-02-03 14:22 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01 12:42 [PATCH v31 00/12] add kdump support AKASHI Takahiro
2017-02-01 12:42 ` AKASHI Takahiro
2017-02-01 12:45 ` [PATCH v31 01/12] memblock: add memblock_cap_memory_range() AKASHI Takahiro
2017-02-01 12:45   ` AKASHI Takahiro
2017-02-01 12:45   ` AKASHI Takahiro
2017-02-01 12:46 ` [PATCH v31 02/12] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 15:07   ` Mark Rutland
2017-02-01 15:07     ` Mark Rutland
2017-02-02  4:21     ` AKASHI Takahiro
2017-02-02  4:21       ` AKASHI Takahiro
2017-02-01 12:46 ` [PATCH v31 03/12] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 15:26   ` Mark Rutland
2017-02-01 15:26     ` Mark Rutland
2017-02-02  4:52     ` AKASHI Takahiro
2017-02-02  4:52       ` AKASHI Takahiro
2017-02-02 11:26       ` Mark Rutland
2017-02-02 11:26         ` Mark Rutland
2017-02-02 13:44         ` AKASHI Takahiro
2017-02-02 13:44           ` AKASHI Takahiro
2017-02-01 12:46 ` [PATCH v31 04/12] arm64: mm: allow for unmapping part of kernel mapping AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 16:03   ` Mark Rutland
2017-02-01 16:03     ` Mark Rutland
2017-02-02 10:21     ` AKASHI Takahiro
2017-02-02 10:21       ` AKASHI Takahiro
2017-02-02 11:44       ` Mark Rutland
2017-02-02 11:44         ` Mark Rutland
2017-02-02 14:01         ` AKASHI Takahiro
2017-02-02 14:01           ` AKASHI Takahiro
2017-02-02 14:35           ` Mark Rutland
2017-02-02 14:35             ` Mark Rutland
2017-02-02 14:55             ` AKASHI Takahiro
2017-02-02 14:55               ` AKASHI Takahiro
2017-02-03  6:13               ` AKASHI Takahiro
2017-02-03  6:13                 ` AKASHI Takahiro
2017-02-03 14:22                 ` Mark Rutland
2017-02-03 14:22                   ` Mark Rutland
2017-02-01 12:46 ` [PATCH v31 05/12] arm64: kdump: protect crash dump kernel memory AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 18:00   ` Mark Rutland
2017-02-01 18:00     ` Mark Rutland
2017-02-01 18:25     ` Mark Rutland
2017-02-01 18:25       ` Mark Rutland
2017-02-02 10:39       ` AKASHI Takahiro
2017-02-02 10:39         ` AKASHI Takahiro
2017-02-02 11:54         ` Mark Rutland
2017-02-02 11:54           ` Mark Rutland
2017-02-03  1:45           ` AKASHI Takahiro
2017-02-03  1:45             ` AKASHI Takahiro
2017-02-03 11:51             ` Mark Rutland
2017-02-03 11:51               ` Mark Rutland
2017-02-02 10:45       ` James Morse
2017-02-02 10:45         ` James Morse
2017-02-02 11:19         ` AKASHI Takahiro
2017-02-02 11:19           ` AKASHI Takahiro
2017-02-02 11:48         ` Mark Rutland
2017-02-02 11:48           ` Mark Rutland
2017-02-02 10:31     ` AKASHI Takahiro
2017-02-02 10:31       ` AKASHI Takahiro
2017-02-02 11:16       ` Mark Rutland
2017-02-02 11:16         ` Mark Rutland
2017-02-02 14:36         ` AKASHI Takahiro
2017-02-02 14:36           ` AKASHI Takahiro
2017-02-02 15:36           ` Mark Rutland
2017-02-02 15:36             ` Mark Rutland
2017-02-01 12:46 ` [PATCH v31 06/12] arm64: hibernate: preserve kdump image around hibernation AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 12:46 ` [PATCH v31 07/12] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 12:46 ` [PATCH v31 08/12] arm64: kdump: add VMCOREINFO's for user-space tools AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 12:46 ` [PATCH v31 09/12] arm64: kdump: provide /proc/vmcore file AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 19:21   ` Mark Rutland
2017-02-01 19:21     ` Mark Rutland
2017-02-02  6:24     ` AKASHI Takahiro
2017-02-02  6:24       ` AKASHI Takahiro
2017-02-02 12:03       ` Mark Rutland
2017-02-02 12:03         ` Mark Rutland
2017-02-02 12:08         ` Mark Rutland
2017-02-02 12:08           ` Mark Rutland
2017-02-02 14:39           ` AKASHI Takahiro
2017-02-02 14:39             ` AKASHI Takahiro
2017-02-01 12:46 ` [PATCH v31 10/12] arm64: kdump: enable kdump in defconfig AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
2017-02-01 12:46 ` [PATCH v31 11/12] Documentation: kdump: describe arm64 port AKASHI Takahiro
2017-02-01 12:46   ` AKASHI Takahiro
     [not found] ` <20170201124218.5823-1-takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-02-01 12:48   ` [PATCH v31 12/12] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro
2017-02-01 12:48     ` AKASHI Takahiro
2017-02-01 12:48     ` AKASHI Takahiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.