linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] arm64: kexec,kdump: fix boot failures on acpi-only system
@ 2018-07-23  1:57 AKASHI Takahiro
  2018-07-23  1:57 ` [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem AKASHI Takahiro
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: AKASHI Takahiro @ 2018-07-23  1:57 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, rjw, lenb, ard.biesheuvel
  Cc: tbaicar, bhsharma, dyoung, james.morse, mark.rutland, al.stone,
	graeme.gregory, hanjun.guo, lorenzo.pieralisi, sudeep.holla,
	linux-arm-kernel, linux-kernel, kexec, AKASHI Takahiro


This patch series is a set of bug fixes to address kexec/kdump
failures which are sometimes observed on ACPI-only system and reported
in LAK-ML before.

In short, the phenomena are:
1. kexec'ed kernel can fail to boot because some ACPI table is corrupted
   by a new kernel (or other data) being loaded into System RAM. Currently
   kexec may possibly allocate space ignoring such "reserved" regions.
   We will see no messages after "Bye!"

2. crash dump (kdump) kernel can fail to boot and get into panic due to
   an alignment fault when accessing ACPI tables. This can happen because
   those tables are not always properly aligned while they are mapped
   non-cacheable (ioremap'ed) as they are not recognized as part of System
   RAM under the current implementation.

After discussing several possibilities to address those issues,
the agreed approach, in my understanding, is
* to add resource entries for every "reserved", i.e. memblock_reserve(),
  regions to /proc/iomem.
  (NOMAP regions, also marked as "reserved," remains at top-level for
  backward compatibility. User-space can tell the difference between
  reserved-system-ram and reserved-address-space.)
* For case (1), user space (kexec-tools) should rule out such regions
  in searching for free space for loaded data.
* For case (2), the kernel should access ACPI tables by mapping
  them with appropriate memory attributes described in UEFI memory map.
  (This means that it doesn't require any changes in /proc/iomem, and
  hence user space.)

Please find past discussions about /proc/iomem in [1].
--- more words from James ---
Our attempts to fix this just in the kernel reached a dead end, because Kdump
needs to include reserved-system-ram, whereas kexec has to avoid it. User-space
needs to be able to tell reserved-system-ram and reserved-address-space apart.
Hence we need to expose that information, and pick it up in user-space.

Patched-kernel and unpatch-user-space will work the same way it does today, as
the additional reserved regions are ignored by user-space.

Unpatched-kernel and patched-user-space will also work the same way it does
today as the additional reserved regions are missing.
--->8---

Patch#1 addresses kexec case, for which you are also required to update
user space. See necessary patches in [2]. If you want to review Patch#1,
please also take a look at and review [2].

Patch#2, #3, #4 and #5 address the kdump case above.


Changes in v4 (2018, July 23, 2018)
* correct configuration dependency for ACPI (patch#2)

Changes in v3.1 (2018, July 10, 2018)
* add Ard's patch[4] to this patch set

Changes in v3 (2018, July 9, 2018)
* drop the v2's patch#3, preferring [4]

Changes in v2 (2018, June 19, 2018)
* re-organise v1's patch#2 and #3 into v2's #2, #3 and #4
  not to break bisect

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/565980.html
[2] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/resv_mem
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/573655.html
[4] https://marc.info/?l=linux-efi&m=152930773507524&w=2

AKASHI Takahiro (3):
  drivers: acpi: add dependency of EFI for arm64
  efi/arm: map UEFI memory map even w/o runtime services enabled
  arm64: acpi: fix alignment fault in accessing ACPI

Ard Biesheuvel (1):
  efi/arm: preserve early mapping of UEFI memory map longer for BGRT

James Morse (1):
  arm64: export memblock_reserve()d regions via /proc/iomem

 arch/arm64/include/asm/acpi.h      | 23 ++++++++++++------
 arch/arm64/kernel/acpi.c           | 11 +++------
 arch/arm64/kernel/setup.c          | 38 ++++++++++++++++++++++++++++++
 drivers/acpi/Kconfig               |  2 +-
 drivers/firmware/efi/arm-init.c    |  1 -
 drivers/firmware/efi/arm-runtime.c | 16 +++++++------
 6 files changed, 67 insertions(+), 24 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem
  2018-07-23  1:57 [PATCH v4 0/5] arm64: kexec,kdump: fix boot failures on acpi-only system AKASHI Takahiro
@ 2018-07-23  1:57 ` AKASHI Takahiro
  2018-08-21  4:39   ` John Stultz
  2018-07-23  1:57 ` [PATCH v4 2/5] drivers: acpi: add dependency of EFI for arm64 AKASHI Takahiro
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: AKASHI Takahiro @ 2018-07-23  1:57 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, rjw, lenb, ard.biesheuvel
  Cc: tbaicar, bhsharma, dyoung, james.morse, mark.rutland, al.stone,
	graeme.gregory, hanjun.guo, lorenzo.pieralisi, sudeep.holla,
	linux-arm-kernel, linux-kernel, kexec

From: James Morse <james.morse@arm.com>

There has been some confusion around what is necessary to prevent kexec
overwriting important memory regions. memblock: reserve, or nomap?
Only memblock nomap regions are reported via /proc/iomem, kexec's
user-space doesn't know about memblock_reserve()d regions.

Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory
as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
and thus possible for kexec to overwrite with the new kernel or initrd.
But this was always broken, as the UEFI memory map is also reserved
and not marked as nomap.

Exporting both nomap and reserved memblock types is a nuisance as
they live in different memblock structures which we can't walk at
the same time.

Take a second walk over memblock.reserved and add new 'reserved'
subnodes for the memblock_reserved() regions that aren't already
described by the existing code. (e.g. Kernel Code)

We use reserve_region_with_split() to find the gaps in existing named
regions. This handles the gap between 'kernel code' and 'kernel data'
which is memblock_reserve()d, but already partially described by
request_standard_resources(). e.g.:
| 80000000-dfffffff : System RAM
|   80080000-80ffffff : Kernel code
|   81000000-8158ffff : reserved
|   81590000-8237efff : Kernel data
|   a0000000-dfffffff : Crash kernel
| e00f0000-f949ffff : System RAM

reserve_region_with_split needs kzalloc() which isn't available when
request_standard_resources() is called, use an initcall.

Reported-by: Bhupesh Sharma <bhsharma@redhat.com>
Reported-by: Tyler Baicar <tbaicar@codeaurora.org>
Suggested-by: Akashi Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: James Morse <james.morse@arm.com>
Fixes: d28f6df1305a ("arm64/kexec: Add core kexec support")
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 30ad2f085d1f..5b4fac434c84 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -241,6 +241,44 @@ static void __init request_standard_resources(void)
 	}
 }
 
+static int __init reserve_memblock_reserved_regions(void)
+{
+	phys_addr_t start, end, roundup_end = 0;
+	struct resource *mem, *res;
+	u64 i;
+
+	for_each_reserved_mem_region(i, &start, &end) {
+		if (end <= roundup_end)
+			continue; /* done already */
+
+		start = __pfn_to_phys(PFN_DOWN(start));
+		end = __pfn_to_phys(PFN_UP(end)) - 1;
+		roundup_end = end;
+
+		res = kzalloc(sizeof(*res), GFP_ATOMIC);
+		if (WARN_ON(!res))
+			return -ENOMEM;
+		res->start = start;
+		res->end = end;
+		res->name  = "reserved";
+		res->flags = IORESOURCE_MEM;
+
+		mem = request_resource_conflict(&iomem_resource, res);
+		/*
+		 * We expected memblock_reserve() regions to conflict with
+		 * memory created by request_standard_resources().
+		 */
+		if (WARN_ON_ONCE(!mem))
+			continue;
+		kfree(res);
+
+		reserve_region_with_split(mem, start, end, "reserved");
+	}
+
+	return 0;
+}
+arch_initcall(reserve_memblock_reserved_regions);
+
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
 void __init setup_arch(char **cmdline_p)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/5] drivers: acpi: add dependency of EFI for arm64
  2018-07-23  1:57 [PATCH v4 0/5] arm64: kexec,kdump: fix boot failures on acpi-only system AKASHI Takahiro
  2018-07-23  1:57 ` [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem AKASHI Takahiro
@ 2018-07-23  1:57 ` AKASHI Takahiro
  2018-07-23  1:57 ` [PATCH v4 3/5] efi/arm: preserve early mapping of UEFI memory map longer for BGRT AKASHI Takahiro
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: AKASHI Takahiro @ 2018-07-23  1:57 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, rjw, lenb, ard.biesheuvel
  Cc: tbaicar, bhsharma, dyoung, james.morse, mark.rutland, al.stone,
	graeme.gregory, hanjun.guo, lorenzo.pieralisi, sudeep.holla,
	linux-arm-kernel, linux-kernel, kexec, AKASHI Takahiro

As Ard suggested, CONFIG_ACPI && !CONFIG_EFI doesn't make sense on arm64,
while CONFIG_ACPI and CONFIG_CPU_BIG_ENDIAN doesn't make sense either.

As CONFIG_EFI already has a dependency of !CONFIG_CPU_BIG_ENDIAN, it is
good enough to add a dependency of CONFIG_EFI to avoid any useless
combination of configuration.

This bug, reported by Will, will be revealed when my patch series,
"arm64: kexec,kdump: fix boot failures on acpi-only system," is applied
and the kernel is built under allmodconfig.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/acpi/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index b533eeb6139d..15ab1daaa808 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -6,7 +6,7 @@
 menuconfig ACPI
 	bool "ACPI (Advanced Configuration and Power Interface) Support"
 	depends on !IA64_HP_SIM
-	depends on IA64 || X86 || ARM64
+	depends on IA64 || X86 || (ARM64 && EFI)
 	depends on PCI
 	select PNP
 	default y if (IA64 || X86)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/5] efi/arm: preserve early mapping of UEFI memory map longer for BGRT
  2018-07-23  1:57 [PATCH v4 0/5] arm64: kexec,kdump: fix boot failures on acpi-only system AKASHI Takahiro
  2018-07-23  1:57 ` [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem AKASHI Takahiro
  2018-07-23  1:57 ` [PATCH v4 2/5] drivers: acpi: add dependency of EFI for arm64 AKASHI Takahiro
@ 2018-07-23  1:57 ` AKASHI Takahiro
  2018-07-23  1:57 ` [PATCH v4 4/5] efi/arm: map UEFI memory map even w/o runtime services enabled AKASHI Takahiro
  2018-07-23  1:57 ` [PATCH v4 5/5] arm64: acpi: fix alignment fault in accessing ACPI AKASHI Takahiro
  4 siblings, 0 replies; 12+ messages in thread
From: AKASHI Takahiro @ 2018-07-23  1:57 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, rjw, lenb, ard.biesheuvel
  Cc: tbaicar, bhsharma, dyoung, james.morse, mark.rutland, al.stone,
	graeme.gregory, hanjun.guo, lorenzo.pieralisi, sudeep.holla,
	linux-arm-kernel, linux-kernel, kexec

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

The BGRT code validates the contents of the table against the UEFI
memory map, and so it expects it to be mapped when the code runs.

On ARM, this is currently not the case, since we tear down the early
mapping after efi_init() completes, and only create the permanent
mapping in arm_enable_runtime_services(), which executes as an early
initcall, but still leaves a window where the UEFI memory map is not
mapped.

So move the call to efi_memmap_unmap() from efi_init() to
arm_enable_runtime_services().

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/arm-init.c    | 1 -
 drivers/firmware/efi/arm-runtime.c | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index b5214c143fee..388a929baf95 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -259,7 +259,6 @@ void __init efi_init(void)
 
 	reserve_regions();
 	efi_esrt_init();
-	efi_memmap_unmap();
 
 	memblock_reserve(params.mmap & PAGE_MASK,
 			 PAGE_ALIGN(params.mmap_size +
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 5889cbea60b8..59a8c0ec94d5 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -115,6 +115,8 @@ static int __init arm_enable_runtime_services(void)
 		return 0;
 	}
 
+	efi_memmap_unmap();
+
 	if (efi_runtime_disabled()) {
 		pr_info("EFI runtime services will be disabled.\n");
 		return 0;
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/5] efi/arm: map UEFI memory map even w/o runtime services enabled
  2018-07-23  1:57 [PATCH v4 0/5] arm64: kexec,kdump: fix boot failures on acpi-only system AKASHI Takahiro
                   ` (2 preceding siblings ...)
  2018-07-23  1:57 ` [PATCH v4 3/5] efi/arm: preserve early mapping of UEFI memory map longer for BGRT AKASHI Takahiro
@ 2018-07-23  1:57 ` AKASHI Takahiro
  2018-07-23  1:57 ` [PATCH v4 5/5] arm64: acpi: fix alignment fault in accessing ACPI AKASHI Takahiro
  4 siblings, 0 replies; 12+ messages in thread
From: AKASHI Takahiro @ 2018-07-23  1:57 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, rjw, lenb, ard.biesheuvel
  Cc: tbaicar, bhsharma, dyoung, james.morse, mark.rutland, al.stone,
	graeme.gregory, hanjun.guo, lorenzo.pieralisi, sudeep.holla,
	linux-arm-kernel, linux-kernel, kexec, AKASHI Takahiro

Under the current implementation, UEFI memory map will be mapped and made
available in virtual mappings only if runtime services are enabled.
But in a later patch, we want to use UEFI memory map in acpi_os_ioremap()
to create mappings of ACPI tables using memory attributes described in
UEFI memory map.
See the following commit:
    arm64: acpi: fix alignment fault in accessing ACPI tables

So, as a first step, arm_enter_runtime_services() is modified, alongside
Ard's patch[1], so that UEFI memory map will not be freed even if
efi=noruntime.

[1] https://marc.info/?l=linux-efi&m=152930773507524&w=2

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/arm-runtime.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 59a8c0ec94d5..a00934d263c5 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -117,6 +117,13 @@ static int __init arm_enable_runtime_services(void)
 
 	efi_memmap_unmap();
 
+	mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
+
+	if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
+		pr_err("Failed to remap EFI memory map\n");
+		return 0;
+	}
+
 	if (efi_runtime_disabled()) {
 		pr_info("EFI runtime services will be disabled.\n");
 		return 0;
@@ -129,13 +136,6 @@ static int __init arm_enable_runtime_services(void)
 
 	pr_info("Remapping and enabling EFI services.\n");
 
-	mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
-
-	if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
-		pr_err("Failed to remap EFI memory map\n");
-		return -ENOMEM;
-	}
-
 	if (!efi_virtmap_init()) {
 		pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
 		return -ENOMEM;
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/5] arm64: acpi: fix alignment fault in accessing ACPI
  2018-07-23  1:57 [PATCH v4 0/5] arm64: kexec,kdump: fix boot failures on acpi-only system AKASHI Takahiro
                   ` (3 preceding siblings ...)
  2018-07-23  1:57 ` [PATCH v4 4/5] efi/arm: map UEFI memory map even w/o runtime services enabled AKASHI Takahiro
@ 2018-07-23  1:57 ` AKASHI Takahiro
  4 siblings, 0 replies; 12+ messages in thread
From: AKASHI Takahiro @ 2018-07-23  1:57 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, rjw, lenb, ard.biesheuvel
  Cc: tbaicar, bhsharma, dyoung, james.morse, mark.rutland, al.stone,
	graeme.gregory, hanjun.guo, lorenzo.pieralisi, sudeep.holla,
	linux-arm-kernel, linux-kernel, kexec, AKASHI Takahiro

This is a fix against the issue that crash dump kernel may hang up
during booting, which can happen on any ACPI-based system with "ACPI
Reclaim Memory."

(kernel messages after panic kicked off kdump)
	   (snip...)
	Bye!
	   (snip...)
	ACPI: Core revision 20170728
	pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
	Internal error: Oops: 96000021 [#1] SMP
	Modules linked in:
	CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
	task: ffff000008d05180 task.stack: ffff000008cc0000
	PC is at acpi_ns_lookup+0x25c/0x3c0
	LR is at acpi_ds_load1_begin_op+0xa4/0x294
	   (snip...)
	Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
	Call trace:
	   (snip...)
	[<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
	[<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
	[<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
	[<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
	[<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
	[<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
	[<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
	[<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
	[<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
	[<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
	[<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
	[<ffff000008badc20>] acpi_early_init+0x9c/0xd0
	[<ffff000008b70d50>] start_kernel+0x3b4/0x43c
	Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
	---[ end trace c46ed37f9651c58e ]---
	Kernel panic - not syncing: Fatal exception
	Rebooting in 10 seconds..

(diagnosis)
* This fault is a data abort, alignment fault (ESR=0x96000021)
  during reading out ACPI table.
* Initial ACPI tables are normally stored in system ram and marked as
  "ACPI Reclaim memory" by the firmware.
* After the commit f56ab9a5b73c ("efi/arm: Don't mark ACPI reclaim
  memory as MEMBLOCK_NOMAP"), those regions are differently handled
  as they are "memblock-reserved", without NOMAP bit.
* So they are now excluded from device tree's "usable-memory-range"
  which kexec-tools determines based on a current view of /proc/iomem.
* When crash dump kernel boots up, it tries to accesses ACPI tables by
  mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
  since they are no longer part of mapped system ram.
* Given that ACPI accessor/helper functions are compiled in without
  unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
  any unaligned access to ACPI tables can cause a fatal panic.

With this patch, acpi_os_ioremap() always honors memory attribute
information provided by the firmware (EFI) and retaining cacheability
allows the kernel safe access to ACPI tables.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by and Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
---
 arch/arm64/include/asm/acpi.h | 23 ++++++++++++++++-------
 arch/arm64/kernel/acpi.c      | 11 +++--------
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 0db62a4cbce2..68bc18cb2b85 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -12,10 +12,12 @@
 #ifndef _ASM_ACPI_H
 #define _ASM_ACPI_H
 
+#include <linux/efi.h>
 #include <linux/memblock.h>
 #include <linux/psci.h>
 
 #include <asm/cputype.h>
+#include <asm/io.h>
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
@@ -29,18 +31,22 @@
 
 /* Basic configuration for ACPI */
 #ifdef	CONFIG_ACPI
+pgprot_t __acpi_get_mem_attribute(phys_addr_t addr);
+
 /* ACPI table mapping after acpi_permanent_mmap is set */
 static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
 					    acpi_size size)
 {
+	/* For normal memory we already have a cacheable mapping. */
+	if (memblock_is_map_memory(phys))
+		return (void __iomem *)__phys_to_virt(phys);
+
 	/*
-	 * EFI's reserve_regions() call adds memory with the WB attribute
-	 * to memblock via early_init_dt_add_memory_arch().
+	 * We should still honor the memory's attribute here because
+	 * crash dump kernel possibly excludes some ACPI (reclaim)
+	 * regions from memblock list.
 	 */
-	if (!memblock_is_memory(phys))
-		return ioremap(phys, size);
-
-	return ioremap_cache(phys, size);
+	return __ioremap(phys, size, __acpi_get_mem_attribute(phys));
 }
 #define acpi_os_ioremap acpi_os_ioremap
 
@@ -129,7 +135,10 @@ static inline const char *acpi_get_enable_method(int cpu)
  * for compatibility.
  */
 #define acpi_disable_cmcff 1
-pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
+static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
+{
+	return __acpi_get_mem_attribute(addr);
+}
 #endif /* CONFIG_ACPI_APEI */
 
 #ifdef CONFIG_ACPI_NUMA
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..ed46dc188b22 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -18,6 +18,7 @@
 #include <linux/acpi.h>
 #include <linux/bootmem.h>
 #include <linux/cpumask.h>
+#include <linux/efi.h>
 #include <linux/efi-bgrt.h>
 #include <linux/init.h>
 #include <linux/irq.h>
@@ -29,13 +30,9 @@
 
 #include <asm/cputype.h>
 #include <asm/cpu_ops.h>
+#include <asm/pgtable.h>
 #include <asm/smp_plat.h>
 
-#ifdef CONFIG_ACPI_APEI
-# include <linux/efi.h>
-# include <asm/pgtable.h>
-#endif
-
 int acpi_noirq = 1;		/* skip ACPI IRQ initialization */
 int acpi_disabled = 1;
 EXPORT_SYMBOL(acpi_disabled);
@@ -239,8 +236,7 @@ void __init acpi_boot_table_init(void)
 	}
 }
 
-#ifdef CONFIG_ACPI_APEI
-pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
+pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
 {
 	/*
 	 * According to "Table 8 Map: EFI memory types to AArch64 memory
@@ -261,4 +257,3 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
 		return __pgprot(PROT_NORMAL_NC);
 	return __pgprot(PROT_DEVICE_nGnRnE);
 }
-#endif
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem
  2018-07-23  1:57 ` [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem AKASHI Takahiro
@ 2018-08-21  4:39   ` John Stultz
  2018-08-21  6:07     ` AKASHI Takahiro
  2018-08-21 10:22     ` James Morse
  0 siblings, 2 replies; 12+ messages in thread
From: John Stultz @ 2018-08-21  4:39 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Catalin Marinas, Will Deacon, Rafael J. Wysocki, Len Brown,
	Ard Biesheuvel, Mark Rutland, Lorenzo Pieralisi, G Gregory,
	al.stone, bhsharma, tbaicar, kexec, lkml, james.morse,
	hanjun.guo, Sudeep Holla, dyoung, linux-arm-kernel

On Sun, Jul 22, 2018 at 6:57 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> From: James Morse <james.morse@arm.com>
>
> There has been some confusion around what is necessary to prevent kexec
> overwriting important memory regions. memblock: reserve, or nomap?
> Only memblock nomap regions are reported via /proc/iomem, kexec's
> user-space doesn't know about memblock_reserve()d regions.
>
> Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory
> as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
> and thus possible for kexec to overwrite with the new kernel or initrd.
> But this was always broken, as the UEFI memory map is also reserved
> and not marked as nomap.
>
> Exporting both nomap and reserved memblock types is a nuisance as
> they live in different memblock structures which we can't walk at
> the same time.
>
> Take a second walk over memblock.reserved and add new 'reserved'
> subnodes for the memblock_reserved() regions that aren't already
> described by the existing code. (e.g. Kernel Code)
>
> We use reserve_region_with_split() to find the gaps in existing named
> regions. This handles the gap between 'kernel code' and 'kernel data'
> which is memblock_reserve()d, but already partially described by
> request_standard_resources(). e.g.:
> | 80000000-dfffffff : System RAM
> |   80080000-80ffffff : Kernel code
> |   81000000-8158ffff : reserved
> |   81590000-8237efff : Kernel data
> |   a0000000-dfffffff : Crash kernel
> | e00f0000-f949ffff : System RAM
>
> reserve_region_with_split needs kzalloc() which isn't available when
> request_standard_resources() is called, use an initcall.
>
> Reported-by: Bhupesh Sharma <bhsharma@redhat.com>
> Reported-by: Tyler Baicar <tbaicar@codeaurora.org>
> Suggested-by: Akashi Takahiro <takahiro.akashi@linaro.org>
> Signed-off-by: James Morse <james.morse@arm.com>
> Fixes: d28f6df1305a ("arm64/kexec: Add core kexec support")
> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> CC: Mark Rutland <mark.rutland@arm.com>
> ---
>  arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 30ad2f085d1f..5b4fac434c84 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -241,6 +241,44 @@ static void __init request_standard_resources(void)
>         }
>  }
>
> +static int __init reserve_memblock_reserved_regions(void)
> +{
> +       phys_addr_t start, end, roundup_end = 0;
> +       struct resource *mem, *res;
> +       u64 i;
> +
> +       for_each_reserved_mem_region(i, &start, &end) {
> +               if (end <= roundup_end)
> +                       continue; /* done already */
> +
> +               start = __pfn_to_phys(PFN_DOWN(start));
> +               end = __pfn_to_phys(PFN_UP(end)) - 1;
> +               roundup_end = end;
> +
> +               res = kzalloc(sizeof(*res), GFP_ATOMIC);
> +               if (WARN_ON(!res))
> +                       return -ENOMEM;
> +               res->start = start;
> +               res->end = end;
> +               res->name  = "reserved";
> +               res->flags = IORESOURCE_MEM;
> +
> +               mem = request_resource_conflict(&iomem_resource, res);
> +               /*
> +                * We expected memblock_reserve() regions to conflict with
> +                * memory created by request_standard_resources().
> +                */
> +               if (WARN_ON_ONCE(!mem))
> +                       continue;
> +               kfree(res);
> +
> +               reserve_region_with_split(mem, start, end, "reserved");
> +       }
> +
> +       return 0;
> +}
> +arch_initcall(reserve_memblock_reserved_regions);
> +

Since this patch landed, on the HiKey board at bootup I'm seeing:

[    0.451884] WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/setup.c:271
reserve_memblock_reserved_regions+0xd4/0x13c
[    0.451896] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
4.18.0-10758-ga534dc3 #709
[    0.451903] Hardware name: HiKey Development Board (DT)
[    0.451913] pstate: 80400005 (Nzcv daif +PAN -UAO)
[    0.451922] pc : reserve_memblock_reserved_regions+0xd4/0x13c
[    0.451931] lr : reserve_memblock_reserved_regions+0xcc/0x13c
[    0.451938] sp : ffffff8008053d30
[    0.451945] x29: ffffff8008053d30 x28: ffffff8008ebe650
[    0.451957] x27: ffffff8008ead060 x26: ffffff8008e113b0
[    0.451969] x25: 0000000000000000 x24: 0000000000488020
[    0.451981] x23: 0000000021ffffff x22: ffffff8008e0d860
[    0.451993] x21: ffffff8008d74370 x20: ffffff8009019000
[    0.452005] x19: ffffffc07507a400 x18: ffffff8009019a48
[    0.452017] x17: 0000000000000000 x16: 0000000000000000
[    0.452028] x15: ffffff80890e973f x14: 0000000000000006
[    0.452040] x13: 0000000000000000 x12: 0000000000000000
[    0.452051] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[    0.452063] x9 : 0000000000000000 x8 : ffffffc07507a480
[    0.452074] x7 : 0000000000000000 x6 : ffffffc07ffffc30
[    0.452086] x5 : 0000000000000000 x4 : 0000000021ffffff
[    0.452097] x3 : 0000000000000001 x2 : 0000000000000001
[    0.452109] x1 : 0000000000000000 x0 : 0000000000000000
[    0.452121] Call trace:
[    0.452130]  reserve_memblock_reserved_regions+0xd4/0x13c
[    0.452140]  do_one_initcall+0x78/0x150
[    0.452148]  kernel_init_freeable+0x198/0x258
[    0.452159]  kernel_init+0x10/0x108
[    0.452170]  ret_from_fork+0x10/0x18
[    0.452181] ---[ end trace b4b78c443df3a750 ]---

From skimming the patch, it seems this is maybe expected? Or should
this warning raise eyebrows? I can't quite figure it out.

It seems to trigger on the pstore memory at 0x21f00000-0x21ffffff.

/proc/iomem now has:
...
07410000-21efffff : System RAM
  11000000-1113cfff : reserved
21f00000-21ffffff : reserved
  21f00000-21f1ffff : persistent_ram
  21f20000-21f3ffff : persistent_ram
  21f40000-21f5ffff : persistent_ram
  21f60000-21f7ffff : persistent_ram
  21f80000-21f9ffff : persistent_ram
  21fa0000-21fbffff : persistent_ram
  21fc0000-21fdffff : persistent_ram
  21fe0000-21ffffff : persistent_ram
22000000-34ffffff : System RAM
...

Where previously it had:
...
07410000-21efffff : System RAM
21f00000-21f1ffff : persistent_ram
21f20000-21f3ffff : persistent_ram
21f40000-21f5ffff : persistent_ram
21f60000-21f7ffff : persistent_ram
21f80000-21f9ffff : persistent_ram
21fa0000-21fbffff : persistent_ram
21fc0000-21fdffff : persistent_ram
21fe0000-21ffffff : persistent_ram
22000000-34ffffff : System RAM
...



thanks
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem
  2018-08-21  4:39   ` John Stultz
@ 2018-08-21  6:07     ` AKASHI Takahiro
  2018-08-21 10:22     ` James Morse
  1 sibling, 0 replies; 12+ messages in thread
From: AKASHI Takahiro @ 2018-08-21  6:07 UTC (permalink / raw)
  To: John Stultz
  Cc: Catalin Marinas, Will Deacon, Rafael J. Wysocki, Len Brown,
	Ard Biesheuvel, Mark Rutland, Lorenzo Pieralisi, G Gregory,
	al.stone, bhsharma, tbaicar, kexec, lkml, james.morse,
	hanjun.guo, Sudeep Holla, dyoung, linux-arm-kernel

Hi John,

On Mon, Aug 20, 2018 at 09:39:01PM -0700, John Stultz wrote:
> On Sun, Jul 22, 2018 at 6:57 PM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > From: James Morse <james.morse@arm.com>
> >
> > There has been some confusion around what is necessary to prevent kexec
> > overwriting important memory regions. memblock: reserve, or nomap?
> > Only memblock nomap regions are reported via /proc/iomem, kexec's
> > user-space doesn't know about memblock_reserve()d regions.
> >
> > Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory
> > as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
> > and thus possible for kexec to overwrite with the new kernel or initrd.
> > But this was always broken, as the UEFI memory map is also reserved
> > and not marked as nomap.
> >
> > Exporting both nomap and reserved memblock types is a nuisance as
> > they live in different memblock structures which we can't walk at
> > the same time.
> >
> > Take a second walk over memblock.reserved and add new 'reserved'
> > subnodes for the memblock_reserved() regions that aren't already
> > described by the existing code. (e.g. Kernel Code)
> >
> > We use reserve_region_with_split() to find the gaps in existing named
> > regions. This handles the gap between 'kernel code' and 'kernel data'
> > which is memblock_reserve()d, but already partially described by
> > request_standard_resources(). e.g.:
> > | 80000000-dfffffff : System RAM
> > |   80080000-80ffffff : Kernel code
> > |   81000000-8158ffff : reserved
> > |   81590000-8237efff : Kernel data
> > |   a0000000-dfffffff : Crash kernel
> > | e00f0000-f949ffff : System RAM
> >
> > reserve_region_with_split needs kzalloc() which isn't available when
> > request_standard_resources() is called, use an initcall.
> >
> > Reported-by: Bhupesh Sharma <bhsharma@redhat.com>
> > Reported-by: Tyler Baicar <tbaicar@codeaurora.org>
> > Suggested-by: Akashi Takahiro <takahiro.akashi@linaro.org>
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Fixes: d28f6df1305a ("arm64/kexec: Add core kexec support")
> > Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > CC: Mark Rutland <mark.rutland@arm.com>
> > ---
> >  arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 30ad2f085d1f..5b4fac434c84 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -241,6 +241,44 @@ static void __init request_standard_resources(void)
> >         }
> >  }
> >
> > +static int __init reserve_memblock_reserved_regions(void)
> > +{
> > +       phys_addr_t start, end, roundup_end = 0;
> > +       struct resource *mem, *res;
> > +       u64 i;
> > +
> > +       for_each_reserved_mem_region(i, &start, &end) {
> > +               if (end <= roundup_end)
> > +                       continue; /* done already */
> > +
> > +               start = __pfn_to_phys(PFN_DOWN(start));
> > +               end = __pfn_to_phys(PFN_UP(end)) - 1;
> > +               roundup_end = end;
> > +
> > +               res = kzalloc(sizeof(*res), GFP_ATOMIC);
> > +               if (WARN_ON(!res))
> > +                       return -ENOMEM;
> > +               res->start = start;
> > +               res->end = end;
> > +               res->name  = "reserved";
> > +               res->flags = IORESOURCE_MEM;
> > +
> > +               mem = request_resource_conflict(&iomem_resource, res);
> > +               /*
> > +                * We expected memblock_reserve() regions to conflict with
> > +                * memory created by request_standard_resources().
> > +                */
> > +               if (WARN_ON_ONCE(!mem))
> > +                       continue;
> > +               kfree(res);
> > +
> > +               reserve_region_with_split(mem, start, end, "reserved");
> > +       }
> > +
> > +       return 0;
> > +}
> > +arch_initcall(reserve_memblock_reserved_regions);
> > +
> 
> Since this patch landed, on the HiKey board at bootup I'm seeing:
> 
> [    0.451884] WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/setup.c:271
> reserve_memblock_reserved_regions+0xd4/0x13c
> [    0.451896] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> 4.18.0-10758-ga534dc3 #709
> [    0.451903] Hardware name: HiKey Development Board (DT)
> [    0.451913] pstate: 80400005 (Nzcv daif +PAN -UAO)
> [    0.451922] pc : reserve_memblock_reserved_regions+0xd4/0x13c
> [    0.451931] lr : reserve_memblock_reserved_regions+0xcc/0x13c
> [    0.451938] sp : ffffff8008053d30
> [    0.451945] x29: ffffff8008053d30 x28: ffffff8008ebe650
> [    0.451957] x27: ffffff8008ead060 x26: ffffff8008e113b0
> [    0.451969] x25: 0000000000000000 x24: 0000000000488020
> [    0.451981] x23: 0000000021ffffff x22: ffffff8008e0d860
> [    0.451993] x21: ffffff8008d74370 x20: ffffff8009019000
> [    0.452005] x19: ffffffc07507a400 x18: ffffff8009019a48
> [    0.452017] x17: 0000000000000000 x16: 0000000000000000
> [    0.452028] x15: ffffff80890e973f x14: 0000000000000006
> [    0.452040] x13: 0000000000000000 x12: 0000000000000000
> [    0.452051] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> [    0.452063] x9 : 0000000000000000 x8 : ffffffc07507a480
> [    0.452074] x7 : 0000000000000000 x6 : ffffffc07ffffc30
> [    0.452086] x5 : 0000000000000000 x4 : 0000000021ffffff
> [    0.452097] x3 : 0000000000000001 x2 : 0000000000000001
> [    0.452109] x1 : 0000000000000000 x0 : 0000000000000000
> [    0.452121] Call trace:
> [    0.452130]  reserve_memblock_reserved_regions+0xd4/0x13c
> [    0.452140]  do_one_initcall+0x78/0x150
> [    0.452148]  kernel_init_freeable+0x198/0x258
> [    0.452159]  kernel_init+0x10/0x108
> [    0.452170]  ret_from_fork+0x10/0x18
> [    0.452181] ---[ end trace b4b78c443df3a750 ]---
> 
> From skimming the patch, it seems this is maybe expected? Or should
> this warning raise eyebrows? I can't quite figure it out.

Yeah, somehow.
This is the case where an inserted region has NOMAP attribute,
which means that it has no use for system memory, but is yet
"memblock_reserved." (doesn't make sense.)

> 
> It seems to trigger on the pstore memory at 0x21f00000-0x21ffffff.
> 
> /proc/iomem now has:
> ...
> 07410000-21efffff : System RAM
>   11000000-1113cfff : reserved
> 21f00000-21ffffff : reserved
>   21f00000-21f1ffff : persistent_ram
>   21f20000-21f3ffff : persistent_ram
>   21f40000-21f5ffff : persistent_ram
>   21f60000-21f7ffff : persistent_ram
>   21f80000-21f9ffff : persistent_ram
>   21fa0000-21fbffff : persistent_ram
>   21fc0000-21fdffff : persistent_ram
>   21fe0000-21ffffff : persistent_ram
> 22000000-34ffffff : System RAM
> ...

Please note that, after my patch set, there can appear a "reserved"
entry at the top level, which implies that the whole range is NOMAP'ed.

Thanks,
-Takahiro AKASHI

> 
> Where previously it had:
> ...
> 07410000-21efffff : System RAM
> 21f00000-21f1ffff : persistent_ram
> 21f20000-21f3ffff : persistent_ram
> 21f40000-21f5ffff : persistent_ram
> 21f60000-21f7ffff : persistent_ram
> 21f80000-21f9ffff : persistent_ram
> 21fa0000-21fbffff : persistent_ram
> 21fc0000-21fdffff : persistent_ram
> 21fe0000-21ffffff : persistent_ram
> 22000000-34ffffff : System RAM
> ...
> 
> 
> thanks
> -john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem
  2018-08-21  4:39   ` John Stultz
  2018-08-21  6:07     ` AKASHI Takahiro
@ 2018-08-21 10:22     ` James Morse
  2018-08-21 16:48       ` James Morse
  2018-08-21 19:38       ` John Stultz
  1 sibling, 2 replies; 12+ messages in thread
From: James Morse @ 2018-08-21 10:22 UTC (permalink / raw)
  To: John Stultz, AKASHI Takahiro
  Cc: Catalin Marinas, Will Deacon, Rafael J. Wysocki, Len Brown,
	Ard Biesheuvel, Mark Rutland, Lorenzo Pieralisi, G Gregory,
	al.stone, bhsharma, tbaicar, kexec, lkml, hanjun.guo,
	Sudeep Holla, dyoung, linux-arm-kernel

Hi John,

On 08/21/2018 05:39 AM, John Stultz wrote:
> On Sun, Jul 22, 2018 at 6:57 PM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> There has been some confusion around what is necessary to prevent kexec
>> overwriting important memory regions. memblock: reserve, or nomap?
>> Only memblock nomap regions are reported via /proc/iomem, kexec's
>> user-space doesn't know about memblock_reserve()d regions.
>>
>> Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory
>> as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
>> and thus possible for kexec to overwrite with the new kernel or initrd.
>> But this was always broken, as the UEFI memory map is also reserved
>> and not marked as nomap.
>>
>> Exporting both nomap and reserved memblock types is a nuisance as
>> they live in different memblock structures which we can't walk at
>> the same time.
>>
>> Take a second walk over memblock.reserved and add new 'reserved'
>> subnodes for the memblock_reserved() regions that aren't already
>> described by the existing code. (e.g. Kernel Code)
>>
>> We use reserve_region_with_split() to find the gaps in existing named
>> regions. This handles the gap between 'kernel code' and 'kernel data'
>> which is memblock_reserve()d, but already partially described by
>> request_standard_resources(). e.g.:
>> | 80000000-dfffffff : System RAM
>> |   80080000-80ffffff : Kernel code
>> |   81000000-8158ffff : reserved
>> |   81590000-8237efff : Kernel data
>> |   a0000000-dfffffff : Crash kernel
>> | e00f0000-f949ffff : System RAM
>>
>> reserve_region_with_split needs kzalloc() which isn't available when
>> request_standard_resources() is called, use an initcall.

>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 30ad2f085d1f..5b4fac434c84 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -241,6 +241,44 @@ static void __init request_standard_resources(void)

>> +static int __init reserve_memblock_reserved_regions(void)

>> +       for_each_reserved_mem_region(i, &start, &end) {
>> +               if (end <= roundup_end)
>> +                       continue; /* done already */
>> +
>> +               start = __pfn_to_phys(PFN_DOWN(start));
>> +               end = __pfn_to_phys(PFN_UP(end)) - 1;
>> +               roundup_end = end;
>> +
>> +               res = kzalloc(sizeof(*res), GFP_ATOMIC);
>> +               if (WARN_ON(!res))
>> +                       return -ENOMEM;
>> +               res->start = start;
>> +               res->end = end;
>> +               res->name  = "reserved";
>> +               res->flags = IORESOURCE_MEM;
>> +
>> +               mem = request_resource_conflict(&iomem_resource, res);
>> +               /*
>> +                * We expected memblock_reserve() regions to conflict with
>> +                * memory created by request_standard_resources().
>> +                */
>> +               if (WARN_ON_ONCE(!mem))
>> +                       continue;
>> +               kfree(res);
>> +
>> +               reserve_region_with_split(mem, start, end, "reserved");
>> +       }
>> +
>> +       return 0;
>> +}
>> +arch_initcall(reserve_memblock_reserved_regions);
>> +
> 
> Since this patch landed, on the HiKey board at bootup I'm seeing:
> 
> [    0.451884] WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/setup.c:271
> reserve_memblock_reserved_regions+0xd4/0x13c
> [    0.451896] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> 4.18.0-10758-ga534dc3 #709
> [    0.451903] Hardware name: HiKey Development Board (DT)
> [    0.451913] pstate: 80400005 (Nzcv daif +PAN -UAO)
> [    0.451922] pc : reserve_memblock_reserved_regions+0xd4/0x13c
> [    0.451931] lr : reserve_memblock_reserved_regions+0xcc/0x13c
> [    0.451938] sp : ffffff8008053d30
> [    0.451945] x29: ffffff8008053d30 x28: ffffff8008ebe650
> [    0.451957] x27: ffffff8008ead060 x26: ffffff8008e113b0
> [    0.451969] x25: 0000000000000000 x24: 0000000000488020
> [    0.451981] x23: 0000000021ffffff x22: ffffff8008e0d860
> [    0.451993] x21: ffffff8008d74370 x20: ffffff8009019000
> [    0.452005] x19: ffffffc07507a400 x18: ffffff8009019a48
> [    0.452017] x17: 0000000000000000 x16: 0000000000000000
> [    0.452028] x15: ffffff80890e973f x14: 0000000000000006
> [    0.452040] x13: 0000000000000000 x12: 0000000000000000
> [    0.452051] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> [    0.452063] x9 : 0000000000000000 x8 : ffffffc07507a480
> [    0.452074] x7 : 0000000000000000 x6 : ffffffc07ffffc30
> [    0.452086] x5 : 0000000000000000 x4 : 0000000021ffffff
> [    0.452097] x3 : 0000000000000001 x2 : 0000000000000001
> [    0.452109] x1 : 0000000000000000 x0 : 0000000000000000
> [    0.452121] Call trace:
> [    0.452130]  reserve_memblock_reserved_regions+0xd4/0x13c
> [    0.452140]  do_one_initcall+0x78/0x150
> [    0.452148]  kernel_init_freeable+0x198/0x258
> [    0.452159]  kernel_init+0x10/0x108
> [    0.452170]  ret_from_fork+0x10/0x18
> [    0.452181] ---[ end trace b4b78c443df3a750 ]---
> 
>  From skimming the patch, it seems this is maybe expected? Or should
> this warning raise eyebrows? I can't quite figure it out.

Ugh, sorry for the noise! This is warning that there is something wrong
with our assumptions about what types of memory exist.


> It seems to trigger on the pstore memory at 0x21f00000-0x21ffffff.

... pmem ...

> 
> /proc/iomem now has:
> ...
> 07410000-21efffff : System RAM
>    11000000-1113cfff : reserved

> 21f00000-21ffffff : reserved

^ This entry is what triggered the warning.

It expects that meblock_reserved() memory is also described as memory.
(memblock keeps them as separate lists, so its possible to reserve
memory that doesn't exist... which it looks like your system is doing)

If this region was described as memory, request_standard_resources()
would have put down a "System RAM" resource to cover it. This code is
trying to add a "reserved" underneath that, to discover the top level
doesn't exist. The reserved entry gets left in place as this region is
described as reserved...

Akashi and I figured this would only happen if a region is described as
nomap-memory and memblock_reserved() at the same time, which we don't
think is valid, hence the warning.

This avoided walking each reserved region, to work out which parts of it
were memory, nomap-memory (and now, not memory at all!)


>    21f00000-21f1ffff : persistent_ram
>    21f20000-21f3ffff : persistent_ram
>    21f40000-21f5ffff : persistent_ram
>    21f60000-21f7ffff : persistent_ram
>    21f80000-21f9ffff : persistent_ram
>    21fa0000-21fbffff : persistent_ram
>    21fc0000-21fdffff : persistent_ram
>    21fe0000-21ffffff : persistent_ram
> 22000000-34ffffff : System RAM
> ...
> 
> Where previously it had:
> ...
> 07410000-21efffff : System RAM
> 21f00000-21f1ffff : persistent_ram
> 21f20000-21f3ffff : persistent_ram
> 21f40000-21f5ffff : persistent_ram
> 21f60000-21f7ffff : persistent_ram
> 21f80000-21f9ffff : persistent_ram
> 21fa0000-21fbffff : persistent_ram
> 21fc0000-21fdffff : persistent_ram
> 21fe0000-21ffffff : persistent_ram
> 22000000-34ffffff : System RAM

So, this is a memblock_reserved() range that isn't actually memory.

This happens because your DT has carved these regions out of the memory
node, but added a named 'reserved-memory' region for them, just in case?
I'm not sure what it means if 'reserved-memory' is not also described as
memory....

You do need the carve-out, otherwise this gets covered by the linear
map, and when you throw in that 'unbuffered' property you get both a
cacheable and uncacheable mapping of the same page.

... and if you have the carve-out, you don't need the reserved-memory as
this is effectively a device which the driver knows the
memory-attributes for...

I think the problem here is making 'persistent memory' e.g. byte-addressable flash 
and 'reserved memory' (RAM) mean the same thing.

Something like [0] should mute the warning, but this assumes your
memblock_reserved() region didn't get merged with another reserved region
that really is memory, so just doing this makes it more fragile...

I will see if I can come up with something better.


Thanks,

James


[0] not even build tested
----------------------%<----------------------
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 5b4fac4..c64541b 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -266,10 +266,15 @@ static int __init reserve_memblock_reserved_regions(void)
                 mem = request_resource_conflict(&iomem_resource, res);
                 /*
                  * We expected memblock_reserve() regions to conflict with
-                * memory created by request_standard_resources().
+                * memory created by request_standard_resources(). This doesn't
+                * happen if we found a region that is memblock_reserved(), but
+                * not actually memory.
                  */
-               if (WARN_ON_ONCE(!mem))
+               if (!mem) {
+                       remove_resource(res);
+                       kfree(res);
                         continue;
+               }
                 kfree(res);

                 reserve_region_with_split(mem, start, end, "reserved");
----------------------%<----------------------




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem
  2018-08-21 10:22     ` James Morse
@ 2018-08-21 16:48       ` James Morse
  2018-08-21 19:38       ` John Stultz
  1 sibling, 0 replies; 12+ messages in thread
From: James Morse @ 2018-08-21 16:48 UTC (permalink / raw)
  To: John Stultz, AKASHI Takahiro
  Cc: Catalin Marinas, Will Deacon, Rafael J. Wysocki, Len Brown,
	Ard Biesheuvel, Mark Rutland, Lorenzo Pieralisi, G Gregory,
	al.stone, bhsharma, tbaicar, kexec, lkml, hanjun.guo,
	Sudeep Holla, dyoung, linux-arm-kernel

On 08/21/2018 11:22 AM, James Morse wrote:
> On 08/21/2018 05:39 AM, John Stultz wrote:
>> On Sun, Jul 22, 2018 at 6:57 PM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>>> From: James Morse <james.morse@arm.com>
>>>
>>> There has been some confusion around what is necessary to prevent kexec
>>> overwriting important memory regions. memblock: reserve, or nomap?
>>> Only memblock nomap regions are reported via /proc/iomem, kexec's
>>> user-space doesn't know about memblock_reserve()d regions.
>>>
>>> Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory
>>> as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
>>> and thus possible for kexec to overwrite with the new kernel or initrd.
>>> But this was always broken, as the UEFI memory map is also reserved
>>> and not marked as nomap.
>>>
>>> Exporting both nomap and reserved memblock types is a nuisance as
>>> they live in different memblock structures which we can't walk at
>>> the same time.
>>>
>>> Take a second walk over memblock.reserved and add new 'reserved'
>>> subnodes for the memblock_reserved() regions that aren't already
>>> described by the existing code. (e.g. Kernel Code)
>>>
>>> We use reserve_region_with_split() to find the gaps in existing named
>>> regions. This handles the gap between 'kernel code' and 'kernel data'
>>> which is memblock_reserve()d, but already partially described by
>>> request_standard_resources(). e.g.:
>>> | 80000000-dfffffff : System RAM
>>> |   80080000-80ffffff : Kernel code
>>> |   81000000-8158ffff : reserved
>>> |   81590000-8237efff : Kernel data
>>> |   a0000000-dfffffff : Crash kernel
>>> | e00f0000-f949ffff : System RAM
>>>
>>> reserve_region_with_split needs kzalloc() which isn't available when
>>> request_standard_resources() is called, use an initcall.
> 
>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> index 30ad2f085d1f..5b4fac434c84 100644
>>> --- a/arch/arm64/kernel/setup.c
>>> +++ b/arch/arm64/kernel/setup.c
>>> @@ -241,6 +241,44 @@ static void __init request_standard_resources(void)
> 
>>> +static int __init reserve_memblock_reserved_regions(void)
> 
>>> +       for_each_reserved_mem_region(i, &start, &end) {
>>> +               if (end <= roundup_end)
>>> +                       continue; /* done already */
>>> +
>>> +               start = __pfn_to_phys(PFN_DOWN(start));
>>> +               end = __pfn_to_phys(PFN_UP(end)) - 1;
>>> +               roundup_end = end;
>>> +
>>> +               res = kzalloc(sizeof(*res), GFP_ATOMIC);
>>> +               if (WARN_ON(!res))
>>> +                       return -ENOMEM;
>>> +               res->start = start;
>>> +               res->end = end;
>>> +               res->name  = "reserved";
>>> +               res->flags = IORESOURCE_MEM;
>>> +
>>> +               mem = request_resource_conflict(&iomem_resource, res);
>>> +               /*
>>> +                * We expected memblock_reserve() regions to conflict with
>>> +                * memory created by request_standard_resources().
>>> +                */
>>> +               if (WARN_ON_ONCE(!mem))


>> Since this patch landed, on the HiKey board at bootup I'm seeing:
>>
>> [    0.451884] WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/setup.c:271
>> reserve_memblock_reserved_regions+0xd4/0x13c

>>  From skimming the patch, it seems this is maybe expected? Or should
>> this warning raise eyebrows? I can't quite figure it out.
> 
> Ugh, sorry for the noise! This is warning that there is something wrong
> with our assumptions about what types of memory exist.
> 
> 
>> It seems to trigger on the pstore memory at 0x21f00000-0x21ffffff.
> 
> ... pmem ...


> So, this is a memblock_reserved() range that isn't actually memory.
> 
> This happens because your DT has carved these regions out of the memory
> node, but added a named 'reserved-memory' region for them, just in case?
> I'm not sure what it means if 'reserved-memory' is not also described as
> memory....
> 
> You do need the carve-out, otherwise this gets covered by the linear
> map, and when you throw in that 'unbuffered' property you get both a
> cacheable and uncacheable mapping of the same page.


Hmm, looks like its (even) more complicated than I thought, of_reserved_mem.c's 
definition of 'nomap' is memblock_remove(), not memblock_mark_nomap().

This might be more common to all users of DTs memreserve.


Thanks,

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem
  2018-08-21 10:22     ` James Morse
  2018-08-21 16:48       ` James Morse
@ 2018-08-21 19:38       ` John Stultz
  2018-08-23 12:35         ` James Morse
  1 sibling, 1 reply; 12+ messages in thread
From: John Stultz @ 2018-08-21 19:38 UTC (permalink / raw)
  To: James Morse
  Cc: AKASHI Takahiro, Catalin Marinas, Will Deacon, Rafael J. Wysocki,
	Len Brown, Ard Biesheuvel, Mark Rutland, Lorenzo Pieralisi,
	G Gregory, al.stone, bhsharma, tbaicar, kexec, lkml, hanjun.guo,
	Sudeep Holla, dyoung, linux-arm-kernel

On Tue, Aug 21, 2018 at 3:22 AM, James Morse <james.morse@arm.com> wrote:
> On 08/21/2018 05:39 AM, John Stultz wrote:
>>
>> Since this patch landed, on the HiKey board at bootup I'm seeing:
>>
>> [    0.451884] WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/setup.c:271
>> reserve_memblock_reserved_regions+0xd4/0x13c
...
>>  From skimming the patch, it seems this is maybe expected? Or should
>> this warning raise eyebrows? I can't quite figure it out.
>
> Ugh, sorry for the noise! This is warning that there is something wrong
> with our assumptions about what types of memory exist.
>
>
>> It seems to trigger on the pstore memory at 0x21f00000-0x21ffffff.
>
>
> ... pmem ...
>
>>
>> /proc/iomem now has:
>> ...
>> 07410000-21efffff : System RAM
>>    11000000-1113cfff : reserved
>
>
>> 21f00000-21ffffff : reserved
>
>
> ^ This entry is what triggered the warning.
>
> It expects that meblock_reserved() memory is also described as memory.
> (memblock keeps them as separate lists, so its possible to reserve
> memory that doesn't exist... which it looks like your system is doing)

So yea, I suspect the hikey dts isn't quite right here then. I've
always thought how it setup the memory was a bit strange:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts#n30

So from your mail, I suspect the hole in the memory map for the pstore
buffer isn't correct, and we should rework that.

I'll give that a shot here and make sure pstore still works properly.

thanks
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem
  2018-08-21 19:38       ` John Stultz
@ 2018-08-23 12:35         ` James Morse
  0 siblings, 0 replies; 12+ messages in thread
From: James Morse @ 2018-08-23 12:35 UTC (permalink / raw)
  To: John Stultz
  Cc: AKASHI Takahiro, Catalin Marinas, Will Deacon, Rafael J. Wysocki,
	Len Brown, Ard Biesheuvel, Mark Rutland, Lorenzo Pieralisi,
	G Gregory, al.stone, bhsharma, tbaicar, kexec, lkml, hanjun.guo,
	Sudeep Holla, dyoung, linux-arm-kernel

Hi John,

On 21/08/18 20:38, John Stultz wrote:
> On Tue, Aug 21, 2018 at 3:22 AM, James Morse <james.morse@arm.com> wrote:
>> On 08/21/2018 05:39 AM, John Stultz wrote:
>>>
>>> Since this patch landed, on the HiKey board at bootup I'm seeing:
>>>
>>> [    0.451884] WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/setup.c:271
>>> reserve_memblock_reserved_regions+0xd4/0x13c
> ...
>>>  From skimming the patch, it seems this is maybe expected? Or should
>>> this warning raise eyebrows? I can't quite figure it out.
>>

>>> /proc/iomem now has:
>>> ...
>>> 07410000-21efffff : System RAM
>>>    11000000-1113cfff : reserved
>>
>>
>>> 21f00000-21ffffff : reserved
>>
>>
>> ^ This entry is what triggered the warning.
>>
>> It expects that meblock_reserved() memory is also described as memory.
>> (memblock keeps them as separate lists, so its possible to reserve
>> memory that doesn't exist... which it looks like your system is doing)
> 
> So yea, I suspect the hikey dts isn't quite right here then.

The DT mem-reserve code goes and memblock_removes() some regions, instead of
marking them nomap.

Given memblock has a for_each_resv_unavail_range() that explicitly walks
reserved && !memory, it looks like this is expected, and its just me thinking
this is strange.

I will come up with a version of this patch that walks the 'System RAM'
resources that were created during boot, and adds the memblock_reserved()
regions to them, which should stop this happening.


Thanks,

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-08-23 12:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-23  1:57 [PATCH v4 0/5] arm64: kexec,kdump: fix boot failures on acpi-only system AKASHI Takahiro
2018-07-23  1:57 ` [PATCH v4 1/5] arm64: export memblock_reserve()d regions via /proc/iomem AKASHI Takahiro
2018-08-21  4:39   ` John Stultz
2018-08-21  6:07     ` AKASHI Takahiro
2018-08-21 10:22     ` James Morse
2018-08-21 16:48       ` James Morse
2018-08-21 19:38       ` John Stultz
2018-08-23 12:35         ` James Morse
2018-07-23  1:57 ` [PATCH v4 2/5] drivers: acpi: add dependency of EFI for arm64 AKASHI Takahiro
2018-07-23  1:57 ` [PATCH v4 3/5] efi/arm: preserve early mapping of UEFI memory map longer for BGRT AKASHI Takahiro
2018-07-23  1:57 ` [PATCH v4 4/5] efi/arm: map UEFI memory map even w/o runtime services enabled AKASHI Takahiro
2018-07-23  1:57 ` [PATCH v4 5/5] arm64: acpi: fix alignment fault in accessing ACPI AKASHI Takahiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).