All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump.
@ 2018-04-02  6:29 Mahesh J Salgaonkar
  2018-04-02  6:29 ` [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-02  6:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. This reserved
memory is used for saving the contents of old crashed kernel's memory before
fadump capture kernel uses old kernel's memory area to boot. However, This
reserved memory area stays unused until system crash and isn't available
for production kernel to use.

Instead of setting aside a significant chunk of memory that nobody can use,
take advantage Linux kernel's Contiguous Memory Allocator (CMA) feature,
to reserve a significant chunk of memory that the kernel is prevented from
using , but applications are free to use it.

Patch 1 moves the metadata region to the start of the reserved area for easy
handling/detection of metadata region in second kernel.
Patch 3 implements the usage of CMA region to allow production kernel to
use that memory for applications usage, making fadump reservationless.
Patch 4-7 fixes various fadump issues and bugs.

Changes in V3:
- patch 1 & 2: move metadata region and documentation update.
- patch 7: Un-register the faudmp on kexec path


---

Mahesh Salgaonkar (7):
      powerpc/fadump: Move the metadata region to start of the reserved area.
      powerpc/fadump: Update documentation to reflect the metadata region movement.
      powerpc/fadump: Reservationless firmware assisted dump
      powerpc/fadump: exclude memory holes while reserving memory in second kernel.
      powerpc/fadump: throw proper error message on fadump registration failure.
      powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
      powerpc/fadump: un-register fadump on kexec path.


 Documentation/powerpc/firmware-assisted-dump.txt |   31 ++
 arch/powerpc/include/asm/fadump.h                |    6 
 arch/powerpc/kernel/fadump.c                     |  275 +++++++++++++++++++---
 arch/powerpc/platforms/pseries/hotplug-memory.c  |    7 -
 4 files changed, 271 insertions(+), 48 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.
  2018-04-02  6:29 [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
@ 2018-04-02  6:29 ` Mahesh J Salgaonkar
  2018-04-03 19:26   ` Hari Bathini
  2018-04-02  6:30 ` [PATCH v3 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement Mahesh J Salgaonkar
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-02  6:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Currently the metadata region that holds crash info structure and ELF core
header is placed towards the end of reserved memory area. This patch places
it at the beginning of the reserved memory area.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |    4 ++
 arch/powerpc/kernel/fadump.c      |   92 ++++++++++++++++++++++++++++++++-----
 2 files changed, 83 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 5a23010af600..44b6ebfe9be6 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -61,6 +61,9 @@
 #define FADUMP_UNREGISTER	2
 #define FADUMP_INVALIDATE	3
 
+/* Number of dump sections requested by kernel */
+#define FADUMP_NUM_SECTIONS	4
+
 /* Dump status flag */
 #define FADUMP_ERROR_FLAG	0x2000
 
@@ -119,6 +122,7 @@ struct fadump_mem_struct {
 	struct fadump_section		cpu_state_data;
 	struct fadump_section		hpte_region;
 	struct fadump_section		rmr_region;
+	struct fadump_section		metadata_region;
 };
 
 /* Firmware-assisted dump configuration details. */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3c2c2688918f..80552fd7d5f8 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -188,17 +188,48 @@ static void fadump_show_config(void)
 	pr_debug("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
 }
 
+static unsigned long get_fadump_metadata_size(void)
+{
+	unsigned long size = 0;
+
+	/*
+	 * If fadump is active then look into fdm_active to get metadata
+	 * size set by previous kernel.
+	 */
+	if (fw_dump.dump_active) {
+		size = fdm_active->cpu_state_data.destination_address -
+				fdm_active->metadata_region.source_address;
+		goto out;
+	}
+	size += sizeof(struct fadump_crash_info_header);
+	size += sizeof(struct elfhdr); /* ELF core header.*/
+	size += sizeof(struct elf_phdr); /* place holder for cpu notes */
+	/* Program headers for crash memory regions. */
+	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
+
+	size = PAGE_ALIGN(size);
+out:
+	pr_debug("fadump Metadata size is %ld\n", size);
+	return size;
+}
+
 static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 				unsigned long addr)
 {
+	uint16_t num_sections = 0;
+	unsigned long metadata_base = 0;
+
 	if (!fdm)
 		return 0;
 
+	/* Skip the fadump metadata area. */
+	metadata_base = addr;
+	addr += get_fadump_metadata_size();
+
 	memset(fdm, 0, sizeof(struct fadump_mem_struct));
 	addr = addr & PAGE_MASK;
 
 	fdm->header.dump_format_version = cpu_to_be32(0x00000001);
-	fdm->header.dump_num_sections = cpu_to_be16(3);
 	fdm->header.dump_status_flag = 0;
 	fdm->header.offset_first_dump_section =
 		cpu_to_be32((u32)offsetof(struct fadump_mem_struct, cpu_state_data));
@@ -222,6 +253,7 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 	fdm->cpu_state_data.source_address = 0;
 	fdm->cpu_state_data.source_len = cpu_to_be64(fw_dump.cpu_state_data_size);
 	fdm->cpu_state_data.destination_address = cpu_to_be64(addr);
+	num_sections++;
 	addr += fw_dump.cpu_state_data_size;
 
 	/* hpte region section */
@@ -230,6 +262,7 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 	fdm->hpte_region.source_address = 0;
 	fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size);
 	fdm->hpte_region.destination_address = cpu_to_be64(addr);
+	num_sections++;
 	addr += fw_dump.hpte_region_size;
 
 	/* RMA region section */
@@ -238,8 +271,25 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 	fdm->rmr_region.source_address = cpu_to_be64(RMA_START);
 	fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size);
 	fdm->rmr_region.destination_address = cpu_to_be64(addr);
+	num_sections++;
 	addr += fw_dump.boot_memory_size;
 
+	/*
+	 * fadump metadata section.
+	 * Add an entry with source len a zero. We are only intereseted in
+	 * source address which will help us to detect the location of
+	 * metadata area where faump header and elf core header is placed.
+	 */
+	fdm->metadata_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG);
+	fdm->metadata_region.source_data_type =
+					cpu_to_be16(FADUMP_REAL_MODE_REGION);
+	fdm->metadata_region.source_address = cpu_to_be64(metadata_base);
+	fdm->metadata_region.source_len = 0;
+	fdm->metadata_region.destination_address = cpu_to_be64(addr);
+	num_sections++;
+
+	fdm->header.dump_num_sections = cpu_to_be16(num_sections);
+	BUILD_BUG_ON(num_sections != FADUMP_NUM_SECTIONS);
 	return addr;
 }
 
@@ -325,16 +375,17 @@ static unsigned long get_fadump_area_size(void)
 	size += fw_dump.cpu_state_data_size;
 	size += fw_dump.hpte_region_size;
 	size += fw_dump.boot_memory_size;
-	size += sizeof(struct fadump_crash_info_header);
-	size += sizeof(struct elfhdr); /* ELF core header.*/
-	size += sizeof(struct elf_phdr); /* place holder for cpu notes */
-	/* Program headers for crash memory regions. */
-	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
-
+	size += get_fadump_metadata_size();
 	size = PAGE_ALIGN(size);
 	return size;
 }
 
+static inline unsigned long get_fadump_metadata_base(
+			const struct fadump_mem_struct *fdm)
+{
+	return be64_to_cpu(fdm->metadata_region.source_address);
+}
+
 int __init fadump_reserve_mem(void)
 {
 	unsigned long base, size, memory_boundary;
@@ -395,9 +446,9 @@ int __init fadump_reserve_mem(void)
 				(unsigned long)(size >> 20),
 				(unsigned long)(base >> 20));
 
-		fw_dump.fadumphdr_addr =
-				be64_to_cpu(fdm_active->rmr_region.destination_address) +
-				be64_to_cpu(fdm_active->rmr_region.source_len);
+		pr_info("Number of kernel Dump sections: %d\n",
+			be16_to_cpu(fdm_active->header.dump_num_sections));
+		fw_dump.fadumphdr_addr = get_fadump_metadata_base(fdm_active);
 		pr_debug("fadumphdr_addr = %p\n",
 				(void *) fw_dump.fadumphdr_addr);
 	} else {
@@ -803,14 +854,24 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
 static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 {
 	struct fadump_crash_info_header *fdh;
+	uint16_t num_sections = 0;
 	int rc = 0;
 
 	if (!fdm_active || !fw_dump.fadumphdr_addr)
 		return -EINVAL;
 
+	/* Check if platform has honored all 4 dump sections requested. */
+	num_sections = be16_to_cpu(fdm_active->header.dump_num_sections);
+	if (num_sections < FADUMP_NUM_SECTIONS) {
+		printk(KERN_ERR "Dump taken by platform does not "
+				"contain all requested sections\n");
+		return -EINVAL;
+	}
+
 	/* Check if the dump data is valid. */
 	if ((be16_to_cpu(fdm_active->header.dump_status_flag) == FADUMP_ERROR_FLAG) ||
 			(fdm_active->cpu_state_data.error_flags != 0) ||
+			(fdm_active->metadata_region.error_flags != 0) ||
 			(fdm_active->rmr_region.error_flags != 0)) {
 		printk(KERN_ERR "Dump taken by platform is not valid\n");
 		return -EINVAL;
@@ -821,6 +882,11 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 		printk(KERN_ERR "Dump taken by platform is incomplete\n");
 		return -EINVAL;
 	}
+	if ((fdm_active->metadata_region.bytes_dumped !=
+			fdm_active->metadata_region.source_len)) {
+		printk(KERN_ERR "Dump taken by platform is incomplete\n");
+		return -EINVAL;
+	}
 
 	/* Validate the fadump crash info header */
 	fdh = __va(fw_dump.fadumphdr_addr);
@@ -1082,7 +1148,7 @@ static int register_fadump(void)
 
 	fadump_setup_crash_memory_ranges();
 
-	addr = be64_to_cpu(fdm.rmr_region.destination_address) + be64_to_cpu(fdm.rmr_region.source_len);
+	addr = get_fadump_metadata_base(&fdm);
 	/* Initialize fadump crash info header. */
 	addr = init_fadump_header(addr);
 	vaddr = __va(addr);
@@ -1153,7 +1219,7 @@ void fadump_cleanup(void)
 	/* Invalidate the registration only if dump is active. */
 	if (fw_dump.dump_active) {
 		init_fadump_mem_struct(&fdm,
-			be64_to_cpu(fdm_active->cpu_state_data.destination_address));
+				get_fadump_metadata_base(fdm_active));
 		fadump_invalidate_dump(&fdm);
 	}
 }
@@ -1236,7 +1302,7 @@ static void fadump_invalidate_release_mem(void)
 		return;
 	}
 
-	destination_address = be64_to_cpu(fdm_active->cpu_state_data.destination_address);
+	destination_address = get_fadump_metadata_base(fdm_active);
 	fadump_cleanup();
 	mutex_unlock(&fadump_mutex);
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement.
  2018-04-02  6:29 [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
  2018-04-02  6:29 ` [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
@ 2018-04-02  6:30 ` Mahesh J Salgaonkar
  2018-04-02  6:30 ` [PATCH v3 3/7] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-02  6:30 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Metadata region that holds fadump header and ELF header is now placed at
the start of reserved memory area. Update the documentation accordingly

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.txt |   31 +++++++++++++---------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
index bdd344aa18d9..c77f8dc9231f 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -115,17 +115,24 @@ be kept permanently reserved, so that it can act as a receptacle
 for a copy of the boot memory content in addition to CPU state
 and HPTE region, in the case a crash does occur.
 
+The first kernel, during fadump registration, prepares ELF core header
+that contains necessary information for the coredump of the 1st kernel
+which includes its physical memory layout. This ELF header and some more
+additional data is stored in the area called metadata region at the start
+of the reserved memory area.
+
   o Memory Reservation during first kernel
 
   Low memory                                         Top of memory
   0      boot memory size                                       |
   |           |                |<--Reserved dump area -->|      |
   V           V                |   Permanent Reservation |      V
-  +-----------+----------/ /---+---+----+-----------+----+------+
-  |           |                |CPU|HPTE|  DUMP     |ELF |      |
-  +-----------+----------/ /---+---+----+-----------+----+------+
-        |                                           ^
-        |                                           |
+  +-----------+----------/ /---+----+---+----+-----------+------+
+  |           |                |ELF |CPU|HPTE|  DUMP     |      |
+  +-----------+---------/ /----+----+---+----+-----------+------+
+        |                        ^                  ^
+        |                        |                  |
+        |                metadata region            |
         \                                           /
          -------------------------------------------
           Boot memory content gets transferred to
@@ -137,14 +144,14 @@ and HPTE region, in the case a crash does occur.
 
   Low memory                                        Top of memory
   0      boot memory size                                       |
-  |           |<------------- Reserved dump area ----------- -->|
+  |           |<------------- Reserved dump area -------------->|
   V           V                                                 V
-  +-----------+----------/ /---+---+----+-----------+----+------+
-  |           |                |CPU|HPTE|  DUMP     |ELF |      |
-  +-----------+----------/ /---+---+----+-----------+----+------+
-        |                                              |
-        V                                              V
-   Used by second                                /proc/vmcore
+  +-----------+----------/ /---+----+----+---------------+------+
+  |           |                |ELF |CPU|HPTE|  DUMP     |      |
+  +-----------+----------/ /---+----+----+---------------+------+
+        |                         |
+        V                         V
+   Used by second             /proc/vmcore
    kernel to boot
                    Fig. 2
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/7] powerpc/fadump: Reservationless firmware assisted dump
  2018-04-02  6:29 [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
  2018-04-02  6:29 ` [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
  2018-04-02  6:30 ` [PATCH v3 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement Mahesh J Salgaonkar
@ 2018-04-02  6:30 ` Mahesh J Salgaonkar
  2018-04-02  6:30 ` [PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel Mahesh J Salgaonkar
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-02  6:30 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Ananth N Mavinakayanahalli,
	Aneesh Kumar K.V, Ananth Narayan, Hari Bathini, Nathan Fontenot,
	Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.

In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.

In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".

We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump. Fadump can now only capture kernel memory.

Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it.

Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7557         193        6822          12         541        6725
Swap:          4095           0        4095

With this patch:
[root@zzxx-yy10 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           8133         194        7464          12         475        7338
Swap:          4095           0        4095

Changes made here are completely transparent to how fadump has
traditionally worked.

Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.

TODO:
- Handle case where CMA reservation spans nodes.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |  122 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 104 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 80552fd7d5f8..011f1aa7abab 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -34,6 +34,7 @@
 #include <linux/crash_dump.h>
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
+#include <linux/cma.h>
 
 #include <asm/debugfs.h>
 #include <asm/page.h>
@@ -45,11 +46,57 @@
 static struct fw_dump fw_dump;
 static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;
+static struct cma *fadump_cma;
 
 static DEFINE_MUTEX(fadump_mutex);
 struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
 int crash_mem_ranges;
 
+/*
+ * fadump_cma_reserve() - reserve area for fadump memory reservation
+ *
+ * This function reserves memory from early allocator. It should be
+ * called by arch specific code once the memblock allocator
+ * has been activated.
+ */
+int __init fadump_cma_reserve(void)
+{
+	unsigned long long base, size;
+	int rc;
+
+	if (!fw_dump.fadump_enabled)
+		return 0;
+
+	base = fw_dump.reserve_dump_area_start;
+	size = fw_dump.reserve_dump_area_size;
+	pr_debug("Original reserve area base %ld, size %ld\n",
+				(unsigned long)base >> 20,
+				(unsigned long)size >> 20);
+	if (!size)
+		return 0;
+
+	rc = cma_declare_contiguous(base, size, 0, 0, 0, false,
+						"fadump_cma", &fadump_cma);
+	if (rc) {
+		printk(KERN_ERR "fadump: Failed to reserve cma area for "
+				"firmware-assisted dump, %d\n", rc);
+		fw_dump.reserve_dump_area_size = 0;
+		return 0;
+	}
+	/*
+	 * So we now have cma area reserved for fadump. base may be different
+	 * from what we requested.
+	 */
+	fw_dump.reserve_dump_area_start = cma_get_base(fadump_cma);
+	fw_dump.reserve_dump_area_size = cma_get_size(fadump_cma);
+	printk("Reserved %ldMB cma area at %ldMB for firmware-assisted dump "
+			"(System RAM: %ldMB)\n",
+			cma_get_size(fadump_cma) >> 20,
+			(unsigned long)cma_get_base(fadump_cma) >> 20,
+			(unsigned long)(memblock_phys_mem_size() >> 20));
+	return 1;
+}
+
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node,
 			const char *uname, int depth, void *data)
@@ -381,7 +428,7 @@ static unsigned long get_fadump_area_size(void)
 }
 
 static inline unsigned long get_fadump_metadata_base(
-			const struct fadump_mem_struct *fdm)
+				const struct fadump_mem_struct *fdm)
 {
 	return be64_to_cpu(fdm->metadata_region.source_address);
 }
@@ -449,8 +496,9 @@ int __init fadump_reserve_mem(void)
 		pr_info("Number of kernel Dump sections: %d\n",
 			be16_to_cpu(fdm_active->header.dump_num_sections));
 		fw_dump.fadumphdr_addr = get_fadump_metadata_base(fdm_active);
-		pr_debug("fadumphdr_addr = %p\n",
-				(void *) fw_dump.fadumphdr_addr);
+		pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr);
+		fw_dump.reserve_dump_area_start = base;
+		fw_dump.reserve_dump_area_size = size;
 	} else {
 		size = get_fadump_area_size();
 
@@ -467,21 +515,10 @@ int __init fadump_reserve_mem(void)
 			    !memblock_is_region_reserved(base, size))
 				break;
 		}
-		if ((base > (memory_boundary - size)) ||
-		    memblock_reserve(base, size)) {
-			pr_err("Failed to reserve memory\n");
-			return 0;
-		}
-
-		pr_info("Reserved %ldMB of memory at %ldMB for firmware-"
-			"assisted dump (System RAM: %ldMB)\n",
-			(unsigned long)(size >> 20),
-			(unsigned long)(base >> 20),
-			(unsigned long)(memblock_phys_mem_size() >> 20));
+		fw_dump.reserve_dump_area_start = base;
+		fw_dump.reserve_dump_area_size = size;
+		return fadump_cma_reserve();
 	}
-
-	fw_dump.reserve_dump_area_start = base;
-	fw_dump.reserve_dump_area_size = size;
 	return 1;
 }
 
@@ -1134,6 +1171,39 @@ static unsigned long init_fadump_header(unsigned long addr)
 	return addr;
 }
 
+static unsigned long allocate_metadata_area(void)
+{
+	int nr_pages;
+	unsigned long size;
+	struct page *page = NULL;
+
+	/*
+	 * Check if fadump cma region is activated.
+	 * fadump_cma->count == 0 means cma activation has failed. This means
+	 * that the fadump reserved memory now will not be visible/available
+	 * for user applications to use. It will be as good as old fadump
+	 * behaviour of blocking this memory chunk from production system
+	 * use. CMA activation failure does not mean that fadump will not
+	 * work. Will continue to setup fadump.
+	 */
+	if (!fadump_cma || !cma_get_size(fadump_cma)) {
+		pr_warn("fadump cma region activation failed.\n");
+		return 0;
+	}
+
+	size = get_fadump_metadata_size();
+	nr_pages = ALIGN(size, PAGE_SIZE) >> PAGE_SHIFT;
+	pr_info("Fadump metadata size = %ld (nr_pages = %d)\n", size, nr_pages);
+
+	page = cma_alloc(fadump_cma, nr_pages, 0, GFP_KERNEL);
+	if (page) {
+		pr_debug("Allocated fadump metadata area at %ldMB (cma)\n",
+				(unsigned long)page_to_phys(page) >> 20);
+		return page_to_phys(page);
+	}
+	return 0;
+}
+
 static int register_fadump(void)
 {
 	unsigned long addr;
@@ -1556,8 +1626,24 @@ int __init setup_fadump(void)
 			fadump_invalidate_release_mem();
 	}
 	/* Initialize the kernel dump memory structure for FAD registration. */
-	else if (fw_dump.reserve_dump_area_size)
+	else if (fw_dump.reserve_dump_area_size) {
+		/*
+		 * By this time cma area has been activated. Allocate memory
+		 * for metadata from fadump cma region. Since this is very
+		 * early during boot we are guaranteed to get metadata cma
+		 * allocation at address fw_dump.reserve_dump_area_start.
+		 *
+		 * During fadump registration, metadata region is used
+		 * to setup fadump header and ELF core header. We don't want
+		 * this region to be touched by anyone. Allocating metadata
+		 * region memory from fadump cma will make sure that this
+		 * region will not given to any user space application.
+		 * However the rest of the fadump cma memory is still free
+		 * to be used by user applications.
+		 */
+		allocate_metadata_area();
 		init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+	}
 	fadump_init_files();
 
 	return 1;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel.
  2018-04-02  6:29 [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
                   ` (2 preceding siblings ...)
  2018-04-02  6:30 ` [PATCH v3 3/7] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
@ 2018-04-02  6:30 ` Mahesh J Salgaonkar
  2018-04-03  9:51   ` Hari Bathini
  2018-04-02  6:30 ` [PATCH v3 5/7] powerpc/fadump: throw proper error message on fadump registration failure Mahesh J Salgaonkar
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-02  6:30 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

The second kernel, during early boot after the crash, reserves rest of
the memory above boot memory size to make sure it does not touch any of the
dump memory area. It uses memblock_reserve() that reserves the specified
memory region irrespective of memory holes present within that region.
There are chances where previous kernel would have hot removed some of
its memory leaving memory holes behind. In such cases fadump kernel reports
incorrect number of reserved pages through arch_reserved_kernel_pages()
hook causing kernel to hang or panic.

Fix this by excluding memory holes while reserving rest of the memory
above boot memory size during second kernel boot after crash.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 011f1aa7abab..a497e9fb93fe 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -433,6 +433,21 @@ static inline unsigned long get_fadump_metadata_base(
 	return be64_to_cpu(fdm->metadata_region.source_address);
 }
 
+static void fadump_memblock_reserve(unsigned long base, unsigned long size)
+{
+	struct memblock_region *reg;
+	unsigned long start, end;
+
+	for_each_memblock(memory, reg) {
+		start = max(base, (unsigned long)reg->base);
+		end = reg->base + reg->size;
+		end = min(base + size, end);
+
+		if (start < end)
+			memblock_reserve(start, end - start);
+	}
+}
+
 int __init fadump_reserve_mem(void)
 {
 	unsigned long base, size, memory_boundary;
@@ -487,7 +502,7 @@ int __init fadump_reserve_mem(void)
 		 */
 		base = fw_dump.boot_memory_size;
 		size = memory_boundary - base;
-		memblock_reserve(base, size);
+		fadump_memblock_reserve(base, size);
 		printk(KERN_INFO "Reserved %ldMB of memory at %ldMB "
 				"for saving crash dump\n",
 				(unsigned long)(size >> 20),

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/7] powerpc/fadump: throw proper error message on fadump registration failure.
  2018-04-02  6:29 [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
                   ` (3 preceding siblings ...)
  2018-04-02  6:30 ` [PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel Mahesh J Salgaonkar
@ 2018-04-02  6:30 ` Mahesh J Salgaonkar
  2018-04-02  6:30 ` [PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar
  2018-04-02  6:30 ` [PATCH v3 7/7] powerpc/fadump: un-register fadump on kexec path Mahesh J Salgaonkar
  6 siblings, 0 replies; 14+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-02  6:30 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

fadump fails to register when there are holes in reserved memory area.
This can happen if user has hot-removed a memory that falls in the fadump
reserved memory area. Throw a meaningful error message to the user in
such case.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index a497e9fb93fe..59aaf0357a52 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -216,6 +216,36 @@ static int is_boot_memory_area_contiguous(void)
 	return ret;
 }
 
+/*
+ * Returns 1, if there are no holes in reserved memory area,
+ * 0 otherwise.
+ */
+static int is_reserved_memory_area_contiguous(void)
+{
+	struct memblock_region *reg;
+	unsigned long start, end;
+	unsigned long d_start = fw_dump.reserve_dump_area_start;
+	unsigned long d_end = d_start + fw_dump.reserve_dump_area_size;
+	int ret = 0;
+
+	for_each_memblock(memory, reg) {
+		start = max(d_start, (unsigned long)reg->base);
+		end = min(d_end, (unsigned long)(reg->base + reg->size));
+		if (d_start < end) {
+			/* Memory hole from d_start to start */
+			if (start > d_start)
+				break;
+
+			if (end == d_end) {
+				ret = 1;
+				break;
+			}
+			d_start = end + 1;
+		}
+	}
+	return ret;
+}
+
 /* Print firmware assisted dump configurations for debugging purpose. */
 static void fadump_show_config(void)
 {
@@ -602,6 +632,9 @@ static int register_fw_dump(struct fadump_mem_struct *fdm)
 		if (!is_boot_memory_area_contiguous())
 			pr_err("Can't have holes in boot memory area while "
 			       "registering fadump\n");
+		else if (!is_reserved_memory_area_contiguous())
+			pr_err("Can't have holes in reserved memory area while"
+			       " registering fadump\n");
 
 		printk(KERN_ERR "Failed to register firmware-assisted kernel"
 			" dump. Parameter Error(%d).\n", rc);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
  2018-04-02  6:29 [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
                   ` (4 preceding siblings ...)
  2018-04-02  6:30 ` [PATCH v3 5/7] powerpc/fadump: throw proper error message on fadump registration failure Mahesh J Salgaonkar
@ 2018-04-02  6:30 ` Mahesh J Salgaonkar
  2018-04-03  3:18   ` Pingfan Liu
  2018-04-02  6:30 ` [PATCH v3 7/7] powerpc/fadump: un-register fadump on kexec path Mahesh J Salgaonkar
  6 siblings, 1 reply; 14+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-02  6:30 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. But this memory area is currently not
protected from hot-remove operations. Hence, fadump service can fail to
re-register after the hot-remove operation, if hot-removed memory belongs
to fadump reserved region. To avoid this make sure that memory from fadump
reserved area is not hot-removable if fadump is registered.

However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h               |    2 +-
 arch/powerpc/kernel/fadump.c                    |   10 ++++++++--
 arch/powerpc/platforms/pseries/hotplug-memory.c |    7 +++++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 44b6ebfe9be6..d16dc77107a8 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -207,7 +207,7 @@ struct fad_crash_memory_ranges {
 	unsigned long long	size;
 };
 
-extern int is_fadump_boot_memory_area(u64 addr, ulong size);
+extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 59aaf0357a52..2c3c7e655eec 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -162,13 +162,19 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 
 /*
  * If fadump is registered, check if the memory provided
- * falls within boot memory area.
+ * falls within boot memory area and reserved memory area.
  */
-int is_fadump_boot_memory_area(u64 addr, ulong size)
+int is_fadump_memory_area(u64 addr, ulong size)
 {
+	u64 d_start = fw_dump.reserve_dump_area_start;
+	u64 d_end = d_start + fw_dump.reserve_dump_area_size;
+
 	if (!fw_dump.dump_registered)
 		return 0;
 
+	if (((addr + size) > d_start) && (addr <= d_end))
+		return 1;
+
 	return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..e4c658cda3a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -389,8 +389,11 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
 	phys_addr = lmb->base_addr;
 
 #ifdef CONFIG_FA_DUMP
-	/* Don't hot-remove memory that falls in fadump boot memory area */
-	if (is_fadump_boot_memory_area(phys_addr, block_sz))
+	/*
+	 * Don't hot-remove memory that falls in fadump boot memory area
+	 * and memory that is reserved for capturing old kernel memory.
+	 */
+	if (is_fadump_memory_area(phys_addr, block_sz))
 		return false;
 #endif
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 7/7] powerpc/fadump: un-register fadump on kexec path.
  2018-04-02  6:29 [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
                   ` (5 preceding siblings ...)
  2018-04-02  6:30 ` [PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar
@ 2018-04-02  6:30 ` Mahesh J Salgaonkar
  6 siblings, 0 replies; 14+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-02  6:30 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

otherwise the fadump registration in new kexec-ed kernel complains that
fadump is already registered. This makes new kernel to continue using
fadump registered by previous kernel which may lead to invalid vmcore
generation. Hence this patch fixes this issue by un-registering fadump
in fadump_cleanup() which is called during kexec path so that new kernel
can register fadump with new valid values.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 2c3c7e655eec..cb496c8f40f3 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1345,6 +1345,9 @@ void fadump_cleanup(void)
 		init_fadump_mem_struct(&fdm,
 				get_fadump_metadata_base(fdm_active));
 		fadump_invalidate_dump(&fdm);
+	} else if (fw_dump.dump_registered) {
+		/* Un-register Firmware-assisted dump if it was registered. */
+		fadump_unregister_dump(&fdm);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
  2018-04-02  6:30 ` [PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar
@ 2018-04-03  3:18   ` Pingfan Liu
  2018-04-03 11:37     ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 14+ messages in thread
From: Pingfan Liu @ 2018-04-03  3:18 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Srikar Dronamraju, Aneesh Kumar K.V,
	Ananth Narayan, Hari Bathini, Nathan Fontenot, Anshuman Khandual

I think CMA has protected us from hot-remove, so this patch is not necessary.

Regards,
Pingfan

On Mon, Apr 2, 2018 at 2:30 PM, Mahesh J Salgaonkar
<mahesh@linux.vnet.ibm.com> wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> For fadump to work successfully there should not be any holes in reserved
> memory ranges where kernel has asked firmware to move the content of old
> kernel memory in event of crash. But this memory area is currently not
> protected from hot-remove operations. Hence, fadump service can fail to
> re-register after the hot-remove operation, if hot-removed memory belongs
> to fadump reserved region. To avoid this make sure that memory from fadump
> reserved area is not hot-removable if fadump is registered.
>
> However, if user still wants to remove that memory, he can do so by
> manually stopping fadump service before hot-remove operation.
>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/fadump.h               |    2 +-
>  arch/powerpc/kernel/fadump.c                    |   10 ++++++++--
>  arch/powerpc/platforms/pseries/hotplug-memory.c |    7 +++++--
>  3 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
> index 44b6ebfe9be6..d16dc77107a8 100644
> --- a/arch/powerpc/include/asm/fadump.h
> +++ b/arch/powerpc/include/asm/fadump.h
> @@ -207,7 +207,7 @@ struct fad_crash_memory_ranges {
>         unsigned long long      size;
>  };
>
> -extern int is_fadump_boot_memory_area(u64 addr, ulong size);
> +extern int is_fadump_memory_area(u64 addr, ulong size);
>  extern int early_init_dt_scan_fw_dump(unsigned long node,
>                 const char *uname, int depth, void *data);
>  extern int fadump_reserve_mem(void);
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 59aaf0357a52..2c3c7e655eec 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -162,13 +162,19 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
>
>  /*
>   * If fadump is registered, check if the memory provided
> - * falls within boot memory area.
> + * falls within boot memory area and reserved memory area.
>   */
> -int is_fadump_boot_memory_area(u64 addr, ulong size)
> +int is_fadump_memory_area(u64 addr, ulong size)
>  {
> +       u64 d_start = fw_dump.reserve_dump_area_start;
> +       u64 d_end = d_start + fw_dump.reserve_dump_area_size;
> +
>         if (!fw_dump.dump_registered)
>                 return 0;
>
> +       if (((addr + size) > d_start) && (addr <= d_end))
> +               return 1;
> +
>         return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
>  }
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index c1578f54c626..e4c658cda3a7 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -389,8 +389,11 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
>         phys_addr = lmb->base_addr;
>
>  #ifdef CONFIG_FA_DUMP
> -       /* Don't hot-remove memory that falls in fadump boot memory area */
> -       if (is_fadump_boot_memory_area(phys_addr, block_sz))
> +       /*
> +        * Don't hot-remove memory that falls in fadump boot memory area
> +        * and memory that is reserved for capturing old kernel memory.
> +        */
> +       if (is_fadump_memory_area(phys_addr, block_sz))
>                 return false;
>  #endif
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel.
  2018-04-02  6:30 ` [PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel Mahesh J Salgaonkar
@ 2018-04-03  9:51   ` Hari Bathini
  2018-04-03 11:39     ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 14+ messages in thread
From: Hari Bathini @ 2018-04-03  9:51 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Nathan Fontenot, Anshuman Khandual



On Monday 02 April 2018 12:00 PM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> The second kernel, during early boot after the crash, reserves rest of
> the memory above boot memory size to make sure it does not touch any of the
> dump memory area. It uses memblock_reserve() that reserves the specified
> memory region irrespective of memory holes present within that region.
> There are chances where previous kernel would have hot removed some of
> its memory leaving memory holes behind. In such cases fadump kernel reports
> incorrect number of reserved pages through arch_reserved_kernel_pages()
> hook causing kernel to hang or panic.
>
> Fix this by excluding memory holes while reserving rest of the memory
> above boot memory size during second kernel boot after crash.
>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>   arch/powerpc/kernel/fadump.c |   17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 011f1aa7abab..a497e9fb93fe 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -433,6 +433,21 @@ static inline unsigned long get_fadump_metadata_base(
>   	return be64_to_cpu(fdm->metadata_region.source_address);
>   }
>
> +static void fadump_memblock_reserve(unsigned long base, unsigned long size)
> +{
> +	struct memblock_region *reg;
> +	unsigned long start, end;
> +
> +	for_each_memblock(memory, reg) {
> +		start = max(base, (unsigned long)reg->base);
> +		end = reg->base + reg->size;
> +		end = min(base + size, end);
> +
> +		if (start < end)
> +			memblock_reserve(start, end - start);
> +	}
> +}
> +
>   int __init fadump_reserve_mem(void)
>   {
>   	unsigned long base, size, memory_boundary;
> @@ -487,7 +502,7 @@ int __init fadump_reserve_mem(void)
>   		 */
>   		base = fw_dump.boot_memory_size;
>   		size = memory_boundary - base;
> -		memblock_reserve(base, size);
> +		fadump_memblock_reserve(base, size);
>   		printk(KERN_INFO "Reserved %ldMB of memory at %ldMB "

Mahesh, you may want to change this print as well as it would be 
misleading in case of
holes in the memory.

Thanks
Hari

>   				"for saving crash dump\n",
>   				(unsigned long)(size >> 20),
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
  2018-04-03  3:18   ` Pingfan Liu
@ 2018-04-03 11:37     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-04-03 11:37 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linuxppc-dev, Srikar Dronamraju, Aneesh Kumar K.V,
	Ananth Narayan, Hari Bathini, Nathan Fontenot, Anshuman Khandual

On 04/03/2018 08:48 AM, Pingfan Liu wrote:
> I think CMA has protected us from hot-remove, so this patch is not necessary.

Yes, but only if the memory from declared CMA region is allocated using
cma_alloc(). The rest of the memory inside CMA region which hasn't been
cma_allocat-ed can still be hot-removed. Hence this patch is necessary.

fadump allocates only 1 page from fadump CMA region for metadata region.
Rest all memory is free to use by applications and vulnerable to hot-remove.

Thanks,
-Mahesh.

> 
> Regards,
> Pingfan
> 
> On Mon, Apr 2, 2018 at 2:30 PM, Mahesh J Salgaonkar
> <mahesh@linux.vnet.ibm.com> wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> For fadump to work successfully there should not be any holes in reserved
>> memory ranges where kernel has asked firmware to move the content of old
>> kernel memory in event of crash. But this memory area is currently not
>> protected from hot-remove operations. Hence, fadump service can fail to
>> re-register after the hot-remove operation, if hot-removed memory belongs
>> to fadump reserved region. To avoid this make sure that memory from fadump
>> reserved area is not hot-removable if fadump is registered.
>>
>> However, if user still wants to remove that memory, he can do so by
>> manually stopping fadump service before hot-remove operation.
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/fadump.h               |    2 +-
>>  arch/powerpc/kernel/fadump.c                    |   10 ++++++++--
>>  arch/powerpc/platforms/pseries/hotplug-memory.c |    7 +++++--
>>  3 files changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
>> index 44b6ebfe9be6..d16dc77107a8 100644
>> --- a/arch/powerpc/include/asm/fadump.h
>> +++ b/arch/powerpc/include/asm/fadump.h
>> @@ -207,7 +207,7 @@ struct fad_crash_memory_ranges {
>>         unsigned long long      size;
>>  };
>>
>> -extern int is_fadump_boot_memory_area(u64 addr, ulong size);
>> +extern int is_fadump_memory_area(u64 addr, ulong size);
>>  extern int early_init_dt_scan_fw_dump(unsigned long node,
>>                 const char *uname, int depth, void *data);
>>  extern int fadump_reserve_mem(void);
>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>> index 59aaf0357a52..2c3c7e655eec 100644
>> --- a/arch/powerpc/kernel/fadump.c
>> +++ b/arch/powerpc/kernel/fadump.c
>> @@ -162,13 +162,19 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
>>
>>  /*
>>   * If fadump is registered, check if the memory provided
>> - * falls within boot memory area.
>> + * falls within boot memory area and reserved memory area.
>>   */
>> -int is_fadump_boot_memory_area(u64 addr, ulong size)
>> +int is_fadump_memory_area(u64 addr, ulong size)
>>  {
>> +       u64 d_start = fw_dump.reserve_dump_area_start;
>> +       u64 d_end = d_start + fw_dump.reserve_dump_area_size;
>> +
>>         if (!fw_dump.dump_registered)
>>                 return 0;
>>
>> +       if (((addr + size) > d_start) && (addr <= d_end))
>> +               return 1;
>> +
>>         return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
>>  }
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index c1578f54c626..e4c658cda3a7 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -389,8 +389,11 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
>>         phys_addr = lmb->base_addr;
>>
>>  #ifdef CONFIG_FA_DUMP
>> -       /* Don't hot-remove memory that falls in fadump boot memory area */
>> -       if (is_fadump_boot_memory_area(phys_addr, block_sz))
>> +       /*
>> +        * Don't hot-remove memory that falls in fadump boot memory area
>> +        * and memory that is reserved for capturing old kernel memory.
>> +        */
>> +       if (is_fadump_memory_area(phys_addr, block_sz))
>>                 return false;
>>  #endif
>>
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel.
  2018-04-03  9:51   ` Hari Bathini
@ 2018-04-03 11:39     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-04-03 11:39 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Nathan Fontenot, Anshuman Khandual

On 04/03/2018 03:21 PM, Hari Bathini wrote:
> 
> 
> On Monday 02 April 2018 12:00 PM, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> The second kernel, during early boot after the crash, reserves rest of
>> the memory above boot memory size to make sure it does not touch any
>> of the
>> dump memory area. It uses memblock_reserve() that reserves the specified
>> memory region irrespective of memory holes present within that region.
>> There are chances where previous kernel would have hot removed some of
>> its memory leaving memory holes behind. In such cases fadump kernel
>> reports
>> incorrect number of reserved pages through arch_reserved_kernel_pages()
>> hook causing kernel to hang or panic.
>>
>> Fix this by excluding memory holes while reserving rest of the memory
>> above boot memory size during second kernel boot after crash.
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/kernel/fadump.c |   17 ++++++++++++++++-
>>   1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>> index 011f1aa7abab..a497e9fb93fe 100644
>> --- a/arch/powerpc/kernel/fadump.c
>> +++ b/arch/powerpc/kernel/fadump.c
>> @@ -433,6 +433,21 @@ static inline unsigned long
>> get_fadump_metadata_base(
>>       return be64_to_cpu(fdm->metadata_region.source_address);
>>   }
>>
>> +static void fadump_memblock_reserve(unsigned long base, unsigned long
>> size)
>> +{
>> +    struct memblock_region *reg;
>> +    unsigned long start, end;
>> +
>> +    for_each_memblock(memory, reg) {
>> +        start = max(base, (unsigned long)reg->base);
>> +        end = reg->base + reg->size;
>> +        end = min(base + size, end);
>> +
>> +        if (start < end)
>> +            memblock_reserve(start, end - start);
>> +    }
>> +}
>> +
>>   int __init fadump_reserve_mem(void)
>>   {
>>       unsigned long base, size, memory_boundary;
>> @@ -487,7 +502,7 @@ int __init fadump_reserve_mem(void)
>>            */
>>           base = fw_dump.boot_memory_size;
>>           size = memory_boundary - base;
>> -        memblock_reserve(base, size);
>> +        fadump_memblock_reserve(base, size);
>>           printk(KERN_INFO "Reserved %ldMB of memory at %ldMB "
> 
> Mahesh, you may want to change this print as well as it would be
> misleading in case of
> holes in the memory.

Yeah, may be we can just move that printk also inside
fadump_memblock_reserve().

Thanks,
-Mahesh.

> 
> Thanks
> Hari
> 
>>                   "for saving crash dump\n",
>>                   (unsigned long)(size >> 20),
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.
  2018-04-02  6:29 ` [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
@ 2018-04-03 19:26   ` Hari Bathini
  2018-04-05  4:50     ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 14+ messages in thread
From: Hari Bathini @ 2018-04-03 19:26 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, linuxppc-dev
  Cc: Ananth Narayan, kernelfans, Aneesh Kumar K.V, Nathan Fontenot,
	Anshuman Khandual, Srikar Dronamraju

Mahesh, I think we should explicitly document that production and 
capture kernel
versions should be same. For changes like below, older/newer production 
kernel vs
capture kernel is bound to fail. Of course, production and capture 
kernel versions
would be the same case usually but for the uninitiated, mentioning that 
explicitly
in documentation would help..?

Thanks
Hari


On Monday 02 April 2018 11:59 AM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> Currently the metadata region that holds crash info structure and ELF core
> header is placed towards the end of reserved memory area. This patch places
> it at the beginning of the reserved memory area.
>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/fadump.h |    4 ++
>   arch/powerpc/kernel/fadump.c      |   92 ++++++++++++++++++++++++++++++++-----
>   2 files changed, 83 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
> index 5a23010af600..44b6ebfe9be6 100644
> --- a/arch/powerpc/include/asm/fadump.h
> +++ b/arch/powerpc/include/asm/fadump.h
> @@ -61,6 +61,9 @@
>   #define FADUMP_UNREGISTER	2
>   #define FADUMP_INVALIDATE	3
>
> +/* Number of dump sections requested by kernel */
> +#define FADUMP_NUM_SECTIONS	4
> +
>   /* Dump status flag */
>   #define FADUMP_ERROR_FLAG	0x2000
>
> @@ -119,6 +122,7 @@ struct fadump_mem_struct {
>   	struct fadump_section		cpu_state_data;
>   	struct fadump_section		hpte_region;
>   	struct fadump_section		rmr_region;
> +	struct fadump_section		metadata_region;
>   };
>
>   /* Firmware-assisted dump configuration details. */
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 3c2c2688918f..80552fd7d5f8 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -188,17 +188,48 @@ static void fadump_show_config(void)
>   	pr_debug("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
>   }
>
> +static unsigned long get_fadump_metadata_size(void)
> +{
> +	unsigned long size = 0;
> +
> +	/*
> +	 * If fadump is active then look into fdm_active to get metadata
> +	 * size set by previous kernel.
> +	 */
> +	if (fw_dump.dump_active) {
> +		size = fdm_active->cpu_state_data.destination_address -
> +				fdm_active->metadata_region.source_address;
> +		goto out;
> +	}
> +	size += sizeof(struct fadump_crash_info_header);
> +	size += sizeof(struct elfhdr); /* ELF core header.*/
> +	size += sizeof(struct elf_phdr); /* place holder for cpu notes */
> +	/* Program headers for crash memory regions. */
> +	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
> +
> +	size = PAGE_ALIGN(size);
> +out:
> +	pr_debug("fadump Metadata size is %ld\n", size);
> +	return size;
> +}
> +
>   static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
>   				unsigned long addr)
>   {
> +	uint16_t num_sections = 0;
> +	unsigned long metadata_base = 0;
> +
>   	if (!fdm)
>   		return 0;
>
> +	/* Skip the fadump metadata area. */
> +	metadata_base = addr;
> +	addr += get_fadump_metadata_size();
> +
>   	memset(fdm, 0, sizeof(struct fadump_mem_struct));
>   	addr = addr & PAGE_MASK;
>
>   	fdm->header.dump_format_version = cpu_to_be32(0x00000001);
> -	fdm->header.dump_num_sections = cpu_to_be16(3);
>   	fdm->header.dump_status_flag = 0;
>   	fdm->header.offset_first_dump_section =
>   		cpu_to_be32((u32)offsetof(struct fadump_mem_struct, cpu_state_data));
> @@ -222,6 +253,7 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
>   	fdm->cpu_state_data.source_address = 0;
>   	fdm->cpu_state_data.source_len = cpu_to_be64(fw_dump.cpu_state_data_size);
>   	fdm->cpu_state_data.destination_address = cpu_to_be64(addr);
> +	num_sections++;
>   	addr += fw_dump.cpu_state_data_size;
>
>   	/* hpte region section */
> @@ -230,6 +262,7 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
>   	fdm->hpte_region.source_address = 0;
>   	fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size);
>   	fdm->hpte_region.destination_address = cpu_to_be64(addr);
> +	num_sections++;
>   	addr += fw_dump.hpte_region_size;
>
>   	/* RMA region section */
> @@ -238,8 +271,25 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
>   	fdm->rmr_region.source_address = cpu_to_be64(RMA_START);
>   	fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size);
>   	fdm->rmr_region.destination_address = cpu_to_be64(addr);
> +	num_sections++;
>   	addr += fw_dump.boot_memory_size;
>
> +	/*
> +	 * fadump metadata section.
> +	 * Add an entry with source len a zero. We are only intereseted in
> +	 * source address which will help us to detect the location of
> +	 * metadata area where faump header and elf core header is placed.
> +	 */
> +	fdm->metadata_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG);
> +	fdm->metadata_region.source_data_type =
> +					cpu_to_be16(FADUMP_REAL_MODE_REGION);
> +	fdm->metadata_region.source_address = cpu_to_be64(metadata_base);
> +	fdm->metadata_region.source_len = 0;
> +	fdm->metadata_region.destination_address = cpu_to_be64(addr);
> +	num_sections++;
> +
> +	fdm->header.dump_num_sections = cpu_to_be16(num_sections);
> +	BUILD_BUG_ON(num_sections != FADUMP_NUM_SECTIONS);
>   	return addr;
>   }
>
> @@ -325,16 +375,17 @@ static unsigned long get_fadump_area_size(void)
>   	size += fw_dump.cpu_state_data_size;
>   	size += fw_dump.hpte_region_size;
>   	size += fw_dump.boot_memory_size;
> -	size += sizeof(struct fadump_crash_info_header);
> -	size += sizeof(struct elfhdr); /* ELF core header.*/
> -	size += sizeof(struct elf_phdr); /* place holder for cpu notes */
> -	/* Program headers for crash memory regions. */
> -	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
> -
> +	size += get_fadump_metadata_size();
>   	size = PAGE_ALIGN(size);
>   	return size;
>   }
>
> +static inline unsigned long get_fadump_metadata_base(
> +			const struct fadump_mem_struct *fdm)
> +{
> +	return be64_to_cpu(fdm->metadata_region.source_address);
> +}
> +
>   int __init fadump_reserve_mem(void)
>   {
>   	unsigned long base, size, memory_boundary;
> @@ -395,9 +446,9 @@ int __init fadump_reserve_mem(void)
>   				(unsigned long)(size >> 20),
>   				(unsigned long)(base >> 20));
>
> -		fw_dump.fadumphdr_addr =
> -				be64_to_cpu(fdm_active->rmr_region.destination_address) +
> -				be64_to_cpu(fdm_active->rmr_region.source_len);
> +		pr_info("Number of kernel Dump sections: %d\n",
> +			be16_to_cpu(fdm_active->header.dump_num_sections));
> +		fw_dump.fadumphdr_addr = get_fadump_metadata_base(fdm_active);
>   		pr_debug("fadumphdr_addr = %p\n",
>   				(void *) fw_dump.fadumphdr_addr);
>   	} else {
> @@ -803,14 +854,24 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
>   static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
>   {
>   	struct fadump_crash_info_header *fdh;
> +	uint16_t num_sections = 0;
>   	int rc = 0;
>
>   	if (!fdm_active || !fw_dump.fadumphdr_addr)
>   		return -EINVAL;
>
> +	/* Check if platform has honored all 4 dump sections requested. */
> +	num_sections = be16_to_cpu(fdm_active->header.dump_num_sections);
> +	if (num_sections < FADUMP_NUM_SECTIONS) {
> +		printk(KERN_ERR "Dump taken by platform does not "
> +				"contain all requested sections\n");
> +		return -EINVAL;
> +	}
> +
>   	/* Check if the dump data is valid. */
>   	if ((be16_to_cpu(fdm_active->header.dump_status_flag) == FADUMP_ERROR_FLAG) ||
>   			(fdm_active->cpu_state_data.error_flags != 0) ||
> +			(fdm_active->metadata_region.error_flags != 0) ||
>   			(fdm_active->rmr_region.error_flags != 0)) {
>   		printk(KERN_ERR "Dump taken by platform is not valid\n");
>   		return -EINVAL;
> @@ -821,6 +882,11 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
>   		printk(KERN_ERR "Dump taken by platform is incomplete\n");
>   		return -EINVAL;
>   	}
> +	if ((fdm_active->metadata_region.bytes_dumped !=
> +			fdm_active->metadata_region.source_len)) {
> +		printk(KERN_ERR "Dump taken by platform is incomplete\n");
> +		return -EINVAL;
> +	}
>
>   	/* Validate the fadump crash info header */
>   	fdh = __va(fw_dump.fadumphdr_addr);
> @@ -1082,7 +1148,7 @@ static int register_fadump(void)
>
>   	fadump_setup_crash_memory_ranges();
>
> -	addr = be64_to_cpu(fdm.rmr_region.destination_address) + be64_to_cpu(fdm.rmr_region.source_len);
> +	addr = get_fadump_metadata_base(&fdm);
>   	/* Initialize fadump crash info header. */
>   	addr = init_fadump_header(addr);
>   	vaddr = __va(addr);
> @@ -1153,7 +1219,7 @@ void fadump_cleanup(void)
>   	/* Invalidate the registration only if dump is active. */
>   	if (fw_dump.dump_active) {
>   		init_fadump_mem_struct(&fdm,
> -			be64_to_cpu(fdm_active->cpu_state_data.destination_address));
> +				get_fadump_metadata_base(fdm_active));
>   		fadump_invalidate_dump(&fdm);
>   	}
>   }
> @@ -1236,7 +1302,7 @@ static void fadump_invalidate_release_mem(void)
>   		return;
>   	}
>
> -	destination_address = be64_to_cpu(fdm_active->cpu_state_data.destination_address);
> +	destination_address = get_fadump_metadata_base(fdm_active);
>   	fadump_cleanup();
>   	mutex_unlock(&fadump_mutex);
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.
  2018-04-03 19:26   ` Hari Bathini
@ 2018-04-05  4:50     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-04-05  4:50 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth Narayan, kernelfans, Aneesh Kumar K.V, Nathan Fontenot,
	Anshuman Khandual, Srikar Dronamraju

On 04/04/2018 12:56 AM, Hari Bathini wrote:
> Mahesh, I think we should explicitly document that production and
> capture kernel
> versions should be same. For changes like below, older/newer production
> kernel vs
> capture kernel is bound to fail. Of course, production and capture
> kernel versions
> would be the same case usually but for the uninitiated, mentioning that
> explicitly
> in documentation would help..?

Yeah we could do that. In ideal case production and capture kernel will
be same. But yes, there can be cases where we may end up in older/newer
kernel scenario. My earlier versions had backward compatibility which I
broke in v3. I can fix that in v4. Thanks for catching it. Will also
update documentation mentioning working kernel combinations.

Thanks,
-Mahesh.

> 
> Thanks
> Hari
> 
> 
> On Monday 02 April 2018 11:59 AM, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Currently the metadata region that holds crash info structure and ELF
>> core
>> header is placed towards the end of reserved memory area. This patch
>> places
>> it at the beginning of the reserved memory area.
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/include/asm/fadump.h |    4 ++
>>   arch/powerpc/kernel/fadump.c      |   92
>> ++++++++++++++++++++++++++++++++-----
>>   2 files changed, 83 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/fadump.h
>> b/arch/powerpc/include/asm/fadump.h
>> index 5a23010af600..44b6ebfe9be6 100644
>> --- a/arch/powerpc/include/asm/fadump.h
>> +++ b/arch/powerpc/include/asm/fadump.h
>> @@ -61,6 +61,9 @@
>>   #define FADUMP_UNREGISTER    2
>>   #define FADUMP_INVALIDATE    3
>>
>> +/* Number of dump sections requested by kernel */
>> +#define FADUMP_NUM_SECTIONS    4
>> +
>>   /* Dump status flag */
>>   #define FADUMP_ERROR_FLAG    0x2000
>>
>> @@ -119,6 +122,7 @@ struct fadump_mem_struct {
>>       struct fadump_section        cpu_state_data;
>>       struct fadump_section        hpte_region;
>>       struct fadump_section        rmr_region;
>> +    struct fadump_section        metadata_region;
>>   };
>>
>>   /* Firmware-assisted dump configuration details. */
>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>> index 3c2c2688918f..80552fd7d5f8 100644
>> --- a/arch/powerpc/kernel/fadump.c
>> +++ b/arch/powerpc/kernel/fadump.c
>> @@ -188,17 +188,48 @@ static void fadump_show_config(void)
>>       pr_debug("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
>>   }
>>
>> +static unsigned long get_fadump_metadata_size(void)
>> +{
>> +    unsigned long size = 0;
>> +
>> +    /*
>> +     * If fadump is active then look into fdm_active to get metadata
>> +     * size set by previous kernel.
>> +     */
>> +    if (fw_dump.dump_active) {
>> +        size = fdm_active->cpu_state_data.destination_address -
>> +                fdm_active->metadata_region.source_address;
>> +        goto out;
>> +    }
>> +    size += sizeof(struct fadump_crash_info_header);
>> +    size += sizeof(struct elfhdr); /* ELF core header.*/
>> +    size += sizeof(struct elf_phdr); /* place holder for cpu notes */
>> +    /* Program headers for crash memory regions. */
>> +    size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) +
>> 2);
>> +
>> +    size = PAGE_ALIGN(size);
>> +out:
>> +    pr_debug("fadump Metadata size is %ld\n", size);
>> +    return size;
>> +}
>> +
>>   static unsigned long init_fadump_mem_struct(struct fadump_mem_struct
>> *fdm,
>>                   unsigned long addr)
>>   {
>> +    uint16_t num_sections = 0;
>> +    unsigned long metadata_base = 0;
>> +
>>       if (!fdm)
>>           return 0;
>>
>> +    /* Skip the fadump metadata area. */
>> +    metadata_base = addr;
>> +    addr += get_fadump_metadata_size();
>> +
>>       memset(fdm, 0, sizeof(struct fadump_mem_struct));
>>       addr = addr & PAGE_MASK;
>>
>>       fdm->header.dump_format_version = cpu_to_be32(0x00000001);
>> -    fdm->header.dump_num_sections = cpu_to_be16(3);
>>       fdm->header.dump_status_flag = 0;
>>       fdm->header.offset_first_dump_section =
>>           cpu_to_be32((u32)offsetof(struct fadump_mem_struct,
>> cpu_state_data));
>> @@ -222,6 +253,7 @@ static unsigned long init_fadump_mem_struct(struct
>> fadump_mem_struct *fdm,
>>       fdm->cpu_state_data.source_address = 0;
>>       fdm->cpu_state_data.source_len =
>> cpu_to_be64(fw_dump.cpu_state_data_size);
>>       fdm->cpu_state_data.destination_address = cpu_to_be64(addr);
>> +    num_sections++;
>>       addr += fw_dump.cpu_state_data_size;
>>
>>       /* hpte region section */
>> @@ -230,6 +262,7 @@ static unsigned long init_fadump_mem_struct(struct
>> fadump_mem_struct *fdm,
>>       fdm->hpte_region.source_address = 0;
>>       fdm->hpte_region.source_len =
>> cpu_to_be64(fw_dump.hpte_region_size);
>>       fdm->hpte_region.destination_address = cpu_to_be64(addr);
>> +    num_sections++;
>>       addr += fw_dump.hpte_region_size;
>>
>>       /* RMA region section */
>> @@ -238,8 +271,25 @@ static unsigned long
>> init_fadump_mem_struct(struct fadump_mem_struct *fdm,
>>       fdm->rmr_region.source_address = cpu_to_be64(RMA_START);
>>       fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size);
>>       fdm->rmr_region.destination_address = cpu_to_be64(addr);
>> +    num_sections++;
>>       addr += fw_dump.boot_memory_size;
>>
>> +    /*
>> +     * fadump metadata section.
>> +     * Add an entry with source len a zero. We are only intereseted in
>> +     * source address which will help us to detect the location of
>> +     * metadata area where faump header and elf core header is placed.
>> +     */
>> +    fdm->metadata_region.request_flag =
>> cpu_to_be32(FADUMP_REQUEST_FLAG);
>> +    fdm->metadata_region.source_data_type =
>> +                    cpu_to_be16(FADUMP_REAL_MODE_REGION);
>> +    fdm->metadata_region.source_address = cpu_to_be64(metadata_base);
>> +    fdm->metadata_region.source_len = 0;
>> +    fdm->metadata_region.destination_address = cpu_to_be64(addr);
>> +    num_sections++;
>> +
>> +    fdm->header.dump_num_sections = cpu_to_be16(num_sections);
>> +    BUILD_BUG_ON(num_sections != FADUMP_NUM_SECTIONS);
>>       return addr;
>>   }
>>
>> @@ -325,16 +375,17 @@ static unsigned long get_fadump_area_size(void)
>>       size += fw_dump.cpu_state_data_size;
>>       size += fw_dump.hpte_region_size;
>>       size += fw_dump.boot_memory_size;
>> -    size += sizeof(struct fadump_crash_info_header);
>> -    size += sizeof(struct elfhdr); /* ELF core header.*/
>> -    size += sizeof(struct elf_phdr); /* place holder for cpu notes */
>> -    /* Program headers for crash memory regions. */
>> -    size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) +
>> 2);
>> -
>> +    size += get_fadump_metadata_size();
>>       size = PAGE_ALIGN(size);
>>       return size;
>>   }
>>
>> +static inline unsigned long get_fadump_metadata_base(
>> +            const struct fadump_mem_struct *fdm)
>> +{
>> +    return be64_to_cpu(fdm->metadata_region.source_address);
>> +}
>> +
>>   int __init fadump_reserve_mem(void)
>>   {
>>       unsigned long base, size, memory_boundary;
>> @@ -395,9 +446,9 @@ int __init fadump_reserve_mem(void)
>>                   (unsigned long)(size >> 20),
>>                   (unsigned long)(base >> 20));
>>
>> -        fw_dump.fadumphdr_addr =
>> -               
>> be64_to_cpu(fdm_active->rmr_region.destination_address) +
>> -                be64_to_cpu(fdm_active->rmr_region.source_len);
>> +        pr_info("Number of kernel Dump sections: %d\n",
>> +            be16_to_cpu(fdm_active->header.dump_num_sections));
>> +        fw_dump.fadumphdr_addr = get_fadump_metadata_base(fdm_active);
>>           pr_debug("fadumphdr_addr = %p\n",
>>                   (void *) fw_dump.fadumphdr_addr);
>>       } else {
>> @@ -803,14 +854,24 @@ static int __init fadump_build_cpu_notes(const
>> struct fadump_mem_struct *fdm)
>>   static int __init process_fadump(const struct fadump_mem_struct
>> *fdm_active)
>>   {
>>       struct fadump_crash_info_header *fdh;
>> +    uint16_t num_sections = 0;
>>       int rc = 0;
>>
>>       if (!fdm_active || !fw_dump.fadumphdr_addr)
>>           return -EINVAL;
>>
>> +    /* Check if platform has honored all 4 dump sections requested. */
>> +    num_sections = be16_to_cpu(fdm_active->header.dump_num_sections);
>> +    if (num_sections < FADUMP_NUM_SECTIONS) {
>> +        printk(KERN_ERR "Dump taken by platform does not "
>> +                "contain all requested sections\n");
>> +        return -EINVAL;
>> +    }
>> +
>>       /* Check if the dump data is valid. */
>>       if ((be16_to_cpu(fdm_active->header.dump_status_flag) ==
>> FADUMP_ERROR_FLAG) ||
>>               (fdm_active->cpu_state_data.error_flags != 0) ||
>> +            (fdm_active->metadata_region.error_flags != 0) ||
>>               (fdm_active->rmr_region.error_flags != 0)) {
>>           printk(KERN_ERR "Dump taken by platform is not valid\n");
>>           return -EINVAL;
>> @@ -821,6 +882,11 @@ static int __init process_fadump(const struct
>> fadump_mem_struct *fdm_active)
>>           printk(KERN_ERR "Dump taken by platform is incomplete\n");
>>           return -EINVAL;
>>       }
>> +    if ((fdm_active->metadata_region.bytes_dumped !=
>> +            fdm_active->metadata_region.source_len)) {
>> +        printk(KERN_ERR "Dump taken by platform is incomplete\n");
>> +        return -EINVAL;
>> +    }
>>
>>       /* Validate the fadump crash info header */
>>       fdh = __va(fw_dump.fadumphdr_addr);
>> @@ -1082,7 +1148,7 @@ static int register_fadump(void)
>>
>>       fadump_setup_crash_memory_ranges();
>>
>> -    addr = be64_to_cpu(fdm.rmr_region.destination_address) +
>> be64_to_cpu(fdm.rmr_region.source_len);
>> +    addr = get_fadump_metadata_base(&fdm);
>>       /* Initialize fadump crash info header. */
>>       addr = init_fadump_header(addr);
>>       vaddr = __va(addr);
>> @@ -1153,7 +1219,7 @@ void fadump_cleanup(void)
>>       /* Invalidate the registration only if dump is active. */
>>       if (fw_dump.dump_active) {
>>           init_fadump_mem_struct(&fdm,
>> -           
>> be64_to_cpu(fdm_active->cpu_state_data.destination_address));
>> +                get_fadump_metadata_base(fdm_active));
>>           fadump_invalidate_dump(&fdm);
>>       }
>>   }
>> @@ -1236,7 +1302,7 @@ static void fadump_invalidate_release_mem(void)
>>           return;
>>       }
>>
>> -    destination_address =
>> be64_to_cpu(fdm_active->cpu_state_data.destination_address);
>> +    destination_address = get_fadump_metadata_base(fdm_active);
>>       fadump_cleanup();
>>       mutex_unlock(&fadump_mutex);
>>
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-04-05  4:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-02  6:29 [PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
2018-04-02  6:29 ` [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
2018-04-03 19:26   ` Hari Bathini
2018-04-05  4:50     ` Mahesh Jagannath Salgaonkar
2018-04-02  6:30 ` [PATCH v3 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement Mahesh J Salgaonkar
2018-04-02  6:30 ` [PATCH v3 3/7] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
2018-04-02  6:30 ` [PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel Mahesh J Salgaonkar
2018-04-03  9:51   ` Hari Bathini
2018-04-03 11:39     ` Mahesh Jagannath Salgaonkar
2018-04-02  6:30 ` [PATCH v3 5/7] powerpc/fadump: throw proper error message on fadump registration failure Mahesh J Salgaonkar
2018-04-02  6:30 ` [PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar
2018-04-03  3:18   ` Pingfan Liu
2018-04-03 11:37     ` Mahesh Jagannath Salgaonkar
2018-04-02  6:30 ` [PATCH v3 7/7] powerpc/fadump: un-register fadump on kexec path Mahesh J Salgaonkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.