All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump.
@ 2018-04-20  5:04 Mahesh J Salgaonkar
  2018-04-20  5:04 ` [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-20  5:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. This reserved
memory is used for saving the contents of old crashed kernel's memory before
fadump capture kernel uses old kernel's memory area to boot. However, This
reserved memory area stays unused until system crash and isn't available
for production kernel to use.

Instead of setting aside a significant chunk of memory that nobody can use,
take advantage Linux kernel's Contiguous Memory Allocator (CMA) feature,
to reserve a significant chunk of memory that the kernel is prevented from
using , but applications are free to use it.

Patch 1 moves the metadata region to the start of the reserved area for easy
handling/detection of metadata region in second kernel.
Patch 4 implements the usage of CMA region to allow production kernel to
use that memory for applications usage, making fadump reservationless.
Patch 3, 6 and 7 fixes various fadump issues and bugs.

Changes in V4:
- patch 1: Make fadump compatible irrespective of kernel versions.
- patch 4: moved out of the series and been posted seperatly at
  http://patchwork.ozlabs.org/patch/896716/
- Documentation update about CMA reservation.

Changes in V3:
- patch 1 & 2: move metadata region and documentation update.
- patch 7: Un-register the faudmp on kexec path

---

Mahesh Salgaonkar (7):
      powerpc/fadump: Move the metadata region to start of the reserved area.
      powerpc/fadump: Update documentation to reflect the metadata region movement.
      powerpc/fadump: un-register fadump on kexec path.
      powerpc/fadump: Reservationless firmware assisted dump
      powerpc/fadump: Update documentation to reflect CMA reservation.
      powerpc/fadump: throw proper error message on fadump registration failure.
      powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.


 Documentation/powerpc/firmware-assisted-dump.txt |   43 ++-
 arch/powerpc/include/asm/fadump.h                |   11 +
 arch/powerpc/kernel/fadump.c                     |  343 +++++++++++++++++++---
 arch/powerpc/platforms/pseries/hotplug-memory.c  |    7 
 4 files changed, 343 insertions(+), 61 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.
  2018-04-20  5:04 [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
@ 2018-04-20  5:04 ` Mahesh J Salgaonkar
  2018-04-22  1:58   ` Nicholas Piggin
  2018-04-20  5:04 ` [PATCH v4 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement Mahesh J Salgaonkar
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-20  5:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Currently the metadata region that holds crash info structure and ELF core
header is placed towards the end of reserved memory area. This patch places
it at the beginning of the reserved memory area. It also introduces
additional dump section called metadata section to communicate location
of metadata region to 2nd kernel. This patch also maintains the
compatibility between production/capture kernels irrespective of their
kernel versions. Both combination older/newer and newer/older works fine.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |    9 ++
 arch/powerpc/kernel/fadump.c      |  177 ++++++++++++++++++++++++++++++++-----
 2 files changed, 160 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 5a23010af600..03d5140a6f0d 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -53,6 +53,11 @@
 #define FADUMP_HPTE_REGION	0x0002
 #define FADUMP_REAL_MODE_REGION	0x0011
 
+/* User define dump sections.
+ * Per PAPR: 0x0100 - 0xFFFF OS defined source types
+ */
+#define FADUMP_METADATA_REGION	0x0100
+
 /* Dump request flag */
 #define FADUMP_REQUEST_FLAG	0x00000001
 
@@ -61,6 +66,9 @@
 #define FADUMP_UNREGISTER	2
 #define FADUMP_INVALIDATE	3
 
+/* Number of dump sections requested by kernel */
+#define FADUMP_NUM_SECTIONS	4
+
 /* Dump status flag */
 #define FADUMP_ERROR_FLAG	0x2000
 
@@ -119,6 +127,7 @@ struct fadump_mem_struct {
 	struct fadump_section		cpu_state_data;
 	struct fadump_section		hpte_region;
 	struct fadump_section		rmr_region;
+	struct fadump_section		metadata_region;
 };
 
 /* Firmware-assisted dump configuration details. */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 8ceabef40d3d..43bfa535d0ea 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -188,17 +188,40 @@ static void fadump_show_config(void)
 	pr_debug("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
 }
 
+static unsigned long get_fadump_metadata_size(void)
+{
+	unsigned long size = 0;
+
+	size += sizeof(struct fadump_crash_info_header);
+	size += sizeof(struct elfhdr); /* ELF core header.*/
+	size += sizeof(struct elf_phdr); /* place holder for cpu notes */
+	/* Program headers for crash memory regions. */
+	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
+
+	size = PAGE_ALIGN(size);
+	pr_debug("fadump Metadata size is %ld\n", size);
+	return size;
+}
+
 static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 				unsigned long addr)
 {
+	uint16_t num_sections = 0;
+	unsigned long metadata_base = 0;
+	unsigned long metadata_size = 0;
+
 	if (!fdm)
 		return 0;
 
+	/* Skip the fadump metadata area. */
+	metadata_base = addr;
+	metadata_size = get_fadump_metadata_size();
+	addr += metadata_size;
+
 	memset(fdm, 0, sizeof(struct fadump_mem_struct));
 	addr = addr & PAGE_MASK;
 
 	fdm->header.dump_format_version = cpu_to_be32(0x00000001);
-	fdm->header.dump_num_sections = cpu_to_be16(3);
 	fdm->header.dump_status_flag = 0;
 	fdm->header.offset_first_dump_section =
 		cpu_to_be32((u32)offsetof(struct fadump_mem_struct, cpu_state_data));
@@ -222,6 +245,7 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 	fdm->cpu_state_data.source_address = 0;
 	fdm->cpu_state_data.source_len = cpu_to_be64(fw_dump.cpu_state_data_size);
 	fdm->cpu_state_data.destination_address = cpu_to_be64(addr);
+	num_sections++;
 	addr += fw_dump.cpu_state_data_size;
 
 	/* hpte region section */
@@ -230,6 +254,7 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 	fdm->hpte_region.source_address = 0;
 	fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size);
 	fdm->hpte_region.destination_address = cpu_to_be64(addr);
+	num_sections++;
 	addr += fw_dump.hpte_region_size;
 
 	/* RMA region section */
@@ -238,8 +263,27 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 	fdm->rmr_region.source_address = cpu_to_be64(RMA_START);
 	fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size);
 	fdm->rmr_region.destination_address = cpu_to_be64(addr);
+	num_sections++;
 	addr += fw_dump.boot_memory_size;
 
+	/*
+	 * fadump metadata section.
+	 * This section will help 2nd kernel to locate the metadata base
+	 * looking at source_address. This entry will also copy the metadata
+	 * content after the RMR region dump data, so that old kernels that
+	 * does not support this section can still find the metadata to
+	 * process fadump.
+	 */
+	fdm->metadata_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG);
+	fdm->metadata_region.source_data_type =
+					cpu_to_be16(FADUMP_METADATA_REGION);
+	fdm->metadata_region.source_address = cpu_to_be64(metadata_base);
+	fdm->metadata_region.source_len = cpu_to_be64(metadata_size);
+	fdm->metadata_region.destination_address = cpu_to_be64(addr);
+	num_sections++;
+
+	fdm->header.dump_num_sections = cpu_to_be16(num_sections);
+	BUILD_BUG_ON(num_sections != FADUMP_NUM_SECTIONS);
 	return addr;
 }
 
@@ -325,12 +369,12 @@ static unsigned long get_fadump_area_size(void)
 	size += fw_dump.cpu_state_data_size;
 	size += fw_dump.hpte_region_size;
 	size += fw_dump.boot_memory_size;
-	size += sizeof(struct fadump_crash_info_header);
-	size += sizeof(struct elfhdr); /* ELF core header.*/
-	size += sizeof(struct elf_phdr); /* place holder for cpu notes */
-	/* Program headers for crash memory regions. */
-	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
-
+	size += get_fadump_metadata_size();
+	/* Add it twice. We will ask platform to move metadata contents
+	 * at rmr_region.destination_address + source_len, so that old
+	 * kernels also can get metadata to process fadump successfully.
+	 */
+	size += get_fadump_metadata_size();
 	size = PAGE_ALIGN(size);
 	return size;
 }
@@ -355,6 +399,35 @@ static void __init fadump_reserve_crash_area(unsigned long base,
 	}
 }
 
+static inline unsigned long get_fadump_metadata_base(
+			const struct fadump_mem_struct *fdm)
+{
+	const struct fadump_section *dump_section;
+	uint16_t num_sections;
+
+	/* Scan through all dump sections to find out metadata section. */
+	dump_section = &fdm->cpu_state_data;
+	num_sections = be16_to_cpu(fdm->header.dump_num_sections);
+
+	while (num_sections--) {
+		uint16_t type = be16_to_cpu(dump_section->source_data_type);
+		if (type == FADUMP_METADATA_REGION)
+			return be64_to_cpu(dump_section->source_address);
+
+		dump_section++;
+	}
+
+	/*
+	 * Support the older kernel which does not contain metadata
+	 * dump section. We end up here only when fadump is active and
+	 * the previously crashed kernel is old kernel which had metadata
+	 * section at the top of the reserved memory.
+	 */
+	BUG_ON(fw_dump.dump_active == 0);
+	return be64_to_cpu(fdm->rmr_region.destination_address) +
+				be64_to_cpu(fdm->rmr_region.source_len);
+}
+
 int __init fadump_reserve_mem(void)
 {
 	unsigned long base, size, memory_boundary;
@@ -420,9 +493,9 @@ int __init fadump_reserve_mem(void)
 		size = memory_boundary - base;
 		fadump_reserve_crash_area(base, size);
 
-		fw_dump.fadumphdr_addr =
-				be64_to_cpu(fdm_active->rmr_region.destination_address) +
-				be64_to_cpu(fdm_active->rmr_region.source_len);
+		pr_info("Number of kernel Dump sections: %d\n",
+			be16_to_cpu(fdm_active->header.dump_num_sections));
+		fw_dump.fadumphdr_addr = get_fadump_metadata_base(fdm_active);
 		pr_debug("fadumphdr_addr = %p\n",
 				(void *) fw_dump.fadumphdr_addr);
 	} else {
@@ -821,6 +894,38 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
 
 }
 
+static int validate_dump(const struct fadump_mem_struct *fdm_active)
+{
+	const struct fadump_section *dump_section;
+	uint16_t num_sections = 0;
+	int rc = 0;
+
+	num_sections = be16_to_cpu(fdm_active->header.dump_num_sections);
+	dump_section = &fdm_active->cpu_state_data;
+
+	if ((be16_to_cpu(fdm_active->header.dump_status_flag)
+						== FADUMP_ERROR_FLAG)) {
+		pr_err("Error occured while taking a dump by platform\n");
+		rc = -EINVAL;
+	}
+
+	/* Print Section type and error flags for failed dump section. */
+	while (num_sections--) {
+		if (dump_section->error_flags != 0 ||
+		(dump_section->source_len != dump_section->bytes_dumped)) {
+			pr_err("Section: %04x, Error flag: %04x, "
+				"Bytes Req: %llx, Bytes Dumped: %llx\n",
+				be16_to_cpu(dump_section->source_data_type),
+				be16_to_cpu(dump_section->error_flags),
+				be64_to_cpu(dump_section->source_len),
+				be64_to_cpu(dump_section->bytes_dumped));
+			rc = -EINVAL;
+		}
+		dump_section++;
+	}
+	return rc;
+}
+
 /*
  * Validate and process the dump data stored by firmware before exporting
  * it through '/proc/vmcore'.
@@ -834,17 +939,10 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 		return -EINVAL;
 
 	/* Check if the dump data is valid. */
-	if ((be16_to_cpu(fdm_active->header.dump_status_flag) == FADUMP_ERROR_FLAG) ||
-			(fdm_active->cpu_state_data.error_flags != 0) ||
-			(fdm_active->rmr_region.error_flags != 0)) {
+	rc = validate_dump(fdm_active);
+	if (rc) {
 		printk(KERN_ERR "Dump taken by platform is not valid\n");
-		return -EINVAL;
-	}
-	if ((fdm_active->rmr_region.bytes_dumped !=
-			fdm_active->rmr_region.source_len) ||
-			!fdm_active->cpu_state_data.bytes_dumped) {
-		printk(KERN_ERR "Dump taken by platform is incomplete\n");
-		return -EINVAL;
+		return rc;
 	}
 
 	/* Validate the fadump crash info header */
@@ -1107,7 +1205,7 @@ static int register_fadump(void)
 
 	fadump_setup_crash_memory_ranges();
 
-	addr = be64_to_cpu(fdm.rmr_region.destination_address) + be64_to_cpu(fdm.rmr_region.source_len);
+	addr = be64_to_cpu(fdm.metadata_region.source_address);
 	/* Initialize fadump crash info header. */
 	addr = init_fadump_header(addr);
 	vaddr = __va(addr);
@@ -1146,7 +1244,7 @@ static int fadump_unregister_dump(struct fadump_mem_struct *fdm)
 	return 0;
 }
 
-static int fadump_invalidate_dump(struct fadump_mem_struct *fdm)
+static int fadump_invalidate_dump(const struct fadump_mem_struct *fdm)
 {
 	int rc = 0;
 	unsigned int wait_time;
@@ -1177,9 +1275,7 @@ void fadump_cleanup(void)
 {
 	/* Invalidate the registration only if dump is active. */
 	if (fw_dump.dump_active) {
-		init_fadump_mem_struct(&fdm,
-			be64_to_cpu(fdm_active->cpu_state_data.destination_address));
-		fadump_invalidate_dump(&fdm);
+		fadump_invalidate_dump(fdm_active);
 	}
 }
 
@@ -1250,6 +1346,35 @@ static void fadump_release_memory(unsigned long begin, unsigned long end)
 		fadump_release_reserved_area(begin, end);
 }
 
+static inline unsigned long get_fadump_prev_mstart(
+			const struct fadump_mem_struct *fdm)
+{
+	const struct fadump_section *dump_section;
+	uint16_t num_sections;
+
+	/* This function is called only when fadump is active. */
+	BUG_ON(fw_dump.dump_active == 0);
+
+	/* Scan through all dump sections to find out metadata section. */
+	dump_section = &fdm->cpu_state_data;
+	num_sections = be16_to_cpu(fdm->header.dump_num_sections);
+
+	while (num_sections--) {
+		uint16_t type = be16_to_cpu(dump_section->source_data_type);
+		if (type == FADUMP_METADATA_REGION)
+			return be64_to_cpu(dump_section->source_address);
+
+		dump_section++;
+	}
+
+	/*
+	 * Support the older kernel which does not contain metadata
+	 * dump section. For older kernel reserved memory start is
+	 * at cpu_state_data.destination_address.
+	 */
+	return be64_to_cpu(fdm->cpu_state_data.destination_address);
+}
+
 static void fadump_invalidate_release_mem(void)
 {
 	unsigned long reserved_area_start, reserved_area_end;
@@ -1261,7 +1386,7 @@ static void fadump_invalidate_release_mem(void)
 		return;
 	}
 
-	destination_address = be64_to_cpu(fdm_active->cpu_state_data.destination_address);
+	destination_address = get_fadump_prev_mstart(fdm_active);
 	fadump_cleanup();
 	mutex_unlock(&fadump_mutex);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement.
  2018-04-20  5:04 [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
  2018-04-20  5:04 ` [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
@ 2018-04-20  5:04 ` Mahesh J Salgaonkar
  2018-04-20  5:04 ` [PATCH v4 3/7] powerpc/fadump: un-register fadump on kexec path Mahesh J Salgaonkar
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-20  5:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Metadata region that holds fadump header and ELF header is now placed at
the start of reserved memory area. Update the documentation accordingly

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.txt |   33 ++++++++++++++--------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
index bdd344aa18d9..30b32f879219 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -115,17 +115,26 @@ be kept permanently reserved, so that it can act as a receptacle
 for a copy of the boot memory content in addition to CPU state
 and HPTE region, in the case a crash does occur.
 
+The first kernel, during fadump registration, prepares ELF core header
+that contains necessary information for the coredump of the 1st kernel
+which includes its physical memory layout. This ELF header and some more
+additional data is stored in the area called metadata region at the start
+of the reserved memory area. The implementation maintains the compatibility
+between production/capture kernels irrespective of their kernel versions.
+Both combination older/newer and newer/older works fine.
+
   o Memory Reservation during first kernel
 
   Low memory                                         Top of memory
   0      boot memory size                                       |
   |           |                |<--Reserved dump area -->|      |
   V           V                |   Permanent Reservation |      V
-  +-----------+----------/ /---+---+----+-----------+----+------+
-  |           |                |CPU|HPTE|  DUMP     |ELF |      |
-  +-----------+----------/ /---+---+----+-----------+----+------+
-        |                                           ^
-        |                                           |
+  +-----------+----------/ /---+----+---+----+-----------+------+
+  |           |                |ELF |CPU|HPTE|  DUMP     |      |
+  +-----------+---------/ /----+----+---+----+-----------+------+
+        |                        ^                  ^
+        |                        |                  |
+        |                metadata region            |
         \                                           /
          -------------------------------------------
           Boot memory content gets transferred to
@@ -137,14 +146,14 @@ and HPTE region, in the case a crash does occur.
 
   Low memory                                        Top of memory
   0      boot memory size                                       |
-  |           |<------------- Reserved dump area ----------- -->|
+  |           |<------------- Reserved dump area -------------->|
   V           V                                                 V
-  +-----------+----------/ /---+---+----+-----------+----+------+
-  |           |                |CPU|HPTE|  DUMP     |ELF |      |
-  +-----------+----------/ /---+---+----+-----------+----+------+
-        |                                              |
-        V                                              V
-   Used by second                                /proc/vmcore
+  +-----------+----------/ /---+----+----+---------------+------+
+  |           |                |ELF |CPU|HPTE|  DUMP     |      |
+  +-----------+----------/ /---+----+----+---------------+------+
+        |                         |
+        V                         V
+   Used by second             /proc/vmcore
    kernel to boot
                    Fig. 2
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/7] powerpc/fadump: un-register fadump on kexec path.
  2018-04-20  5:04 [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
  2018-04-20  5:04 ` [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
  2018-04-20  5:04 ` [PATCH v4 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement Mahesh J Salgaonkar
@ 2018-04-20  5:04 ` Mahesh J Salgaonkar
  2018-04-22  1:58   ` Nicholas Piggin
  2018-04-20  5:04 ` [PATCH v4 4/7] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-20  5:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

otherwise the fadump registration in new kexec-ed kernel complains that
fadump is already registered. This makes new kernel to continue using
fadump registered by previous kernel which may lead to invalid vmcore
generation. Hence this patch fixes this issue by un-registering fadump
in fadump_cleanup() which is called during kexec path so that new kernel
can register fadump with new valid values.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 43bfa535d0ea..16b3e8c5cae0 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1276,6 +1276,9 @@ void fadump_cleanup(void)
 	/* Invalidate the registration only if dump is active. */
 	if (fw_dump.dump_active) {
 		fadump_invalidate_dump(fdm_active);
+	} else if (fw_dump.dump_registered) {
+		/* Un-register Firmware-assisted dump if it was registered. */
+		fadump_unregister_dump(&fdm);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/7] powerpc/fadump: Reservationless firmware assisted dump
  2018-04-20  5:04 [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
                   ` (2 preceding siblings ...)
  2018-04-20  5:04 ` [PATCH v4 3/7] powerpc/fadump: un-register fadump on kexec path Mahesh J Salgaonkar
@ 2018-04-20  5:04 ` Mahesh J Salgaonkar
  2018-04-23 12:53   ` Hari Bathini
  2018-04-20  5:04 ` [PATCH v4 5/7] powerpc/fadump: Update documentation to reflect CMA reservation Mahesh J Salgaonkar
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-20  5:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth Narayan, Srikar Dronamraju, kernelfans,
	Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Hari Bathini,
	Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.

In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.

In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".

We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump.

Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it. With this fadump will still be able
to capture all of the kernel memory and most of the user space memory
except the user pages that were present in CMA region.

Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7557         193        6822          12         541        6725
Swap:          4095           0        4095

With this patch:
[root@zzxx-yy10 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           8133         194        7464          12         475        7338
Swap:          4095           0        4095

Changes made here are completely transparent to how fadump has
traditionally worked.

Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.

TODO:
- Handle case where CMA reservation spans nodes.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |  120 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 103 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 16b3e8c5cae0..7f76924ab190 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -34,6 +34,7 @@
 #include <linux/crash_dump.h>
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
+#include <linux/cma.h>
 
 #include <asm/debugfs.h>
 #include <asm/page.h>
@@ -45,11 +46,57 @@
 static struct fw_dump fw_dump;
 static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;
+static struct cma *fadump_cma;
 
 static DEFINE_MUTEX(fadump_mutex);
 struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
 int crash_mem_ranges;
 
+/*
+ * fadump_cma_reserve() - reserve area for fadump memory reservation
+ *
+ * This function reserves memory from early allocator. It should be
+ * called by arch specific code once the memblock allocator
+ * has been activated.
+ */
+int __init fadump_cma_reserve(void)
+{
+	unsigned long long base, size;
+	int rc;
+
+	if (!fw_dump.fadump_enabled)
+		return 0;
+
+	base = fw_dump.reserve_dump_area_start;
+	size = fw_dump.reserve_dump_area_size;
+	pr_debug("Original reserve area base %ld, size %ld\n",
+				(unsigned long)base >> 20,
+				(unsigned long)size >> 20);
+	if (!size)
+		return 0;
+
+	rc = cma_declare_contiguous(base, size, 0, 0, 0, false,
+						"fadump_cma", &fadump_cma);
+	if (rc) {
+		printk(KERN_ERR "fadump: Failed to reserve cma area for "
+				"firmware-assisted dump, %d\n", rc);
+		fw_dump.reserve_dump_area_size = 0;
+		return 0;
+	}
+	/*
+	 * So we now have cma area reserved for fadump. base may be different
+	 * from what we requested.
+	 */
+	fw_dump.reserve_dump_area_start = cma_get_base(fadump_cma);
+	fw_dump.reserve_dump_area_size = cma_get_size(fadump_cma);
+	printk("Reserved %ldMB cma area at %ldMB for firmware-assisted dump "
+			"(System RAM: %ldMB)\n",
+			cma_get_size(fadump_cma) >> 20,
+			(unsigned long)cma_get_base(fadump_cma) >> 20,
+			(unsigned long)(memblock_phys_mem_size() >> 20));
+	return 1;
+}
+
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node,
 			const char *uname, int depth, void *data)
@@ -496,8 +543,9 @@ int __init fadump_reserve_mem(void)
 		pr_info("Number of kernel Dump sections: %d\n",
 			be16_to_cpu(fdm_active->header.dump_num_sections));
 		fw_dump.fadumphdr_addr = get_fadump_metadata_base(fdm_active);
-		pr_debug("fadumphdr_addr = %p\n",
-				(void *) fw_dump.fadumphdr_addr);
+		pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr);
+		fw_dump.reserve_dump_area_start = base;
+		fw_dump.reserve_dump_area_size = size;
 	} else {
 		size = get_fadump_area_size();
 
@@ -514,21 +562,10 @@ int __init fadump_reserve_mem(void)
 			    !memblock_is_region_reserved(base, size))
 				break;
 		}
-		if ((base > (memory_boundary - size)) ||
-		    memblock_reserve(base, size)) {
-			pr_err("Failed to reserve memory\n");
-			return 0;
-		}
-
-		pr_info("Reserved %ldMB of memory at %ldMB for firmware-"
-			"assisted dump (System RAM: %ldMB)\n",
-			(unsigned long)(size >> 20),
-			(unsigned long)(base >> 20),
-			(unsigned long)(memblock_phys_mem_size() >> 20));
+		fw_dump.reserve_dump_area_start = base;
+		fw_dump.reserve_dump_area_size = size;
+		return fadump_cma_reserve();
 	}
-
-	fw_dump.reserve_dump_area_start = base;
-	fw_dump.reserve_dump_area_size = size;
 	return 1;
 }
 
@@ -1191,6 +1228,39 @@ static unsigned long init_fadump_header(unsigned long addr)
 	return addr;
 }
 
+static unsigned long allocate_metadata_area(void)
+{
+	int nr_pages;
+	unsigned long size;
+	struct page *page = NULL;
+
+	/*
+	 * Check if fadump cma region is activated.
+	 * fadump_cma->count == 0 means cma activation has failed. This means
+	 * that the fadump reserved memory now will not be visible/available
+	 * for user applications to use. It will be as good as old fadump
+	 * behaviour of blocking this memory chunk from production system
+	 * use. CMA activation failure does not mean that fadump will not
+	 * work. Will continue to setup fadump.
+	 */
+	if (!fadump_cma || !cma_get_size(fadump_cma)) {
+		pr_warn("fadump cma region activation failed.\n");
+		return 0;
+	}
+
+	size = get_fadump_metadata_size();
+	nr_pages = ALIGN(size, PAGE_SIZE) >> PAGE_SHIFT;
+	pr_info("Fadump metadata size = %ld (nr_pages = %d)\n", size, nr_pages);
+
+	page = cma_alloc(fadump_cma, nr_pages, 0, GFP_KERNEL);
+	if (page) {
+		pr_debug("Allocated fadump metadata area at %ldMB (cma)\n",
+				(unsigned long)page_to_phys(page) >> 20);
+		return page_to_phys(page);
+	}
+	return 0;
+}
+
 static int register_fadump(void)
 {
 	unsigned long addr;
@@ -1643,8 +1713,24 @@ int __init setup_fadump(void)
 			fadump_invalidate_release_mem();
 	}
 	/* Initialize the kernel dump memory structure for FAD registration. */
-	else if (fw_dump.reserve_dump_area_size)
+	else if (fw_dump.reserve_dump_area_size) {
+		/*
+		 * By this time cma area has been activated. Allocate memory
+		 * for metadata from fadump cma region. Since this is very
+		 * early during boot we are guaranteed to get metadata cma
+		 * allocation at address fw_dump.reserve_dump_area_start.
+		 *
+		 * During fadump registration, metadata region is used
+		 * to setup fadump header and ELF core header. We don't want
+		 * this region to be touched by anyone. Allocating metadata
+		 * region memory from fadump cma will make sure that this
+		 * region will not given to any user space application.
+		 * However the rest of the fadump cma memory is still free
+		 * to be used by user applications.
+		 */
+		allocate_metadata_area();
 		init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+	}
 	fadump_init_files();
 
 	return 1;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/7] powerpc/fadump: Update documentation to reflect CMA reservation.
  2018-04-20  5:04 [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
                   ` (3 preceding siblings ...)
  2018-04-20  5:04 ` [PATCH v4 4/7] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
@ 2018-04-20  5:04 ` Mahesh J Salgaonkar
  2018-04-20  5:05 ` [PATCH v4 6/7] powerpc/fadump: throw proper error message on fadump registration failure Mahesh J Salgaonkar
  2018-04-20  5:05 ` [PATCH v4 7/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar
  6 siblings, 0 replies; 13+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-20  5:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.txt |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
index 30b32f879219..2ab13270b050 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -113,7 +113,15 @@ header, is usually reserved at an offset greater than boot memory
 size (see Fig. 1). This area is *not* released: this region will
 be kept permanently reserved, so that it can act as a receptacle
 for a copy of the boot memory content in addition to CPU state
-and HPTE region, in the case a crash does occur.
+and HPTE region, in the case a crash does occur. Since this reserved
+memory area is used only after the system crash, there is no point in
+blocking this significant chunk of memory from production kernel.
+Hence, the implementation uses the Linux kernel's Contiguous Memory
+Allocator (CMA) for memory reservation. With CMA reservation this memory
+will be available for applications to use it, while kernel is prevented
+from using it. With this fadump will still be able to capture all of
+the kernel memory and most of the user space memory except the user
+pages that were present in CMA region.
 
 The first kernel, during fadump registration, prepares ELF core header
 that contains necessary information for the coredump of the 1st kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/7] powerpc/fadump: throw proper error message on fadump registration failure.
  2018-04-20  5:04 [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
                   ` (4 preceding siblings ...)
  2018-04-20  5:04 ` [PATCH v4 5/7] powerpc/fadump: Update documentation to reflect CMA reservation Mahesh J Salgaonkar
@ 2018-04-20  5:05 ` Mahesh J Salgaonkar
  2018-04-20  5:05 ` [PATCH v4 7/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar
  6 siblings, 0 replies; 13+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-20  5:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

fadump fails to register when there are holes in reserved memory area.
This can happen if user has hot-removed a memory that falls in the fadump
reserved memory area. Throw a meaningful error message to the user in
such case.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 7f76924ab190..4884ab796f4a 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -216,6 +216,36 @@ static int is_boot_memory_area_contiguous(void)
 	return ret;
 }
 
+/*
+ * Returns 1, if there are no holes in reserved memory area,
+ * 0 otherwise.
+ */
+static int is_reserved_memory_area_contiguous(void)
+{
+	struct memblock_region *reg;
+	unsigned long start, end;
+	unsigned long d_start = fw_dump.reserve_dump_area_start;
+	unsigned long d_end = d_start + fw_dump.reserve_dump_area_size;
+	int ret = 0;
+
+	for_each_memblock(memory, reg) {
+		start = max(d_start, (unsigned long)reg->base);
+		end = min(d_end, (unsigned long)(reg->base + reg->size));
+		if (d_start < end) {
+			/* Memory hole from d_start to start */
+			if (start > d_start)
+				break;
+
+			if (end == d_end) {
+				ret = 1;
+				break;
+			}
+			d_start = end + 1;
+		}
+	}
+	return ret;
+}
+
 /* Print firmware assisted dump configurations for debugging purpose. */
 static void fadump_show_config(void)
 {
@@ -634,6 +664,9 @@ static int register_fw_dump(struct fadump_mem_struct *fdm)
 		if (!is_boot_memory_area_contiguous())
 			pr_err("Can't have holes in boot memory area while "
 			       "registering fadump\n");
+		else if (!is_reserved_memory_area_contiguous())
+			pr_err("Can't have holes in reserved memory area while"
+			       " registering fadump\n");
 
 		printk(KERN_ERR "Failed to register firmware-assisted kernel"
 			" dump. Parameter Error(%d).\n", rc);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 7/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
  2018-04-20  5:04 [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
                   ` (5 preceding siblings ...)
  2018-04-20  5:05 ` [PATCH v4 6/7] powerpc/fadump: throw proper error message on fadump registration failure Mahesh J Salgaonkar
@ 2018-04-20  5:05 ` Mahesh J Salgaonkar
  6 siblings, 0 replies; 13+ messages in thread
From: Mahesh J Salgaonkar @ 2018-04-20  5:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Srikar Dronamraju, kernelfans, Aneesh Kumar K.V, Ananth Narayan,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. But this memory area is currently not
protected from hot-remove operations. Hence, fadump service can fail to
re-register after the hot-remove operation, if hot-removed memory belongs
to fadump reserved region. To avoid this make sure that memory from fadump
reserved area is not hot-removable if fadump is registered.

However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h               |    2 +-
 arch/powerpc/kernel/fadump.c                    |   10 ++++++++--
 arch/powerpc/platforms/pseries/hotplug-memory.c |    7 +++++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 03d5140a6f0d..5c0a88821ae9 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -212,7 +212,7 @@ struct fad_crash_memory_ranges {
 	unsigned long long	size;
 };
 
-extern int is_fadump_boot_memory_area(u64 addr, ulong size);
+extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 4884ab796f4a..2fda1e4a6318 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -162,13 +162,19 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 
 /*
  * If fadump is registered, check if the memory provided
- * falls within boot memory area.
+ * falls within boot memory area and reserved memory area.
  */
-int is_fadump_boot_memory_area(u64 addr, ulong size)
+int is_fadump_memory_area(u64 addr, ulong size)
 {
+	u64 d_start = fw_dump.reserve_dump_area_start;
+	u64 d_end = d_start + fw_dump.reserve_dump_area_size;
+
 	if (!fw_dump.dump_registered)
 		return 0;
 
+	if (((addr + size) > d_start) && (addr <= d_end))
+		return 1;
+
 	return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..e4c658cda3a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -389,8 +389,11 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
 	phys_addr = lmb->base_addr;
 
 #ifdef CONFIG_FA_DUMP
-	/* Don't hot-remove memory that falls in fadump boot memory area */
-	if (is_fadump_boot_memory_area(phys_addr, block_sz))
+	/*
+	 * Don't hot-remove memory that falls in fadump boot memory area
+	 * and memory that is reserved for capturing old kernel memory.
+	 */
+	if (is_fadump_memory_area(phys_addr, block_sz))
 		return false;
 #endif
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.
  2018-04-20  5:04 ` [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
@ 2018-04-22  1:58   ` Nicholas Piggin
  2018-04-23  5:16     ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2018-04-22  1:58 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Ananth Narayan, kernelfans, Aneesh Kumar K.V,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual,
	Srikar Dronamraju

On Fri, 20 Apr 2018 10:34:18 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Currently the metadata region that holds crash info structure and ELF core
> header is placed towards the end of reserved memory area. This patch places
> it at the beginning of the reserved memory area. It also introduces
> additional dump section called metadata section to communicate location
> of metadata region to 2nd kernel. This patch also maintains the
> compatibility between production/capture kernels irrespective of their
> kernel versions. Both combination older/newer and newer/older works fine.

Trying to look at the patches it might help me if you document reasons
for why this change is made changelog, even if it may be obvious to
someone who knows the code better.

I thought you could include the documentation change in this patch as
well, but maybe that's a matter of preference.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/7] powerpc/fadump: un-register fadump on kexec path.
  2018-04-20  5:04 ` [PATCH v4 3/7] powerpc/fadump: un-register fadump on kexec path Mahesh J Salgaonkar
@ 2018-04-22  1:58   ` Nicholas Piggin
  2018-04-23  5:17     ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2018-04-22  1:58 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Ananth Narayan, kernelfans, Aneesh Kumar K.V,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual,
	Srikar Dronamraju

On Fri, 20 Apr 2018 10:34:35 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> otherwise the fadump registration in new kexec-ed kernel complains that
> fadump is already registered. This makes new kernel to continue using
> fadump registered by previous kernel which may lead to invalid vmcore
> generation. Hence this patch fixes this issue by un-registering fadump
> in fadump_cleanup() which is called during kexec path so that new kernel
> can register fadump with new valid values.

Is this a bug fix that should go to previous kernels as well?

> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/fadump.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 43bfa535d0ea..16b3e8c5cae0 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -1276,6 +1276,9 @@ void fadump_cleanup(void)
>  	/* Invalidate the registration only if dump is active. */
>  	if (fw_dump.dump_active) {
>  		fadump_invalidate_dump(fdm_active);
> +	} else if (fw_dump.dump_registered) {
> +		/* Un-register Firmware-assisted dump if it was registered. */
> +		fadump_unregister_dump(&fdm);
>  	}
>  }
>  
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.
  2018-04-22  1:58   ` Nicholas Piggin
@ 2018-04-23  5:16     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 13+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-04-23  5:16 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Ananth Narayan, kernelfans, Aneesh Kumar K.V,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual,
	Srikar Dronamraju

On 04/22/2018 07:28 AM, Nicholas Piggin wrote:
> On Fri, 20 Apr 2018 10:34:18 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Currently the metadata region that holds crash info structure and ELF core
>> header is placed towards the end of reserved memory area. This patch places
>> it at the beginning of the reserved memory area. It also introduces
>> additional dump section called metadata section to communicate location
>> of metadata region to 2nd kernel. This patch also maintains the
>> compatibility between production/capture kernels irrespective of their
>> kernel versions. Both combination older/newer and newer/older works fine.
> 
> Trying to look at the patches it might help me if you document reasons
> for why this change is made changelog, even if it may be obvious to
> someone who knows the code better.

Yeah, I should have mentioned that this patch provides the foundation
for CMA patch 4. With CMA reservation we now allocate metadata region
using cma_alloc() which always allocates metadata region at the start of
CMA reserved region. Earlier in v1, I had this change included along
with CMA reservation patch. But then to make things simpler for review I
did a logical split of movement of metadata region and CMA reservation
patch separately. I think I should order patch 1, 2 and 4 in a sequence
and Move patch3 to patch 1.

> 
> I thought you could include the documentation change in this patch as
> well, but maybe that's a matter of preference.

Yeah that's how I prefer it :-), but that just me. But if it helps in
review I can fold it into 1.

Thanks,
-Mahesh.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/7] powerpc/fadump: un-register fadump on kexec path.
  2018-04-22  1:58   ` Nicholas Piggin
@ 2018-04-23  5:17     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 13+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-04-23  5:17 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Ananth Narayan, kernelfans, Aneesh Kumar K.V,
	Hari Bathini, Nathan Fontenot, Anshuman Khandual,
	Srikar Dronamraju

On 04/22/2018 07:28 AM, Nicholas Piggin wrote:
> On Fri, 20 Apr 2018 10:34:35 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> otherwise the fadump registration in new kexec-ed kernel complains that
>> fadump is already registered. This makes new kernel to continue using
>> fadump registered by previous kernel which may lead to invalid vmcore
>> generation. Hence this patch fixes this issue by un-registering fadump
>> in fadump_cleanup() which is called during kexec path so that new kernel
>> can register fadump with new valid values.
> 
> Is this a bug fix that should go to previous kernels as well?

Yes.

> 
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/fadump.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>> index 43bfa535d0ea..16b3e8c5cae0 100644
>> --- a/arch/powerpc/kernel/fadump.c
>> +++ b/arch/powerpc/kernel/fadump.c
>> @@ -1276,6 +1276,9 @@ void fadump_cleanup(void)
>>  	/* Invalidate the registration only if dump is active. */
>>  	if (fw_dump.dump_active) {
>>  		fadump_invalidate_dump(fdm_active);
>> +	} else if (fw_dump.dump_registered) {
>> +		/* Un-register Firmware-assisted dump if it was registered. */
>> +		fadump_unregister_dump(&fdm);
>>  	}
>>  }
>>  
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/7] powerpc/fadump: Reservationless firmware assisted dump
  2018-04-20  5:04 ` [PATCH v4 4/7] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
@ 2018-04-23 12:53   ` Hari Bathini
  0 siblings, 0 replies; 13+ messages in thread
From: Hari Bathini @ 2018-04-23 12:53 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, linuxppc-dev
  Cc: Ananth Narayan, kernelfans, Aneesh Kumar K.V, Nathan Fontenot,
	Anshuman Khandual, Srikar Dronamraju



On Friday 20 April 2018 10:34 AM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> One of the primary issues with Firmware Assisted Dump (fadump) on Power
> is that it needs a large amount of memory to be reserved. On large
> systems with TeraBytes of memory, this reservation can be quite
> significant.
>
> In some cases, fadump fails if the memory reserved is insufficient, or
> if the reserved memory was DLPAR hot-removed.
>
> In the normal case, post reboot, the preserved memory is filtered to
> extract only relevant areas of interest using the makedumpfile tool.
> While the tool provides flexibility to determine what needs to be part
> of the dump and what memory to filter out, all supported distributions
> default this to "Capture only kernel data and nothing else".
>
> We take advantage of this default and the Linux kernel's Contiguous
> Memory Allocator (CMA) to fundamentally change the memory reservation
> model for fadump.
>
> Instead of setting aside a significant chunk of memory nobody can use,
> this patch uses CMA instead, to reserve a significant chunk of memory
> that the kernel is prevented from using (due to MIGRATE_CMA), but
> applications are free to use it. With this fadump will still be able
> to capture all of the kernel memory and most of the user space memory
> except the user pages that were present in CMA region.
>
> Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
> [root@zzxx-yy10 ~]# free -m
>                total        used        free      shared  buff/cache   available
> Mem:           7557         193        6822          12         541        6725
> Swap:          4095           0        4095
>
> With this patch:
> [root@zzxx-yy10 ~]# free -m
>                total        used        free      shared  buff/cache   available
> Mem:           8133         194        7464          12         475        7338
> Swap:          4095           0        4095
>
> Changes made here are completely transparent to how fadump has
> traditionally worked.
>
> Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
> CMA and its usage.
>
> TODO:
> - Handle case where CMA reservation spans nodes.
>
> Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>   arch/powerpc/kernel/fadump.c |  120 ++++++++++++++++++++++++++++++++++++------
>   1 file changed, 103 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 16b3e8c5cae0..7f76924ab190 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -34,6 +34,7 @@
>   #include <linux/crash_dump.h>
>   #include <linux/kobject.h>
>   #include <linux/sysfs.h>
> +#include <linux/cma.h>
>
>   #include <asm/debugfs.h>
>   #include <asm/page.h>
> @@ -45,11 +46,57 @@
>   static struct fw_dump fw_dump;
>   static struct fadump_mem_struct fdm;
>   static const struct fadump_mem_struct *fdm_active;
> +static struct cma *fadump_cma;
>
>   static DEFINE_MUTEX(fadump_mutex);
>   struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
>   int crash_mem_ranges;
>
> +/*
> + * fadump_cma_reserve() - reserve area for fadump memory reservation
> + *
> + * This function reserves memory from early allocator. It should be
> + * called by arch specific code once the memblock allocator
> + * has been activated.
> + */
> +int __init fadump_cma_reserve(void)
> +{
> +	unsigned long long base, size;
> +	int rc;
> +
> +	if (!fw_dump.fadump_enabled)
> +		return 0;
> +
> +	base = fw_dump.reserve_dump_area_start;
> +	size = fw_dump.reserve_dump_area_size;

Mahesh, How about moving sections around instead:

Old:
   1. cpu state data region
   2. hpte region
   3. real memory region

New:
   2. cpu state data region
   3. hpte region
   1. real memory region

and using only boot memory size for cma reserve. The other regions, 
crashinfo header
& elfcorehdrs can still use memblock_reserve.

This achieves two things. One, ensures we don't waste memory in alignment
as cma uses hugepage(16MB)/maxorder as default alignment (we need to 
ensure boot
memory size is aligned by hugepage(16MB)/maxorder though). Two, we don't 
have to
move around meta data from end to start (patch 1/7)

To differentiate the old and new section order, we can overload crash 
info magic
(FADUMPINF -> FADUMPIV2), I guess. That differentiation may be needed for
re-registering after dump capture..

> +	pr_debug("Original reserve area base %ld, size %ld\n",
> +				(unsigned long)base >> 20,
> +				(unsigned long)size >> 20);
> +	if (!size)
> +		return 0;
> +
> +	rc = cma_declare_contiguous(base, size, 0, 0, 0, false,
> +						"fadump_cma", &fadump_cma);

Compilation fails when CONFIG_CMA is not set. A fallback when CONFIG_CMA
is not set or dependency enforced for FA_DUMP config option seems to be 
missing..

Also, considering we already deduce the base by looking for holes in 
fadump code, we could
have a 'fixed' ('true' for 6th parameter) cma region? Again, we have to 
ensure CMA alignment
for boot memory size in fadump_calculate_reserve_size() for doing all 
this seamlessly..

> +	if (rc) {
> +		printk(KERN_ERR "fadump: Failed to reserve cma area for "
> +				"firmware-assisted dump, %d\n", rc);
> +		fw_dump.reserve_dump_area_size = 0;
> +		return 0;
> +	}
> +	/*
> +	 * So we now have cma area reserved for fadump. base may be different
> +	 * from what we requested.
> +	 */
> +	fw_dump.reserve_dump_area_start = cma_get_base(fadump_cma);
> +	fw_dump.reserve_dump_area_size = cma_get_size(fadump_cma);
> +	printk("Reserved %ldMB cma area at %ldMB for firmware-assisted dump "
> +			"(System RAM: %ldMB)\n",
> +			cma_get_size(fadump_cma) >> 20,
> +			(unsigned long)cma_get_base(fadump_cma) >> 20,
> +			(unsigned long)(memblock_phys_mem_size() >> 20));
> +	return 1;
> +}
> +
>   /* Scan the Firmware Assisted dump configuration details. */
>   int __init early_init_dt_scan_fw_dump(unsigned long node,
>   			const char *uname, int depth, void *data)
> @@ -496,8 +543,9 @@ int __init fadump_reserve_mem(void)
>   		pr_info("Number of kernel Dump sections: %d\n",
>   			be16_to_cpu(fdm_active->header.dump_num_sections));
>   		fw_dump.fadumphdr_addr = get_fadump_metadata_base(fdm_active);
> -		pr_debug("fadumphdr_addr = %p\n",
> -				(void *) fw_dump.fadumphdr_addr);
> +		pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr);
> +		fw_dump.reserve_dump_area_start = base;
> +		fw_dump.reserve_dump_area_size = size;
>   	} else {
>   		size = get_fadump_area_size();
>
> @@ -514,21 +562,10 @@ int __init fadump_reserve_mem(void)
>   			    !memblock_is_region_reserved(base, size))
>   				break;
>   		}
> -		if ((base > (memory_boundary - size)) ||
> -		    memblock_reserve(base, size)) {
> -			pr_err("Failed to reserve memory\n");
> -			return 0;
> -		}
> -
> -		pr_info("Reserved %ldMB of memory at %ldMB for firmware-"
> -			"assisted dump (System RAM: %ldMB)\n",
> -			(unsigned long)(size >> 20),
> -			(unsigned long)(base >> 20),
> -			(unsigned long)(memblock_phys_mem_size() >> 20));
> +		fw_dump.reserve_dump_area_start = base;
> +		fw_dump.reserve_dump_area_size = size;
> +		return fadump_cma_reserve();
>   	}
> -
> -	fw_dump.reserve_dump_area_start = base;
> -	fw_dump.reserve_dump_area_size = size;
>   	return 1;
>   }
>
> @@ -1191,6 +1228,39 @@ static unsigned long init_fadump_header(unsigned long addr)
>   	return addr;
>   }
>
> +static unsigned long allocate_metadata_area(void)
> +{
> +	int nr_pages;
> +	unsigned long size;
> +	struct page *page = NULL;
> +
> +	/*
> +	 * Check if fadump cma region is activated.
> +	 * fadump_cma->count == 0 means cma activation has failed. This means
> +	 * that the fadump reserved memory now will not be visible/available
> +	 * for user applications to use. It will be as good as old fadump
> +	 * behaviour of blocking this memory chunk from production system
> +	 * use. CMA activation failure does not mean that fadump will not
> +	 * work. Will continue to setup fadump.
> +	 */
> +	if (!fadump_cma || !cma_get_size(fadump_cma)) {
> +		pr_warn("fadump cma region activation failed.\n");
> +		return 0;
> +	}
> +
> +	size = get_fadump_metadata_size();
> +	nr_pages = ALIGN(size, PAGE_SIZE) >> PAGE_SHIFT;
> +	pr_info("Fadump metadata size = %ld (nr_pages = %d)\n", size, nr_pages);
> +
> +	page = cma_alloc(fadump_cma, nr_pages, 0, GFP_KERNEL);
> +	if (page) {
> +		pr_debug("Allocated fadump metadata area at %ldMB (cma)\n",
> +				(unsigned long)page_to_phys(page) >> 20);
> +		return page_to_phys(page);
> +	}
> +	return 0;
> +}
> +

We shouldn't be needing this function with the above mentioned change..

Thanks
Hari

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-04-23 12:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-20  5:04 [PATCH v4 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
2018-04-20  5:04 ` [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area Mahesh J Salgaonkar
2018-04-22  1:58   ` Nicholas Piggin
2018-04-23  5:16     ` Mahesh Jagannath Salgaonkar
2018-04-20  5:04 ` [PATCH v4 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement Mahesh J Salgaonkar
2018-04-20  5:04 ` [PATCH v4 3/7] powerpc/fadump: un-register fadump on kexec path Mahesh J Salgaonkar
2018-04-22  1:58   ` Nicholas Piggin
2018-04-23  5:17     ` Mahesh Jagannath Salgaonkar
2018-04-20  5:04 ` [PATCH v4 4/7] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
2018-04-23 12:53   ` Hari Bathini
2018-04-20  5:04 ` [PATCH v4 5/7] powerpc/fadump: Update documentation to reflect CMA reservation Mahesh J Salgaonkar
2018-04-20  5:05 ` [PATCH v4 6/7] powerpc/fadump: throw proper error message on fadump registration failure Mahesh J Salgaonkar
2018-04-20  5:05 ` [PATCH v4 7/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.