linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/31] Add FADump support on PowerNV platform
@ 2019-08-20 12:04 Hari Bathini
  2019-08-20 12:04 ` [PATCH v5 01/31] powerpc/fadump: move internal macros/definitions to a new header Hari Bathini
                   ` (30 more replies)
  0 siblings, 31 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Firmware-Assisted Dump (FADump) is currently supported only on pSeries
platform. This patch series adds support for PowerNV platform too.

The first few patches refactor the FADump code to make use of common
code across multiple platforms. Then basic FADump support is added for
PowerNV platform. Followed by patches to honour reserved-ranges DT node
while reserving/releasing memory used by FADump. The subsequent patch
processes CPU state data provided by firmware to create and append core
notes to the ELF core file and the next patch adds support to preserve
crash data for subsequent boots (useful in cases like petitboot). The
subsequent patches add support to export opalcore. opalcore makes
debugging of failures in OPAL code easier. Firmware-Assisted Dump
documentation is also updated appropriately.

The patch series is tested with the latest firmware plus the below skiboot
changes, accepted upstream recently, for MPIPL support:

    https://patchwork.ozlabs.org/project/skiboot/list/?series=119169
    ("MPIPL support")


Changes in v5:
  * Split the patches further.
  * Rebased to latest upstream kernel version.
  * Updated the patches based on discussions with mahesh on V4.

---

Hari Bathini (31):
      powerpc/fadump: move internal macros/definitions to a new header
      powerpc/fadump: move internal code to a new file
      powerpc/fadump: Improve fadump documentation
      pseries/fadump: move rtas specific definitions to platform code
      pseries/fadump: introduce callbacks for platform specific operations
      pseries/fadump: define register/un-register callback functions
      powerpc/fadump: release all the memory above boot memory size
      pseries/fadump: move out platform specific support from generic code
      powerpc/fadump: use FADump instead of fadump for how it is pronounced
      opal: add MPIPL interface definitions
      powernv/fadump: add fadump support on powernv
      powernv/fadump: register kernel metadata address with opal
      powernv/fadump: reset metadata address during clean up
      powernv/fadump: define register/un-register callback functions
      powernv/fadump: support copying multiple kernel boot memory regions
      powernv/fadump: process the crashdump by exporting it as /proc/vmcore
      powernv/fadump: Warn before processing partial crashdump
      powernv/fadump: handle invalidation of crashdump and re-registraion
      powerpc/fadump: Update documentation about OPAL platform support
      powerpc/fadump: use smaller offset while finding memory for reservation
      powernv/fadump: process architected register state data provided by firmware
      powerpc/fadump: make crash memory ranges array allocation generic
      powerpc/fadump: consider reserved ranges while releasing memory
      powerpc/fadump: improve how crashed kernel's memory is reserved
      powernv/fadump: add support to preserve crash data on FADUMP disabled kernel
      powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
      powernv/opalcore: export /sys/firmware/opal/core for analysing opal crashes
      powernv/opalcore: provide an option to invalidate /sys/firmware/opal/core file
      powerpc/fadump: consider f/w load area
      powernv/fadump: update documentation about option to release opalcore
      powernv/fadump: support holes in kernel boot memory area


 Documentation/powerpc/firmware-assisted-dump.rst |  204 ++--
 arch/powerpc/Kconfig                             |   23 
 arch/powerpc/include/asm/fadump.h                |  190 ---
 arch/powerpc/include/asm/opal-api.h              |   50 +
 arch/powerpc/include/asm/opal.h                  |    6 
 arch/powerpc/kernel/Makefile                     |    6 
 arch/powerpc/kernel/fadump-common.c              |  149 +++
 arch/powerpc/kernel/fadump-common.h              |  192 +++
 arch/powerpc/kernel/fadump.c                     | 1272 ++++++++--------------
 arch/powerpc/kernel/prom.c                       |    4 
 arch/powerpc/platforms/powernv/Makefile          |    3 
 arch/powerpc/platforms/powernv/opal-call.c       |    3 
 arch/powerpc/platforms/powernv/opal-core.c       |  633 +++++++++++
 arch/powerpc/platforms/powernv/opal-fadump.c     |  715 ++++++++++++
 arch/powerpc/platforms/powernv/opal-fadump.h     |  148 +++
 arch/powerpc/platforms/pseries/Makefile          |    1 
 arch/powerpc/platforms/pseries/rtas-fadump.c     |  593 ++++++++++
 arch/powerpc/platforms/pseries/rtas-fadump.h     |  117 ++
 18 files changed, 3267 insertions(+), 1042 deletions(-)
 create mode 100644 arch/powerpc/kernel/fadump-common.c
 create mode 100644 arch/powerpc/kernel/fadump-common.h
 create mode 100644 arch/powerpc/platforms/powernv/opal-core.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-fadump.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-fadump.h
 create mode 100644 arch/powerpc/platforms/pseries/rtas-fadump.c
 create mode 100644 arch/powerpc/platforms/pseries/rtas-fadump.h


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v5 01/31] powerpc/fadump: move internal macros/definitions to a new header
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
@ 2019-08-20 12:04 ` Hari Bathini
  2019-09-03 11:09   ` Michael Ellerman
  2019-08-20 12:04 ` [PATCH v5 02/31] powerpc/fadump: move internal code to a new file Hari Bathini
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Though asm/fadump.h is meant to be used by other components dealing
with FADump, it also has macros/definitions internal to FADump code.
Move them to a new header file used within FADump code. This also
makes way for refactoring platform specific FADump code.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/include/asm/fadump.h   |   71 ----------------------------
 arch/powerpc/kernel/fadump-common.h |   89 +++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/fadump.c        |    2 +
 3 files changed, 91 insertions(+), 71 deletions(-)
 create mode 100644 arch/powerpc/kernel/fadump-common.h

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 17d9b6a..75179497 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -11,34 +11,6 @@
 
 #ifdef CONFIG_FA_DUMP
 
-/*
- * The RMA region will be saved for later dumping when kernel crashes.
- * RMA is Real Mode Area, the first block of logical memory address owned
- * by logical partition, containing the storage that may be accessed with
- * translate off.
- */
-#define RMA_START	0x0
-#define RMA_END		(ppc64_rma_size)
-
-/*
- * On some Power systems where RMO is 128MB, it still requires minimum of
- * 256MB for kernel to boot successfully. When kdump infrastructure is
- * configured to save vmcore over network, we run into OOM issue while
- * loading modules related to network setup. Hence we need aditional 64M
- * of memory to avoid OOM issue.
- */
-#define MIN_BOOT_MEM	(((RMA_END < (0x1UL << 28)) ? (0x1UL << 28) : RMA_END) \
-			+ (0x1UL << 26))
-
-/* The upper limit percentage for user specified boot memory size (25%) */
-#define MAX_BOOT_MEM_RATIO			4
-
-#define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
-
-/* Alignement per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
-			max_t(unsigned long, MAX_ORDER - 1, pageblock_order))
-
 /* Firmware provided dump sections */
 #define FADUMP_CPU_STATE_DATA	0x0001
 #define FADUMP_HPTE_REGION	0x0002
@@ -47,11 +19,6 @@
 /* Dump request flag */
 #define FADUMP_REQUEST_FLAG	0x00000001
 
-/* FAD commands */
-#define FADUMP_REGISTER		1
-#define FADUMP_UNREGISTER	2
-#define FADUMP_INVALIDATE	3
-
 /* Dump status flag */
 #define FADUMP_ERROR_FLAG	0x2000
 
@@ -112,29 +79,6 @@ struct fadump_mem_struct {
 	struct fadump_section		rmr_region;
 };
 
-/* Firmware-assisted dump configuration details. */
-struct fw_dump {
-	unsigned long	cpu_state_data_size;
-	unsigned long	hpte_region_size;
-	unsigned long	boot_memory_size;
-	unsigned long	reserve_dump_area_start;
-	unsigned long	reserve_dump_area_size;
-	/* cmd line option during boot */
-	unsigned long	reserve_bootvar;
-
-	unsigned long	fadumphdr_addr;
-	unsigned long	cpu_notes_buf;
-	unsigned long	cpu_notes_buf_size;
-
-	int		ibm_configure_kernel_dump;
-
-	unsigned long	fadump_enabled:1;
-	unsigned long	fadump_supported:1;
-	unsigned long	dump_active:1;
-	unsigned long	dump_registered:1;
-	unsigned long	nocma:1;
-};
-
 /*
  * Copy the ascii values for first 8 characters from a string into u64
  * variable at their respective indexes.
@@ -153,7 +97,6 @@ static inline u64 str_to_u64(const char *str)
 #define STR_TO_HEX(x)	str_to_u64(x)
 #define REG_ID(x)	str_to_u64(x)
 
-#define FADUMP_CRASH_INFO_MAGIC		STR_TO_HEX("FADMPINF")
 #define REGSAVE_AREA_MAGIC		STR_TO_HEX("REGSAVE")
 
 /* The firmware-assisted dump format.
@@ -178,20 +121,6 @@ struct fadump_reg_entry {
 	__be64		reg_value;
 };
 
-/* fadump crash info structure */
-struct fadump_crash_info_header {
-	u64		magic_number;
-	u64		elfcorehdr_addr;
-	u32		crashing_cpu;
-	struct pt_regs	regs;
-	struct cpumask	online_mask;
-};
-
-struct fad_crash_memory_ranges {
-	unsigned long long	base;
-	unsigned long long	size;
-};
-
 extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
new file mode 100644
index 0000000..e0673b2
--- /dev/null
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Firmware-Assisted Dump internal code.
+ *
+ * Copyright 2011, IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.ibm.com>
+ *
+ * Copyright 2019, IBM Corp.
+ * Author: Hari Bathini <hbathini@linux.ibm.com>
+ */
+
+#ifndef __PPC64_FA_DUMP_INTERNAL_H__
+#define __PPC64_FA_DUMP_INTERNAL_H__
+
+/*
+ * The RMA region will be saved for later dumping when kernel crashes.
+ * RMA is Real Mode Area, the first block of logical memory address owned
+ * by logical partition, containing the storage that may be accessed with
+ * translate off.
+ */
+#define RMA_START	0x0
+#define RMA_END		(ppc64_rma_size)
+
+/*
+ * On some Power systems where RMO is 128MB, it still requires minimum of
+ * 256MB for kernel to boot successfully. When kdump infrastructure is
+ * configured to save vmcore over network, we run into OOM issue while
+ * loading modules related to network setup. Hence we need additional 64M
+ * of memory to avoid OOM issue.
+ */
+#define MIN_BOOT_MEM	(((RMA_END < (0x1UL << 28)) ? (0x1UL << 28) : RMA_END) \
+			+ (0x1UL << 26))
+
+/* The upper limit percentage for user specified boot memory size (25%) */
+#define MAX_BOOT_MEM_RATIO			4
+
+#define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
+
+/* Alignment per CMA requirement. */
+#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
+				 max_t(unsigned long, MAX_ORDER - 1,	\
+				 pageblock_order))
+
+/* FAD commands */
+#define FADUMP_REGISTER			1
+#define FADUMP_UNREGISTER		2
+#define FADUMP_INVALIDATE		3
+
+#define FADUMP_CRASH_INFO_MAGIC		str_to_u64("FADMPINF")
+
+/* fadump crash info structure */
+struct fadump_crash_info_header {
+	u64		magic_number;
+	u64		elfcorehdr_addr;
+	u32		crashing_cpu;
+	struct pt_regs	regs;
+	struct cpumask	online_mask;
+};
+
+struct fad_crash_memory_ranges {
+	unsigned long long	base;
+	unsigned long long	size;
+};
+
+/* Firmware-assisted dump configuration details. */
+struct fw_dump {
+	unsigned long	reserve_dump_area_start;
+	unsigned long	reserve_dump_area_size;
+	/* cmd line option during boot */
+	unsigned long	reserve_bootvar;
+
+	unsigned long	cpu_state_data_size;
+	unsigned long	hpte_region_size;
+	unsigned long	boot_memory_size;
+
+	unsigned long	fadumphdr_addr;
+	unsigned long	cpu_notes_buf;
+	unsigned long	cpu_notes_buf_size;
+
+	int		ibm_configure_kernel_dump;
+
+	unsigned long	fadump_enabled:1;
+	unsigned long	fadump_supported:1;
+	unsigned long	dump_active:1;
+	unsigned long	dump_registered:1;
+	unsigned long	nocma:1;
+};
+
+#endif /* __PPC64_FA_DUMP_INTERNAL_H__ */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 4eab972..e8630bb 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -32,6 +32,8 @@
 #include <asm/fadump.h>
 #include <asm/setup.h>
 
+#include "fadump-common.h"
+
 static struct fw_dump fw_dump;
 static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 02/31] powerpc/fadump: move internal code to a new file
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
  2019-08-20 12:04 ` [PATCH v5 01/31] powerpc/fadump: move internal macros/definitions to a new header Hari Bathini
@ 2019-08-20 12:04 ` Hari Bathini
  2019-09-03 11:09   ` Michael Ellerman
  2019-08-20 12:04 ` [PATCH v5 03/31] powerpc/fadump: Improve fadump documentation Hari Bathini
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Make way for refactoring platform specific FADump code by moving code
that could be referenced from multiple places to fadump-common.c file.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/Makefile        |    2 
 arch/powerpc/kernel/fadump-common.c |  140 ++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/fadump-common.h |    8 ++
 arch/powerpc/kernel/fadump.c        |  146 ++---------------------------------
 4 files changed, 158 insertions(+), 138 deletions(-)
 create mode 100644 arch/powerpc/kernel/fadump-common.c

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 56dfa7a..439d548 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -78,7 +78,7 @@ obj-$(CONFIG_EEH)              += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
 				  eeh_driver.o eeh_event.o eeh_sysfs.o
 obj-$(CONFIG_GENERIC_TBSYNC)	+= smp-tbsync.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o
-obj-$(CONFIG_FA_DUMP)		+= fadump.o
+obj-$(CONFIG_FA_DUMP)		+= fadump.o fadump-common.o
 ifdef CONFIG_PPC32
 obj-$(CONFIG_E500)		+= idle_e500.o
 endif
diff --git a/arch/powerpc/kernel/fadump-common.c b/arch/powerpc/kernel/fadump-common.c
new file mode 100644
index 0000000..7f39e4f
--- /dev/null
+++ b/arch/powerpc/kernel/fadump-common.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Firmware-Assisted Dump internal code.
+ *
+ * Copyright 2011, IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.ibm.com>
+ *
+ * Copyright 2019, IBM Corp.
+ * Author: Hari Bathini <hbathini@linux.ibm.com>
+ */
+
+#undef DEBUG
+#define pr_fmt(fmt) "fadump: " fmt
+
+#include <linux/memblock.h>
+#include <linux/elf.h>
+#include <linux/mm.h>
+#include <linux/crash_core.h>
+
+#include "fadump-common.h"
+
+void *fadump_cpu_notes_buf_alloc(unsigned long size)
+{
+	void *vaddr;
+	struct page *page;
+	unsigned long order, count, i;
+
+	order = get_order(size);
+	vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order);
+	if (!vaddr)
+		return NULL;
+
+	count = 1 << order;
+	page = virt_to_page(vaddr);
+	for (i = 0; i < count; i++)
+		SetPageReserved(page + i);
+	return vaddr;
+}
+
+void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size)
+{
+	struct page *page;
+	unsigned long order, count, i;
+
+	order = get_order(size);
+	count = 1 << order;
+	page = virt_to_page(vaddr);
+	for (i = 0; i < count; i++)
+		ClearPageReserved(page + i);
+	__free_pages(page, order);
+}
+
+u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs)
+{
+	struct elf_prstatus prstatus;
+
+	memset(&prstatus, 0, sizeof(prstatus));
+	/*
+	 * FIXME: How do i get PID? Do I really need it?
+	 * prstatus.pr_pid = ????
+	 */
+	elf_core_copy_kernel_regs(&prstatus.pr_reg, regs);
+	buf = append_elf_note(buf, CRASH_CORE_NOTE_NAME, NT_PRSTATUS,
+			      &prstatus, sizeof(prstatus));
+	return buf;
+}
+
+void fadump_update_elfcore_header(struct fw_dump *fadump_conf, char *bufp)
+{
+	struct elfhdr *elf;
+	struct elf_phdr *phdr;
+
+	elf = (struct elfhdr *)bufp;
+	bufp += sizeof(struct elfhdr);
+
+	/* First note is a place holder for cpu notes info. */
+	phdr = (struct elf_phdr *)bufp;
+
+	if (phdr->p_type == PT_NOTE) {
+		phdr->p_paddr  = fadump_conf->cpu_notes_buf;
+		phdr->p_offset = phdr->p_paddr;
+		phdr->p_memsz  = fadump_conf->cpu_notes_buf_size;
+		phdr->p_filesz = phdr->p_memsz;
+	}
+}
+
+/*
+ * Returns 1, if there are no holes in memory area between d_start to d_end,
+ * 0 otherwise.
+ */
+static int is_fadump_memory_area_contiguous(unsigned long d_start,
+					    unsigned long d_end)
+{
+	struct memblock_region *reg;
+	unsigned long start, end;
+	int ret = 0;
+
+	for_each_memblock(memory, reg) {
+		start = max_t(unsigned long, d_start, reg->base);
+		end = min_t(unsigned long, d_end, (reg->base + reg->size));
+		if (d_start < end) {
+			/* Memory hole from d_start to start */
+			if (start > d_start)
+				break;
+
+			if (end == d_end) {
+				ret = 1;
+				break;
+			}
+
+			d_start = end + 1;
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * Returns 1, if there are no holes in boot memory area,
+ * 0 otherwise.
+ */
+int is_fadump_boot_mem_contiguous(struct fw_dump *fadump_conf)
+{
+	unsigned long d_start = RMA_START;
+	unsigned long d_end   = RMA_START + fadump_conf->boot_memory_size;
+
+	return is_fadump_memory_area_contiguous(d_start, d_end);
+}
+
+/*
+ * Returns 1, if there are no holes in reserved memory area,
+ * 0 otherwise.
+ */
+int is_fadump_reserved_mem_contiguous(struct fw_dump *fadump_conf)
+{
+	unsigned long d_start = fadump_conf->reserve_dump_area_start;
+	unsigned long d_end   = d_start + fadump_conf->reserve_dump_area_size;
+
+	return is_fadump_memory_area_contiguous(d_start, d_end);
+}
diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index e0673b2..54328ef 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -86,4 +86,12 @@ struct fw_dump {
 	unsigned long	nocma:1;
 };
 
+/* Helper functions */
+void *fadump_cpu_notes_buf_alloc(unsigned long size);
+void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size);
+u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs);
+void fadump_update_elfcore_header(struct fw_dump *fadump_config, char *bufp);
+int is_fadump_boot_mem_contiguous(struct fw_dump *fadump_conf);
+int is_fadump_reserved_mem_contiguous(struct fw_dump *fadump_conf);
+
 #endif /* __PPC64_FA_DUMP_INTERNAL_H__ */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index e8630bb..40e5e96 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -37,9 +37,6 @@
 static struct fw_dump fw_dump;
 static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;
-#ifdef CONFIG_CMA
-static struct cma *fadump_cma;
-#endif
 
 static DEFINE_MUTEX(fadump_mutex);
 struct fad_crash_memory_ranges *crash_memory_ranges;
@@ -48,6 +45,8 @@ int crash_mem_ranges;
 int max_crash_mem_ranges;
 
 #ifdef CONFIG_CMA
+static struct cma *fadump_cma;
+
 /*
  * fadump_cma_init() - Initialize CMA area from a fadump reserved memory
  *
@@ -109,8 +108,8 @@ static int __init fadump_cma_init(void) { return 1; }
 #endif /* CONFIG_CMA */
 
 /* Scan the Firmware Assisted dump configuration details. */
-int __init early_init_dt_scan_fw_dump(unsigned long node,
-			const char *uname, int depth, void *data)
+int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
+				      int depth, void *data)
 {
 	const __be32 *sections;
 	int i, num_sections;
@@ -201,67 +200,6 @@ int is_fadump_active(void)
 	return fw_dump.dump_active;
 }
 
-/*
- * Returns 1, if there are no holes in boot memory area,
- * 0 otherwise.
- */
-static int is_boot_memory_area_contiguous(void)
-{
-	struct memblock_region *reg;
-	unsigned long tstart, tend;
-	unsigned long start_pfn = PHYS_PFN(RMA_START);
-	unsigned long end_pfn = PHYS_PFN(RMA_START + fw_dump.boot_memory_size);
-	unsigned int ret = 0;
-
-	for_each_memblock(memory, reg) {
-		tstart = max(start_pfn, memblock_region_memory_base_pfn(reg));
-		tend = min(end_pfn, memblock_region_memory_end_pfn(reg));
-		if (tstart < tend) {
-			/* Memory hole from start_pfn to tstart */
-			if (tstart > start_pfn)
-				break;
-
-			if (tend == end_pfn) {
-				ret = 1;
-				break;
-			}
-
-			start_pfn = tend + 1;
-		}
-	}
-
-	return ret;
-}
-
-/*
- * Returns true, if there are no holes in reserved memory area,
- * false otherwise.
- */
-static bool is_reserved_memory_area_contiguous(void)
-{
-	struct memblock_region *reg;
-	unsigned long start, end;
-	unsigned long d_start = fw_dump.reserve_dump_area_start;
-	unsigned long d_end = d_start + fw_dump.reserve_dump_area_size;
-
-	for_each_memblock(memory, reg) {
-		start = max(d_start, (unsigned long)reg->base);
-		end = min(d_end, (unsigned long)(reg->base + reg->size));
-		if (d_start < end) {
-			/* Memory hole from d_start to start */
-			if (start > d_start)
-				break;
-
-			if (end == d_end)
-				return true;
-
-			d_start = end + 1;
-		}
-	}
-
-	return false;
-}
-
 /* Print firmware assisted dump configurations for debugging purpose. */
 static void fadump_show_config(void)
 {
@@ -627,9 +565,9 @@ static int register_fw_dump(struct fadump_mem_struct *fdm)
 			" dump. Hardware Error(%d).\n", rc);
 		break;
 	case -3:
-		if (!is_boot_memory_area_contiguous())
+		if (!is_fadump_boot_mem_contiguous(&fw_dump))
 			pr_err("Can't have holes in boot memory area while registering fadump\n");
-		else if (!is_reserved_memory_area_contiguous())
+		else if (!is_fadump_reserved_mem_contiguous(&fw_dump))
 			pr_err("Can't have holes in reserved memory area while"
 			       " registering fadump\n");
 
@@ -759,72 +697,6 @@ fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs)
 	return reg_entry;
 }
 
-static u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs)
-{
-	struct elf_prstatus prstatus;
-
-	memset(&prstatus, 0, sizeof(prstatus));
-	/*
-	 * FIXME: How do i get PID? Do I really need it?
-	 * prstatus.pr_pid = ????
-	 */
-	elf_core_copy_kernel_regs(&prstatus.pr_reg, regs);
-	buf = append_elf_note(buf, CRASH_CORE_NOTE_NAME, NT_PRSTATUS,
-			      &prstatus, sizeof(prstatus));
-	return buf;
-}
-
-static void fadump_update_elfcore_header(char *bufp)
-{
-	struct elfhdr *elf;
-	struct elf_phdr *phdr;
-
-	elf = (struct elfhdr *)bufp;
-	bufp += sizeof(struct elfhdr);
-
-	/* First note is a place holder for cpu notes info. */
-	phdr = (struct elf_phdr *)bufp;
-
-	if (phdr->p_type == PT_NOTE) {
-		phdr->p_paddr = fw_dump.cpu_notes_buf;
-		phdr->p_offset	= phdr->p_paddr;
-		phdr->p_filesz	= fw_dump.cpu_notes_buf_size;
-		phdr->p_memsz = fw_dump.cpu_notes_buf_size;
-	}
-	return;
-}
-
-static void *fadump_cpu_notes_buf_alloc(unsigned long size)
-{
-	void *vaddr;
-	struct page *page;
-	unsigned long order, count, i;
-
-	order = get_order(size);
-	vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order);
-	if (!vaddr)
-		return NULL;
-
-	count = 1 << order;
-	page = virt_to_page(vaddr);
-	for (i = 0; i < count; i++)
-		SetPageReserved(page + i);
-	return vaddr;
-}
-
-static void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size)
-{
-	struct page *page;
-	unsigned long order, count, i;
-
-	order = get_order(size);
-	count = 1 << order;
-	page = virt_to_page(vaddr);
-	for (i = 0; i < count; i++)
-		ClearPageReserved(page + i);
-	__free_pages(page, order);
-}
-
 /*
  * Read CPU state dump data and convert it into ELF notes.
  * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be
@@ -914,9 +786,9 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
 	final_note(note_buf);
 
 	if (fdh) {
-		pr_debug("Updating elfcore header (%llx) with cpu notes\n",
-							fdh->elfcorehdr_addr);
-		fadump_update_elfcore_header((char *)__va(fdh->elfcorehdr_addr));
+		addr = fdh->elfcorehdr_addr;
+		pr_debug("Updating elfcore header(%lx) with cpu notes\n", addr);
+		fadump_update_elfcore_header(&fw_dump, (char *)__va(addr));
 	}
 	return 0;
 


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 03/31] powerpc/fadump: Improve fadump documentation
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
  2019-08-20 12:04 ` [PATCH v5 01/31] powerpc/fadump: move internal macros/definitions to a new header Hari Bathini
  2019-08-20 12:04 ` [PATCH v5 02/31] powerpc/fadump: move internal code to a new file Hari Bathini
@ 2019-08-20 12:04 ` Hari Bathini
  2019-08-20 12:04 ` [PATCH v5 04/31] pseries/fadump: move rtas specific definitions to platform code Hari Bathini
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

The figures depicting FADump's (Firmware-Assisted Dump) memory layout
are missing some finer details like different memory regions and what
they represent. Improve the documentation by updating those details.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.rst |   61 ++++++++++++----------
 1 file changed, 33 insertions(+), 28 deletions(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
index 9ca1283..563c021 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -76,8 +76,9 @@ as follows:
    there is crash data available from a previous boot. During
    the early boot OS will reserve rest of the memory above
    boot memory size effectively booting with restricted memory
-   size. This will make sure that the second kernel will not
-   touch any of the dump memory area.
+   size. This will make sure that this kernel (also, referred
+   to as second kernel or capture kernel) will not touch any
+   of the dump memory area.
 
 -  User-space tools will read /proc/vmcore to obtain the contents
    of memory, which holds the previous crashed kernel dump in ELF
@@ -128,42 +129,46 @@ space memory except the user pages that were present in CMA region::
 
   o Memory Reservation during first kernel
 
-  Low memory                                         Top of memory
-  0      boot memory size                                       |
-  |           |                |<--Reserved dump area -->|      |
-  V           V                |   Permanent Reservation |      V
-  +-----------+----------/ /---+---+----+-----------+----+------+
-  |           |                |CPU|HPTE|  DUMP     |ELF |      |
-  +-----------+----------/ /---+---+----+-----------+----+------+
-        |                                           ^
-        |                                           |
-        \                                           /
-         -------------------------------------------
-          Boot memory content gets transferred to
-          reserved area by firmware at the time of
-          crash
+  Low memory                                                Top of memory
+  0      boot memory size      |<--Reserved dump area --->|      |
+  |           |                | (Permanent Reservation)  |      |
+  V           V                |                          |      V
+  +-----------+----------/ /---+---+----+--------+---+----+------+
+  |           |                |CPU|HPTE|  DUMP  |HDR|ELF |      |
+  +-----------+----------/ /---+---+----+--------+---+----+------+
+        |                                   ^      ^
+        |                                   |      |
+        \                                   /      |
+         -----------------------------------     FADump Header
+          Boot memory content gets transferred   (meta area)
+          to reserved area by firmware at the
+          time of crash
+
                    Fig. 1
 
+
   o Memory Reservation during second kernel after crash
 
-  Low memory                                        Top of memory
-  0      boot memory size                                       |
-  |           |<------------- Reserved dump area ----------- -->|
-  V           V                                                 V
-  +-----------+----------/ /---+---+----+-----------+----+------+
-  |           |                |CPU|HPTE|  DUMP     |ELF |      |
-  +-----------+----------/ /---+---+----+-----------+----+------+
+  Low memory                                                Top of memory
+  0      boot memory size                                        |
+  |           |<----------- Crash preserved area --------------->|
+  V           V                |<-- Reserved dump area -->|      V
+  +-----------+----------/ /---+---+----+--------+---+----+------+
+  |           |                |CPU|HPTE|  DUMP  |HDR|ELF |      |
+  +-----------+----------/ /---+---+----+--------+---+----+------+
         |                                              |
         V                                              V
    Used by second                                /proc/vmcore
    kernel to boot
                    Fig. 2
 
-Currently the dump will be copied from /proc/vmcore to a
-a new file upon user intervention. The dump data available through
-/proc/vmcore will be in ELF format. Hence the existing kdump
-infrastructure (kdump scripts) to save the dump works fine with
-minor modifications.
+Currently the dump will be copied from /proc/vmcore to a new file upon
+user intervention. The dump data available through /proc/vmcore will be
+in ELF format. Hence the existing kdump infrastructure (kdump scripts)
+to save the dump works fine with minor modifications. KDump scripts on
+major Distro releases have already been modified to work seemlessly (no
+user intervention in saving the dump) when FADump is used, instead of
+KDump, as dump mechanism.
 
 The tools to examine the dump will be same as the ones
 used for kdump.


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 04/31] pseries/fadump: move rtas specific definitions to platform code
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (2 preceding siblings ...)
  2019-08-20 12:04 ` [PATCH v5 03/31] powerpc/fadump: Improve fadump documentation Hari Bathini
@ 2019-08-20 12:04 ` Hari Bathini
  2019-08-20 12:04 ` [PATCH v5 05/31] pseries/fadump: introduce callbacks for platform specific operations Hari Bathini
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Currently, FADump is only supported on pSeries but that is going to
change soon with FADump support being added on PowerNV platform. So,
move rtas specific definitions to platform code to allow FADump
to have multiple platforms support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/include/asm/fadump.h            |  112 --------------------------
 arch/powerpc/kernel/fadump-common.h          |   20 ++++-
 arch/powerpc/kernel/fadump.c                 |   90 +++++++++++----------
 arch/powerpc/platforms/pseries/rtas-fadump.h |  103 ++++++++++++++++++++++++
 4 files changed, 170 insertions(+), 155 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/rtas-fadump.h

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 75179497..e608d34 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -11,116 +11,8 @@
 
 #ifdef CONFIG_FA_DUMP
 
-/* Firmware provided dump sections */
-#define FADUMP_CPU_STATE_DATA	0x0001
-#define FADUMP_HPTE_REGION	0x0002
-#define FADUMP_REAL_MODE_REGION	0x0011
-
-/* Dump request flag */
-#define FADUMP_REQUEST_FLAG	0x00000001
-
-/* Dump status flag */
-#define FADUMP_ERROR_FLAG	0x2000
-
-#define FADUMP_CPU_ID_MASK	((1UL << 32) - 1)
-
-#define CPU_UNKNOWN		(~((u32)0))
-
-/* Utility macros */
-#define SKIP_TO_NEXT_CPU(reg_entry)					\
-({									\
-	while (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUEND"))	\
-		reg_entry++;						\
-	reg_entry++;							\
-})
-
 extern int crashing_cpu;
 
-/* Kernel Dump section info */
-struct fadump_section {
-	__be32	request_flag;
-	__be16	source_data_type;
-	__be16	error_flags;
-	__be64	source_address;
-	__be64	source_len;
-	__be64	bytes_dumped;
-	__be64	destination_address;
-};
-
-/* ibm,configure-kernel-dump header. */
-struct fadump_section_header {
-	__be32	dump_format_version;
-	__be16	dump_num_sections;
-	__be16	dump_status_flag;
-	__be32	offset_first_dump_section;
-
-	/* Fields for disk dump option. */
-	__be32	dd_block_size;
-	__be64	dd_block_offset;
-	__be64	dd_num_blocks;
-	__be32	dd_offset_disk_path;
-
-	/* Maximum time allowed to prevent an automatic dump-reboot. */
-	__be32	max_time_auto;
-};
-
-/*
- * Firmware Assisted dump memory structure. This structure is required for
- * registering future kernel dump with power firmware through rtas call.
- *
- * No disk dump option. Hence disk dump path string section is not included.
- */
-struct fadump_mem_struct {
-	struct fadump_section_header	header;
-
-	/* Kernel dump sections */
-	struct fadump_section		cpu_state_data;
-	struct fadump_section		hpte_region;
-	struct fadump_section		rmr_region;
-};
-
-/*
- * Copy the ascii values for first 8 characters from a string into u64
- * variable at their respective indexes.
- * e.g.
- *  The string "FADMPINF" will be converted into 0x4641444d50494e46
- */
-static inline u64 str_to_u64(const char *str)
-{
-	u64 val = 0;
-	int i;
-
-	for (i = 0; i < sizeof(val); i++)
-		val = (*str) ? (val << 8) | *str++ : val << 8;
-	return val;
-}
-#define STR_TO_HEX(x)	str_to_u64(x)
-#define REG_ID(x)	str_to_u64(x)
-
-#define REGSAVE_AREA_MAGIC		STR_TO_HEX("REGSAVE")
-
-/* The firmware-assisted dump format.
- *
- * The register save area is an area in the partition's memory used to preserve
- * the register contents (CPU state data) for the active CPUs during a firmware
- * assisted dump. The dump format contains register save area header followed
- * by register entries. Each list of registers for a CPU starts with
- * "CPUSTRT" and ends with "CPUEND".
- */
-
-/* Register save area header. */
-struct fadump_reg_save_area_header {
-	__be64		magic_number;
-	__be32		version;
-	__be32		num_cpu_offset;
-};
-
-/* Register entry. */
-struct fadump_reg_entry {
-	__be64		reg_id;
-	__be64		reg_value;
-};
-
 extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
@@ -136,5 +28,5 @@ static inline int is_fadump_active(void) { return 0; }
 static inline int should_fadump_crash(void) { return 0; }
 static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
 static inline void fadump_cleanup(void) { }
-#endif
-#endif
+#endif /* !CONFIG_FA_DUMP */
+#endif /* __PPC64_FA_DUMP_H__ */
diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index 54328ef..93bc471 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -46,7 +46,25 @@
 #define FADUMP_UNREGISTER		2
 #define FADUMP_INVALIDATE		3
 
-#define FADUMP_CRASH_INFO_MAGIC		str_to_u64("FADMPINF")
+/*
+ * Copy the ascii values for first 8 characters from a string into u64
+ * variable at their respective indexes.
+ * e.g.
+ *  The string "FADMPINF" will be converted into 0x4641444d50494e46
+ */
+static inline u64 fadump_str_to_u64(const char *str)
+{
+	u64 val = 0;
+	int i;
+
+	for (i = 0; i < sizeof(val); i++)
+		val = (*str) ? (val << 8) | *str++ : val << 8;
+	return val;
+}
+
+#define FADUMP_CPU_UNKNOWN		(~((u32)0))
+
+#define FADUMP_CRASH_INFO_MAGIC		fadump_str_to_u64("FADMPINF")
 
 /* fadump crash info structure */
 struct fadump_crash_info_header {
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 40e5e96..23c0ca0 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -33,10 +33,11 @@
 #include <asm/setup.h>
 
 #include "fadump-common.h"
+#include "../platforms/pseries/rtas-fadump.h"
 
 static struct fw_dump fw_dump;
-static struct fadump_mem_struct fdm;
-static const struct fadump_mem_struct *fdm_active;
+static struct rtas_fadump_mem_struct fdm;
+static const struct rtas_fadump_mem_struct *fdm_active;
 
 static DEFINE_MUTEX(fadump_mutex);
 struct fad_crash_memory_ranges *crash_memory_ranges;
@@ -156,11 +157,11 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
 		u32 type = (u32)of_read_number(sections, 1);
 
 		switch (type) {
-		case FADUMP_CPU_STATE_DATA:
+		case RTAS_FADUMP_CPU_STATE_DATA:
 			fw_dump.cpu_state_data_size =
 					of_read_ulong(&sections[1], 2);
 			break;
-		case FADUMP_HPTE_REGION:
+		case RTAS_FADUMP_HPTE_REGION:
 			fw_dump.hpte_region_size =
 					of_read_ulong(&sections[1], 2);
 			break;
@@ -219,20 +220,20 @@ static void fadump_show_config(void)
 	pr_debug("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
 }
 
-static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
+static unsigned long init_fadump_mem_struct(struct rtas_fadump_mem_struct *fdm,
 				unsigned long addr)
 {
 	if (!fdm)
 		return 0;
 
-	memset(fdm, 0, sizeof(struct fadump_mem_struct));
+	memset(fdm, 0, sizeof(struct rtas_fadump_mem_struct));
 	addr = addr & PAGE_MASK;
 
 	fdm->header.dump_format_version = cpu_to_be32(0x00000001);
 	fdm->header.dump_num_sections = cpu_to_be16(3);
 	fdm->header.dump_status_flag = 0;
 	fdm->header.offset_first_dump_section =
-		cpu_to_be32((u32)offsetof(struct fadump_mem_struct, cpu_state_data));
+		cpu_to_be32((u32)offsetof(struct rtas_fadump_mem_struct, cpu_state_data));
 
 	/*
 	 * Fields for disk dump option.
@@ -248,24 +249,24 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
 
 	/* Kernel dump sections */
 	/* cpu state data section. */
-	fdm->cpu_state_data.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG);
-	fdm->cpu_state_data.source_data_type = cpu_to_be16(FADUMP_CPU_STATE_DATA);
+	fdm->cpu_state_data.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
+	fdm->cpu_state_data.source_data_type = cpu_to_be16(RTAS_FADUMP_CPU_STATE_DATA);
 	fdm->cpu_state_data.source_address = 0;
 	fdm->cpu_state_data.source_len = cpu_to_be64(fw_dump.cpu_state_data_size);
 	fdm->cpu_state_data.destination_address = cpu_to_be64(addr);
 	addr += fw_dump.cpu_state_data_size;
 
 	/* hpte region section */
-	fdm->hpte_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG);
-	fdm->hpte_region.source_data_type = cpu_to_be16(FADUMP_HPTE_REGION);
+	fdm->hpte_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
+	fdm->hpte_region.source_data_type = cpu_to_be16(RTAS_FADUMP_HPTE_REGION);
 	fdm->hpte_region.source_address = 0;
 	fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size);
 	fdm->hpte_region.destination_address = cpu_to_be64(addr);
 	addr += fw_dump.hpte_region_size;
 
 	/* RMA region section */
-	fdm->rmr_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG);
-	fdm->rmr_region.source_data_type = cpu_to_be16(FADUMP_REAL_MODE_REGION);
+	fdm->rmr_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
+	fdm->rmr_region.source_data_type = cpu_to_be16(RTAS_FADUMP_REAL_MODE_REGION);
 	fdm->rmr_region.source_address = cpu_to_be64(RMA_START);
 	fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size);
 	fdm->rmr_region.destination_address = cpu_to_be64(addr);
@@ -536,7 +537,7 @@ static int __init early_fadump_reserve_mem(char *p)
 }
 early_param("fadump_reserve_mem", early_fadump_reserve_mem);
 
-static int register_fw_dump(struct fadump_mem_struct *fdm)
+static int register_fw_dump(struct rtas_fadump_mem_struct *fdm)
 {
 	int rc, err;
 	unsigned int wait_time;
@@ -547,7 +548,7 @@ static int register_fw_dump(struct fadump_mem_struct *fdm)
 	do {
 		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
 			FADUMP_REGISTER, fdm,
-			sizeof(struct fadump_mem_struct));
+			sizeof(struct rtas_fadump_mem_struct));
 
 		wait_time = rtas_busy_delay_time(rc);
 		if (wait_time)
@@ -643,7 +644,7 @@ static inline int fadump_gpr_index(u64 id)
 	int i = -1;
 	char str[3];
 
-	if ((id & GPR_MASK) == REG_ID("GPR")) {
+	if ((id & GPR_MASK) == fadump_str_to_u64("GPR")) {
 		/* get the digits at the end */
 		id &= ~GPR_MASK;
 		id >>= 24;
@@ -665,30 +666,30 @@ static inline void fadump_set_regval(struct pt_regs *regs, u64 reg_id,
 	i = fadump_gpr_index(reg_id);
 	if (i >= 0)
 		regs->gpr[i] = (unsigned long)reg_val;
-	else if (reg_id == REG_ID("NIA"))
+	else if (reg_id == fadump_str_to_u64("NIA"))
 		regs->nip = (unsigned long)reg_val;
-	else if (reg_id == REG_ID("MSR"))
+	else if (reg_id == fadump_str_to_u64("MSR"))
 		regs->msr = (unsigned long)reg_val;
-	else if (reg_id == REG_ID("CTR"))
+	else if (reg_id == fadump_str_to_u64("CTR"))
 		regs->ctr = (unsigned long)reg_val;
-	else if (reg_id == REG_ID("LR"))
+	else if (reg_id == fadump_str_to_u64("LR"))
 		regs->link = (unsigned long)reg_val;
-	else if (reg_id == REG_ID("XER"))
+	else if (reg_id == fadump_str_to_u64("XER"))
 		regs->xer = (unsigned long)reg_val;
-	else if (reg_id == REG_ID("CR"))
+	else if (reg_id == fadump_str_to_u64("CR"))
 		regs->ccr = (unsigned long)reg_val;
-	else if (reg_id == REG_ID("DAR"))
+	else if (reg_id == fadump_str_to_u64("DAR"))
 		regs->dar = (unsigned long)reg_val;
-	else if (reg_id == REG_ID("DSISR"))
+	else if (reg_id == fadump_str_to_u64("DSISR"))
 		regs->dsisr = (unsigned long)reg_val;
 }
 
-static struct fadump_reg_entry*
-fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs)
+static struct rtas_fadump_reg_entry*
+fadump_read_registers(struct rtas_fadump_reg_entry *reg_entry, struct pt_regs *regs)
 {
 	memset(regs, 0, sizeof(struct pt_regs));
 
-	while (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUEND")) {
+	while (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUEND")) {
 		fadump_set_regval(regs, be64_to_cpu(reg_entry->reg_id),
 					be64_to_cpu(reg_entry->reg_value));
 		reg_entry++;
@@ -711,10 +712,10 @@ fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs)
  * state from fadump crash info structure populated by first kernel at the
  * time of crash.
  */
-static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
+static int __init fadump_build_cpu_notes(const struct rtas_fadump_mem_struct *fdm)
 {
-	struct fadump_reg_save_area_header *reg_header;
-	struct fadump_reg_entry *reg_entry;
+	struct rtas_fadump_reg_save_area_header *reg_header;
+	struct rtas_fadump_reg_entry *reg_entry;
 	struct fadump_crash_info_header *fdh = NULL;
 	void *vaddr;
 	unsigned long addr;
@@ -729,7 +730,8 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
 	vaddr = __va(addr);
 
 	reg_header = vaddr;
-	if (be64_to_cpu(reg_header->magic_number) != REGSAVE_AREA_MAGIC) {
+	if (be64_to_cpu(reg_header->magic_number) !=
+	    fadump_str_to_u64("REGSAVE")) {
 		printk(KERN_ERR "Unable to read register save area.\n");
 		return -ENOENT;
 	}
@@ -741,7 +743,7 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
 	num_cpus = be32_to_cpu(*((__be32 *)(vaddr)));
 	pr_debug("NumCpus     : %u\n", num_cpus);
 	vaddr += sizeof(u32);
-	reg_entry = (struct fadump_reg_entry *)vaddr;
+	reg_entry = (struct rtas_fadump_reg_entry *)vaddr;
 
 	/* Allocate buffer to hold cpu crash notes. */
 	fw_dump.cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
@@ -761,22 +763,22 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
 		fdh = __va(fw_dump.fadumphdr_addr);
 
 	for (i = 0; i < num_cpus; i++) {
-		if (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUSTRT")) {
+		if (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUSTRT")) {
 			printk(KERN_ERR "Unable to read CPU state data\n");
 			rc = -ENOENT;
 			goto error_out;
 		}
 		/* Lower 4 bytes of reg_value contains logical cpu id */
-		cpu = be64_to_cpu(reg_entry->reg_value) & FADUMP_CPU_ID_MASK;
+		cpu = be64_to_cpu(reg_entry->reg_value) & RTAS_FADUMP_CPU_ID_MASK;
 		if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) {
-			SKIP_TO_NEXT_CPU(reg_entry);
+			RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry);
 			continue;
 		}
 		pr_debug("Reading register data for cpu %d...\n", cpu);
 		if (fdh && fdh->crashing_cpu == cpu) {
 			regs = fdh->regs;
 			note_buf = fadump_regs_to_elf_notes(note_buf, &regs);
-			SKIP_TO_NEXT_CPU(reg_entry);
+			RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry);
 		} else {
 			reg_entry++;
 			reg_entry = fadump_read_registers(reg_entry, &regs);
@@ -805,7 +807,7 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm)
  * Validate and process the dump data stored by firmware before exporting
  * it through '/proc/vmcore'.
  */
-static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
+static int __init process_fadump(const struct rtas_fadump_mem_struct *fdm_active)
 {
 	struct fadump_crash_info_header *fdh;
 	int rc = 0;
@@ -814,7 +816,7 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 		return -EINVAL;
 
 	/* Check if the dump data is valid. */
-	if ((be16_to_cpu(fdm_active->header.dump_status_flag) == FADUMP_ERROR_FLAG) ||
+	if ((be16_to_cpu(fdm_active->header.dump_status_flag) == RTAS_FADUMP_ERROR_FLAG) ||
 			(fdm_active->cpu_state_data.error_flags != 0) ||
 			(fdm_active->rmr_region.error_flags != 0)) {
 		printk(KERN_ERR "Dump taken by platform is not valid\n");
@@ -1145,7 +1147,7 @@ static unsigned long init_fadump_header(unsigned long addr)
 	fdh->magic_number = FADUMP_CRASH_INFO_MAGIC;
 	fdh->elfcorehdr_addr = addr;
 	/* We will set the crashing cpu id in crash_fadump() during crash. */
-	fdh->crashing_cpu = CPU_UNKNOWN;
+	fdh->crashing_cpu = FADUMP_CPU_UNKNOWN;
 
 	return addr;
 }
@@ -1179,7 +1181,7 @@ static int register_fadump(void)
 	return register_fw_dump(&fdm);
 }
 
-static int fadump_unregister_dump(struct fadump_mem_struct *fdm)
+static int fadump_unregister_dump(struct rtas_fadump_mem_struct *fdm)
 {
 	int rc = 0;
 	unsigned int wait_time;
@@ -1190,7 +1192,7 @@ static int fadump_unregister_dump(struct fadump_mem_struct *fdm)
 	do {
 		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
 			FADUMP_UNREGISTER, fdm,
-			sizeof(struct fadump_mem_struct));
+			sizeof(struct rtas_fadump_mem_struct));
 
 		wait_time = rtas_busy_delay_time(rc);
 		if (wait_time)
@@ -1206,7 +1208,7 @@ static int fadump_unregister_dump(struct fadump_mem_struct *fdm)
 	return 0;
 }
 
-static int fadump_invalidate_dump(const struct fadump_mem_struct *fdm)
+static int fadump_invalidate_dump(const struct rtas_fadump_mem_struct *fdm)
 {
 	int rc = 0;
 	unsigned int wait_time;
@@ -1217,7 +1219,7 @@ static int fadump_invalidate_dump(const struct fadump_mem_struct *fdm)
 	do {
 		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
 			FADUMP_INVALIDATE, fdm,
-			sizeof(struct fadump_mem_struct));
+			sizeof(struct rtas_fadump_mem_struct));
 
 		wait_time = rtas_busy_delay_time(rc);
 		if (wait_time)
@@ -1438,7 +1440,7 @@ static ssize_t fadump_register_store(struct kobject *kobj,
 
 static int fadump_region_show(struct seq_file *m, void *private)
 {
-	const struct fadump_mem_struct *fdm_ptr;
+	const struct rtas_fadump_mem_struct *fdm_ptr;
 
 	if (!fw_dump.fadump_enabled)
 		return 0;
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.h b/arch/powerpc/platforms/pseries/rtas-fadump.h
new file mode 100644
index 0000000..a3a2918
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Firmware-Assisted Dump support on POWERVM platform.
+ *
+ * Copyright 2011, IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.ibm.com>
+ *
+ * Copyright 2019, IBM Corp.
+ * Author: Hari Bathini <hbathini@linux.ibm.com>
+ */
+
+#ifndef __PPC64_RTAS_FA_DUMP_H__
+#define __PPC64_RTAS_FA_DUMP_H__
+
+/* Firmware provided dump sections */
+#define RTAS_FADUMP_CPU_STATE_DATA	0x0001
+#define RTAS_FADUMP_HPTE_REGION		0x0002
+#define RTAS_FADUMP_REAL_MODE_REGION	0x0011
+
+/* Dump request flag */
+#define RTAS_FADUMP_REQUEST_FLAG	0x00000001
+
+/* Dump status flag */
+#define RTAS_FADUMP_ERROR_FLAG		0x2000
+
+/* Kernel Dump section info */
+struct rtas_fadump_section {
+	__be32	request_flag;
+	__be16	source_data_type;
+	__be16	error_flags;
+	__be64	source_address;
+	__be64	source_len;
+	__be64	bytes_dumped;
+	__be64	destination_address;
+};
+
+/* ibm,configure-kernel-dump header. */
+struct rtas_fadump_section_header {
+	__be32	dump_format_version;
+	__be16	dump_num_sections;
+	__be16	dump_status_flag;
+	__be32	offset_first_dump_section;
+
+	/* Fields for disk dump option. */
+	__be32	dd_block_size;
+	__be64	dd_block_offset;
+	__be64	dd_num_blocks;
+	__be32	dd_offset_disk_path;
+
+	/* Maximum time allowed to prevent an automatic dump-reboot. */
+	__be32	max_time_auto;
+};
+
+/*
+ * Firmware Assisted dump memory structure. This structure is required for
+ * registering future kernel dump with power firmware through rtas call.
+ *
+ * No disk dump option. Hence disk dump path string section is not included.
+ */
+struct rtas_fadump_mem_struct {
+	struct rtas_fadump_section_header	header;
+
+	/* Kernel dump sections */
+	struct rtas_fadump_section		cpu_state_data;
+	struct rtas_fadump_section		hpte_region;
+	struct rtas_fadump_section		rmr_region;
+};
+
+/*
+ * The firmware-assisted dump format.
+ *
+ * The register save area is an area in the partition's memory used to preserve
+ * the register contents (CPU state data) for the active CPUs during a firmware
+ * assisted dump. The dump format contains register save area header followed
+ * by register entries. Each list of registers for a CPU starts with "CPUSTRT"
+ * and ends with "CPUEND".
+ */
+
+/* Register save area header. */
+struct rtas_fadump_reg_save_area_header {
+	__be64		magic_number;
+	__be32		version;
+	__be32		num_cpu_offset;
+};
+
+/* Register entry. */
+struct rtas_fadump_reg_entry {
+	__be64		reg_id;
+	__be64		reg_value;
+};
+
+/* Utility macros */
+#define RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry)				\
+({									\
+	while (be64_to_cpu(reg_entry->reg_id) !=			\
+	       fadump_str_to_u64("CPUEND"))				\
+		reg_entry++;						\
+	reg_entry++;							\
+})
+
+#define RTAS_FADUMP_CPU_ID_MASK			((1UL << 32) - 1)
+
+#endif /* __PPC64_RTAS_FA_DUMP_H__ */


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 05/31] pseries/fadump: introduce callbacks for platform specific operations
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (3 preceding siblings ...)
  2019-08-20 12:04 ` [PATCH v5 04/31] pseries/fadump: move rtas specific definitions to platform code Hari Bathini
@ 2019-08-20 12:04 ` Hari Bathini
  2019-09-03 11:10   ` Michael Ellerman
  2019-08-20 12:04 ` [PATCH v5 06/31] pseries/fadump: define register/un-register callback functions Hari Bathini
                   ` (25 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Introduce callback functions for platform specific operations like
register, unregister, invalidate & such. Also, define place-holders
for the same on pSeries platform.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump-common.h          |   26 +++++
 arch/powerpc/kernel/fadump.c                 |   47 +--------
 arch/powerpc/platforms/pseries/Makefile      |    1 
 arch/powerpc/platforms/pseries/rtas-fadump.c |  129 ++++++++++++++++++++++++++
 4 files changed, 159 insertions(+), 44 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/rtas-fadump.c

diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index 93bc471..898d0e8 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -80,6 +80,9 @@ struct fad_crash_memory_ranges {
 	unsigned long long	size;
 };
 
+/* Platform specific callback functions */
+struct fadump_ops;
+
 /* Firmware-assisted dump configuration details. */
 struct fw_dump {
 	unsigned long	reserve_dump_area_start;
@@ -102,6 +105,20 @@ struct fw_dump {
 	unsigned long	dump_active:1;
 	unsigned long	dump_registered:1;
 	unsigned long	nocma:1;
+
+	struct fadump_ops		*ops;
+};
+
+struct fadump_ops {
+	ulong	(*fadump_init_mem_struct)(struct fw_dump *fadump_config);
+	int	(*fadump_register)(struct fw_dump *fadump_config);
+	int	(*fadump_unregister)(struct fw_dump *fadump_config);
+	int	(*fadump_invalidate)(struct fw_dump *fadump_config);
+	int	(*fadump_process)(struct fw_dump *fadump_config);
+	void	(*fadump_region_show)(struct fw_dump *fadump_config,
+				      struct seq_file *m);
+	void	(*fadump_trigger)(struct fadump_crash_info_header *fdh,
+				  const char *msg);
 };
 
 /* Helper functions */
@@ -112,4 +129,13 @@ void fadump_update_elfcore_header(struct fw_dump *fadump_config, char *bufp);
 int is_fadump_boot_mem_contiguous(struct fw_dump *fadump_conf);
 int is_fadump_reserved_mem_contiguous(struct fw_dump *fadump_conf);
 
+#ifdef CONFIG_PPC_PSERIES
+extern int rtas_fadump_dt_scan(struct fw_dump *fadump_config, ulong node);
+#else
+static inline int rtas_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
+{
+	return 1;
+}
+#endif
+
 #endif /* __PPC64_FA_DUMP_INTERNAL_H__ */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 23c0ca0..99d5def 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -112,24 +112,12 @@ static int __init fadump_cma_init(void) { return 1; }
 int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
 				      int depth, void *data)
 {
-	const __be32 *sections;
-	int i, num_sections;
-	int size;
-	const __be32 *token;
+	int ret;
 
 	if (depth != 1 || strcmp(uname, "rtas") != 0)
 		return 0;
 
-	/*
-	 * Check if Firmware Assisted dump is supported. if yes, check
-	 * if dump has been initiated on last reboot.
-	 */
-	token = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL);
-	if (!token)
-		return 1;
-
-	fw_dump.fadump_supported = 1;
-	fw_dump.ibm_configure_kernel_dump = be32_to_cpu(*token);
+	ret = rtas_fadump_dt_scan(&fw_dump, node);
 
 	/*
 	 * The 'ibm,kernel-dump' rtas node is present only if there is
@@ -139,36 +127,7 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
 	if (fdm_active)
 		fw_dump.dump_active = 1;
 
-	/* Get the sizes required to store dump data for the firmware provided
-	 * dump sections.
-	 * For each dump section type supported, a 32bit cell which defines
-	 * the ID of a supported section followed by two 32 bit cells which
-	 * gives teh size of the section in bytes.
-	 */
-	sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
-					&size);
-
-	if (!sections)
-		return 1;
-
-	num_sections = size / (3 * sizeof(u32));
-
-	for (i = 0; i < num_sections; i++, sections += 3) {
-		u32 type = (u32)of_read_number(sections, 1);
-
-		switch (type) {
-		case RTAS_FADUMP_CPU_STATE_DATA:
-			fw_dump.cpu_state_data_size =
-					of_read_ulong(&sections[1], 2);
-			break;
-		case RTAS_FADUMP_HPTE_REGION:
-			fw_dump.hpte_region_size =
-					of_read_ulong(&sections[1], 2);
-			break;
-		}
-	}
-
-	return 1;
+	return ret;
 }
 
 /*
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index ab3d59a..e248724 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_IBMVIO)		+= vio.o
 obj-$(CONFIG_IBMEBUS)		+= ibmebus.o
 obj-$(CONFIG_PAPR_SCM)		+= papr_scm.o
 obj-$(CONFIG_PPC_SPLPAR)	+= vphn.o
+obj-$(CONFIG_FA_DUMP)		+= rtas-fadump.o
 
 ifdef CONFIG_PPC_PSERIES
 obj-$(CONFIG_SUSPEND)		+= suspend.o
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c
new file mode 100644
index 0000000..b77d738
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Firmware-Assisted Dump support on POWERVM platform.
+ *
+ * Copyright 2011, IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.ibm.com>
+ *
+ * Copyright 2019, IBM Corp.
+ * Author: Hari Bathini <hbathini@linux.ibm.com>
+ */
+
+#undef DEBUG
+#define pr_fmt(fmt) "rtas fadump: " fmt
+
+#include <linux/string.h>
+#include <linux/memblock.h>
+#include <linux/delay.h>
+#include <linux/seq_file.h>
+#include <linux/crash_dump.h>
+
+#include <asm/page.h>
+#include <asm/prom.h>
+#include <asm/rtas.h>
+#include <asm/fadump.h>
+
+#include "../../kernel/fadump-common.h"
+#include "rtas-fadump.h"
+
+static ulong rtas_fadump_init_mem_struct(struct fw_dump *fadump_conf)
+{
+	return fadump_conf->reserve_dump_area_start;
+}
+
+static int rtas_fadump_register(struct fw_dump *fadump_conf)
+{
+	return -EIO;
+}
+
+static int rtas_fadump_unregister(struct fw_dump *fadump_conf)
+{
+	return -EIO;
+}
+
+static int rtas_fadump_invalidate(struct fw_dump *fadump_conf)
+{
+	return -EIO;
+}
+
+/*
+ * Validate and process the dump data stored by firmware before exporting
+ * it through '/proc/vmcore'.
+ */
+static int __init rtas_fadump_process(struct fw_dump *fadump_conf)
+{
+	return -EINVAL;
+}
+
+static void rtas_fadump_region_show(struct fw_dump *fadump_conf,
+				    struct seq_file *m)
+{
+}
+
+static void rtas_fadump_trigger(struct fadump_crash_info_header *fdh,
+				const char *msg)
+{
+	/* Call ibm,os-term rtas call to trigger firmware assisted dump */
+	rtas_os_term((char *)msg);
+}
+
+static struct fadump_ops rtas_fadump_ops = {
+	.fadump_init_mem_struct		= rtas_fadump_init_mem_struct,
+	.fadump_register		= rtas_fadump_register,
+	.fadump_unregister		= rtas_fadump_unregister,
+	.fadump_invalidate		= rtas_fadump_invalidate,
+	.fadump_process			= rtas_fadump_process,
+	.fadump_region_show		= rtas_fadump_region_show,
+	.fadump_trigger			= rtas_fadump_trigger,
+};
+
+int __init rtas_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
+{
+	const __be32 *sections;
+	int i, num_sections;
+	int size;
+	const __be32 *token;
+
+	/*
+	 * Check if Firmware Assisted dump is supported. if yes, check
+	 * if dump has been initiated on last reboot.
+	 */
+	token = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL);
+	if (!token)
+		return 1;
+
+	fadump_conf->ibm_configure_kernel_dump = be32_to_cpu(*token);
+	fadump_conf->ops		= &rtas_fadump_ops;
+	fadump_conf->fadump_supported	= 1;
+
+	/* Get the sizes required to store dump data for the firmware provided
+	 * dump sections.
+	 * For each dump section type supported, a 32bit cell which defines
+	 * the ID of a supported section followed by two 32 bit cells which
+	 * gives the size of the section in bytes.
+	 */
+	sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
+					&size);
+
+	if (!sections)
+		return 1;
+
+	num_sections = size / (3 * sizeof(u32));
+
+	for (i = 0; i < num_sections; i++, sections += 3) {
+		u32 type = (u32)of_read_number(sections, 1);
+
+		switch (type) {
+		case RTAS_FADUMP_CPU_STATE_DATA:
+			fadump_conf->cpu_state_data_size =
+					of_read_ulong(&sections[1], 2);
+			break;
+		case RTAS_FADUMP_HPTE_REGION:
+			fadump_conf->hpte_region_size =
+					of_read_ulong(&sections[1], 2);
+			break;
+		}
+	}
+
+	return 1;
+}


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 06/31] pseries/fadump: define register/un-register callback functions
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (4 preceding siblings ...)
  2019-08-20 12:04 ` [PATCH v5 05/31] pseries/fadump: introduce callbacks for platform specific operations Hari Bathini
@ 2019-08-20 12:04 ` Hari Bathini
  2019-09-03 11:10   ` Michael Ellerman
  2019-08-20 12:04 ` [PATCH v5 07/31] powerpc/fadump: release all the memory above boot memory size Hari Bathini
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Make RTAS calls to register and un-register for FADump. Also, update
how fadump_region contents are diplayed to provide more information.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump-common.h          |    2 
 arch/powerpc/kernel/fadump.c                 |  165 ++------------------------
 arch/powerpc/platforms/pseries/rtas-fadump.c |  163 +++++++++++++++++++++++++-
 3 files changed, 177 insertions(+), 153 deletions(-)

diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index 898d0e8..d2c5b16 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -92,6 +92,8 @@ struct fw_dump {
 
 	unsigned long	cpu_state_data_size;
 	unsigned long	hpte_region_size;
+
+	unsigned long	boot_mem_dest_addr;
 	unsigned long	boot_memory_size;
 
 	unsigned long	fadumphdr_addr;
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 99d5def..5f5bc37 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -36,7 +36,6 @@
 #include "../platforms/pseries/rtas-fadump.h"
 
 static struct fw_dump fw_dump;
-static struct rtas_fadump_mem_struct fdm;
 static const struct rtas_fadump_mem_struct *fdm_active;
 
 static DEFINE_MUTEX(fadump_mutex);
@@ -179,61 +178,6 @@ static void fadump_show_config(void)
 	pr_debug("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
 }
 
-static unsigned long init_fadump_mem_struct(struct rtas_fadump_mem_struct *fdm,
-				unsigned long addr)
-{
-	if (!fdm)
-		return 0;
-
-	memset(fdm, 0, sizeof(struct rtas_fadump_mem_struct));
-	addr = addr & PAGE_MASK;
-
-	fdm->header.dump_format_version = cpu_to_be32(0x00000001);
-	fdm->header.dump_num_sections = cpu_to_be16(3);
-	fdm->header.dump_status_flag = 0;
-	fdm->header.offset_first_dump_section =
-		cpu_to_be32((u32)offsetof(struct rtas_fadump_mem_struct, cpu_state_data));
-
-	/*
-	 * Fields for disk dump option.
-	 * We are not using disk dump option, hence set these fields to 0.
-	 */
-	fdm->header.dd_block_size = 0;
-	fdm->header.dd_block_offset = 0;
-	fdm->header.dd_num_blocks = 0;
-	fdm->header.dd_offset_disk_path = 0;
-
-	/* set 0 to disable an automatic dump-reboot. */
-	fdm->header.max_time_auto = 0;
-
-	/* Kernel dump sections */
-	/* cpu state data section. */
-	fdm->cpu_state_data.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
-	fdm->cpu_state_data.source_data_type = cpu_to_be16(RTAS_FADUMP_CPU_STATE_DATA);
-	fdm->cpu_state_data.source_address = 0;
-	fdm->cpu_state_data.source_len = cpu_to_be64(fw_dump.cpu_state_data_size);
-	fdm->cpu_state_data.destination_address = cpu_to_be64(addr);
-	addr += fw_dump.cpu_state_data_size;
-
-	/* hpte region section */
-	fdm->hpte_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
-	fdm->hpte_region.source_data_type = cpu_to_be16(RTAS_FADUMP_HPTE_REGION);
-	fdm->hpte_region.source_address = 0;
-	fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size);
-	fdm->hpte_region.destination_address = cpu_to_be64(addr);
-	addr += fw_dump.hpte_region_size;
-
-	/* RMA region section */
-	fdm->rmr_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
-	fdm->rmr_region.source_data_type = cpu_to_be16(RTAS_FADUMP_REAL_MODE_REGION);
-	fdm->rmr_region.source_address = cpu_to_be64(RMA_START);
-	fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size);
-	fdm->rmr_region.destination_address = cpu_to_be64(addr);
-	addr += fw_dump.boot_memory_size;
-
-	return addr;
-}
-
 /**
  * fadump_calculate_reserve_size(): reserve variable boot area 5% of System RAM
  *
@@ -496,61 +440,6 @@ static int __init early_fadump_reserve_mem(char *p)
 }
 early_param("fadump_reserve_mem", early_fadump_reserve_mem);
 
-static int register_fw_dump(struct rtas_fadump_mem_struct *fdm)
-{
-	int rc, err;
-	unsigned int wait_time;
-
-	pr_debug("Registering for firmware-assisted kernel dump...\n");
-
-	/* TODO: Add upper time limit for the delay */
-	do {
-		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
-			FADUMP_REGISTER, fdm,
-			sizeof(struct rtas_fadump_mem_struct));
-
-		wait_time = rtas_busy_delay_time(rc);
-		if (wait_time)
-			mdelay(wait_time);
-
-	} while (wait_time);
-
-	err = -EIO;
-	switch (rc) {
-	default:
-		pr_err("Failed to register. Unknown Error(%d).\n", rc);
-		break;
-	case -1:
-		printk(KERN_ERR "Failed to register firmware-assisted kernel"
-			" dump. Hardware Error(%d).\n", rc);
-		break;
-	case -3:
-		if (!is_fadump_boot_mem_contiguous(&fw_dump))
-			pr_err("Can't have holes in boot memory area while registering fadump\n");
-		else if (!is_fadump_reserved_mem_contiguous(&fw_dump))
-			pr_err("Can't have holes in reserved memory area while"
-			       " registering fadump\n");
-
-		printk(KERN_ERR "Failed to register firmware-assisted kernel"
-			" dump. Parameter Error(%d).\n", rc);
-		err = -EINVAL;
-		break;
-	case -9:
-		printk(KERN_ERR "firmware-assisted kernel dump is already "
-			" registered.");
-		fw_dump.dump_registered = 1;
-		err = -EEXIST;
-		break;
-	case 0:
-		printk(KERN_INFO "firmware-assisted kernel dump registration"
-			" is successful\n");
-		fw_dump.dump_registered = 1;
-		err = 0;
-		break;
-	}
-	return err;
-}
-
 void crash_fadump(struct pt_regs *regs, const char *str)
 {
 	struct fadump_crash_info_header *fdh = NULL;
@@ -593,8 +482,7 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 
 	fdh->online_mask = *cpu_online_mask;
 
-	/* Call ibm,os-term rtas call to trigger firmware assisted dump */
-	rtas_os_term((char *)str);
+	fw_dump.ops->fadump_trigger(fdh, str);
 }
 
 #define GPR_MASK	0xffffff0000000000
@@ -1003,7 +891,7 @@ static int fadump_setup_crash_memory_ranges(void)
 static inline unsigned long fadump_relocate(unsigned long paddr)
 {
 	if (paddr > RMA_START && paddr < fw_dump.boot_memory_size)
-		return be64_to_cpu(fdm.rmr_region.destination_address) + paddr;
+		return fw_dump.boot_mem_dest_addr + paddr;
 	else
 		return paddr;
 }
@@ -1076,7 +964,7 @@ static int fadump_create_elfcore_headers(char *bufp)
 			 * to the specified destination_address. Hence set
 			 * the correct offset.
 			 */
-			phdr->p_offset = be64_to_cpu(fdm.rmr_region.destination_address);
+			phdr->p_offset = fw_dump.boot_mem_dest_addr;
 		}
 
 		phdr->p_paddr = mbase;
@@ -1128,7 +1016,8 @@ static int register_fadump(void)
 	if (ret)
 		return ret;
 
-	addr = be64_to_cpu(fdm.rmr_region.destination_address) + be64_to_cpu(fdm.rmr_region.source_len);
+	addr = fw_dump.fadumphdr_addr;
+
 	/* Initialize fadump crash info header. */
 	addr = init_fadump_header(addr);
 	vaddr = __va(addr);
@@ -1137,34 +1026,8 @@ static int register_fadump(void)
 	fadump_create_elfcore_headers(vaddr);
 
 	/* register the future kernel dump with firmware. */
-	return register_fw_dump(&fdm);
-}
-
-static int fadump_unregister_dump(struct rtas_fadump_mem_struct *fdm)
-{
-	int rc = 0;
-	unsigned int wait_time;
-
-	pr_debug("Un-register firmware-assisted dump\n");
-
-	/* TODO: Add upper time limit for the delay */
-	do {
-		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
-			FADUMP_UNREGISTER, fdm,
-			sizeof(struct rtas_fadump_mem_struct));
-
-		wait_time = rtas_busy_delay_time(rc);
-		if (wait_time)
-			mdelay(wait_time);
-	} while (wait_time);
-
-	if (rc) {
-		printk(KERN_ERR "Failed to un-register firmware-assisted dump."
-			" unexpected error(%d).\n", rc);
-		return rc;
-	}
-	fw_dump.dump_registered = 0;
-	return 0;
+	pr_debug("Registering for firmware-assisted kernel dump...\n");
+	return fw_dump.ops->fadump_register(&fw_dump);
 }
 
 static int fadump_invalidate_dump(const struct rtas_fadump_mem_struct *fdm)
@@ -1202,7 +1065,7 @@ void fadump_cleanup(void)
 		fadump_invalidate_dump(fdm_active);
 	} else if (fw_dump.dump_registered) {
 		/* Un-register Firmware-assisted dump if it was registered. */
-		fadump_unregister_dump(&fdm);
+		fw_dump.ops->fadump_unregister(&fw_dump);
 		free_crash_memory_ranges();
 	}
 }
@@ -1312,7 +1175,7 @@ static void fadump_invalidate_release_mem(void)
 		fw_dump.cpu_notes_buf_size = 0;
 	}
 	/* Initialize the kernel dump memory structure for FAD registration. */
-	init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+	fw_dump.ops->fadump_init_mem_struct(&fw_dump);
 }
 
 static ssize_t fadump_release_memory_store(struct kobject *kobj,
@@ -1377,12 +1240,13 @@ static ssize_t fadump_register_store(struct kobject *kobj,
 			goto unlock_out;
 		}
 		/* Un-register Firmware-assisted dump */
-		fadump_unregister_dump(&fdm);
+		pr_debug("Un-register firmware-assisted dump\n");
+		fw_dump.ops->fadump_unregister(&fw_dump);
 		break;
 	case 1:
 		if (fw_dump.dump_registered == 1) {
 			/* Un-register Firmware-assisted dump */
-			fadump_unregister_dump(&fdm);
+			fw_dump.ops->fadump_unregister(&fw_dump);
 		}
 		/* Register Firmware-assisted dump */
 		ret = register_fadump();
@@ -1409,7 +1273,8 @@ static int fadump_region_show(struct seq_file *m, void *private)
 		fdm_ptr = fdm_active;
 	else {
 		mutex_unlock(&fadump_mutex);
-		fdm_ptr = &fdm;
+		fw_dump.ops->fadump_region_show(&fw_dump, m);
+		return 0;
 	}
 
 	seq_printf(m,
@@ -1530,7 +1395,7 @@ int __init setup_fadump(void)
 	}
 	/* Initialize the kernel dump memory structure for FAD registration. */
 	else if (fw_dump.reserve_dump_area_size)
-		init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+		fw_dump.ops->fadump_init_mem_struct(&fw_dump);
 	fadump_init_files();
 
 	return 1;
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c
index b77d738..4cfac04 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -26,19 +26,152 @@
 #include "../../kernel/fadump-common.h"
 #include "rtas-fadump.h"
 
+static struct rtas_fadump_mem_struct fdm;
+
+static void rtas_fadump_update_config(struct fw_dump *fadump_conf,
+				      const struct rtas_fadump_mem_struct *fdm)
+{
+	fadump_conf->boot_mem_dest_addr =
+		be64_to_cpu(fdm->rmr_region.destination_address);
+
+	fadump_conf->fadumphdr_addr = (fadump_conf->boot_mem_dest_addr +
+				       fadump_conf->boot_memory_size);
+}
+
 static ulong rtas_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 {
-	return fadump_conf->reserve_dump_area_start;
+	ulong addr = fadump_conf->reserve_dump_area_start;
+
+	memset(&fdm, 0, sizeof(struct rtas_fadump_mem_struct));
+	addr = addr & PAGE_MASK;
+
+	fdm.header.dump_format_version = cpu_to_be32(0x00000001);
+	fdm.header.dump_num_sections = cpu_to_be16(3);
+	fdm.header.dump_status_flag = 0;
+	fdm.header.offset_first_dump_section =
+		cpu_to_be32((u32)offsetof(struct rtas_fadump_mem_struct,
+					  cpu_state_data));
+
+	/*
+	 * Fields for disk dump option.
+	 * We are not using disk dump option, hence set these fields to 0.
+	 */
+	fdm.header.dd_block_size = 0;
+	fdm.header.dd_block_offset = 0;
+	fdm.header.dd_num_blocks = 0;
+	fdm.header.dd_offset_disk_path = 0;
+
+	/* set 0 to disable an automatic dump-reboot. */
+	fdm.header.max_time_auto = 0;
+
+	/* Kernel dump sections */
+	/* cpu state data section. */
+	fdm.cpu_state_data.request_flag =
+		cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
+	fdm.cpu_state_data.source_data_type =
+		cpu_to_be16(RTAS_FADUMP_CPU_STATE_DATA);
+	fdm.cpu_state_data.source_address = 0;
+	fdm.cpu_state_data.source_len =
+		cpu_to_be64(fadump_conf->cpu_state_data_size);
+	fdm.cpu_state_data.destination_address = cpu_to_be64(addr);
+	addr += fadump_conf->cpu_state_data_size;
+
+	/* hpte region section */
+	fdm.hpte_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
+	fdm.hpte_region.source_data_type =
+		cpu_to_be16(RTAS_FADUMP_HPTE_REGION);
+	fdm.hpte_region.source_address = 0;
+	fdm.hpte_region.source_len =
+		cpu_to_be64(fadump_conf->hpte_region_size);
+	fdm.hpte_region.destination_address = cpu_to_be64(addr);
+	addr += fadump_conf->hpte_region_size;
+
+	/* RMA region section */
+	fdm.rmr_region.request_flag = cpu_to_be32(RTAS_FADUMP_REQUEST_FLAG);
+	fdm.rmr_region.source_data_type =
+		cpu_to_be16(RTAS_FADUMP_REAL_MODE_REGION);
+	fdm.rmr_region.source_address = cpu_to_be64(RMA_START);
+	fdm.rmr_region.source_len =
+		cpu_to_be64(fadump_conf->boot_memory_size);
+	fdm.rmr_region.destination_address = cpu_to_be64(addr);
+	addr += fadump_conf->boot_memory_size;
+
+	rtas_fadump_update_config(fadump_conf, &fdm);
+
+	return addr;
 }
 
 static int rtas_fadump_register(struct fw_dump *fadump_conf)
 {
-	return -EIO;
+	int rc, err = -EIO;
+	unsigned int wait_time;
+
+	/* TODO: Add upper time limit for the delay */
+	do {
+		rc =  rtas_call(fadump_conf->ibm_configure_kernel_dump, 3, 1,
+				NULL, FADUMP_REGISTER, &fdm,
+				sizeof(struct rtas_fadump_mem_struct));
+
+		wait_time = rtas_busy_delay_time(rc);
+		if (wait_time)
+			mdelay(wait_time);
+
+	} while (wait_time);
+
+	switch (rc) {
+	case 0:
+		pr_info("Registration is successful!\n");
+		fadump_conf->dump_registered = 1;
+		err = 0;
+		break;
+	case -1:
+		pr_err("Failed to register. Hardware Error(%d).\n", rc);
+		break;
+	case -3:
+		if (!is_fadump_boot_mem_contiguous(fadump_conf))
+			pr_err("Can't have holes in boot memory area.\n");
+		else if (!is_fadump_reserved_mem_contiguous(fadump_conf))
+			pr_err("Can't have holes in reserved memory area.\n");
+
+		pr_err("Failed to register. Parameter Error(%d).\n", rc);
+		err = -EINVAL;
+		break;
+	case -9:
+		pr_err("Already registered!\n");
+		fadump_conf->dump_registered = 1;
+		err = -EEXIST;
+		break;
+	default:
+		pr_err("Failed to register. Unknown Error(%d).\n", rc);
+		break;
+	}
+
+	return err;
 }
 
 static int rtas_fadump_unregister(struct fw_dump *fadump_conf)
 {
-	return -EIO;
+	int rc;
+	unsigned int wait_time;
+
+	/* TODO: Add upper time limit for the delay */
+	do {
+		rc =  rtas_call(fadump_conf->ibm_configure_kernel_dump, 3, 1,
+				NULL, FADUMP_UNREGISTER, &fdm,
+				sizeof(struct rtas_fadump_mem_struct));
+
+		wait_time = rtas_busy_delay_time(rc);
+		if (wait_time)
+			mdelay(wait_time);
+	} while (wait_time);
+
+	if (rc) {
+		pr_err("Failed to un-register - unexpected error(%d).\n", rc);
+		return -EIO;
+	}
+
+	fadump_conf->dump_registered = 0;
+	return 0;
 }
 
 static int rtas_fadump_invalidate(struct fw_dump *fadump_conf)
@@ -58,6 +191,30 @@ static int __init rtas_fadump_process(struct fw_dump *fadump_conf)
 static void rtas_fadump_region_show(struct fw_dump *fadump_conf,
 				    struct seq_file *m)
 {
+	const struct rtas_fadump_mem_struct *fdm_ptr = &fdm;
+	const struct rtas_fadump_section *cpu_data_section;
+
+	cpu_data_section = &(fdm_ptr->cpu_state_data);
+	seq_printf(m, "CPU :[%#016llx-%#016llx] %#llx bytes, Dumped: %#llx\n",
+		   be64_to_cpu(cpu_data_section->destination_address),
+		   be64_to_cpu(cpu_data_section->destination_address) +
+		   be64_to_cpu(cpu_data_section->source_len) - 1,
+		   be64_to_cpu(cpu_data_section->source_len),
+		   be64_to_cpu(cpu_data_section->bytes_dumped));
+
+	seq_printf(m, "HPTE:[%#016llx-%#016llx] %#llx bytes, Dumped: %#llx\n",
+		   be64_to_cpu(fdm_ptr->hpte_region.destination_address),
+		   be64_to_cpu(fdm_ptr->hpte_region.destination_address) +
+		   be64_to_cpu(fdm_ptr->hpte_region.source_len) - 1,
+		   be64_to_cpu(fdm_ptr->hpte_region.source_len),
+		   be64_to_cpu(fdm_ptr->hpte_region.bytes_dumped));
+
+	seq_printf(m, "DUMP: Src: %#016llx, Dest: %#016llx, ",
+		   be64_to_cpu(fdm_ptr->rmr_region.source_address),
+		   be64_to_cpu(fdm_ptr->rmr_region.destination_address));
+	seq_printf(m, "Size: %#llx, Dumped: %#llx bytes\n",
+		   be64_to_cpu(fdm_ptr->rmr_region.source_len),
+		   be64_to_cpu(fdm_ptr->rmr_region.bytes_dumped));
 }
 
 static void rtas_fadump_trigger(struct fadump_crash_info_header *fdh,


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 07/31] powerpc/fadump: release all the memory above boot memory size
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (5 preceding siblings ...)
  2019-08-20 12:04 ` [PATCH v5 06/31] pseries/fadump: define register/un-register callback functions Hari Bathini
@ 2019-08-20 12:04 ` Hari Bathini
  2019-09-03 11:10   ` Michael Ellerman
  2019-08-20 12:05 ` [PATCH v5 08/31] pseries/fadump: move out platform specific support from generic code Hari Bathini
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Except for reserve dump area which is permanent reserved, all memory
above boot memory size is released when the dump is invalidated. Make
this a bit more explicit in the code.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   34 ++++++++++------------------------
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 5f5bc37..f26ab58 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -341,6 +341,8 @@ int __init fadump_reserve_mem(void)
 	else
 		memory_boundary = memblock_end_of_DRAM();
 
+	size = get_fadump_area_size();
+	fw_dump.reserve_dump_area_size = size;
 	if (fw_dump.dump_active) {
 		pr_info("Firmware-assisted dump is active.\n");
 
@@ -366,12 +368,15 @@ int __init fadump_reserve_mem(void)
 				be64_to_cpu(fdm_active->rmr_region.destination_address) +
 				be64_to_cpu(fdm_active->rmr_region.source_len);
 		pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr);
-		fw_dump.reserve_dump_area_start = base;
-		fw_dump.reserve_dump_area_size = size;
-	} else {
-		size = get_fadump_area_size();
 
 		/*
+		 * Start address of reserve dump area (permanent reservation)
+		 * for re-registering FADump after dump capture.
+		 */
+		fw_dump.reserve_dump_area_start =
+			be64_to_cpu(fdm_active->cpu_state_data.destination_address);
+	} else {
+		/*
 		 * Reserve memory at an offset closer to bottom of the RAM to
 		 * minimize the impact of memory hot-remove operation. We can't
 		 * use memblock_find_in_range() here since it doesn't allocate
@@ -397,7 +402,6 @@ int __init fadump_reserve_mem(void)
 			(unsigned long)(memblock_phys_mem_size() >> 20));
 
 		fw_dump.reserve_dump_area_start = base;
-		fw_dump.reserve_dump_area_size = size;
 		return fadump_cma_init();
 	}
 	return 1;
@@ -1139,34 +1143,16 @@ static void fadump_release_memory(unsigned long begin, unsigned long end)
 
 static void fadump_invalidate_release_mem(void)
 {
-	unsigned long reserved_area_start, reserved_area_end;
-	unsigned long destination_address;
-
 	mutex_lock(&fadump_mutex);
 	if (!fw_dump.dump_active) {
 		mutex_unlock(&fadump_mutex);
 		return;
 	}
 
-	destination_address = be64_to_cpu(fdm_active->cpu_state_data.destination_address);
 	fadump_cleanup();
 	mutex_unlock(&fadump_mutex);
 
-	/*
-	 * Save the current reserved memory bounds we will require them
-	 * later for releasing the memory for general use.
-	 */
-	reserved_area_start = fw_dump.reserve_dump_area_start;
-	reserved_area_end = reserved_area_start +
-			fw_dump.reserve_dump_area_size;
-	/*
-	 * Setup reserve_dump_area_start and its size so that we can
-	 * reuse this reserved memory for Re-registration.
-	 */
-	fw_dump.reserve_dump_area_start = destination_address;
-	fw_dump.reserve_dump_area_size = get_fadump_area_size();
-
-	fadump_release_memory(reserved_area_start, reserved_area_end);
+	fadump_release_memory(fw_dump.boot_memory_size, memblock_end_of_DRAM());
 	if (fw_dump.cpu_notes_buf) {
 		fadump_cpu_notes_buf_free(
 				(unsigned long)__va(fw_dump.cpu_notes_buf),


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 08/31] pseries/fadump: move out platform specific support from generic code
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (6 preceding siblings ...)
  2019-08-20 12:04 ` [PATCH v5 07/31] powerpc/fadump: release all the memory above boot memory size Hari Bathini
@ 2019-08-20 12:05 ` Hari Bathini
  2019-08-20 12:05 ` [PATCH v5 09/31] powerpc/fadump: use FADump instead of fadump for how it is pronounced Hari Bathini
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Move code that supports processing the crash'ed kernel's memory
preserved by firmware to platform specific callback functions.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump.c                 |  343 +-------------------------
 arch/powerpc/platforms/pseries/rtas-fadump.c |  280 +++++++++++++++++++++
 2 files changed, 294 insertions(+), 329 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index f26ab58..f7c8073 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -28,15 +28,12 @@
 #include <asm/debugfs.h>
 #include <asm/page.h>
 #include <asm/prom.h>
-#include <asm/rtas.h>
 #include <asm/fadump.h>
 #include <asm/setup.h>
 
 #include "fadump-common.h"
-#include "../platforms/pseries/rtas-fadump.h"
 
 static struct fw_dump fw_dump;
-static const struct rtas_fadump_mem_struct *fdm_active;
 
 static DEFINE_MUTEX(fadump_mutex);
 struct fad_crash_memory_ranges *crash_memory_ranges;
@@ -111,22 +108,13 @@ static int __init fadump_cma_init(void) { return 1; }
 int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
 				      int depth, void *data)
 {
-	int ret;
-
-	if (depth != 1 || strcmp(uname, "rtas") != 0)
+	if (depth != 1)
 		return 0;
 
-	ret = rtas_fadump_dt_scan(&fw_dump, node);
+	if (strcmp(uname, "rtas") == 0)
+		return rtas_fadump_dt_scan(&fw_dump, node);
 
-	/*
-	 * The 'ibm,kernel-dump' rtas node is present only if there is
-	 * dump data waiting for us.
-	 */
-	fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
-	if (fdm_active)
-		fw_dump.dump_active = 1;
-
-	return ret;
+	return 0;
 }
 
 /*
@@ -308,9 +296,7 @@ int __init fadump_reserve_mem(void)
 	 * If dump is active then we have already calculated the size during
 	 * first kernel.
 	 */
-	if (fdm_active)
-		fw_dump.boot_memory_size = be64_to_cpu(fdm_active->rmr_region.source_len);
-	else {
+	if (!fw_dump.dump_active) {
 		fw_dump.boot_memory_size = fadump_calculate_reserve_size();
 #ifdef CONFIG_CMA
 		if (!fw_dump.nocma)
@@ -364,17 +350,9 @@ int __init fadump_reserve_mem(void)
 		size = memory_boundary - base;
 		fadump_reserve_crash_area(base, size);
 
-		fw_dump.fadumphdr_addr =
-				be64_to_cpu(fdm_active->rmr_region.destination_address) +
-				be64_to_cpu(fdm_active->rmr_region.source_len);
-		pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr);
-
-		/*
-		 * Start address of reserve dump area (permanent reservation)
-		 * for re-registering FADump after dump capture.
-		 */
-		fw_dump.reserve_dump_area_start =
-			be64_to_cpu(fdm_active->cpu_state_data.destination_address);
+		pr_debug("fadumphdr_addr = %#016lx\n", fw_dump.fadumphdr_addr);
+		pr_debug("Reserve dump area start address: 0x%lx\n",
+			 fw_dump.reserve_dump_area_start);
 	} else {
 		/*
 		 * Reserve memory at an offset closer to bottom of the RAM to
@@ -489,218 +467,6 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 	fw_dump.ops->fadump_trigger(fdh, str);
 }
 
-#define GPR_MASK	0xffffff0000000000
-static inline int fadump_gpr_index(u64 id)
-{
-	int i = -1;
-	char str[3];
-
-	if ((id & GPR_MASK) == fadump_str_to_u64("GPR")) {
-		/* get the digits at the end */
-		id &= ~GPR_MASK;
-		id >>= 24;
-		str[2] = '\0';
-		str[1] = id & 0xff;
-		str[0] = (id >> 8) & 0xff;
-		sscanf(str, "%d", &i);
-		if (i > 31)
-			i = -1;
-	}
-	return i;
-}
-
-static inline void fadump_set_regval(struct pt_regs *regs, u64 reg_id,
-								u64 reg_val)
-{
-	int i;
-
-	i = fadump_gpr_index(reg_id);
-	if (i >= 0)
-		regs->gpr[i] = (unsigned long)reg_val;
-	else if (reg_id == fadump_str_to_u64("NIA"))
-		regs->nip = (unsigned long)reg_val;
-	else if (reg_id == fadump_str_to_u64("MSR"))
-		regs->msr = (unsigned long)reg_val;
-	else if (reg_id == fadump_str_to_u64("CTR"))
-		regs->ctr = (unsigned long)reg_val;
-	else if (reg_id == fadump_str_to_u64("LR"))
-		regs->link = (unsigned long)reg_val;
-	else if (reg_id == fadump_str_to_u64("XER"))
-		regs->xer = (unsigned long)reg_val;
-	else if (reg_id == fadump_str_to_u64("CR"))
-		regs->ccr = (unsigned long)reg_val;
-	else if (reg_id == fadump_str_to_u64("DAR"))
-		regs->dar = (unsigned long)reg_val;
-	else if (reg_id == fadump_str_to_u64("DSISR"))
-		regs->dsisr = (unsigned long)reg_val;
-}
-
-static struct rtas_fadump_reg_entry*
-fadump_read_registers(struct rtas_fadump_reg_entry *reg_entry, struct pt_regs *regs)
-{
-	memset(regs, 0, sizeof(struct pt_regs));
-
-	while (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUEND")) {
-		fadump_set_regval(regs, be64_to_cpu(reg_entry->reg_id),
-					be64_to_cpu(reg_entry->reg_value));
-		reg_entry++;
-	}
-	reg_entry++;
-	return reg_entry;
-}
-
-/*
- * Read CPU state dump data and convert it into ELF notes.
- * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be
- * used to access the data to allow for additional fields to be added without
- * affecting compatibility. Each list of registers for a CPU starts with
- * "CPUSTRT" and ends with "CPUEND". Each register entry is of 16 bytes,
- * 8 Byte ASCII identifier and 8 Byte register value. The register entry
- * with identifier "CPUSTRT" and "CPUEND" contains 4 byte cpu id as part
- * of register value. For more details refer to PAPR document.
- *
- * Only for the crashing cpu we ignore the CPU dump data and get exact
- * state from fadump crash info structure populated by first kernel at the
- * time of crash.
- */
-static int __init fadump_build_cpu_notes(const struct rtas_fadump_mem_struct *fdm)
-{
-	struct rtas_fadump_reg_save_area_header *reg_header;
-	struct rtas_fadump_reg_entry *reg_entry;
-	struct fadump_crash_info_header *fdh = NULL;
-	void *vaddr;
-	unsigned long addr;
-	u32 num_cpus, *note_buf;
-	struct pt_regs regs;
-	int i, rc = 0, cpu = 0;
-
-	if (!fdm->cpu_state_data.bytes_dumped)
-		return -EINVAL;
-
-	addr = be64_to_cpu(fdm->cpu_state_data.destination_address);
-	vaddr = __va(addr);
-
-	reg_header = vaddr;
-	if (be64_to_cpu(reg_header->magic_number) !=
-	    fadump_str_to_u64("REGSAVE")) {
-		printk(KERN_ERR "Unable to read register save area.\n");
-		return -ENOENT;
-	}
-	pr_debug("--------CPU State Data------------\n");
-	pr_debug("Magic Number: %llx\n", be64_to_cpu(reg_header->magic_number));
-	pr_debug("NumCpuOffset: %x\n", be32_to_cpu(reg_header->num_cpu_offset));
-
-	vaddr += be32_to_cpu(reg_header->num_cpu_offset);
-	num_cpus = be32_to_cpu(*((__be32 *)(vaddr)));
-	pr_debug("NumCpus     : %u\n", num_cpus);
-	vaddr += sizeof(u32);
-	reg_entry = (struct rtas_fadump_reg_entry *)vaddr;
-
-	/* Allocate buffer to hold cpu crash notes. */
-	fw_dump.cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
-	fw_dump.cpu_notes_buf_size = PAGE_ALIGN(fw_dump.cpu_notes_buf_size);
-	note_buf = fadump_cpu_notes_buf_alloc(fw_dump.cpu_notes_buf_size);
-	if (!note_buf) {
-		printk(KERN_ERR "Failed to allocate 0x%lx bytes for "
-			"cpu notes buffer\n", fw_dump.cpu_notes_buf_size);
-		return -ENOMEM;
-	}
-	fw_dump.cpu_notes_buf = __pa(note_buf);
-
-	pr_debug("Allocated buffer for cpu notes of size %ld at %p\n",
-			(num_cpus * sizeof(note_buf_t)), note_buf);
-
-	if (fw_dump.fadumphdr_addr)
-		fdh = __va(fw_dump.fadumphdr_addr);
-
-	for (i = 0; i < num_cpus; i++) {
-		if (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUSTRT")) {
-			printk(KERN_ERR "Unable to read CPU state data\n");
-			rc = -ENOENT;
-			goto error_out;
-		}
-		/* Lower 4 bytes of reg_value contains logical cpu id */
-		cpu = be64_to_cpu(reg_entry->reg_value) & RTAS_FADUMP_CPU_ID_MASK;
-		if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) {
-			RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry);
-			continue;
-		}
-		pr_debug("Reading register data for cpu %d...\n", cpu);
-		if (fdh && fdh->crashing_cpu == cpu) {
-			regs = fdh->regs;
-			note_buf = fadump_regs_to_elf_notes(note_buf, &regs);
-			RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry);
-		} else {
-			reg_entry++;
-			reg_entry = fadump_read_registers(reg_entry, &regs);
-			note_buf = fadump_regs_to_elf_notes(note_buf, &regs);
-		}
-	}
-	final_note(note_buf);
-
-	if (fdh) {
-		addr = fdh->elfcorehdr_addr;
-		pr_debug("Updating elfcore header(%lx) with cpu notes\n", addr);
-		fadump_update_elfcore_header(&fw_dump, (char *)__va(addr));
-	}
-	return 0;
-
-error_out:
-	fadump_cpu_notes_buf_free((unsigned long)__va(fw_dump.cpu_notes_buf),
-					fw_dump.cpu_notes_buf_size);
-	fw_dump.cpu_notes_buf = 0;
-	fw_dump.cpu_notes_buf_size = 0;
-	return rc;
-
-}
-
-/*
- * Validate and process the dump data stored by firmware before exporting
- * it through '/proc/vmcore'.
- */
-static int __init process_fadump(const struct rtas_fadump_mem_struct *fdm_active)
-{
-	struct fadump_crash_info_header *fdh;
-	int rc = 0;
-
-	if (!fdm_active || !fw_dump.fadumphdr_addr)
-		return -EINVAL;
-
-	/* Check if the dump data is valid. */
-	if ((be16_to_cpu(fdm_active->header.dump_status_flag) == RTAS_FADUMP_ERROR_FLAG) ||
-			(fdm_active->cpu_state_data.error_flags != 0) ||
-			(fdm_active->rmr_region.error_flags != 0)) {
-		printk(KERN_ERR "Dump taken by platform is not valid\n");
-		return -EINVAL;
-	}
-	if ((fdm_active->rmr_region.bytes_dumped !=
-			fdm_active->rmr_region.source_len) ||
-			!fdm_active->cpu_state_data.bytes_dumped) {
-		printk(KERN_ERR "Dump taken by platform is incomplete\n");
-		return -EINVAL;
-	}
-
-	/* Validate the fadump crash info header */
-	fdh = __va(fw_dump.fadumphdr_addr);
-	if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
-		printk(KERN_ERR "Crash info header is not valid.\n");
-		return -EINVAL;
-	}
-
-	rc = fadump_build_cpu_notes(fdm_active);
-	if (rc)
-		return rc;
-
-	/*
-	 * We are done validating dump info and elfcore header is now ready
-	 * to be exported. set elfcorehdr_addr so that vmcore module will
-	 * export the elfcore header through '/proc/vmcore'.
-	 */
-	elfcorehdr_addr = fdh->elfcorehdr_addr;
-
-	return 0;
-}
-
 static void free_crash_memory_ranges(void)
 {
 	kfree(crash_memory_ranges);
@@ -990,7 +756,6 @@ static unsigned long init_fadump_header(unsigned long addr)
 	if (!addr)
 		return 0;
 
-	fw_dump.fadumphdr_addr = addr;
 	fdh = __va(addr);
 	addr += sizeof(struct fadump_crash_info_header);
 
@@ -1034,39 +799,12 @@ static int register_fadump(void)
 	return fw_dump.ops->fadump_register(&fw_dump);
 }
 
-static int fadump_invalidate_dump(const struct rtas_fadump_mem_struct *fdm)
-{
-	int rc = 0;
-	unsigned int wait_time;
-
-	pr_debug("Invalidating firmware-assisted dump registration\n");
-
-	/* TODO: Add upper time limit for the delay */
-	do {
-		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
-			FADUMP_INVALIDATE, fdm,
-			sizeof(struct rtas_fadump_mem_struct));
-
-		wait_time = rtas_busy_delay_time(rc);
-		if (wait_time)
-			mdelay(wait_time);
-	} while (wait_time);
-
-	if (rc) {
-		pr_err("Failed to invalidate firmware-assisted dump registration. Unexpected error (%d).\n", rc);
-		return rc;
-	}
-	fw_dump.dump_active = 0;
-	fdm_active = NULL;
-	return 0;
-}
-
 void fadump_cleanup(void)
 {
 	/* Invalidate the registration only if dump is active. */
 	if (fw_dump.dump_active) {
-		/* pass the same memory dump structure provided by platform */
-		fadump_invalidate_dump(fdm_active);
+		pr_debug("Invalidating firmware-assisted dump registration\n");
+		fw_dump.ops->fadump_invalidate(&fw_dump);
 	} else if (fw_dump.dump_registered) {
 		/* Un-register Firmware-assisted dump if it was registered. */
 		fw_dump.ops->fadump_unregister(&fw_dump);
@@ -1160,6 +898,7 @@ static void fadump_invalidate_release_mem(void)
 		fw_dump.cpu_notes_buf = 0;
 		fw_dump.cpu_notes_buf_size = 0;
 	}
+
 	/* Initialize the kernel dump memory structure for FAD registration. */
 	fw_dump.ops->fadump_init_mem_struct(&fw_dump);
 }
@@ -1212,7 +951,7 @@ static ssize_t fadump_register_store(struct kobject *kobj,
 	int ret = 0;
 	int input = -1;
 
-	if (!fw_dump.fadump_enabled || fdm_active)
+	if (!fw_dump.fadump_enabled || fw_dump.dump_active)
 		return -EPERM;
 
 	if (kstrtoint(buf, 0, &input))
@@ -1225,6 +964,7 @@ static ssize_t fadump_register_store(struct kobject *kobj,
 		if (fw_dump.dump_registered == 0) {
 			goto unlock_out;
 		}
+
 		/* Un-register Firmware-assisted dump */
 		pr_debug("Un-register firmware-assisted dump\n");
 		fw_dump.ops->fadump_unregister(&fw_dump);
@@ -1249,63 +989,13 @@ static ssize_t fadump_register_store(struct kobject *kobj,
 
 static int fadump_region_show(struct seq_file *m, void *private)
 {
-	const struct rtas_fadump_mem_struct *fdm_ptr;
-
 	if (!fw_dump.fadump_enabled)
 		return 0;
 
 	mutex_lock(&fadump_mutex);
-	if (fdm_active)
-		fdm_ptr = fdm_active;
-	else {
-		mutex_unlock(&fadump_mutex);
-		fw_dump.ops->fadump_region_show(&fw_dump, m);
-		return 0;
-	}
+	fw_dump.ops->fadump_region_show(&fw_dump, m);
+	mutex_unlock(&fadump_mutex);
 
-	seq_printf(m,
-			"CPU : [%#016llx-%#016llx] %#llx bytes, "
-			"Dumped: %#llx\n",
-			be64_to_cpu(fdm_ptr->cpu_state_data.destination_address),
-			be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) +
-			be64_to_cpu(fdm_ptr->cpu_state_data.source_len) - 1,
-			be64_to_cpu(fdm_ptr->cpu_state_data.source_len),
-			be64_to_cpu(fdm_ptr->cpu_state_data.bytes_dumped));
-	seq_printf(m,
-			"HPTE: [%#016llx-%#016llx] %#llx bytes, "
-			"Dumped: %#llx\n",
-			be64_to_cpu(fdm_ptr->hpte_region.destination_address),
-			be64_to_cpu(fdm_ptr->hpte_region.destination_address) +
-			be64_to_cpu(fdm_ptr->hpte_region.source_len) - 1,
-			be64_to_cpu(fdm_ptr->hpte_region.source_len),
-			be64_to_cpu(fdm_ptr->hpte_region.bytes_dumped));
-	seq_printf(m,
-			"DUMP: [%#016llx-%#016llx] %#llx bytes, "
-			"Dumped: %#llx\n",
-			be64_to_cpu(fdm_ptr->rmr_region.destination_address),
-			be64_to_cpu(fdm_ptr->rmr_region.destination_address) +
-			be64_to_cpu(fdm_ptr->rmr_region.source_len) - 1,
-			be64_to_cpu(fdm_ptr->rmr_region.source_len),
-			be64_to_cpu(fdm_ptr->rmr_region.bytes_dumped));
-
-	if (!fdm_active ||
-		(fw_dump.reserve_dump_area_start ==
-		be64_to_cpu(fdm_ptr->cpu_state_data.destination_address)))
-		goto out;
-
-	/* Dump is active. Show reserved memory region. */
-	seq_printf(m,
-			"    : [%#016llx-%#016llx] %#llx bytes, "
-			"Dumped: %#llx\n",
-			(unsigned long long)fw_dump.reserve_dump_area_start,
-			be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) - 1,
-			be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) -
-			fw_dump.reserve_dump_area_start,
-			be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) -
-			fw_dump.reserve_dump_area_start);
-out:
-	if (fdm_active)
-		mutex_unlock(&fadump_mutex);
 	return 0;
 }
 
@@ -1376,12 +1066,13 @@ int __init setup_fadump(void)
 		 * if dump process fails then invalidate the registration
 		 * and release memory before proceeding for re-registration.
 		 */
-		if (process_fadump(fdm_active) < 0)
+		if (fw_dump.ops->fadump_process(&fw_dump) < 0)
 			fadump_invalidate_release_mem();
 	}
 	/* Initialize the kernel dump memory structure for FAD registration. */
 	else if (fw_dump.reserve_dump_area_size)
 		fw_dump.ops->fadump_init_mem_struct(&fw_dump);
+
 	fadump_init_files();
 
 	return 1;
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c
index 4cfac04..2b94392 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -27,6 +27,7 @@
 #include "rtas-fadump.h"
 
 static struct rtas_fadump_mem_struct fdm;
+static const struct rtas_fadump_mem_struct *fdm_active;
 
 static void rtas_fadump_update_config(struct fw_dump *fadump_conf,
 				      const struct rtas_fadump_mem_struct *fdm)
@@ -38,6 +39,25 @@ static void rtas_fadump_update_config(struct fw_dump *fadump_conf,
 				       fadump_conf->boot_memory_size);
 }
 
+/*
+ * This function is called in the capture kernel to get configuration details
+ * setup in the first kernel and passed to the f/w.
+ */
+static void rtas_fadump_get_config(struct fw_dump *fadump_conf,
+				   const struct rtas_fadump_mem_struct *fdm)
+{
+	fadump_conf->boot_memory_size = be64_to_cpu(fdm->rmr_region.source_len);
+
+	/*
+	 * Start address of reserve dump area (permanent reservation) for
+	 * re-registering FADump after dump capture.
+	 */
+	fadump_conf->reserve_dump_area_start =
+		be64_to_cpu(fdm->cpu_state_data.destination_address);
+
+	rtas_fadump_update_config(fadump_conf, fdm);
+}
+
 static ulong rtas_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 {
 	ulong addr = fadump_conf->reserve_dump_area_start;
@@ -176,7 +196,196 @@ static int rtas_fadump_unregister(struct fw_dump *fadump_conf)
 
 static int rtas_fadump_invalidate(struct fw_dump *fadump_conf)
 {
-	return -EIO;
+	int rc;
+	unsigned int wait_time;
+
+	/* TODO: Add upper time limit for the delay */
+	do {
+		rc =  rtas_call(fadump_conf->ibm_configure_kernel_dump, 3, 1,
+				NULL, FADUMP_INVALIDATE, fdm_active,
+				sizeof(struct rtas_fadump_mem_struct));
+
+		wait_time = rtas_busy_delay_time(rc);
+		if (wait_time)
+			mdelay(wait_time);
+	} while (wait_time);
+
+	if (rc) {
+		pr_err("Failed to invalidate - unexpected error (%d).\n", rc);
+		return -EIO;
+	}
+
+	fadump_conf->dump_active = 0;
+	fdm_active = NULL;
+	return 0;
+}
+
+#define RTAS_FADUMP_GPR_MASK			0xffffff0000000000
+static inline int rtas_fadump_gpr_index(u64 id)
+{
+	int i = -1;
+	char str[3];
+
+	if ((id & RTAS_FADUMP_GPR_MASK) == fadump_str_to_u64("GPR")) {
+		/* get the digits at the end */
+		id &= ~RTAS_FADUMP_GPR_MASK;
+		id >>= 24;
+		str[2] = '\0';
+		str[1] = id & 0xff;
+		str[0] = (id >> 8) & 0xff;
+		if (kstrtoint(str, 10, &i))
+			i = -EINVAL;
+		if (i > 31)
+			i = -1;
+	}
+	return i;
+}
+
+void rtas_fadump_set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val)
+{
+	int i;
+
+	i = rtas_fadump_gpr_index(reg_id);
+	if (i >= 0)
+		regs->gpr[i] = (unsigned long)reg_val;
+	else if (reg_id == fadump_str_to_u64("NIA"))
+		regs->nip = (unsigned long)reg_val;
+	else if (reg_id == fadump_str_to_u64("MSR"))
+		regs->msr = (unsigned long)reg_val;
+	else if (reg_id == fadump_str_to_u64("CTR"))
+		regs->ctr = (unsigned long)reg_val;
+	else if (reg_id == fadump_str_to_u64("LR"))
+		regs->link = (unsigned long)reg_val;
+	else if (reg_id == fadump_str_to_u64("XER"))
+		regs->xer = (unsigned long)reg_val;
+	else if (reg_id == fadump_str_to_u64("CR"))
+		regs->ccr = (unsigned long)reg_val;
+	else if (reg_id == fadump_str_to_u64("DAR"))
+		regs->dar = (unsigned long)reg_val;
+	else if (reg_id == fadump_str_to_u64("DSISR"))
+		regs->dsisr = (unsigned long)reg_val;
+}
+
+static struct rtas_fadump_reg_entry*
+rtas_fadump_read_regs(struct rtas_fadump_reg_entry *reg_entry,
+		      struct pt_regs *regs)
+{
+	memset(regs, 0, sizeof(struct pt_regs));
+
+	while (be64_to_cpu(reg_entry->reg_id) != fadump_str_to_u64("CPUEND")) {
+		rtas_fadump_set_regval(regs, be64_to_cpu(reg_entry->reg_id),
+				       be64_to_cpu(reg_entry->reg_value));
+		reg_entry++;
+	}
+	reg_entry++;
+	return reg_entry;
+}
+
+/*
+ * Read CPU state dump data and convert it into ELF notes.
+ * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be
+ * used to access the data to allow for additional fields to be added without
+ * affecting compatibility. Each list of registers for a CPU starts with
+ * "CPUSTRT" and ends with "CPUEND". Each register entry is of 16 bytes,
+ * 8 Byte ASCII identifier and 8 Byte register value. The register entry
+ * with identifier "CPUSTRT" and "CPUEND" contains 4 byte cpu id as part
+ * of register value. For more details refer to PAPR document.
+ *
+ * Only for the crashing cpu we ignore the CPU dump data and get exact
+ * state from fadump crash info structure populated by first kernel at the
+ * time of crash.
+ */
+static int __init rtas_fadump_build_cpu_notes(struct fw_dump *fadump_conf)
+{
+	struct rtas_fadump_reg_save_area_header *reg_header;
+	struct rtas_fadump_reg_entry *reg_entry;
+	struct fadump_crash_info_header *fdh = NULL;
+	void *vaddr;
+	unsigned long addr;
+	u32 num_cpus, *note_buf;
+	struct pt_regs regs;
+	int i, rc = 0, cpu = 0;
+
+	addr = be64_to_cpu(fdm_active->cpu_state_data.destination_address);
+	vaddr = __va(addr);
+
+	reg_header = vaddr;
+	if (be64_to_cpu(reg_header->magic_number) !=
+	    fadump_str_to_u64("REGSAVE")) {
+		pr_err("Unable to read register save area.\n");
+		return -ENOENT;
+	}
+
+	pr_debug("--------CPU State Data------------\n");
+	pr_debug("Magic Number: %llx\n", be64_to_cpu(reg_header->magic_number));
+	pr_debug("NumCpuOffset: %x\n", be32_to_cpu(reg_header->num_cpu_offset));
+
+	vaddr += be32_to_cpu(reg_header->num_cpu_offset);
+	num_cpus = be32_to_cpu(*((__be32 *)(vaddr)));
+	pr_debug("NumCpus     : %u\n", num_cpus);
+	vaddr += sizeof(u32);
+	reg_entry = (struct rtas_fadump_reg_entry *)vaddr;
+
+	/* Allocate buffer to hold cpu crash notes. */
+	fadump_conf->cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
+	fadump_conf->cpu_notes_buf_size =
+		PAGE_ALIGN(fadump_conf->cpu_notes_buf_size);
+	note_buf = fadump_cpu_notes_buf_alloc(fadump_conf->cpu_notes_buf_size);
+	if (!note_buf) {
+		pr_err("Failed to allocate 0x%lx bytes for cpu notes buffer\n",
+		       fadump_conf->cpu_notes_buf_size);
+		return -ENOMEM;
+	}
+	fadump_conf->cpu_notes_buf = __pa(note_buf);
+
+	pr_debug("Allocated buffer for cpu notes of size %ld at %p\n",
+			(num_cpus * sizeof(note_buf_t)), note_buf);
+
+	if (fadump_conf->fadumphdr_addr)
+		fdh = __va(fadump_conf->fadumphdr_addr);
+
+	for (i = 0; i < num_cpus; i++) {
+		if (be64_to_cpu(reg_entry->reg_id) !=
+		    fadump_str_to_u64("CPUSTRT")) {
+			pr_err("Unable to read CPU state data\n");
+			rc = -ENOENT;
+			goto error_out;
+		}
+		/* Lower 4 bytes of reg_value contains logical cpu id */
+		cpu = (be64_to_cpu(reg_entry->reg_value) &
+		       RTAS_FADUMP_CPU_ID_MASK);
+		if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) {
+			RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry);
+			continue;
+		}
+		pr_debug("Reading register data for cpu %d...\n", cpu);
+		if (fdh && fdh->crashing_cpu == cpu) {
+			regs = fdh->regs;
+			note_buf = fadump_regs_to_elf_notes(note_buf, &regs);
+			RTAS_FADUMP_SKIP_TO_NEXT_CPU(reg_entry);
+		} else {
+			reg_entry++;
+			reg_entry = rtas_fadump_read_regs(reg_entry, &regs);
+			note_buf = fadump_regs_to_elf_notes(note_buf, &regs);
+		}
+	}
+	final_note(note_buf);
+
+	if (fdh) {
+		pr_debug("Updating elfcore header (%llx) with cpu notes\n",
+			 fdh->elfcorehdr_addr);
+		fadump_update_elfcore_header(fadump_conf,
+					     __va(fdh->elfcorehdr_addr));
+	}
+	return 0;
+
+error_out:
+	fadump_cpu_notes_buf_free((ulong)__va(fadump_conf->cpu_notes_buf),
+				  fadump_conf->cpu_notes_buf_size);
+	fadump_conf->cpu_notes_buf = 0;
+	fadump_conf->cpu_notes_buf_size = 0;
+	return rc;
+
 }
 
 /*
@@ -185,15 +394,62 @@ static int rtas_fadump_invalidate(struct fw_dump *fadump_conf)
  */
 static int __init rtas_fadump_process(struct fw_dump *fadump_conf)
 {
-	return -EINVAL;
+	struct fadump_crash_info_header *fdh;
+	int rc = 0;
+
+	if (!fdm_active || !fadump_conf->fadumphdr_addr)
+		return -EINVAL;
+
+	/* Check if the dump data is valid. */
+	if ((be16_to_cpu(fdm_active->header.dump_status_flag) ==
+			RTAS_FADUMP_ERROR_FLAG) ||
+			(fdm_active->cpu_state_data.error_flags != 0) ||
+			(fdm_active->rmr_region.error_flags != 0)) {
+		pr_err("Dump taken by platform is not valid\n");
+		return -EINVAL;
+	}
+	if ((fdm_active->rmr_region.bytes_dumped !=
+			fdm_active->rmr_region.source_len) ||
+			!fdm_active->cpu_state_data.bytes_dumped) {
+		pr_err("Dump taken by platform is incomplete\n");
+		return -EINVAL;
+	}
+
+	/* Validate the fadump crash info header */
+	fdh = __va(fadump_conf->fadumphdr_addr);
+	if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
+		pr_err("Crash info header is not valid.\n");
+		return -EINVAL;
+	}
+
+	if (!fdm_active->cpu_state_data.bytes_dumped)
+		return -EINVAL;
+
+	rc = rtas_fadump_build_cpu_notes(fadump_conf);
+	if (rc)
+		return rc;
+
+	/*
+	 * We are done validating dump info and elfcore header is now ready
+	 * to be exported. set elfcorehdr_addr so that vmcore module will
+	 * export the elfcore header through '/proc/vmcore'.
+	 */
+	elfcorehdr_addr = fdh->elfcorehdr_addr;
+
+	return 0;
 }
 
 static void rtas_fadump_region_show(struct fw_dump *fadump_conf,
 				    struct seq_file *m)
 {
-	const struct rtas_fadump_mem_struct *fdm_ptr = &fdm;
+	const struct rtas_fadump_mem_struct *fdm_ptr;
 	const struct rtas_fadump_section *cpu_data_section;
 
+	if (fdm_active)
+		fdm_ptr = fdm_active;
+	else
+		fdm_ptr = &fdm;
+
 	cpu_data_section = &(fdm_ptr->cpu_state_data);
 	seq_printf(m, "CPU :[%#016llx-%#016llx] %#llx bytes, Dumped: %#llx\n",
 		   be64_to_cpu(cpu_data_section->destination_address),
@@ -215,6 +471,12 @@ static void rtas_fadump_region_show(struct fw_dump *fadump_conf,
 	seq_printf(m, "Size: %#llx, Dumped: %#llx bytes\n",
 		   be64_to_cpu(fdm_ptr->rmr_region.source_len),
 		   be64_to_cpu(fdm_ptr->rmr_region.bytes_dumped));
+
+	/* Dump is active. Show reserved area start address. */
+	if (fdm_active) {
+		seq_printf(m, "\nMemory above %#016lx is reserved for saving crash dump\n",
+			   fadump_conf->reserve_dump_area_start);
+	}
 }
 
 static void rtas_fadump_trigger(struct fadump_crash_info_header *fdh,
@@ -224,6 +486,7 @@ static void rtas_fadump_trigger(struct fadump_crash_info_header *fdh,
 	rtas_os_term((char *)msg);
 }
 
+
 static struct fadump_ops rtas_fadump_ops = {
 	.fadump_init_mem_struct		= rtas_fadump_init_mem_struct,
 	.fadump_register		= rtas_fadump_register,
@@ -253,6 +516,17 @@ int __init rtas_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 	fadump_conf->ops		= &rtas_fadump_ops;
 	fadump_conf->fadump_supported	= 1;
 
+	/*
+	 * The 'ibm,kernel-dump' rtas node is present only if there is
+	 * dump data waiting for us.
+	 */
+	fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
+	if (fdm_active) {
+		pr_info("Firmware-assisted dump is active.\n");
+		fadump_conf->dump_active = 1;
+		rtas_fadump_get_config(fadump_conf, (void *)__pa(fdm_active));
+	}
+
 	/* Get the sizes required to store dump data for the firmware provided
 	 * dump sections.
 	 * For each dump section type supported, a 32bit cell which defines


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 09/31] powerpc/fadump: use FADump instead of fadump for how it is pronounced
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (7 preceding siblings ...)
  2019-08-20 12:05 ` [PATCH v5 08/31] pseries/fadump: move out platform specific support from generic code Hari Bathini
@ 2019-08-20 12:05 ` Hari Bathini
  2019-08-20 12:05 ` [PATCH v5 10/31] opal: add MPIPL interface definitions Hari Bathini
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

fadump is pronounced f-a-dump. Update documentation accordingly. Also,
update how fadump_region contents look like with recent changes.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.rst |   55 ++++++++++++----------
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
index 563c021..d912755 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -9,18 +9,18 @@ a crashed system, and to do so from a fully-reset system, and
 to minimize the total elapsed time until the system is back
 in production use.
 
-- Firmware assisted dump (fadump) infrastructure is intended to replace
+- Firmware-Assisted Dump (FADump) infrastructure is intended to replace
   the existing phyp assisted dump.
 - Fadump uses the same firmware interfaces and memory reservation model
   as phyp assisted dump.
-- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore
+- Unlike phyp dump, FADump exports the memory dump through /proc/vmcore
   in the ELF format in the same way as kdump. This helps us reuse the
   kdump infrastructure for dump capture and filtering.
 - Unlike phyp dump, userspace tool does not need to refer any sysfs
   interface while reading /proc/vmcore.
-- Unlike phyp dump, fadump allows user to release all the memory reserved
+- Unlike phyp dump, FADump allows user to release all the memory reserved
   for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem.
-- Once enabled through kernel boot parameter, fadump can be
+- Once enabled through kernel boot parameter, FADump can be
   started/stopped through /sys/kernel/fadump_registered interface (see
   sysfs files section below) and can be easily integrated with kdump
   service start/stop init scripts.
@@ -34,7 +34,7 @@ dump offers several strong, practical advantages:
    in a clean, consistent state.
 -  Once the dump is copied out, the memory that held the dump
    is immediately available to the running kernel. And therefore,
-   unlike kdump, fadump doesn't need a 2nd reboot to get back
+   unlike kdump, FADump doesn't need a 2nd reboot to get back
    the system to the production configuration.
 
 The above can only be accomplished by coordination with,
@@ -63,7 +63,7 @@ as follows:
          boot successfully. For syntax of crashkernel= parameter,
          refer to Documentation/admin-guide/kdump/kdump.rst. If any offset is
          provided in crashkernel= parameter, it will be ignored
-         as fadump uses a predefined offset to reserve memory
+         as FADump uses a predefined offset to reserve memory
          for boot memory dump preservation in case of a crash.
 
 -  After the low memory (boot memory) area has been saved, the
@@ -123,7 +123,7 @@ blocking this significant chunk of memory from production kernel.
 Hence, the implementation uses the Linux kernel's Contiguous Memory
 Allocator (CMA) for memory reservation if CMA is configured for kernel.
 With CMA reservation this memory will be available for applications to
-use it, while kernel is prevented from using it. With this fadump will
+use it, while kernel is prevented from using it. With this FADump will
 still be able to capture all of the kernel memory and most of the user
 space memory except the user pages that were present in CMA region::
 
@@ -173,14 +173,14 @@ KDump, as dump mechanism.
 The tools to examine the dump will be same as the ones
 used for kdump.
 
-How to enable firmware-assisted dump (fadump):
+How to enable firmware-assisted dump (FADump):
 ----------------------------------------------
 
 1. Set config option CONFIG_FA_DUMP=y and build kernel.
 2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
-   By default, fadump reserved memory will be initialized as CMA area.
+   By default, FADump reserved memory will be initialized as CMA area.
    Alternatively, user can boot linux kernel with 'fadump=nocma' to
-   prevent fadump to use CMA.
+   prevent FADump to use CMA.
 3. Optionally, user can also set 'crashkernel=' kernel cmdline
    to specify size of the memory to reserve for boot memory dump
    preservation.
@@ -206,29 +206,29 @@ the control files and debugfs file to display memory reserved region.
 Here is the list of files under kernel sysfs:
 
  /sys/kernel/fadump_enabled
-    This is used to display the fadump status.
+    This is used to display the FADump status.
 
-    - 0 = fadump is disabled
-    - 1 = fadump is enabled
+    - 0 = FADump is disabled
+    - 1 = FADump is enabled
 
     This interface can be used by kdump init scripts to identify if
-    fadump is enabled in the kernel and act accordingly.
+    FADump is enabled in the kernel and act accordingly.
 
  /sys/kernel/fadump_registered
-    This is used to display the fadump registration status as well
-    as to control (start/stop) the fadump registration.
+    This is used to display the FADump registration status as well
+    as to control (start/stop) the FADump registration.
 
-    - 0 = fadump is not registered.
-    - 1 = fadump is registered and ready to handle system crash.
+    - 0 = FADump is not registered.
+    - 1 = FADump is registered and ready to handle system crash.
 
-    To register fadump echo 1 > /sys/kernel/fadump_registered and
+    To register FADump echo 1 > /sys/kernel/fadump_registered and
     echo 0 > /sys/kernel/fadump_registered for un-register and stop the
-    fadump. Once the fadump is un-registered, the system crash will not
+    FADump. Once the FADump is un-registered, the system crash will not
     be handled and vmcore will not be captured. This interface can be
     easily integrated with kdump service start/stop.
 
  /sys/kernel/fadump_release_mem
-    This file is available only when fadump is active during
+    This file is available only when FADump is active during
     second kernel. This is used to release the reserved memory
     region that are held for saving crash dump. To release the
     reserved memory echo 1 to it::
@@ -246,21 +246,25 @@ Here is the list of files under powerpc debugfs:
 (Assuming debugfs is mounted on /sys/kernel/debug directory.)
 
  /sys/kernel/debug/powerpc/fadump_region
-    This file shows the reserved memory regions if fadump is
+    This file shows the reserved memory regions if FADump is
     enabled otherwise this file is empty. The output format
     is::
 
       <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size>
 
+    and for kernel DUMP region is:
+
+    DUMP: Src: <src-addr>, Dest: <dest-addr>, Size: <size>, Dumped: # bytes
+
     e.g.
-    Contents when fadump is registered during first kernel::
+    Contents when FADump is registered during first kernel::
 
       # cat /sys/kernel/debug/powerpc/fadump_region
       CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0
       HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0
       DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0
 
-    Contents when fadump is active during second kernel::
+    Contents when FADump is active during second kernel::
 
       # cat /sys/kernel/debug/powerpc/fadump_region
       CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020
@@ -268,6 +272,7 @@ Here is the list of files under powerpc debugfs:
       DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000
           : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000
 
+
 NOTE:
       Please refer to Documentation/filesystems/debugfs.txt on
       how to mount the debugfs filesystem.
@@ -278,7 +283,7 @@ TODO:
  - Need to come up with the better approach to find out more
    accurate boot memory size that is required for a kernel to
    boot successfully when booted with restricted memory.
- - The fadump implementation introduces a fadump crash info structure
+ - The FADump implementation introduces a FADump crash info structure
    in the scratch area before the ELF core header. The idea of introducing
    this structure is to pass some important crash info data to the second
    kernel which will help second kernel to populate ELF core header with


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 10/31] opal: add MPIPL interface definitions
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (8 preceding siblings ...)
  2019-08-20 12:05 ` [PATCH v5 09/31] powerpc/fadump: use FADump instead of fadump for how it is pronounced Hari Bathini
@ 2019-08-20 12:05 ` Hari Bathini
  2019-09-03 11:10   ` Michael Ellerman
  2019-09-04 11:05   ` Michael Ellerman
  2019-08-20 12:05 ` [PATCH v5 11/31] powernv/fadump: add fadump support on powernv Hari Bathini
                   ` (20 subsequent siblings)
  30 siblings, 2 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/include/asm/opal-api.h        |   50 +++++++++++++++++++++++++++-
 arch/powerpc/include/asm/opal.h            |    6 +++
 arch/powerpc/platforms/powernv/opal-call.c |    3 ++
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 383242e..c8a5665 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -208,7 +208,10 @@
 #define OPAL_HANDLE_HMI2			166
 #define	OPAL_NX_COPROC_INIT			167
 #define OPAL_XIVE_GET_VP_STATE			170
-#define OPAL_LAST				170
+#define OPAL_MPIPL_UPDATE			173
+#define OPAL_MPIPL_REGISTER_TAG			174
+#define OPAL_MPIPL_QUERY_TAG			175
+#define OPAL_LAST				175
 
 #define QUIESCE_HOLD			1 /* Spin all calls at entry */
 #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */
@@ -980,6 +983,50 @@ struct opal_sg_list {
 };
 
 /*
+ * Firmware-Assisted Dump (FADump) using MPIPL
+ */
+
+/* MPIPL update operations */
+enum opal_mpipl_ops {
+	OPAL_MPIPL_ADD_RANGE			= 0,
+	OPAL_MPIPL_REMOVE_RANGE			= 1,
+	OPAL_MPIPL_REMOVE_ALL			= 2,
+	OPAL_MPIPL_FREE_PRESERVED_MEMORY	= 3,
+};
+
+/*
+ * Each tag maps to a metadata type. Use these tags to register/query
+ * corresponding metadata address with/from OPAL.
+ */
+enum opal_mpipl_tags {
+	OPAL_MPIPL_TAG_CPU		= 0,
+	OPAL_MPIPL_TAG_OPAL		= 1,
+	OPAL_MPIPL_TAG_KERNEL		= 2,
+	OPAL_MPIPL_TAG_BOOT_MEM		= 3,
+};
+
+/* Preserved memory details */
+struct opal_mpipl_region {
+	__be64	src;
+	__be64	dest;
+	__be64	size;
+};
+
+/* FADump structure format version */
+#define MPIPL_FADUMP_VERSION			0x01
+
+/* Metadata provided by OPAL. */
+struct opal_mpipl_fadump {
+	u8				version;
+	u8				reserved[7];
+	__be32				crashing_pir;
+	__be32				cpu_data_version;
+	__be32				cpu_data_size;
+	__be32				region_cnt;
+	struct opal_mpipl_region	region[];
+} __attribute__((packed));
+
+/*
  * Dump region ID range usable by the OS
  */
 #define OPAL_DUMP_REGION_HOST_START		0x80
@@ -1059,6 +1106,7 @@ enum {
 	OPAL_REBOOT_NORMAL		= 0,
 	OPAL_REBOOT_PLATFORM_ERROR	= 1,
 	OPAL_REBOOT_FULL_IPL		= 2,
+	OPAL_REBOOT_MPIPL		= 3,
 };
 
 /* Argument to OPAL_PCI_TCE_KILL */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 57bd029..878110a 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -39,6 +39,12 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
 				uint64_t PE_handle);
 int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
 			uint64_t rate_phys, uint32_t size);
+
+int64_t opal_mpipl_update(enum opal_mpipl_ops op, u64 src,
+			  u64 dest, u64 size);
+int64_t opal_mpipl_register_tag(enum opal_mpipl_tags tag, uint64_t addr);
+int64_t opal_mpipl_query_tag(enum opal_mpipl_tags tag, uint64_t *addr);
+
 int64_t opal_console_write(int64_t term_number, __be64 *length,
 			   const uint8_t *buffer);
 int64_t opal_console_read(int64_t term_number, __be64 *length,
diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
index 29ca523..fc8cc7c 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -287,3 +287,6 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,		OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_sensor_read_u64,			OPAL_SENSOR_READ_U64);
 OPAL_CALL(opal_sensor_group_enable,		OPAL_SENSOR_GROUP_ENABLE);
 OPAL_CALL(opal_nx_coproc_init,			OPAL_NX_COPROC_INIT);
+OPAL_CALL(opal_mpipl_update,			OPAL_MPIPL_UPDATE);
+OPAL_CALL(opal_mpipl_register_tag,		OPAL_MPIPL_REGISTER_TAG);
+OPAL_CALL(opal_mpipl_query_tag,			OPAL_MPIPL_QUERY_TAG);


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 11/31] powernv/fadump: add fadump support on powernv
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (9 preceding siblings ...)
  2019-08-20 12:05 ` [PATCH v5 10/31] opal: add MPIPL interface definitions Hari Bathini
@ 2019-08-20 12:05 ` Hari Bathini
  2019-09-03 11:10   ` Michael Ellerman
  2019-08-20 12:05 ` [PATCH v5 12/31] powernv/fadump: register kernel metadata address with opal Hari Bathini
                   ` (19 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Add basic callback functions for FADump on PowerNV platform.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/Kconfig                         |    5 +
 arch/powerpc/kernel/fadump-common.h          |    9 ++
 arch/powerpc/kernel/fadump.c                 |    3 +
 arch/powerpc/platforms/powernv/Makefile      |    1 
 arch/powerpc/platforms/powernv/opal-fadump.c |   97 ++++++++++++++++++++++++++
 5 files changed, 113 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-fadump.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d8dcd88..fc4ecfe 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -566,7 +566,7 @@ config CRASH_DUMP
 
 config FA_DUMP
 	bool "Firmware-assisted dump"
-	depends on PPC64 && PPC_RTAS
+	depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
 	select CRASH_CORE
 	select CRASH_DUMP
 	help
@@ -577,7 +577,8 @@ config FA_DUMP
 	  is meant to be a kdump replacement offering robustness and
 	  speed not possible without system firmware assistance.
 
-	  If unsure, say "N"
+	  If unsure, say "y". Only special kernels like petitboot may
+	  need to say "N" here.
 
 config IRQ_ALL_CPUS
 	bool "Distribute interrupts on all CPUs by default"
diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index d2c5b16..f6c52d3 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -140,4 +140,13 @@ static inline int rtas_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
 }
 #endif
 
+#ifdef CONFIG_PPC_POWERNV
+extern int opal_fadump_dt_scan(struct fw_dump *fadump_config, ulong node);
+#else
+static inline int opal_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
+{
+	return 1;
+}
+#endif
+
 #endif /* __PPC64_FA_DUMP_INTERNAL_H__ */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index f7c8073..b8061fb9 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -114,6 +114,9 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
 	if (strcmp(uname, "rtas") == 0)
 		return rtas_fadump_dt_scan(&fw_dump, node);
 
+	if (strcmp(uname, "ibm,opal") == 0)
+		return opal_fadump_dt_scan(&fw_dump, node);
+
 	return 0;
 }
 
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index da2e99e..43a6e1c 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,6 +6,7 @@ obj-y			+= opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
 obj-y			+= opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
 
 obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
+obj-$(CONFIG_FA_DUMP)	+= opal-fadump.o
 obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
 obj-$(CONFIG_CXL_BASE)	+= pci-cxl.o
 obj-$(CONFIG_EEH)	+= eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
new file mode 100644
index 0000000..e330877
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Firmware-Assisted Dump support on POWER platform (OPAL).
+ *
+ * Copyright 2019, IBM Corp.
+ * Author: Hari Bathini <hbathini@linux.ibm.com>
+ */
+
+#undef DEBUG
+#define pr_fmt(fmt) "opal fadump: " fmt
+
+#include <linux/string.h>
+#include <linux/seq_file.h>
+#include <linux/of_fdt.h>
+#include <linux/libfdt.h>
+
+#include <asm/opal.h>
+
+#include "../../kernel/fadump-common.h"
+
+static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
+{
+	return fadump_conf->reserve_dump_area_start;
+}
+
+static int opal_fadump_register(struct fw_dump *fadump_conf)
+{
+	return -EIO;
+}
+
+static int opal_fadump_unregister(struct fw_dump *fadump_conf)
+{
+	return -EIO;
+}
+
+static int opal_fadump_invalidate(struct fw_dump *fadump_conf)
+{
+	return -EIO;
+}
+
+static int __init opal_fadump_process(struct fw_dump *fadump_conf)
+{
+	return -EINVAL;
+}
+
+static void opal_fadump_region_show(struct fw_dump *fadump_conf,
+				    struct seq_file *m)
+{
+}
+
+static void opal_fadump_trigger(struct fadump_crash_info_header *fdh,
+				const char *msg)
+{
+	int rc;
+
+	rc = opal_cec_reboot2(OPAL_REBOOT_MPIPL, msg);
+	if (rc == OPAL_UNSUPPORTED) {
+		pr_emerg("Reboot type %d not supported.\n",
+			 OPAL_REBOOT_MPIPL);
+	} else if (rc == OPAL_HARDWARE)
+		pr_emerg("No backend support for MPIPL!\n");
+}
+
+static struct fadump_ops opal_fadump_ops = {
+	.fadump_init_mem_struct		= opal_fadump_init_mem_struct,
+	.fadump_register		= opal_fadump_register,
+	.fadump_unregister		= opal_fadump_unregister,
+	.fadump_invalidate		= opal_fadump_invalidate,
+	.fadump_process			= opal_fadump_process,
+	.fadump_region_show		= opal_fadump_region_show,
+	.fadump_trigger			= opal_fadump_trigger,
+};
+
+int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
+{
+	unsigned long dn;
+
+	/*
+	 * Check if Firmware-Assisted Dump is supported. if yes, check
+	 * if dump has been initiated on last reboot.
+	 */
+	dn = of_get_flat_dt_subnode_by_name(node, "dump");
+	if (dn == -FDT_ERR_NOTFOUND) {
+		pr_debug("FADump support is missing!\n");
+		return 1;
+	}
+
+	if (!of_flat_dt_is_compatible(dn, "ibm,opal-dump")) {
+		pr_err("Support missing for this f/w version!\n");
+		return 1;
+	}
+
+	fadump_conf->ops		= &opal_fadump_ops;
+	fadump_conf->fadump_supported	= 1;
+
+	return 1;
+}


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 12/31] powernv/fadump: register kernel metadata address with opal
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (10 preceding siblings ...)
  2019-08-20 12:05 ` [PATCH v5 11/31] powernv/fadump: add fadump support on powernv Hari Bathini
@ 2019-08-20 12:05 ` Hari Bathini
  2019-09-04 11:25   ` Michael Ellerman
  2019-08-20 12:05 ` [PATCH v5 13/31] powernv/fadump: reset metadata address during clean up Hari Bathini
                   ` (18 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

OPAL allows registering address with it in the first kernel and
retrieving it after MPIPL. Setup kernel metadata and register its
address with OPAL to use it for processing the crash dump.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump-common.h          |    4 +
 arch/powerpc/kernel/fadump.c                 |   53 +++++++++++-----
 arch/powerpc/platforms/powernv/opal-fadump.c |   86 ++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-fadump.h |   33 ++++++++++
 arch/powerpc/platforms/pseries/rtas-fadump.c |   18 +++++
 5 files changed, 174 insertions(+), 20 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-fadump.h

diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index f6c52d3..0acf412 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -100,6 +100,8 @@ struct fw_dump {
 	unsigned long	cpu_notes_buf;
 	unsigned long	cpu_notes_buf_size;
 
+	u64		kernel_metadata;
+
 	int		ibm_configure_kernel_dump;
 
 	unsigned long	fadump_enabled:1;
@@ -113,6 +115,8 @@ struct fw_dump {
 
 struct fadump_ops {
 	ulong	(*fadump_init_mem_struct)(struct fw_dump *fadump_config);
+	ulong	(*fadump_get_metadata_size)(void);
+	int	(*fadump_setup_metadata)(struct fw_dump *fadump_config);
 	int	(*fadump_register)(struct fw_dump *fadump_config);
 	int	(*fadump_unregister)(struct fw_dump *fadump_config);
 	int	(*fadump_invalidate)(struct fw_dump *fadump_config);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b8061fb9..a086a09 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -258,6 +258,9 @@ static unsigned long get_fadump_area_size(void)
 	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
 
 	size = PAGE_ALIGN(size);
+
+	/* This is to hold kernel metadata on platforms that support it */
+	size += fw_dump.ops->fadump_get_metadata_size();
 	return size;
 }
 
@@ -283,17 +286,17 @@ static void __init fadump_reserve_crash_area(unsigned long base,
 
 int __init fadump_reserve_mem(void)
 {
+	int ret = 1;
 	unsigned long base, size, memory_boundary;
 
 	if (!fw_dump.fadump_enabled)
 		return 0;
 
 	if (!fw_dump.fadump_supported) {
-		printk(KERN_INFO "Firmware-assisted dump is not supported on"
-				" this hardware\n");
-		fw_dump.fadump_enabled = 0;
-		return 0;
+		pr_info("Firmware-Assisted Dump is not supported on this hardware\n");
+		goto error_out;
 	}
+
 	/*
 	 * Initialize boot memory size
 	 * If dump is active then we have already calculated the size during
@@ -330,6 +333,7 @@ int __init fadump_reserve_mem(void)
 	else
 		memory_boundary = memblock_end_of_DRAM();
 
+	base = fw_dump.boot_memory_size;
 	size = get_fadump_area_size();
 	fw_dump.reserve_dump_area_size = size;
 	if (fw_dump.dump_active) {
@@ -349,7 +353,6 @@ int __init fadump_reserve_mem(void)
 		 * dump is written to disk by userspace tool. This memory
 		 * will be released for general use once the dump is saved.
 		 */
-		base = fw_dump.boot_memory_size;
 		size = memory_boundary - base;
 		fadump_reserve_crash_area(base, size);
 
@@ -363,29 +366,43 @@ int __init fadump_reserve_mem(void)
 		 * use memblock_find_in_range() here since it doesn't allocate
 		 * from bottom to top.
 		 */
-		for (base = fw_dump.boot_memory_size;
-		     base <= (memory_boundary - size);
-		     base += size) {
+		while (base <= (memory_boundary - size)) {
 			if (memblock_is_region_memory(base, size) &&
 			    !memblock_is_region_reserved(base, size))
 				break;
+
+			base += size;
 		}
-		if ((base > (memory_boundary - size)) ||
-		    memblock_reserve(base, size)) {
+
+		if (base > (memory_boundary - size)) {
+			pr_err("Failed to find memory chunk for reservation\n");
+			goto error_out;
+		}
+		fw_dump.reserve_dump_area_start = base;
+
+		/*
+		 * Calculate the kernel metadata address and register it with
+		 * f/w if the platform supports.
+		 */
+		if (fw_dump.ops->fadump_setup_metadata(&fw_dump) < 0)
+			goto error_out;
+
+		if (memblock_reserve(base, size)) {
 			pr_err("Failed to reserve memory\n");
-			return 0;
+			goto error_out;
 		}
 
-		pr_info("Reserved %ldMB of memory at %ldMB for firmware-"
-			"assisted dump (System RAM: %ldMB)\n",
-			(unsigned long)(size >> 20),
-			(unsigned long)(base >> 20),
+		pr_info("Reserved %ldMB of memory at %#016lx (System RAM: %ldMB)\n",
+			(unsigned long)(size >> 20), base,
 			(unsigned long)(memblock_phys_mem_size() >> 20));
 
-		fw_dump.reserve_dump_area_start = base;
-		return fadump_cma_init();
+		ret = fadump_cma_init();
 	}
-	return 1;
+
+	return ret;
+error_out:
+	fw_dump.fadump_enabled = 0;
+	return 0;
 }
 
 unsigned long __init arch_reserved_kernel_pages(void)
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index e330877..e5c4700 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -13,14 +13,86 @@
 #include <linux/seq_file.h>
 #include <linux/of_fdt.h>
 #include <linux/libfdt.h>
+#include <linux/mm.h>
 
+#include <asm/page.h>
 #include <asm/opal.h>
 
 #include "../../kernel/fadump-common.h"
+#include "opal-fadump.h"
+
+static struct opal_fadump_mem_struct *opal_fdm;
+
+/* Initialize kernel metadata */
+static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
+{
+	fdm->version = OPAL_FADUMP_VERSION;
+	fdm->region_cnt = 0;
+	fdm->registered_regions = 0;
+	fdm->fadumphdr_addr = 0;
+}
 
 static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 {
-	return fadump_conf->reserve_dump_area_start;
+	ulong addr = fadump_conf->reserve_dump_area_start;
+
+	opal_fdm = __va(fadump_conf->kernel_metadata);
+	opal_fadump_init_metadata(opal_fdm);
+
+	opal_fdm->region_cnt = 1;
+	opal_fdm->rgn[0].src	= RMA_START;
+	opal_fdm->rgn[0].dest	= addr;
+	opal_fdm->rgn[0].size	= fadump_conf->boot_memory_size;
+	addr += fadump_conf->boot_memory_size;
+
+	/*
+	 * Kernel metadata is passed to f/w and retrieved in capture kerenl.
+	 * So, use it to save fadump header address instead of calculating it.
+	 */
+	opal_fdm->fadumphdr_addr = (opal_fdm->rgn[0].dest +
+				    fadump_conf->boot_memory_size);
+
+	return addr;
+}
+
+static ulong opal_fadump_get_metadata_size(void)
+{
+	ulong size = sizeof(struct opal_fadump_mem_struct);
+
+	size = PAGE_ALIGN(size);
+	return size;
+}
+
+static int opal_fadump_setup_metadata(struct fw_dump *fadump_conf)
+{
+	int err = 0;
+	s64 ret;
+
+	/*
+	 * Use the last page(s) in FADump memory reservation for
+	 * kernel metadata.
+	 */
+	fadump_conf->kernel_metadata = (fadump_conf->reserve_dump_area_start +
+					fadump_conf->reserve_dump_area_size -
+					opal_fadump_get_metadata_size());
+	pr_info("Kernel metadata addr: %llx\n", fadump_conf->kernel_metadata);
+
+	/* Initialize kernel metadata before registering the address with f/w */
+	opal_fdm = __va(fadump_conf->kernel_metadata);
+	opal_fadump_init_metadata(opal_fdm);
+
+	/*
+	 * Register metadata address with f/w. Can be retrieved in
+	 * the capture kernel.
+	 */
+	ret = opal_mpipl_register_tag(OPAL_MPIPL_TAG_KERNEL,
+				      fadump_conf->kernel_metadata);
+	if (ret != OPAL_SUCCESS) {
+		pr_err("Failed to set kernel metadata tag!\n");
+		err = -EPERM;
+	}
+
+	return err;
 }
 
 static int opal_fadump_register(struct fw_dump *fadump_conf)
@@ -46,6 +118,16 @@ static int __init opal_fadump_process(struct fw_dump *fadump_conf)
 static void opal_fadump_region_show(struct fw_dump *fadump_conf,
 				    struct seq_file *m)
 {
+	int i;
+	const struct opal_fadump_mem_struct *fdm_ptr = opal_fdm;
+	u64 dumped_bytes = 0;
+
+	for (i = 0; i < fdm_ptr->region_cnt; i++) {
+		seq_printf(m, "DUMP: Src: %#016llx, Dest: %#016llx, ",
+			   fdm_ptr->rgn[i].src, fdm_ptr->rgn[i].dest);
+		seq_printf(m, "Size: %#llx, Dumped: %#llx bytes\n",
+			   fdm_ptr->rgn[i].size, dumped_bytes);
+	}
 }
 
 static void opal_fadump_trigger(struct fadump_crash_info_header *fdh,
@@ -63,6 +145,8 @@ static void opal_fadump_trigger(struct fadump_crash_info_header *fdh,
 
 static struct fadump_ops opal_fadump_ops = {
 	.fadump_init_mem_struct		= opal_fadump_init_mem_struct,
+	.fadump_get_metadata_size	= opal_fadump_get_metadata_size,
+	.fadump_setup_metadata		= opal_fadump_setup_metadata,
 	.fadump_register		= opal_fadump_register,
 	.fadump_unregister		= opal_fadump_unregister,
 	.fadump_invalidate		= opal_fadump_invalidate,
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h
new file mode 100644
index 0000000..19cac1f
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-fadump.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Firmware-Assisted Dump support on POWER platform (OPAL).
+ *
+ * Copyright 2019, IBM Corp.
+ * Author: Hari Bathini <hbathini@linux.ibm.com>
+ */
+
+#ifndef __PPC64_OPAL_FA_DUMP_H__
+#define __PPC64_OPAL_FA_DUMP_H__
+
+/* OPAL FADump structure format version */
+#define OPAL_FADUMP_VERSION			0x1
+
+/* Maximum number of memory regions kernel supports */
+#define OPAL_FADUMP_MAX_MEM_REGS		128
+
+/*
+ * FADump memory structure for storing kernel metadata needed to
+ * register-for/process crash dump. The address of this structure will
+ * be registered with f/w for retrieving during crash dump.
+ */
+struct opal_fadump_mem_struct {
+
+	u8	version;
+	u8	reserved[3];
+	u16	region_cnt;		/* number of regions */
+	u16	registered_regions;	/* Regions registered for MPIPL */
+	u64	fadumphdr_addr;
+	struct opal_mpipl_region	rgn[OPAL_FADUMP_MAX_MEM_REGS];
+} __attribute__((packed));
+
+#endif /* __PPC64_OPAL_FA_DUMP_H__ */
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c
index 2b94392..4111ee9 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -121,6 +121,21 @@ static ulong rtas_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 	return addr;
 }
 
+/*
+ * On this platform, the metadata struture is passed while registering
+ * for FADump and the same is returned by f/w in capture kernel.
+ * No additional provision to setup kernel metadata separately.
+ */
+static ulong rtas_fadump_get_metadata_size(void)
+{
+	return 0;
+}
+
+static int rtas_fadump_setup_metadata(struct fw_dump *fadump_conf)
+{
+	return 0;
+}
+
 static int rtas_fadump_register(struct fw_dump *fadump_conf)
 {
 	int rc, err = -EIO;
@@ -486,9 +501,10 @@ static void rtas_fadump_trigger(struct fadump_crash_info_header *fdh,
 	rtas_os_term((char *)msg);
 }
 
-
 static struct fadump_ops rtas_fadump_ops = {
 	.fadump_init_mem_struct		= rtas_fadump_init_mem_struct,
+	.fadump_get_metadata_size	= rtas_fadump_get_metadata_size,
+	.fadump_setup_metadata		= rtas_fadump_setup_metadata,
 	.fadump_register		= rtas_fadump_register,
 	.fadump_unregister		= rtas_fadump_unregister,
 	.fadump_invalidate		= rtas_fadump_invalidate,


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 13/31] powernv/fadump: reset metadata address during clean up
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (11 preceding siblings ...)
  2019-08-20 12:05 ` [PATCH v5 12/31] powernv/fadump: register kernel metadata address with opal Hari Bathini
@ 2019-08-20 12:05 ` Hari Bathini
  2019-08-27 12:00   ` Hari Bathini
  2019-08-20 12:05 ` [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions Hari Bathini
                   ` (17 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

During kexec boot, metadata address needs to be reset to avoid running
into errors interpreting stale metadata address, in case the kexec'ed
kernel crashes before metadata address could be setup again.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump-common.h          |    1 +
 arch/powerpc/kernel/fadump.c                 |    2 ++
 arch/powerpc/platforms/powernv/opal-fadump.c |   10 ++++++++++
 arch/powerpc/platforms/pseries/rtas-fadump.c |    3 +++
 4 files changed, 16 insertions(+)

diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index 0acf412..d2dd117 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -120,6 +120,7 @@ struct fadump_ops {
 	int	(*fadump_register)(struct fw_dump *fadump_config);
 	int	(*fadump_unregister)(struct fw_dump *fadump_config);
 	int	(*fadump_invalidate)(struct fw_dump *fadump_config);
+	void	(*fadump_cleanup)(struct fw_dump *fadump_config);
 	int	(*fadump_process)(struct fw_dump *fadump_config);
 	void	(*fadump_region_show)(struct fw_dump *fadump_config,
 				      struct seq_file *m);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index a086a09..b2d5ca6 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -830,6 +830,8 @@ void fadump_cleanup(void)
 		fw_dump.ops->fadump_unregister(&fw_dump);
 		free_crash_memory_ranges();
 	}
+
+	fw_dump.ops->fadump_cleanup(&fw_dump);
 }
 
 static void fadump_free_reserved_memory(unsigned long start_pfn,
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index e5c4700..e466f7e 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -110,6 +110,15 @@ static int opal_fadump_invalidate(struct fw_dump *fadump_conf)
 	return -EIO;
 }
 
+static void opal_fadump_cleanup(struct fw_dump *fadump_conf)
+{
+	s64 ret;
+
+	ret = opal_mpipl_register_tag(OPAL_MPIPL_TAG_KERNEL, 0);
+	if (ret != OPAL_SUCCESS)
+		pr_warn("Could not reset (%llu) kernel metadata tag!\n", ret);
+}
+
 static int __init opal_fadump_process(struct fw_dump *fadump_conf)
 {
 	return -EINVAL;
@@ -150,6 +159,7 @@ static struct fadump_ops opal_fadump_ops = {
 	.fadump_register		= opal_fadump_register,
 	.fadump_unregister		= opal_fadump_unregister,
 	.fadump_invalidate		= opal_fadump_invalidate,
+	.fadump_cleanup			= opal_fadump_cleanup,
 	.fadump_process			= opal_fadump_process,
 	.fadump_region_show		= opal_fadump_region_show,
 	.fadump_trigger			= opal_fadump_trigger,
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c
index 4111ee9..6164c5a 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -235,6 +235,8 @@ static int rtas_fadump_invalidate(struct fw_dump *fadump_conf)
 	return 0;
 }
 
+static void rtas_fadump_cleanup(struct fw_dump *fadump_conf) { }
+
 #define RTAS_FADUMP_GPR_MASK			0xffffff0000000000
 static inline int rtas_fadump_gpr_index(u64 id)
 {
@@ -508,6 +510,7 @@ static struct fadump_ops rtas_fadump_ops = {
 	.fadump_register		= rtas_fadump_register,
 	.fadump_unregister		= rtas_fadump_unregister,
 	.fadump_invalidate		= rtas_fadump_invalidate,
+	.fadump_cleanup			= rtas_fadump_cleanup,
 	.fadump_process			= rtas_fadump_process,
 	.fadump_region_show		= rtas_fadump_region_show,
 	.fadump_trigger			= rtas_fadump_trigger,


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (12 preceding siblings ...)
  2019-08-20 12:05 ` [PATCH v5 13/31] powernv/fadump: reset metadata address during clean up Hari Bathini
@ 2019-08-20 12:05 ` Hari Bathini
  2019-09-05  4:15   ` Michael Ellerman
  2019-09-05  7:23   ` Michael Ellerman
  2019-08-20 12:05 ` [PATCH v5 15/31] powernv/fadump: support copying multiple kernel boot memory regions Hari Bathini
                   ` (16 subsequent siblings)
  30 siblings, 2 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Make OPAL calls to register and un-register with firmware for MPIPL.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-fadump.c |   79 +++++++++++++++++++++++++-
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index e466f7e..91fb909 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -23,6 +23,22 @@
 
 static struct opal_fadump_mem_struct *opal_fdm;
 
+static int opal_fadump_unregister(struct fw_dump *fadump_conf);
+
+static void opal_fadump_update_config(struct fw_dump *fadump_conf,
+				      const struct opal_fadump_mem_struct *fdm)
+{
+	/*
+	 * The destination address of the first boot memory region is the
+	 * destination address of boot memory regions.
+	 */
+	fadump_conf->boot_mem_dest_addr = fdm->rgn[0].dest;
+	pr_debug("Destination address of boot memory regions: %#016lx\n",
+		 fadump_conf->boot_mem_dest_addr);
+
+	fadump_conf->fadumphdr_addr = fdm->fadumphdr_addr;
+}
+
 /* Initialize kernel metadata */
 static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
 {
@@ -52,6 +68,8 @@ static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 	opal_fdm->fadumphdr_addr = (opal_fdm->rgn[0].dest +
 				    fadump_conf->boot_memory_size);
 
+	opal_fadump_update_config(fadump_conf, opal_fdm);
+
 	return addr;
 }
 
@@ -97,12 +115,69 @@ static int opal_fadump_setup_metadata(struct fw_dump *fadump_conf)
 
 static int opal_fadump_register(struct fw_dump *fadump_conf)
 {
-	return -EIO;
+	int i, err = -EIO;
+	s64 rc;
+
+	for (i = 0; i < opal_fdm->region_cnt; i++) {
+		rc = opal_mpipl_update(OPAL_MPIPL_ADD_RANGE,
+				       opal_fdm->rgn[i].src,
+				       opal_fdm->rgn[i].dest,
+				       opal_fdm->rgn[i].size);
+		if (rc != OPAL_SUCCESS)
+			break;
+
+		opal_fdm->registered_regions++;
+	}
+
+	switch (rc) {
+	case OPAL_SUCCESS:
+		pr_info("Registration is successful!\n");
+		fadump_conf->dump_registered = 1;
+		err = 0;
+		break;
+	case OPAL_RESOURCE:
+		/* If MAX regions limit in f/w is hit, warn and proceed. */
+		pr_warn("%d regions could not be registered for MPIPL as MAX limit is reached!\n",
+			(opal_fdm->region_cnt - opal_fdm->registered_regions));
+		fadump_conf->dump_registered = 1;
+		err = 0;
+		break;
+	case OPAL_PARAMETER:
+		pr_err("Failed to register. Parameter Error(%lld).\n", rc);
+		break;
+	case OPAL_HARDWARE:
+		pr_err("Support not available.\n");
+		fadump_conf->fadump_supported = 0;
+		fadump_conf->fadump_enabled = 0;
+		break;
+	default:
+		pr_err("Failed to register. Unknown Error(%lld).\n", rc);
+		break;
+	}
+
+	/*
+	 * If some regions were registered before OPAL_MPIPL_ADD_RANGE
+	 * OPAL call failed, unregister all regions.
+	 */
+	if ((err < 0) && (opal_fdm->registered_regions > 0))
+		opal_fadump_unregister(fadump_conf);
+
+	return err;
 }
 
 static int opal_fadump_unregister(struct fw_dump *fadump_conf)
 {
-	return -EIO;
+	s64 rc;
+
+	rc = opal_mpipl_update(OPAL_MPIPL_REMOVE_ALL, 0, 0, 0);
+	if (rc) {
+		pr_err("Failed to un-register - unexpected Error(%lld).\n", rc);
+		return -EIO;
+	}
+
+	opal_fdm->registered_regions = 0;
+	fadump_conf->dump_registered = 0;
+	return 0;
 }
 
 static int opal_fadump_invalidate(struct fw_dump *fadump_conf)


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 15/31] powernv/fadump: support copying multiple kernel boot memory regions
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (13 preceding siblings ...)
  2019-08-20 12:05 ` [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions Hari Bathini
@ 2019-08-20 12:05 ` Hari Bathini
  2019-09-04 11:30   ` Michael Ellerman
  2019-08-20 12:06 ` [PATCH v5 16/31] powernv/fadump: process the crashdump by exporting it as /proc/vmcore Hari Bathini
                   ` (15 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Firmware uses 32-bit field for region size while copying/backing-up
memory during MPIPL. So, the maximum copy size for a region would
be a page less than 4GB (aligned to pagesize) but FADump capture
kernel usually needs more memory than that to be preserved to avoid
running into out of memory errors.

So, request firmware to copy multiple kernel boot memory regions
instead of just one (which worked fine for pseries as 64-bit field
was used for size there).

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-fadump.c |   35 +++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 91fb909..a755705 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -28,6 +28,8 @@ static int opal_fadump_unregister(struct fw_dump *fadump_conf);
 static void opal_fadump_update_config(struct fw_dump *fadump_conf,
 				      const struct opal_fadump_mem_struct *fdm)
 {
+	pr_debug("Boot memory regions count: %d\n", fdm->region_cnt);
+
 	/*
 	 * The destination address of the first boot memory region is the
 	 * destination address of boot memory regions.
@@ -50,16 +52,35 @@ static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
 
 static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 {
-	ulong addr = fadump_conf->reserve_dump_area_start;
+	ulong src_addr, dest_addr;
+	int max_copy_size, cur_size, size;
 
 	opal_fdm = __va(fadump_conf->kernel_metadata);
 	opal_fadump_init_metadata(opal_fdm);
 
-	opal_fdm->region_cnt = 1;
-	opal_fdm->rgn[0].src	= RMA_START;
-	opal_fdm->rgn[0].dest	= addr;
-	opal_fdm->rgn[0].size	= fadump_conf->boot_memory_size;
-	addr += fadump_conf->boot_memory_size;
+	/*
+	 * Firmware currently supports only 32-bit value for size,
+	 * align it to pagesize and request firmware to copy multiple
+	 * kernel boot memory regions.
+	 */
+	max_copy_size = _ALIGN_DOWN(U32_MAX, PAGE_SIZE);
+
+	/* Boot memory regions */
+	src_addr = RMA_START;
+	dest_addr = fadump_conf->reserve_dump_area_start;
+	size = fadump_conf->boot_memory_size;
+	while (size) {
+		cur_size = size > max_copy_size ? max_copy_size : size;
+
+		opal_fdm->rgn[opal_fdm->region_cnt].src  = src_addr;
+		opal_fdm->rgn[opal_fdm->region_cnt].dest = dest_addr;
+		opal_fdm->rgn[opal_fdm->region_cnt].size = cur_size;
+
+		opal_fdm->region_cnt++;
+		dest_addr	+= cur_size;
+		src_addr	+= cur_size;
+		size		-= cur_size;
+	}
 
 	/*
 	 * Kernel metadata is passed to f/w and retrieved in capture kerenl.
@@ -70,7 +91,7 @@ static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 
 	opal_fadump_update_config(fadump_conf, opal_fdm);
 
-	return addr;
+	return dest_addr;
 }
 
 static ulong opal_fadump_get_metadata_size(void)


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 16/31] powernv/fadump: process the crashdump by exporting it as /proc/vmcore
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (14 preceding siblings ...)
  2019-08-20 12:05 ` [PATCH v5 15/31] powernv/fadump: support copying multiple kernel boot memory regions Hari Bathini
@ 2019-08-20 12:06 ` Hari Bathini
  2019-09-04 11:42   ` Michael Ellerman
  2019-08-20 12:06 ` [PATCH v5 17/31] powernv/fadump: Warn before processing partial crashdump Hari Bathini
                   ` (14 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Add support in the kernel to process the crash'ed kernel's memory
preserved during MPIPL and export it as /proc/vmcore file for the
userland scripts to filter and analyze it later.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-fadump.c |  165 ++++++++++++++++++++++++++
 1 file changed, 163 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index a755705..10f6086 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -14,6 +14,7 @@
 #include <linux/of_fdt.h>
 #include <linux/libfdt.h>
 #include <linux/mm.h>
+#include <linux/crash_dump.h>
 
 #include <asm/page.h>
 #include <asm/opal.h>
@@ -21,6 +22,7 @@
 #include "../../kernel/fadump-common.h"
 #include "opal-fadump.h"
 
+static const struct opal_fadump_mem_struct *opal_fdm_active;
 static struct opal_fadump_mem_struct *opal_fdm;
 
 static int opal_fadump_unregister(struct fw_dump *fadump_conf);
@@ -41,6 +43,37 @@ static void opal_fadump_update_config(struct fw_dump *fadump_conf,
 	fadump_conf->fadumphdr_addr = fdm->fadumphdr_addr;
 }
 
+/*
+ * This function is called in the capture kernel to get configuration details
+ * from metadata setup by the first kernel.
+ */
+static void opal_fadump_get_config(struct fw_dump *fadump_conf,
+				   const struct opal_fadump_mem_struct *fdm)
+{
+	int i;
+
+	if (!fadump_conf->dump_active)
+		return;
+
+	fadump_conf->boot_memory_size = 0;
+
+	pr_debug("Boot memory regions:\n");
+	for (i = 0; i < fdm->region_cnt; i++) {
+		pr_debug("\t%d. base: 0x%llx, size: 0x%llx\n",
+			 (i + 1), fdm->rgn[i].src, fdm->rgn[i].size);
+
+		fadump_conf->boot_memory_size += fdm->rgn[i].size;
+	}
+
+	/*
+	 * Start address of reserve dump area (permanent reservation) for
+	 * re-registering FADump after dump capture.
+	 */
+	fadump_conf->reserve_dump_area_start = fdm->rgn[0].dest;
+
+	opal_fadump_update_config(fadump_conf, fdm);
+}
+
 /* Initialize kernel metadata */
 static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
 {
@@ -215,24 +248,114 @@ static void opal_fadump_cleanup(struct fw_dump *fadump_conf)
 		pr_warn("Could not reset (%llu) kernel metadata tag!\n", ret);
 }
 
+/*
+ * Convert CPU state data saved at the time of crash into ELF notes.
+ */
+static int __init opal_fadump_build_cpu_notes(struct fw_dump *fadump_conf)
+{
+	u32 num_cpus, *note_buf;
+	struct fadump_crash_info_header *fdh = NULL;
+
+	num_cpus = 1;
+	/* Allocate buffer to hold cpu crash notes. */
+	fadump_conf->cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
+	fadump_conf->cpu_notes_buf_size =
+		PAGE_ALIGN(fadump_conf->cpu_notes_buf_size);
+	note_buf = fadump_cpu_notes_buf_alloc(fadump_conf->cpu_notes_buf_size);
+	if (!note_buf) {
+		pr_err("Failed to allocate 0x%lx bytes for cpu notes buffer\n",
+		       fadump_conf->cpu_notes_buf_size);
+		return -ENOMEM;
+	}
+	fadump_conf->cpu_notes_buf = __pa(note_buf);
+
+	pr_debug("Allocated buffer for cpu notes of size %ld at %p\n",
+		 (num_cpus * sizeof(note_buf_t)), note_buf);
+
+	if (fadump_conf->fadumphdr_addr)
+		fdh = __va(fadump_conf->fadumphdr_addr);
+
+	if (fdh && (fdh->crashing_cpu != FADUMP_CPU_UNKNOWN)) {
+		note_buf = fadump_regs_to_elf_notes(note_buf, &(fdh->regs));
+		final_note(note_buf);
+
+		pr_debug("Updating elfcore header (%llx) with cpu notes\n",
+			 fdh->elfcorehdr_addr);
+		fadump_update_elfcore_header(fadump_conf,
+					     __va(fdh->elfcorehdr_addr));
+	}
+
+	return 0;
+}
+
 static int __init opal_fadump_process(struct fw_dump *fadump_conf)
 {
-	return -EINVAL;
+	struct fadump_crash_info_header *fdh;
+	int rc = 0;
+
+	if (!opal_fdm_active || !fadump_conf->fadumphdr_addr)
+		return -EINVAL;
+
+	/* Validate the fadump crash info header */
+	fdh = __va(fadump_conf->fadumphdr_addr);
+	if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
+		pr_err("Crash info header is not valid.\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * TODO: To build cpu notes, find a way to map PIR to logical id.
+	 *       Also, we may need different method for pseries and powernv.
+	 *       The currently booted kernel could have a different PIR to
+	 *       logical id mapping. So, try saving info of previous kernel's
+	 *       paca to get the right PIR to logical id mapping.
+	 */
+	rc = opal_fadump_build_cpu_notes(fadump_conf);
+	if (rc)
+		return rc;
+
+	/*
+	 * We are done validating dump info and elfcore header is now ready
+	 * to be exported. set elfcorehdr_addr so that vmcore module will
+	 * export the elfcore header through '/proc/vmcore'.
+	 */
+	elfcorehdr_addr = fdh->elfcorehdr_addr;
+
+	return rc;
 }
 
 static void opal_fadump_region_show(struct fw_dump *fadump_conf,
 				    struct seq_file *m)
 {
 	int i;
-	const struct opal_fadump_mem_struct *fdm_ptr = opal_fdm;
+	const struct opal_fadump_mem_struct *fdm_ptr;
 	u64 dumped_bytes = 0;
 
+	if (fadump_conf->dump_active)
+		fdm_ptr = opal_fdm_active;
+	else
+		fdm_ptr = opal_fdm;
+
 	for (i = 0; i < fdm_ptr->region_cnt; i++) {
+		/*
+		 * Only regions that are registered for MPIPL
+		 * would have dump data.
+		 */
+		if ((fadump_conf->dump_active) &&
+		    (i < fdm_ptr->registered_regions))
+			dumped_bytes = fdm_ptr->rgn[i].size;
+
 		seq_printf(m, "DUMP: Src: %#016llx, Dest: %#016llx, ",
 			   fdm_ptr->rgn[i].src, fdm_ptr->rgn[i].dest);
 		seq_printf(m, "Size: %#llx, Dumped: %#llx bytes\n",
 			   fdm_ptr->rgn[i].size, dumped_bytes);
 	}
+
+	/* Dump is active. Show reserved area start address. */
+	if (fadump_conf->dump_active) {
+		seq_printf(m, "\nMemory above %#016lx is reserved for saving crash dump\n",
+			   fadump_conf->reserve_dump_area_start);
+	}
 }
 
 static void opal_fadump_trigger(struct fadump_crash_info_header *fdh,
@@ -264,6 +387,7 @@ static struct fadump_ops opal_fadump_ops = {
 int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 {
 	unsigned long dn;
+	const __be32 *prop;
 
 	/*
 	 * Check if Firmware-Assisted Dump is supported. if yes, check
@@ -283,5 +407,42 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 	fadump_conf->ops		= &opal_fadump_ops;
 	fadump_conf->fadump_supported	= 1;
 
+	/*
+	 * Check if dump has been initiated on last reboot.
+	 */
+	prop = of_get_flat_dt_prop(dn, "mpipl-boot", NULL);
+	if (prop) {
+		u64 addr = 0;
+		s64 ret;
+		const struct opal_fadump_mem_struct *r_opal_fdm_active;
+
+		ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &addr);
+		if ((ret != OPAL_SUCCESS) || !addr) {
+			pr_err("Failed to get Kernel metadata (%lld)\n", ret);
+			return 1;
+		}
+
+		addr = be64_to_cpu(addr);
+		pr_debug("Kernel metadata addr: %llx\n", addr);
+
+		opal_fdm_active = __va(addr);
+		r_opal_fdm_active = (void *)addr;
+		if (r_opal_fdm_active->version != OPAL_FADUMP_VERSION) {
+			pr_err("FADump active but version (%u) unsupported!\n",
+			       r_opal_fdm_active->version);
+			return 1;
+		}
+
+		/* Kernel regions not registered with f/w for MPIPL */
+		if (r_opal_fdm_active->registered_regions == 0) {
+			opal_fdm_active = NULL;
+			return 1;
+		}
+
+		pr_info("Firmware-assisted dump is active.\n");
+		fadump_conf->dump_active = 1;
+		opal_fadump_get_config(fadump_conf, r_opal_fdm_active);
+	}
+
 	return 1;
 }


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 17/31] powernv/fadump: Warn before processing partial crashdump
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (15 preceding siblings ...)
  2019-08-20 12:06 ` [PATCH v5 16/31] powernv/fadump: process the crashdump by exporting it as /proc/vmcore Hari Bathini
@ 2019-08-20 12:06 ` Hari Bathini
  2019-09-04 11:48   ` Michael Ellerman
  2019-08-20 12:06 ` [PATCH v5 18/31] powernv/fadump: handle invalidation of crashdump and re-registraion Hari Bathini
                   ` (13 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

If all kernel boot memory regions are not registered for MPIPL before
system crashes, try processing the partial crashdump but warn the user
before proceeding.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-fadump.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 10f6086..6a05d51 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -71,6 +71,30 @@ static void opal_fadump_get_config(struct fw_dump *fadump_conf,
 	 */
 	fadump_conf->reserve_dump_area_start = fdm->rgn[0].dest;
 
+	/*
+	 * Rarely, but it can so happen that system crashes before all
+	 * boot memory regions are registered for MPIPL. In such
+	 * cases, warn that the vmcore may not be accurate and proceed
+	 * anyway as that is the best bet considering free pages, cache
+	 * pages, user pages, etc are usually filtered out.
+	 *
+	 * Hope the memory that could not be preserved only has pages
+	 * that are usually filtered out while saving the vmcore.
+	 */
+	if (fdm->region_cnt > fdm->registered_regions) {
+		pr_warn("Not all memory regions are saved as system seems to have crashed before all the memory regions could be registered for MPIPL!\n");
+		pr_warn("  The below boot memory regions could not be saved:\n");
+		i = fdm->registered_regions;
+		while (i < fdm->region_cnt) {
+			pr_warn("\t%d. base: 0x%llx, size: 0x%llx\n", (i + 1),
+				fdm->rgn[i].src, fdm->rgn[i].size);
+			i++;
+		}
+
+		pr_warn("  Wishing for the above regions to have only pages that are usually filtered out (user pages, free pages, etc..) and proceeding anyway..\n");
+		pr_warn("  But the sanity of the '/proc/vmcore' file depends on whether the above region(s) have any kernel pages or not.\n");
+	}
+
 	opal_fadump_update_config(fadump_conf, fdm);
 }
 


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 18/31] powernv/fadump: handle invalidation of crashdump and re-registraion
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (16 preceding siblings ...)
  2019-08-20 12:06 ` [PATCH v5 17/31] powernv/fadump: Warn before processing partial crashdump Hari Bathini
@ 2019-08-20 12:06 ` Hari Bathini
  2019-08-20 12:06 ` [PATCH v5 19/31] powerpc/fadump: Update documentation about OPAL platform support Hari Bathini
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Make OPAL call to indicate that the dump is processed and the metadata
area in OPAL can be cleared/released. Also, setup/initialize FADump
for re-registration.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump.c                 |    7 ++++++-
 arch/powerpc/platforms/powernv/opal-fadump.c |   12 +++++++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b2d5ca6..971c50d 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -921,7 +921,12 @@ static void fadump_invalidate_release_mem(void)
 		fw_dump.cpu_notes_buf_size = 0;
 	}
 
-	/* Initialize the kernel dump memory structure for FAD registration. */
+	/*
+	 * Setup kernel metadata and initialize the kernel dump
+	 * memory structure for FADump re-registration.
+	 */
+	if (fw_dump.ops->fadump_setup_metadata(&fw_dump) < 0)
+		pr_warn("Failed to setup kernel metadata!\n");
 	fw_dump.ops->fadump_init_mem_struct(&fw_dump);
 }
 
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 6a05d51..f75b861 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -260,7 +260,17 @@ static int opal_fadump_unregister(struct fw_dump *fadump_conf)
 
 static int opal_fadump_invalidate(struct fw_dump *fadump_conf)
 {
-	return -EIO;
+	s64 rc;
+
+	rc = opal_mpipl_update(OPAL_MPIPL_FREE_PRESERVED_MEMORY, 0, 0, 0);
+	if (rc) {
+		pr_err("Failed to invalidate - unexpected Error(%lld).\n", rc);
+		return -EIO;
+	}
+
+	fadump_conf->dump_active = 0;
+	opal_fdm_active = NULL;
+	return 0;
 }
 
 static void opal_fadump_cleanup(struct fw_dump *fadump_conf)


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 19/31] powerpc/fadump: Update documentation about OPAL platform support
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (17 preceding siblings ...)
  2019-08-20 12:06 ` [PATCH v5 18/31] powernv/fadump: handle invalidation of crashdump and re-registraion Hari Bathini
@ 2019-08-20 12:06 ` Hari Bathini
  2019-09-04 11:51   ` Michael Ellerman
  2019-08-20 12:06 ` [PATCH v5 20/31] powerpc/fadump: use smaller offset while finding memory for reservation Hari Bathini
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

With FADump support now available on both pseries and OPAL platforms,
update FADump documentation with these details.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.rst |  104 +++++++++++++---------
 1 file changed, 63 insertions(+), 41 deletions(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
index d912755..2c3342c 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -72,7 +72,8 @@ as follows:
    normal.
 
 -  The freshly booted kernel will notice that there is a new
-   node (ibm,dump-kernel) in the device tree, indicating that
+   node (ibm,dump-kernel on PSeries or ibm,opal/dump/mpipl-boot
+   on OPAL platform) in the device tree, indicating that
    there is crash data available from a previous boot. During
    the early boot OS will reserve rest of the memory above
    boot memory size effectively booting with restricted memory
@@ -96,7 +97,9 @@ as follows:
 
 Please note that the firmware-assisted dump feature
 is only available on Power6 and above systems with recent
-firmware versions.
+firmware versions on PSeries (PowerVM) platform and Power9
+and above systems with recent firmware versions on PowerNV
+(OPAL) platform.
 
 Implementation details:
 -----------------------
@@ -111,57 +114,76 @@ that are run. If there is dump data, then the
 /sys/kernel/fadump_release_mem file is created, and the reserved
 memory is held.
 
-If there is no waiting dump data, then only the memory required
-to hold CPU state, HPTE region, boot memory dump and elfcore
-header, is usually reserved at an offset greater than boot memory
-size (see Fig. 1). This area is *not* released: this region will
-be kept permanently reserved, so that it can act as a receptacle
-for a copy of the boot memory content in addition to CPU state
-and HPTE region, in the case a crash does occur. Since this reserved
-memory area is used only after the system crash, there is no point in
-blocking this significant chunk of memory from production kernel.
-Hence, the implementation uses the Linux kernel's Contiguous Memory
-Allocator (CMA) for memory reservation if CMA is configured for kernel.
-With CMA reservation this memory will be available for applications to
-use it, while kernel is prevented from using it. With this FADump will
-still be able to capture all of the kernel memory and most of the user
-space memory except the user pages that were present in CMA region::
+If there is no waiting dump data, then only the memory required to
+hold CPU state, HPTE region, boot memory dump, FADump header and
+elfcore header, is usually reserved at an offset greater than boot
+memory size (see Fig. 1). This area is *not* released: this region
+will be kept permanently reserved, so that it can act as a receptacle
+for a copy of the boot memory content in addition to CPU state and
+HPTE region, in the case a crash does occur.
+
+Since this reserved memory area is used only after the system crash,
+there is no point in blocking this significant chunk of memory from
+production kernel. Hence, the implementation uses the Linux kernel's
+Contiguous Memory Allocator (CMA) for memory reservation if CMA is
+configured for kernel. With CMA reservation this memory will be
+available for applications to use it, while kernel is prevented from
+using it. With this FADump will still be able to capture all of the
+kernel memory and most of the user space memory except the user pages
+that were present in CMA region::
 
   o Memory Reservation during first kernel
 
-  Low memory                                                Top of memory
-  0      boot memory size      |<--Reserved dump area --->|      |
-  |           |                | (Permanent Reservation)  |      |
-  V           V                |                          |      V
-  +-----------+----------/ /---+---+----+--------+---+----+------+
-  |           |                |CPU|HPTE|  DUMP  |HDR|ELF |      |
-  +-----------+----------/ /---+---+----+--------+---+----+------+
-        |                                   ^      ^
-        |                                   |      |
-        \                                   /      |
-         -----------------------------------     FADump Header
-          Boot memory content gets transferred   (meta area)
-          to reserved area by firmware at the
-          time of crash
+  Low memory                                                 Top of memory
+  0    boot memory size   |<--- Reserved dump area --->|       |
+  |           |           |    Permanent Reservation   |       |
+  V           V           |                            |       V
+  +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
+  |           |           |///|////|  DUMP | HDR | ELF |////|  |
+  +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
+        |                   ^    ^     ^      ^           ^
+        |                   |    |     |      |           |
+        \                  CPU  HPTE   /      |           |
+         ------------------------------       |           |
+      Boot memory content gets transferred    |           |
+      to reserved area by firmware at the     |           |
+      time of crash.                          |           |
+                                          FADump Header   |
+                                           (meta area)    |
+                                                          |
+                                                          |
+                      Metadata: This area holds a metadata struture whose
+                      address is registered with f/w and retrieved in the
+                      second kernel after crash, on platforms that support
+                      tags (OPAL). Having such structure with info needed
+                      to process the crashdump eases dump capture process.
 
                    Fig. 1
 
 
   o Memory Reservation during second kernel after crash
 
-  Low memory                                                Top of memory
-  0      boot memory size                                        |
-  |           |<----------- Crash preserved area --------------->|
-  V           V                |<-- Reserved dump area -->|      V
-  +-----------+----------/ /---+---+----+--------+---+----+------+
-  |           |                |CPU|HPTE|  DUMP  |HDR|ELF |      |
-  +-----------+----------/ /---+---+----+--------+---+----+------+
-        |                                              |
-        V                                              V
-   Used by second                                /proc/vmcore
+  Low memory                                              Top of memory
+  0      boot memory size                                      |
+  |           |<------------ Crash preserved area ------------>|
+  V           V           |<--- Reserved dump area --->|       |
+  +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
+  |           |           |///|////|  DUMP | HDR | ELF |////|  |
+  +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
+        |                                           |
+        V                                           V
+   Used by second                             /proc/vmcore
    kernel to boot
+
+        +---+
+        |///| -> Regions (CPU, HPTE & Metadata) marked like this in the above
+        +---+    figures are not always present. For example, OPAL platform
+                 does not have CPU & HPTE regions while Metadata region is
+                 not supported on pSeries currently.
+
                    Fig. 2
 
+
 Currently the dump will be copied from /proc/vmcore to a new file upon
 user intervention. The dump data available through /proc/vmcore will be
 in ELF format. Hence the existing kdump infrastructure (kdump scripts)


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 20/31] powerpc/fadump: use smaller offset while finding memory for reservation
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (18 preceding siblings ...)
  2019-08-20 12:06 ` [PATCH v5 19/31] powerpc/fadump: Update documentation about OPAL platform support Hari Bathini
@ 2019-08-20 12:06 ` Hari Bathini
  2019-09-04 11:54   ` Michael Ellerman
  2019-08-20 12:06 ` [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware Hari Bathini
                   ` (10 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Use a smaller offset, instead of size of the memory to be reserved by
which to skip memory before making another attempt at reserving memory,
after the previous attempt to reserve memory for FADump failed due to
memory holes and/or reserved ranges, to reduce the likelihood of memory
reservation failure.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump-common.h |    8 ++++++++
 arch/powerpc/kernel/fadump.c        |    2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index d2dd117..7107cf2 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -66,6 +66,14 @@ static inline u64 fadump_str_to_u64(const char *str)
 
 #define FADUMP_CRASH_INFO_MAGIC		fadump_str_to_u64("FADMPINF")
 
+/*
+ * Amount of memory (1024MB) to skip before making another attempt at
+ * reserving memory (after the previous attempt to reserve memory for
+ * FADump failed due to memory holes and/or reserved ranges) to reduce
+ * the likelihood of memory reservation failure.
+ */
+#define FADUMP_OFFSET_SIZE			0x40000000U
+
 /* fadump crash info structure */
 struct fadump_crash_info_header {
 	u64		magic_number;
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 971c50d..8dd2dcc 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -371,7 +371,7 @@ int __init fadump_reserve_mem(void)
 			    !memblock_is_region_reserved(base, size))
 				break;
 
-			base += size;
+			base += FADUMP_OFFSET_SIZE;
 		}
 
 		if (base > (memory_boundary - size)) {


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (19 preceding siblings ...)
  2019-08-20 12:06 ` [PATCH v5 20/31] powerpc/fadump: use smaller offset while finding memory for reservation Hari Bathini
@ 2019-08-20 12:06 ` Hari Bathini
  2019-09-04 12:20   ` Michael Ellerman
  2019-08-20 12:06 ` [PATCH v5 22/31] powerpc/fadump: make crash memory ranges array allocation generic Hari Bathini
                   ` (9 subsequent siblings)
  30 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

From: Hari Bathini <hbathini@linux.vnet.ibm.com>

Firmware provides architected register state data at the time of crash.
Process this data and build CPU notes to append to ELF core.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump-common.h          |    4 +
 arch/powerpc/platforms/powernv/opal-fadump.c |  198 ++++++++++++++++++++++++--
 arch/powerpc/platforms/powernv/opal-fadump.h |   39 +++++
 3 files changed, 229 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index 7107cf2..fc408b0 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -98,7 +98,11 @@ struct fw_dump {
 	/* cmd line option during boot */
 	unsigned long	reserve_bootvar;
 
+	unsigned long	cpu_state_destination_addr;
+	unsigned long	cpu_state_data_version;
+	unsigned long	cpu_state_entry_size;
 	unsigned long	cpu_state_data_size;
+
 	unsigned long	hpte_region_size;
 
 	unsigned long	boot_mem_dest_addr;
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index f75b861..9a32a7f 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -23,6 +23,7 @@
 #include "opal-fadump.h"
 
 static const struct opal_fadump_mem_struct *opal_fdm_active;
+static const struct opal_mpipl_fadump *opal_cpu_metadata;
 static struct opal_fadump_mem_struct *opal_fdm;
 
 static int opal_fadump_unregister(struct fw_dump *fadump_conf);
@@ -282,15 +283,122 @@ static void opal_fadump_cleanup(struct fw_dump *fadump_conf)
 		pr_warn("Could not reset (%llu) kernel metadata tag!\n", ret);
 }
 
+static inline void opal_fadump_set_regval_regnum(struct pt_regs *regs,
+						 u32 reg_type, u32 reg_num,
+						 u64 reg_val)
+{
+	if (reg_type == HDAT_FADUMP_REG_TYPE_GPR) {
+		if (reg_num < 32)
+			regs->gpr[reg_num] = reg_val;
+		return;
+	}
+
+	switch (reg_num) {
+	case SPRN_CTR:
+		regs->ctr = reg_val;
+		break;
+	case SPRN_LR:
+		regs->link = reg_val;
+		break;
+	case SPRN_XER:
+		regs->xer = reg_val;
+		break;
+	case SPRN_DAR:
+		regs->dar = reg_val;
+		break;
+	case SPRN_DSISR:
+		regs->dsisr = reg_val;
+		break;
+	case HDAT_FADUMP_REG_ID_NIP:
+		regs->nip = reg_val;
+		break;
+	case HDAT_FADUMP_REG_ID_MSR:
+		regs->msr = reg_val;
+		break;
+	case HDAT_FADUMP_REG_ID_CCR:
+		regs->ccr = reg_val;
+		break;
+	}
+}
+
+static inline void opal_fadump_read_regs(char *bufp, unsigned int regs_cnt,
+					 unsigned int reg_entry_size,
+					 struct pt_regs *regs)
+{
+	int i;
+	struct hdat_fadump_reg_entry *reg_entry;
+
+	memset(regs, 0, sizeof(struct pt_regs));
+
+	for (i = 0; i < regs_cnt; i++, bufp += reg_entry_size) {
+		reg_entry = (struct hdat_fadump_reg_entry *)bufp;
+		opal_fadump_set_regval_regnum(regs,
+					      be32_to_cpu(reg_entry->reg_type),
+					      be32_to_cpu(reg_entry->reg_num),
+					      be64_to_cpu(reg_entry->reg_val));
+	}
+}
+
+static inline bool __init is_thread_core_inactive(u8 core_state)
+{
+	bool is_inactive = false;
+
+	if (core_state == HDAT_FADUMP_CORE_INACTIVE)
+		is_inactive = true;
+
+	return is_inactive;
+}
+
 /*
  * Convert CPU state data saved at the time of crash into ELF notes.
+ *
+ * Each register entry is of 16 bytes, A numerical identifier along with
+ * a GPR/SPR flag in the first 8 bytes and the register value in the next
+ * 8 bytes. For more details refer to F/W documentation.
  */
 static int __init opal_fadump_build_cpu_notes(struct fw_dump *fadump_conf)
 {
 	u32 num_cpus, *note_buf;
 	struct fadump_crash_info_header *fdh = NULL;
+	struct hdat_fadump_thread_hdr *thdr;
+	unsigned long addr;
+	u32 thread_pir;
+	char *bufp;
+	struct pt_regs regs;
+	unsigned int size_of_each_thread;
+	unsigned int regs_offset, regs_cnt, reg_esize;
+	int i;
+
+	fadump_conf->cpu_state_entry_size =
+			be32_to_cpu(opal_cpu_metadata->cpu_data_size);
+	fadump_conf->cpu_state_destination_addr =
+			be64_to_cpu(opal_cpu_metadata->region[0].dest);
+	fadump_conf->cpu_state_data_size =
+			be64_to_cpu(opal_cpu_metadata->region[0].size);
+
+	if ((fadump_conf->cpu_state_destination_addr == 0) ||
+	    (fadump_conf->cpu_state_entry_size == 0)) {
+		pr_err("CPU state data not available for processing!\n");
+		return -ENODEV;
+	}
+
+	size_of_each_thread = fadump_conf->cpu_state_entry_size;
+	num_cpus = (fadump_conf->cpu_state_data_size / size_of_each_thread);
+
+	addr = fadump_conf->cpu_state_destination_addr;
+	bufp = __va(addr);
+
+	/*
+	 * Offset for register entries, entry size and registers count is
+	 * duplicated in every thread header in keeping with HDAT format.
+	 * Use these values from the first thread header.
+	 */
+	thdr = (struct hdat_fadump_thread_hdr *)bufp;
+	regs_offset = (offsetof(struct hdat_fadump_thread_hdr, offset) +
+		       be32_to_cpu(thdr->offset));
+	reg_esize = be32_to_cpu(thdr->esize);
+	regs_cnt  = be32_to_cpu(thdr->ecnt);
 
-	num_cpus = 1;
 	/* Allocate buffer to hold cpu crash notes. */
 	fadump_conf->cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
 	fadump_conf->cpu_notes_buf_size =
@@ -309,10 +417,53 @@ static int __init opal_fadump_build_cpu_notes(struct fw_dump *fadump_conf)
 	if (fadump_conf->fadumphdr_addr)
 		fdh = __va(fadump_conf->fadumphdr_addr);
 
-	if (fdh && (fdh->crashing_cpu != FADUMP_CPU_UNKNOWN)) {
-		note_buf = fadump_regs_to_elf_notes(note_buf, &(fdh->regs));
-		final_note(note_buf);
+	pr_debug("--------CPU State Data------------\n");
+	pr_debug("NumCpus     : %u\n", num_cpus);
+	pr_debug("\tOffset: %u, Entry size: %u, Cnt: %u\n",
+		 regs_offset, reg_esize, regs_cnt);
+
+	for (i = 0; i < num_cpus; i++, bufp += size_of_each_thread) {
+		thdr = (struct hdat_fadump_thread_hdr *)bufp;
+
+		thread_pir = be32_to_cpu(thdr->pir);
+		pr_debug("%04d) PIR: 0x%x, core state: 0x%02x\n",
+			 (i + 1), thread_pir, thdr->core_state);
+
+		/*
+		 * Register state data of MAX cores is provided by firmware,
+		 * but some of this cores may not be active. So, while
+		 * processing register state data, check core state and
+		 * skip threads that belong to inactive cores.
+		 */
+		if (is_thread_core_inactive(thdr->core_state))
+			continue;
+
+		/*
+		 * If this is kernel initiated crash, crashing_cpu would be set
+		 * appropriately and register data of the crashing CPU saved by
+		 * crashing kernel. Add this saved register data of crashing CPU
+		 * to elf notes and populate the pt_regs for the remaining CPUs
+		 * from register state data provided by firmware.
+		 */
+		if (fdh && (fdh->crashing_cpu == thread_pir)) {
+			note_buf = fadump_regs_to_elf_notes(note_buf,
+							    &fdh->regs);
+			pr_debug("Crashing CPU PIR: 0x%x - R1 : 0x%lx, NIP : 0x%lx\n",
+				 fdh->crashing_cpu, fdh->regs.gpr[1],
+				 fdh->regs.nip);
+			continue;
+		}
+
+		opal_fadump_read_regs((bufp + regs_offset), regs_cnt,
+				      reg_esize, &regs);
 
+		note_buf = fadump_regs_to_elf_notes(note_buf, &regs);
+		pr_debug("CPU PIR: 0x%x - R1 : 0x%lx, NIP : 0x%lx\n",
+			 thread_pir, regs.gpr[1], regs.nip);
+	}
+	final_note(note_buf);
+
+	if (fdh) {
 		pr_debug("Updating elfcore header (%llx) with cpu notes\n",
 			 fdh->elfcorehdr_addr);
 		fadump_update_elfcore_header(fadump_conf,
@@ -327,7 +478,8 @@ static int __init opal_fadump_process(struct fw_dump *fadump_conf)
 	struct fadump_crash_info_header *fdh;
 	int rc = 0;
 
-	if (!opal_fdm_active || !fadump_conf->fadumphdr_addr)
+	if (!opal_fdm_active || !opal_cpu_metadata ||
+	    !fadump_conf->fadumphdr_addr)
 		return -EINVAL;
 
 	/* Validate the fadump crash info header */
@@ -337,13 +489,6 @@ static int __init opal_fadump_process(struct fw_dump *fadump_conf)
 		return -EINVAL;
 	}
 
-	/*
-	 * TODO: To build cpu notes, find a way to map PIR to logical id.
-	 *       Also, we may need different method for pseries and powernv.
-	 *       The currently booted kernel could have a different PIR to
-	 *       logical id mapping. So, try saving info of previous kernel's
-	 *       paca to get the right PIR to logical id mapping.
-	 */
 	rc = opal_fadump_build_cpu_notes(fadump_conf);
 	if (rc)
 		return rc;
@@ -397,6 +542,14 @@ static void opal_fadump_trigger(struct fadump_crash_info_header *fdh,
 {
 	int rc;
 
+	/*
+	 * Unlike on pSeries platform, logical CPU number is not provided
+	 * with architected register state data. So, store the crashing
+	 * CPU's PIR instead to plug the appropriate register data for
+	 * crashing CPU in the vmcore file.
+	 */
+	fdh->crashing_cpu = (u32)mfspr(SPRN_PIR);
+
 	rc = opal_cec_reboot2(OPAL_REBOOT_MPIPL, msg);
 	if (rc == OPAL_UNSUPPORTED) {
 		pr_emerg("Reboot type %d not supported.\n",
@@ -449,6 +602,7 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 		u64 addr = 0;
 		s64 ret;
 		const struct opal_fadump_mem_struct *r_opal_fdm_active;
+		const struct opal_mpipl_fadump *r_opal_cpu_metadata;
 
 		ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &addr);
 		if ((ret != OPAL_SUCCESS) || !addr) {
@@ -473,6 +627,26 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 			return 1;
 		}
 
+		ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, &addr);
+		if ((ret != OPAL_SUCCESS) || !addr) {
+			pr_err("Failed to get CPU metadata (%lld)\n", ret);
+			return 1;
+		}
+
+		addr = be64_to_cpu(addr);
+		pr_debug("CPU metadata addr: %llx\n", addr);
+
+		opal_cpu_metadata = __va(addr);
+		r_opal_cpu_metadata = (void *)addr;
+		fadump_conf->cpu_state_data_version =
+			be32_to_cpu(r_opal_cpu_metadata->cpu_data_version);
+		if (fadump_conf->cpu_state_data_version !=
+		    HDAT_FADUMP_CPU_DATA_VERSION) {
+			pr_err("CPU data format version (%lu) mismatch!\n",
+			       fadump_conf->cpu_state_data_version);
+			return 1;
+		}
+
 		pr_info("Firmware-assisted dump is active.\n");
 		fadump_conf->dump_active = 1;
 		opal_fadump_get_config(fadump_conf, r_opal_fdm_active);
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h
index 19cac1f..ce4c522 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.h
+++ b/arch/powerpc/platforms/powernv/opal-fadump.h
@@ -30,4 +30,43 @@ struct opal_fadump_mem_struct {
 	struct opal_mpipl_region	rgn[OPAL_FADUMP_MAX_MEM_REGS];
 } __attribute__((packed));
 
+/*
+ * CPU state data is provided by f/w. Below are the definitions
+ * provided in HDAT spec. Refer to latest HDAT specification for
+ * any update to this format.
+ */
+
+#define HDAT_FADUMP_CPU_DATA_VERSION		1
+
+#define HDAT_FADUMP_CORE_INACTIVE		(0x0F)
+
+/* HDAT thread header for register entries */
+struct hdat_fadump_thread_hdr {
+	__be32  pir;
+	/* 0x00 - 0x0F - The corresponding stop state of the core */
+	u8      core_state;
+	u8      reserved[3];
+
+	__be32	offset;	/* Offset to Register Entries array */
+	__be32	ecnt;	/* Number of entries */
+	__be32	esize;	/* Alloc size of each array entry in bytes */
+	__be32	eactsz;	/* Actual size of each array entry in bytes */
+} __attribute__((packed));
+
+/* Register types populated by f/w */
+#define HDAT_FADUMP_REG_TYPE_GPR		0x01
+#define HDAT_FADUMP_REG_TYPE_SPR		0x02
+
+/* ID numbers used by f/w while populating certain registers */
+#define HDAT_FADUMP_REG_ID_NIP			0x7D0
+#define HDAT_FADUMP_REG_ID_MSR			0x7D1
+#define HDAT_FADUMP_REG_ID_CCR			0x7D2
+
+/* HDAT register entry. */
+struct hdat_fadump_reg_entry {
+	__be32		reg_type;
+	__be32		reg_num;
+	__be64		reg_val;
+} __attribute__((packed));
+
 #endif /* __PPC64_OPAL_FA_DUMP_H__ */


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 22/31] powerpc/fadump: make crash memory ranges array allocation generic
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (20 preceding siblings ...)
  2019-08-20 12:06 ` [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware Hari Bathini
@ 2019-08-20 12:06 ` Hari Bathini
  2019-08-20 12:06 ` [PATCH v5 23/31] powerpc/fadump: consider reserved ranges while releasing memory Hari Bathini
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Make allocate_crash_memory_ranges() and free_crash_memory_ranges()
functions generic to reuse them for memory management of all types of
dynamic memory range arrays. This change helps in memory management
of reserved ranges array to be added later.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump-common.h |   11 ++++
 arch/powerpc/kernel/fadump.c        |   99 +++++++++++++++++++----------------
 2 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index fc408b0..d1ff5b2 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -83,11 +83,20 @@ struct fadump_crash_info_header {
 	struct cpumask	online_mask;
 };
 
-struct fad_crash_memory_ranges {
+struct fadump_memory_range {
 	unsigned long long	base;
 	unsigned long long	size;
 };
 
+/* fadump memory ranges info */
+struct fadump_mrange_info {
+	char				name[16];
+	struct fadump_memory_range	*mem_ranges;
+	int				mem_ranges_sz;
+	int				mem_range_cnt;
+	int				max_mem_ranges;
+};
+
 /* Platform specific callback functions */
 struct fadump_ops;
 
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 8dd2dcc..2dd4621 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -36,10 +36,7 @@
 static struct fw_dump fw_dump;
 
 static DEFINE_MUTEX(fadump_mutex);
-struct fad_crash_memory_ranges *crash_memory_ranges;
-int crash_memory_ranges_size;
-int crash_mem_ranges;
-int max_crash_mem_ranges;
+struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 };
 
 #ifdef CONFIG_CMA
 static struct cma *fadump_cma;
@@ -487,46 +484,49 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 	fw_dump.ops->fadump_trigger(fdh, str);
 }
 
-static void free_crash_memory_ranges(void)
+static void fadump_free_mem_ranges(struct fadump_mrange_info *mrange_info)
 {
-	kfree(crash_memory_ranges);
-	crash_memory_ranges = NULL;
-	crash_memory_ranges_size = 0;
-	max_crash_mem_ranges = 0;
+	kfree(mrange_info->mem_ranges);
+	mrange_info->mem_ranges = NULL;
+	mrange_info->mem_ranges_sz = 0;
+	mrange_info->max_mem_ranges = 0;
 }
 
 /*
- * Allocate or reallocate crash memory ranges array in incremental units
+ * Allocate or reallocate mem_ranges array in incremental units
  * of PAGE_SIZE.
  */
-static int allocate_crash_memory_ranges(void)
+static int fadump_alloc_mem_ranges(struct fadump_mrange_info *mrange_info)
 {
-	struct fad_crash_memory_ranges *new_array;
+	struct fadump_memory_range *new_array;
 	u64 new_size;
 
-	new_size = crash_memory_ranges_size + PAGE_SIZE;
-	pr_debug("Allocating %llu bytes of memory for crash memory ranges\n",
-		 new_size);
+	new_size = mrange_info->mem_ranges_sz + PAGE_SIZE;
+	pr_debug("Allocating %llu bytes of memory for %s memory ranges\n",
+		 new_size, mrange_info->name);
 
-	new_array = krealloc(crash_memory_ranges, new_size, GFP_KERNEL);
+	new_array = krealloc(mrange_info->mem_ranges, new_size, GFP_KERNEL);
 	if (new_array == NULL) {
-		pr_err("Insufficient memory for setting up crash memory ranges\n");
-		free_crash_memory_ranges();
+		pr_err("Insufficient memory for setting up %s memory ranges\n",
+		       mrange_info->name);
+		fadump_free_mem_ranges(mrange_info);
 		return -ENOMEM;
 	}
 
-	crash_memory_ranges = new_array;
-	crash_memory_ranges_size = new_size;
-	max_crash_mem_ranges = (new_size /
-				sizeof(struct fad_crash_memory_ranges));
+	mrange_info->mem_ranges = new_array;
+	mrange_info->mem_ranges_sz = new_size;
+	mrange_info->max_mem_ranges = (new_size /
+				       sizeof(struct fadump_memory_range));
 	return 0;
 }
 
-static inline int fadump_add_crash_memory(unsigned long long base,
-					  unsigned long long end)
+static inline int fadump_add_mem_range(struct fadump_mrange_info *mrange_info,
+				       unsigned long long base,
+				       unsigned long long end)
 {
 	u64  start, size;
 	bool is_adjacent = false;
+	struct fadump_memory_range *mem_ranges = mrange_info->mem_ranges;
 
 	if (base == end)
 		return 0;
@@ -535,31 +535,35 @@ static inline int fadump_add_crash_memory(unsigned long long base,
 	 * Fold adjacent memory ranges to bring down the memory ranges/
 	 * PT_LOAD segments count.
 	 */
-	if (crash_mem_ranges) {
-		start = crash_memory_ranges[crash_mem_ranges - 1].base;
-		size = crash_memory_ranges[crash_mem_ranges - 1].size;
+	if (mrange_info->mem_range_cnt) {
+		start = mem_ranges[mrange_info->mem_range_cnt - 1].base;
+		size  = mem_ranges[mrange_info->mem_range_cnt - 1].size;
 
 		if ((start + size) == base)
 			is_adjacent = true;
 	}
 	if (!is_adjacent) {
 		/* resize the array on reaching the limit */
-		if (crash_mem_ranges == max_crash_mem_ranges) {
+		if (mrange_info->mem_range_cnt == mrange_info->max_mem_ranges) {
 			int ret;
 
-			ret = allocate_crash_memory_ranges();
+			ret = fadump_alloc_mem_ranges(mrange_info);
 			if (ret)
 				return ret;
+
+			/* Update to the new resized array */
+			mem_ranges = mrange_info->mem_ranges;
 		}
 
 		start = base;
-		crash_memory_ranges[crash_mem_ranges].base = start;
-		crash_mem_ranges++;
+		mem_ranges[mrange_info->mem_range_cnt].base = start;
+		mrange_info->mem_range_cnt++;
 	}
 
-	crash_memory_ranges[crash_mem_ranges - 1].size = (end - start);
-	pr_debug("crash_memory_range[%d] [%#016llx-%#016llx], %#llx bytes\n",
-		(crash_mem_ranges - 1), start, end - 1, (end - start));
+	mem_ranges[mrange_info->mem_range_cnt - 1].size = (end - start);
+	pr_debug("%s_memory_range[%d] [%#016llx-%#016llx], %#llx bytes\n",
+		 mrange_info->name, (mrange_info->mem_range_cnt - 1),
+		 start, end - 1, (end - start));
 	return 0;
 }
 
@@ -574,18 +578,22 @@ static int fadump_exclude_reserved_area(unsigned long long start,
 
 	if ((ra_start < end) && (ra_end > start)) {
 		if ((start < ra_start) && (end > ra_end)) {
-			ret = fadump_add_crash_memory(start, ra_start);
+			ret = fadump_add_mem_range(&crash_mrange_info,
+						   start, ra_start);
 			if (ret)
 				return ret;
 
-			ret = fadump_add_crash_memory(ra_end, end);
+			ret = fadump_add_mem_range(&crash_mrange_info,
+						   ra_end, end);
 		} else if (start < ra_start) {
-			ret = fadump_add_crash_memory(start, ra_start);
+			ret = fadump_add_mem_range(&crash_mrange_info,
+						   start, ra_start);
 		} else if (ra_end < end) {
-			ret = fadump_add_crash_memory(ra_end, end);
+			ret = fadump_add_mem_range(&crash_mrange_info,
+						   ra_end, end);
 		}
 	} else
-		ret = fadump_add_crash_memory(start, end);
+		ret = fadump_add_mem_range(&crash_mrange_info, start, end);
 
 	return ret;
 }
@@ -634,7 +642,7 @@ static int fadump_setup_crash_memory_ranges(void)
 	int ret;
 
 	pr_debug("Setup crash memory ranges.\n");
-	crash_mem_ranges = 0;
+	crash_mrange_info.mem_range_cnt = 0;
 
 	/*
 	 * add the first memory chunk (RMA_START through boot_memory_size) as
@@ -643,7 +651,8 @@ static int fadump_setup_crash_memory_ranges(void)
 	 * specified during fadump registration. We need to create a separate
 	 * program header for this chunk with the correct offset.
 	 */
-	ret = fadump_add_crash_memory(RMA_START, fw_dump.boot_memory_size);
+	ret = fadump_add_mem_range(&crash_mrange_info,
+				   RMA_START, fw_dump.boot_memory_size);
 	if (ret)
 		return ret;
 
@@ -734,10 +743,10 @@ static int fadump_create_elfcore_headers(char *bufp)
 
 	/* setup PT_LOAD sections. */
 
-	for (i = 0; i < crash_mem_ranges; i++) {
+	for (i = 0; i < crash_mrange_info.mem_range_cnt; i++) {
 		unsigned long long mbase, msize;
-		mbase = crash_memory_ranges[i].base;
-		msize = crash_memory_ranges[i].size;
+		mbase = crash_mrange_info.mem_ranges[i].base;
+		msize = crash_mrange_info.mem_ranges[i].size;
 
 		if (!msize)
 			continue;
@@ -828,7 +837,7 @@ void fadump_cleanup(void)
 	} else if (fw_dump.dump_registered) {
 		/* Un-register Firmware-assisted dump if it was registered. */
 		fw_dump.ops->fadump_unregister(&fw_dump);
-		free_crash_memory_ranges();
+		fadump_free_mem_ranges(&crash_mrange_info);
 	}
 
 	fw_dump.ops->fadump_cleanup(&fw_dump);


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 23/31] powerpc/fadump: consider reserved ranges while releasing memory
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (21 preceding siblings ...)
  2019-08-20 12:06 ` [PATCH v5 22/31] powerpc/fadump: make crash memory ranges array allocation generic Hari Bathini
@ 2019-08-20 12:06 ` Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 24/31] powerpc/fadump: improve how crashed kernel's memory is reserved Hari Bathini
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for
memory reservations") enabled support to parse 'reserved-ranges' DT
node to reserve kernel memory falling in these ranges for firmware
purposes. Along with the preserved area memory, ensure memory in
reserved ranges is not overlapped with memory released by capture
kernel aftering saving vmcore. Also, fix the off-by-one error in
fadump_release_reserved_area function while releasing memory.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump.c |  150 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 138 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 2dd4621..3981599 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -37,6 +37,7 @@ static struct fw_dump fw_dump;
 
 static DEFINE_MUTEX(fadump_mutex);
 struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 };
+struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 };
 
 #ifdef CONFIG_CMA
 static struct cma *fadump_cma;
@@ -881,33 +882,158 @@ static void fadump_release_reserved_area(unsigned long start, unsigned long end)
 			if (tend == end_pfn)
 				break;
 
-			start_pfn = tend + 1;
+			start_pfn = tend;
 		}
 	}
 }
 
 /*
- * Release the memory that was reserved in early boot to preserve the memory
- * contents. The released memory will be available for general use.
+ * Sort the mem ranges in-place and merge adjacent ranges
+ * to minimize the memory ranges count.
+ */
+static void sort_and_merge_mem_ranges(struct fadump_mrange_info *mrange_info)
+{
+	unsigned long long base, size;
+	struct fadump_memory_range tmp_range;
+	struct fadump_memory_range *mem_ranges;
+	int i, j, idx;
+
+	if (!reserved_mrange_info.mem_range_cnt)
+		return;
+
+	/* Sort the memory ranges */
+	mem_ranges = mrange_info->mem_ranges;
+	for (i = 0; i < mrange_info->mem_range_cnt; i++) {
+		idx = i;
+		for (j = i + 1; j < mrange_info->mem_range_cnt; j++) {
+			if (mem_ranges[idx].base > mem_ranges[j].base)
+				idx = j;
+		}
+		if (idx != i) {
+			tmp_range = mem_ranges[idx];
+			mem_ranges[idx] = mem_ranges[i];
+			mem_ranges[i] = tmp_range;
+		}
+	}
+
+	/* Merge adjacent reserved ranges */
+	idx = 0;
+	for (i = 1; i < mrange_info->mem_range_cnt; i++) {
+		base = mem_ranges[i-1].base;
+		size = mem_ranges[i-1].size;
+		if (mem_ranges[i].base == (base + size))
+			mem_ranges[idx].size += mem_ranges[i].size;
+		else {
+			idx++;
+			if (i == idx)
+				continue;
+
+			mem_ranges[idx] = mem_ranges[i];
+		}
+	}
+	mrange_info->mem_range_cnt = idx + 1;
+}
+
+/*
+ * Scan reserved-ranges to consider them while reserving/releasing
+ * memory for FADump.
+ */
+static inline int fadump_scan_reserved_mem_ranges(void)
+{
+	int len, ret = -1;
+	unsigned long i;
+	const __be32 *prop;
+	struct device_node *root;
+
+	root = of_find_node_by_path("/");
+	if (!root)
+		return ret;
+
+	prop = of_get_property(root, "reserved-ranges", &len);
+	if (!prop)
+		return ret;
+
+	/*
+	 * Each reserved range is an (address,size) pair, 2 cells each,
+	 * totalling 4 cells per range.
+	 */
+	for (i = 0; i < len / (sizeof(*prop) * 4); i++) {
+		u64 base, size;
+
+		base = of_read_number(prop + (i * 4) + 0, 2);
+		size = of_read_number(prop + (i * 4) + 2, 2);
+
+		if (size) {
+			ret = fadump_add_mem_range(&reserved_mrange_info,
+						   base, base + size);
+			if (ret < 0) {
+				pr_warn("some reserved ranges are ignored!\n");
+				break;
+			}
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * Release the memory that was reserved during early boot to preserve the
+ * crash'ed kernel's memory contents except reserved dump area (permanent
+ * reservation) and reserved ranges used by F/W. The released memory will
+ * be available for general use.
  */
 static void fadump_release_memory(unsigned long begin, unsigned long end)
 {
+	int i, ret;
 	unsigned long ra_start, ra_end;
+	unsigned long tstart;
+
+	fadump_scan_reserved_mem_ranges();
 
 	ra_start = fw_dump.reserve_dump_area_start;
 	ra_end = ra_start + fw_dump.reserve_dump_area_size;
 
 	/*
-	 * exclude the dump reserve area. Will reuse it for next
-	 * fadump registration.
+	 * Add reserved dump area to reserved ranges list
+	 * and exclude all these ranges while releasing memory.
 	 */
-	if (begin < ra_end && end > ra_start) {
-		if (begin < ra_start)
-			fadump_release_reserved_area(begin, ra_start);
-		if (end > ra_end)
-			fadump_release_reserved_area(ra_end, end);
-	} else
-		fadump_release_reserved_area(begin, end);
+	ret = fadump_add_mem_range(&reserved_mrange_info, ra_start, ra_end);
+	if (ret != 0) {
+		/*
+		 * Not enough memory to setup reserved ranges but the system is
+		 * running shortage of memory. So, release all the memory except
+		 * Reserved dump area (reused for next fadump registration).
+		 */
+		if (begin < ra_end && end > ra_start) {
+			if (begin < ra_start)
+				fadump_release_reserved_area(begin, ra_start);
+			if (end > ra_end)
+				fadump_release_reserved_area(ra_end, end);
+		} else
+			fadump_release_reserved_area(begin, end);
+
+		return;
+	}
+
+	/* Get the reserved ranges list in order first. */
+	sort_and_merge_mem_ranges(&reserved_mrange_info);
+
+	/* Exclude reserved ranges and release remaining memory */
+	tstart = begin;
+	for (i = 0; i < reserved_mrange_info.mem_range_cnt; i++) {
+		ra_start = reserved_mrange_info.mem_ranges[i].base;
+		ra_end = ra_start + reserved_mrange_info.mem_ranges[i].size;
+
+		if (tstart >= ra_end)
+			continue;
+
+		if (tstart < ra_start)
+			fadump_release_reserved_area(tstart, ra_start);
+		tstart = ra_end;
+	}
+
+	if (tstart < end)
+		fadump_release_reserved_area(tstart, end);
 }
 
 static void fadump_invalidate_release_mem(void)


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 24/31] powerpc/fadump: improve how crashed kernel's memory is reserved
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (22 preceding siblings ...)
  2019-08-20 12:06 ` [PATCH v5 23/31] powerpc/fadump: consider reserved ranges while releasing memory Hari Bathini
@ 2019-08-20 12:07 ` Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 25/31] powernv/fadump: add support to preserve crash data on FADUMP disabled kernel Hari Bathini
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:07 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

The size parameter to fadump_reserve_crash_area() function is not needed
as all the memory above boot memory size must be preserved anyway. Update
the function by dropping this redundant parameter.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   53 +++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3981599..665104a 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -35,6 +35,8 @@
 
 static struct fw_dump fw_dump;
 
+static void __init fadump_reserve_crash_area(unsigned long base);
+
 static DEFINE_MUTEX(fadump_mutex);
 struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 };
 struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 };
@@ -262,26 +264,6 @@ static unsigned long get_fadump_area_size(void)
 	return size;
 }
 
-static void __init fadump_reserve_crash_area(unsigned long base,
-					     unsigned long size)
-{
-	struct memblock_region *reg;
-	unsigned long mstart, mend, msize;
-
-	for_each_memblock(memory, reg) {
-		mstart = max_t(unsigned long, base, reg->base);
-		mend = reg->base + reg->size;
-		mend = min(base + size, mend);
-
-		if (mstart < mend) {
-			msize = mend - mstart;
-			memblock_reserve(mstart, msize);
-			pr_info("Reserved %ldMB of memory at %#016lx for saving crash dump\n",
-				(msize >> 20), mstart);
-		}
-	}
-}
-
 int __init fadump_reserve_mem(void)
 {
 	int ret = 1;
@@ -347,12 +329,11 @@ int __init fadump_reserve_mem(void)
 #endif
 		/*
 		 * If last boot has crashed then reserve all the memory
-		 * above boot_memory_size so that we don't touch it until
+		 * above boot memory size so that we don't touch it until
 		 * dump is written to disk by userspace tool. This memory
-		 * will be released for general use once the dump is saved.
+		 * can be released for general use by invalidating fadump.
 		 */
-		size = memory_boundary - base;
-		fadump_reserve_crash_area(base, size);
+		fadump_reserve_crash_area(base);
 
 		pr_debug("fadumphdr_addr = %#016lx\n", fw_dump.fadumphdr_addr);
 		pr_debug("Reserve dump area start address: 0x%lx\n",
@@ -1240,3 +1221,27 @@ int __init setup_fadump(void)
 	return 1;
 }
 subsys_initcall(setup_fadump);
+
+/* Preserve everything above the base address */
+static void __init fadump_reserve_crash_area(unsigned long base)
+{
+	struct memblock_region *reg;
+	unsigned long mstart, msize;
+
+	for_each_memblock(memory, reg) {
+		mstart = reg->base;
+		msize  = reg->size;
+
+		if ((mstart + msize) < base)
+			continue;
+
+		if (mstart < base) {
+			msize -= (base - mstart);
+			mstart = base;
+		}
+
+		pr_info("Reserving %luMB of memory at %#016lx for preserving crash data",
+			(msize >> 20), mstart);
+		memblock_reserve(mstart, msize);
+	}
+}


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 25/31] powernv/fadump: add support to preserve crash data on FADUMP disabled kernel
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (23 preceding siblings ...)
  2019-08-20 12:07 ` [PATCH v5 24/31] powerpc/fadump: improve how crashed kernel's memory is reserved Hari Bathini
@ 2019-08-20 12:07 ` Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 26/31] powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP Hari Bathini
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:07 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures
that crash data, from previously crash'ed kernel, is preserved. This
helps in cases where FADump is not enabled but the subsequent memory
preserving kernel boot is likely to process this crash data. One
typical usecase for this config option is petitboot kernel.

As OPAL allows registering address with it in the first kernel and
retrieving it after MPIPL, use it to store the top of boot memory.
A kernel that intends to preserve crash data retrieves it and avoids
using memory beyond this address.

Move arch_reserved_kernel_pages() function as it is needed for both
FA_DUMP and PRESERVE_FA_DUMP configurations.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/Kconfig                         |    9 +++
 arch/powerpc/include/asm/fadump.h            |    9 ++-
 arch/powerpc/kernel/Makefile                 |    6 ++
 arch/powerpc/kernel/fadump-common.h          |   11 ++++
 arch/powerpc/kernel/fadump.c                 |   43 +++++++++++++--
 arch/powerpc/kernel/prom.c                   |    4 +
 arch/powerpc/platforms/powernv/Makefile      |    1 
 arch/powerpc/platforms/powernv/opal-fadump.c |   76 ++++++++++++++++++++++++++
 8 files changed, 148 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fc4ecfe..2be9b96 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -580,6 +580,15 @@ config FA_DUMP
 	  If unsure, say "y". Only special kernels like petitboot may
 	  need to say "N" here.
 
+config PRESERVE_FA_DUMP
+	bool "Preserve Firmware-assisted dump"
+	depends on PPC64 && PPC_POWERNV && !FA_DUMP
+	help
+	  On a kernel with FA_DUMP disabled, this option helps to preserve
+	  crash data from a previously crash'ed kernel. Useful when the next
+	  memory preserving kernel boot would process this crash data.
+	  Petitboot kernel is the typical usecase for this option.
+
 config IRQ_ALL_CPUS
 	bool "Distribute interrupts on all CPUs by default"
 	depends on SMP
diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index e608d34..fd990d8 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -14,9 +14,6 @@
 extern int crashing_cpu;
 
 extern int is_fadump_memory_area(u64 addr, ulong size);
-extern int early_init_dt_scan_fw_dump(unsigned long node,
-		const char *uname, int depth, void *data);
-extern int fadump_reserve_mem(void);
 extern int setup_fadump(void);
 extern int is_fadump_active(void);
 extern int should_fadump_crash(void);
@@ -29,4 +26,10 @@ static inline int should_fadump_crash(void) { return 0; }
 static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
 static inline void fadump_cleanup(void) { }
 #endif /* !CONFIG_FA_DUMP */
+
+#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
+extern int early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
+				      int depth, void *data);
+extern int fadump_reserve_mem(void);
+#endif
 #endif /* __PPC64_FA_DUMP_H__ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 439d548..6abaead 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -78,7 +78,11 @@ obj-$(CONFIG_EEH)              += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
 				  eeh_driver.o eeh_event.o eeh_sysfs.o
 obj-$(CONFIG_GENERIC_TBSYNC)	+= smp-tbsync.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o
-obj-$(CONFIG_FA_DUMP)		+= fadump.o fadump-common.o
+ifeq ($(CONFIG_FA_DUMP),y)
+obj-y				+= fadump.o fadump-common.o
+else
+obj-$(CONFIG_PRESERVE_FA_DUMP)	+= fadump.o
+endif
 ifdef CONFIG_PPC32
 obj-$(CONFIG_E500)		+= idle_e500.o
 endif
diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index d1ff5b2..ba8481b 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -12,6 +12,7 @@
 #ifndef __PPC64_FA_DUMP_INTERNAL_H__
 #define __PPC64_FA_DUMP_INTERNAL_H__
 
+#ifndef CONFIG_PRESERVE_FA_DUMP
 /*
  * The RMA region will be saved for later dumping when kernel crashes.
  * RMA is Real Mode Area, the first block of logical memory address owned
@@ -157,6 +158,16 @@ void fadump_update_elfcore_header(struct fw_dump *fadump_config, char *bufp);
 int is_fadump_boot_mem_contiguous(struct fw_dump *fadump_conf);
 int is_fadump_reserved_mem_contiguous(struct fw_dump *fadump_conf);
 
+#else /* !CONFIG_PRESERVE_FA_DUMP */
+
+/* Firmware-assisted dump configuration details. */
+struct fw_dump {
+	unsigned long	boot_mem_top;
+	unsigned long	dump_active;
+};
+
+#endif /* CONFIG_PRESERVE_FA_DUMP */
+
 #ifdef CONFIG_PPC_PSERIES
 extern int rtas_fadump_dt_scan(struct fw_dump *fadump_config, ulong node);
 #else
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 665104a..43038f5 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -37,6 +37,7 @@ static struct fw_dump fw_dump;
 
 static void __init fadump_reserve_crash_area(unsigned long base);
 
+#ifndef CONFIG_PRESERVE_FA_DUMP
 static DEFINE_MUTEX(fadump_mutex);
 struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 };
 struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 };
@@ -384,11 +385,6 @@ int __init fadump_reserve_mem(void)
 	return 0;
 }
 
-unsigned long __init arch_reserved_kernel_pages(void)
-{
-	return memblock_reserved_size() / PAGE_SIZE;
-}
-
 /* Look for fadump= cmdline option. */
 static int __init early_fadump_param(char *p)
 {
@@ -1221,6 +1217,38 @@ int __init setup_fadump(void)
 	return 1;
 }
 subsys_initcall(setup_fadump);
+#else /* !CONFIG_PRESERVE_FA_DUMP */
+
+/* Scan the Firmware Assisted dump configuration details. */
+int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
+				      int depth, void *data)
+{
+	if ((depth != 1) || (strcmp(uname, "ibm,opal") != 0))
+		return 0;
+
+	return opal_fadump_dt_scan(&fw_dump, node);
+}
+
+/*
+ * When dump is active but PRESERVE_FA_DUMP is enabled on the kernel,
+ * preserve crash data. The subsequent memory preserving kernel boot
+ * is likely to process this crash data.
+ */
+int __init fadump_reserve_mem(void)
+{
+	if (fw_dump.dump_active) {
+		/*
+		 * If last boot has crashed then reserve all the memory
+		 * above boot memory to preserve crash data.
+		 */
+		pr_info("Preserving crash data for processing in next boot.\n");
+		fadump_reserve_crash_area(PAGE_ALIGN(fw_dump.boot_mem_top));
+	} else
+		pr_debug("FADump-aware kernel..\n");
+
+	return 1;
+}
+#endif /* CONFIG_PRESERVE_FA_DUMP */
 
 /* Preserve everything above the base address */
 static void __init fadump_reserve_crash_area(unsigned long base)
@@ -1245,3 +1273,8 @@ static void __init fadump_reserve_crash_area(unsigned long base)
 		memblock_reserve(mstart, msize);
 	}
 }
+
+unsigned long __init arch_reserved_kernel_pages(void)
+{
+	return memblock_reserved_size() / PAGE_SIZE;
+}
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 7159e79..9c3861bd 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -704,7 +704,7 @@ void __init early_init_devtree(void *params)
 	of_scan_flat_dt(early_init_dt_scan_opal, NULL);
 #endif
 
-#ifdef CONFIG_FA_DUMP
+#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
 	/* scan tree to see if dump is active during last boot */
 	of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL);
 #endif
@@ -731,7 +731,7 @@ void __init early_init_devtree(void *params)
 	if (PHYSICAL_START > MEMORY_START)
 		memblock_reserve(MEMORY_START, 0x8000);
 	reserve_kdump_trampoline();
-#ifdef CONFIG_FA_DUMP
+#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
 	/*
 	 * If we fail to reserve memory for firmware-assisted dump then
 	 * fallback to kexec based kdump.
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 43a6e1c..b4a8022 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -7,6 +7,7 @@ obj-y			+= opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
 
 obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_FA_DUMP)	+= opal-fadump.o
+obj-$(CONFIG_PRESERVE_FA_DUMP)	+= opal-fadump.o
 obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
 obj-$(CONFIG_CXL_BASE)	+= pci-cxl.o
 obj-$(CONFIG_EEH)	+= eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 9a32a7f..85a89c0 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -22,6 +22,70 @@
 #include "../../kernel/fadump-common.h"
 #include "opal-fadump.h"
 
+
+#ifdef CONFIG_PRESERVE_FA_DUMP
+/*
+ * When dump is active but PRESERVE_FA_DUMP is enabled on the kernel,
+ * ensure crash data is preserved in hope that the subsequent memory
+ * preserving kernel boot is going to process this crash data.
+ */
+int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
+{
+	unsigned long dn;
+	const __be32 *prop;
+
+	dn = of_get_flat_dt_subnode_by_name(node, "dump");
+	if (dn == -FDT_ERR_NOTFOUND)
+		return 1;
+
+	/*
+	 * Check if dump has been initiated on last reboot.
+	 */
+	prop = of_get_flat_dt_prop(dn, "mpipl-boot", NULL);
+	if (prop) {
+		u64 addr = 0;
+		s64 ret;
+		const struct opal_fadump_mem_struct *r_opal_fdm_active;
+
+		ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &addr);
+		if ((ret != OPAL_SUCCESS) || !addr) {
+			pr_debug("Could not get Kernel metadata (%lld)\n", ret);
+			return 1;
+		}
+
+		/*
+		 * Preserve memory only if kernel memory regions are registered
+		 * with f/w for MPIPL.
+		 */
+		addr = be64_to_cpu(addr);
+		pr_debug("Kernel metadata addr: %llx\n", addr);
+		r_opal_fdm_active = (void *)addr;
+		if (r_opal_fdm_active->registered_regions == 0)
+			return 1;
+
+		ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_BOOT_MEM, &addr);
+		if ((ret != OPAL_SUCCESS) || !addr) {
+			pr_err("Failed to get boot memory tag (%lld)\n", ret);
+			return 1;
+		}
+
+		/*
+		 * Anything below this address can be used for booting a
+		 * capture kernel or petitboot kernel. Preserve everything
+		 * above this address for processing crashdump.
+		 */
+		fadump_conf->boot_mem_top = be64_to_cpu(addr);
+		pr_debug("Preserve everything above %lx\n",
+			 fadump_conf->boot_mem_top);
+
+		pr_info("Firmware-assisted dump is active.\n");
+		fadump_conf->dump_active = 1;
+	}
+
+	return 1;
+}
+
+#else /* CONFIG_PRESERVE_FA_DUMP */
 static const struct opal_fadump_mem_struct *opal_fdm_active;
 static const struct opal_mpipl_fadump *opal_cpu_metadata;
 static struct opal_fadump_mem_struct *opal_fdm;
@@ -189,6 +253,17 @@ static int opal_fadump_setup_metadata(struct fw_dump *fadump_conf)
 		err = -EPERM;
 	}
 
+	/*
+	 * Register boot memory top address with f/w. Should be retrieved
+	 * by a kernel that intends to preserve crash'ed kernel's memory.
+	 */
+	ret = opal_mpipl_register_tag(OPAL_MPIPL_TAG_BOOT_MEM,
+				      fadump_conf->boot_memory_size);
+	if (ret != OPAL_SUCCESS) {
+		pr_err("Failed to set boot memory tag!\n");
+		err = -EPERM;
+	}
+
 	return err;
 }
 
@@ -654,3 +729,4 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 
 	return 1;
 }
+#endif /* !CONFIG_PRESERVE_FA_DUMP */


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 26/31] powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (24 preceding siblings ...)
  2019-08-20 12:07 ` [PATCH v5 25/31] powernv/fadump: add support to preserve crash data on FADUMP disabled kernel Hari Bathini
@ 2019-08-20 12:07 ` Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 27/31] powernv/opalcore: export /sys/firmware/opal/core for analysing opal crashes Hari Bathini
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:07 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Kernel config option CONFIG_PRESERVE_FA_DUMP is introduced to ensure
crash data, from previously crash'ed kernel, is preserved. Update
documentation with this details.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.rst |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
index 2c3342c..23b04a6 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -101,6 +101,15 @@ firmware versions on PSeries (PowerVM) platform and Power9
 and above systems with recent firmware versions on PowerNV
 (OPAL) platform.
 
+On OPAL based machines, system first boots into an intermittent
+kernel (referred to as petitboot kernel) before booting into the
+capture kernel. This kernel would have minimal kernel and/or
+userspace support to process crash data. Such kernel needs to
+preserve previously crash'ed kernel's memory for the subsequent
+capture kernel boot to process this crash data. Kernel config
+option CONFIG_PRESERVE_FA_DUMP has to be enabled on such kernel
+to ensure that crash data is preserved to process later.
+
 Implementation details:
 -----------------------
 


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 27/31] powernv/opalcore: export /sys/firmware/opal/core for analysing opal crashes
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (25 preceding siblings ...)
  2019-08-20 12:07 ` [PATCH v5 26/31] powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP Hari Bathini
@ 2019-08-20 12:07 ` Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 28/31] powernv/opalcore: provide an option to invalidate /sys/firmware/opal/core file Hari Bathini
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:07 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

From: Hari Bathini <hbathini@linux.vnet.ibm.com>

Export /sys/firmware/opal/core file to analyze opal crashes. Since OPAL
core can be generated independent of CONFIG_FA_DUMP support in kernel,
add this support under a new kernel config option CONFIG_OPAL_CORE.
Also, avoid code duplication by moving common code used while exporting
/proc/vmcore and/or /sys/firmware/opal/core file(s).

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 arch/powerpc/Kconfig                         |    9 
 arch/powerpc/platforms/powernv/Makefile      |    1 
 arch/powerpc/platforms/powernv/opal-core.c   |  595 ++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-fadump.c |   84 +---
 arch/powerpc/platforms/powernv/opal-fadump.h |   72 +++
 5 files changed, 694 insertions(+), 67 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-core.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2be9b96..d91460a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -589,6 +589,15 @@ config PRESERVE_FA_DUMP
 	  memory preserving kernel boot would process this crash data.
 	  Petitboot kernel is the typical usecase for this option.
 
+config OPAL_CORE
+	bool "Export OPAL memory as /sys/firmware/opal/core"
+	depends on PPC64 && PPC_POWERNV
+	help
+	  This option uses the MPIPL support in firmware to provide an
+	  ELF core of OPAL memory after a crash. The ELF core is exported
+	  as /sys/firmware/opal/core file which is helpful in debugging
+	  OPAL crashes using GDB.
+
 config IRQ_ALL_CPUS
 	bool "Distribute interrupts on all CPUs by default"
 	depends on SMP
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index b4a8022..e659afd 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -8,6 +8,7 @@ obj-y			+= opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
 obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_FA_DUMP)	+= opal-fadump.o
 obj-$(CONFIG_PRESERVE_FA_DUMP)	+= opal-fadump.o
+obj-$(CONFIG_OPAL_CORE)	+= opal-core.o
 obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
 obj-$(CONFIG_CXL_BASE)	+= pci-cxl.o
 obj-$(CONFIG_EEH)	+= eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c
new file mode 100644
index 0000000..2f1c8c1
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-core.c
@@ -0,0 +1,595 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Interface for exporting the OPAL ELF core.
+ * Heavily inspired from fs/proc/vmcore.c
+ *
+ * Copyright 2019, IBM Corp.
+ * Author: Hari Bathini <hbathini@linux.ibm.com>
+ */
+
+#undef DEBUG
+#define pr_fmt(fmt) "opalcore: " fmt
+
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <linux/proc_fs.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+#include <linux/slab.h>
+#include <linux/crash_core.h>
+#include <linux/of.h>
+
+#include <asm/page.h>
+#include <asm/opal.h>
+
+#include "../../kernel/fadump-common.h"
+#include "opal-fadump.h"
+
+#define MAX_PT_LOAD_CNT		8
+
+/* NT_AUXV note related info */
+#define AUXV_CNT		1
+#define AUXV_DESC_SZ		(((2 * AUXV_CNT) + 1) * sizeof(Elf64_Off))
+
+struct opalcore_config {
+	unsigned int		num_cpus;
+	/* PIR value of crashing CPU */
+	unsigned int		crashing_cpu;
+
+	/* CPU state data info from F/W */
+	unsigned long		cpu_state_destination_addr;
+	unsigned long		cpu_state_data_size;
+	unsigned long		cpu_state_entry_size;
+
+	/* OPAL memory to be exported as PT_LOAD segments */
+	unsigned long		ptload_addr[MAX_PT_LOAD_CNT];
+	unsigned long		ptload_size[MAX_PT_LOAD_CNT];
+	unsigned long		ptload_cnt;
+
+	/* Pointer to the first PT_LOAD in the ELF core file */
+	Elf64_Phdr		*ptload_phdr;
+
+	/* Total size of opalcore file. */
+	size_t			opalcore_size;
+
+	/* Buffer for all the ELF core headers and the PT_NOTE */
+	size_t			opalcorebuf_sz;
+	char			*opalcorebuf;
+
+	/* NT_AUXV buffer */
+	char			auxv_buf[AUXV_DESC_SZ];
+};
+
+struct opalcore {
+	struct list_head list;
+	unsigned long long paddr;
+	unsigned long long size;
+	loff_t offset;
+};
+
+static LIST_HEAD(opalcore_list);
+static struct opalcore_config *oc_conf;
+static const struct opal_mpipl_fadump *opalc_metadata;
+static const struct opal_mpipl_fadump *opalc_cpu_metadata;
+
+/*
+ * Set crashing CPU's signal to SIGUSR1. if the kernel is triggered
+ * by kernel, SIGTERM otherwise.
+ */
+bool kernel_initiated;
+
+static struct opalcore * __init get_new_element(void)
+{
+	return kzalloc(sizeof(struct opalcore), GFP_KERNEL);
+}
+
+static inline int is_opalcore_usable(void)
+{
+	return (oc_conf && oc_conf->opalcorebuf != NULL) ? 1 : 0;
+}
+
+static Elf64_Word *append_elf64_note(Elf64_Word *buf, char *name,
+				     unsigned int type, void *data,
+				     size_t data_len)
+{
+	Elf64_Nhdr *note = (Elf64_Nhdr *)buf;
+	Elf64_Word namesz = strlen(name) + 1;
+
+	note->n_namesz = cpu_to_be32(namesz);
+	note->n_descsz = cpu_to_be32(data_len);
+	note->n_type   = cpu_to_be32(type);
+	buf += DIV_ROUND_UP(sizeof(*note), sizeof(Elf64_Word));
+	memcpy(buf, name, namesz);
+	buf += DIV_ROUND_UP(namesz, sizeof(Elf64_Word));
+	memcpy(buf, data, data_len);
+	buf += DIV_ROUND_UP(data_len, sizeof(Elf64_Word));
+
+	return buf;
+}
+
+static void fill_prstatus(struct elf_prstatus *prstatus, int pir,
+			  struct pt_regs *regs)
+{
+	memset(prstatus, 0, sizeof(struct elf_prstatus));
+	elf_core_copy_kernel_regs(&(prstatus->pr_reg), regs);
+
+	/*
+	 * Overload PID with PIR value.
+	 * As a PIR value could also be '0', add an offset of '100'
+	 * to every PIR to avoid misinterpretations in GDB.
+	 */
+	prstatus->pr_pid  = cpu_to_be32(100 + pir);
+	prstatus->pr_ppid = cpu_to_be32(1);
+
+	/*
+	 * Indicate SIGUSR1 for crash initiated from kernel.
+	 * SIGTERM otherwise.
+	 */
+	if (pir == oc_conf->crashing_cpu) {
+		short sig;
+
+		sig = kernel_initiated ? SIGUSR1 : SIGTERM;
+		prstatus->pr_cursig = cpu_to_be16(sig);
+	}
+}
+
+static Elf64_Word *auxv_to_elf64_notes(Elf64_Word *buf,
+				       uint64_t opal_boot_entry)
+{
+	int idx = 0;
+	Elf64_Off *bufp = (Elf64_Off *)oc_conf->auxv_buf;
+
+	memset(bufp, 0, AUXV_DESC_SZ);
+
+	/* Entry point of OPAL */
+	bufp[idx++] = cpu_to_be64(AT_ENTRY);
+	bufp[idx++] = cpu_to_be64(opal_boot_entry);
+
+	/* end of vector */
+	bufp[idx++] = cpu_to_be64(AT_NULL);
+
+	buf = append_elf64_note(buf, CRASH_CORE_NOTE_NAME, NT_AUXV,
+				oc_conf->auxv_buf, AUXV_DESC_SZ);
+	return buf;
+}
+
+/*
+ * Read from the ELF header and then the crash dump.
+ * Returns number of bytes read on success, -errno on failure.
+ */
+static ssize_t read_opalcore(struct file *file, struct kobject *kobj,
+			     struct bin_attribute *bin_attr, char *to,
+			     loff_t pos, size_t count)
+{
+	struct opalcore *m;
+	ssize_t tsz, avail;
+	loff_t tpos = pos;
+
+	if (pos >= oc_conf->opalcore_size)
+		return 0;
+
+	/* Adjust count if it goes beyond opalcore size */
+	avail = oc_conf->opalcore_size - pos;
+	if (count > avail)
+		count = avail;
+
+	if (count == 0)
+		return 0;
+
+	/* Read ELF core header and/or PT_NOTE segment */
+	if (tpos < oc_conf->opalcorebuf_sz) {
+		tsz = min_t(size_t, oc_conf->opalcorebuf_sz - tpos, count);
+		memcpy(to, oc_conf->opalcorebuf + tpos, tsz);
+		to += tsz;
+		tpos += tsz;
+		count -= tsz;
+	}
+
+	list_for_each_entry(m, &opalcore_list, list) {
+		/* nothing more to read here */
+		if (count == 0)
+			break;
+
+		if (tpos < m->offset + m->size) {
+			void *addr;
+
+			tsz = min_t(size_t, m->offset + m->size - tpos, count);
+			addr = (void *)(m->paddr + tpos - m->offset);
+			memcpy(to, __va(addr), tsz);
+			to += tsz;
+			tpos += tsz;
+			count -= tsz;
+		}
+	}
+
+	return (tpos - pos);
+}
+
+static struct bin_attribute opal_core_attr = {
+	.attr = {.name = "core", .mode = 0400},
+	.read = read_opalcore
+};
+
+/*
+ * Read CPU state dump data and convert it into ELF notes.
+ *
+ * Each register entry is of 16 bytes, A numerical identifier along with
+ * a GPR/SPR flag in the first 8 bytes and the register value in the next
+ * 8 bytes. For more details refer to F/W documentation.
+ */
+static Elf64_Word * __init opalcore_append_cpu_notes(Elf64_Word *buf)
+{
+	struct hdat_fadump_thread_hdr *thdr;
+	unsigned long addr;
+	u32 thread_pir;
+	char *bufp;
+	Elf64_Word *first_cpu_note;
+	struct pt_regs regs;
+	struct elf_prstatus prstatus;
+	unsigned int size_of_each_thread;
+	unsigned int regs_offset, regs_cnt, reg_esize;
+	int i;
+
+	size_of_each_thread = oc_conf->cpu_state_entry_size;
+
+	addr = oc_conf->cpu_state_destination_addr;
+	bufp = __va(addr);
+
+	/*
+	 * Offset for register entries, entry size and registers count is
+	 * duplicated in every thread header in keeping with HDAT format.
+	 * Use these values from the first thread header.
+	 */
+	thdr = (struct hdat_fadump_thread_hdr *)bufp;
+	regs_offset = (offsetof(struct hdat_fadump_thread_hdr, offset) +
+		       be32_to_cpu(thdr->offset));
+	reg_esize = be32_to_cpu(thdr->esize);
+	regs_cnt  = be32_to_cpu(thdr->ecnt);
+
+	pr_debug("--------CPU State Data------------\n");
+	pr_debug("NumCpus     : %u\n", oc_conf->num_cpus);
+	pr_debug("\tOffset: %u, Entry size: %u, Cnt: %u\n",
+		 regs_offset, reg_esize, regs_cnt);
+
+	/*
+	 * Skip past the first CPU note. Fill this note with the
+	 * crashing CPU's prstatus.
+	 */
+	first_cpu_note = buf;
+	buf = append_elf64_note(buf, CRASH_CORE_NOTE_NAME, NT_PRSTATUS,
+				&prstatus, sizeof(prstatus));
+
+	for (i = 0; i < oc_conf->num_cpus; i++, bufp += size_of_each_thread) {
+		thdr = (struct hdat_fadump_thread_hdr *)bufp;
+		thread_pir = be32_to_cpu(thdr->pir);
+
+		pr_debug("%04d) PIR: 0x%x, core state: 0x%02x\n",
+			 (i + 1), thread_pir, thdr->core_state);
+
+		/*
+		 * Register state data of MAX cores is provided by firmware,
+		 * but some of this cores may not be active. So, while
+		 * processing register state data, check core state and
+		 * skip threads that belong to inactive cores.
+		 */
+		if (is_thread_core_inactive(thdr->core_state))
+			continue;
+
+		opal_fadump_read_regs((bufp + regs_offset), regs_cnt,
+				      reg_esize, false, &regs);
+
+		pr_debug("PIR 0x%x - R1 : 0x%llx, NIP : 0x%llx\n", thread_pir,
+			 be64_to_cpu(regs.gpr[1]), be64_to_cpu(regs.nip));
+		fill_prstatus(&prstatus, thread_pir, &regs);
+
+		if (thread_pir != oc_conf->crashing_cpu) {
+			buf = append_elf64_note(buf, CRASH_CORE_NOTE_NAME,
+						NT_PRSTATUS, &prstatus,
+						sizeof(prstatus));
+		} else {
+			/*
+			 * Add crashing CPU as the first NT_PRSTATUS note for
+			 * GDB to process the core file appropriately.
+			 */
+			append_elf64_note(first_cpu_note, CRASH_CORE_NOTE_NAME,
+					  NT_PRSTATUS, &prstatus,
+					  sizeof(prstatus));
+		}
+	}
+
+	return buf;
+}
+
+static int __init create_opalcore(void)
+{
+	int hdr_size, cpu_notes_size, order, count;
+	int i, ret;
+	unsigned int numcpus;
+	unsigned long paddr;
+	Elf64_Ehdr *elf;
+	Elf64_Phdr *phdr;
+	loff_t opalcore_off;
+	struct opalcore *new;
+	struct page *page;
+	char *bufp;
+	struct device_node *dn;
+	uint64_t opal_base_addr;
+	uint64_t opal_boot_entry;
+
+
+	if ((oc_conf->ptload_cnt == 0) ||
+	    (oc_conf->ptload_cnt > MAX_PT_LOAD_CNT)) {
+		pr_err("Invalid PT_LOAD count: %lu\n", oc_conf->ptload_cnt);
+		return -EINVAL;
+	}
+
+	numcpus = oc_conf->num_cpus;
+	hdr_size = (sizeof(Elf64_Ehdr) +
+		    ((oc_conf->ptload_cnt + 1) * sizeof(Elf64_Phdr)));
+	cpu_notes_size = ((numcpus * (CRASH_CORE_NOTE_HEAD_BYTES +
+			  CRASH_CORE_NOTE_NAME_BYTES +
+			  CRASH_CORE_NOTE_DESC_BYTES)) +
+			  (CRASH_CORE_NOTE_HEAD_BYTES +
+			  CRASH_CORE_NOTE_NAME_BYTES + AUXV_DESC_SZ));
+	oc_conf->opalcorebuf_sz = (hdr_size + cpu_notes_size);
+	order = get_order(oc_conf->opalcorebuf_sz);
+	oc_conf->opalcorebuf =
+		(char *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order);
+	if (!oc_conf->opalcorebuf) {
+		pr_err("Not enough memory to setup opalcore (size: %lu)\n",
+		       oc_conf->opalcorebuf_sz);
+		oc_conf->opalcorebuf_sz = 0;
+		return -ENOMEM;
+	}
+
+	pr_debug("opalcorebuf = 0x%lx\n", (unsigned long)oc_conf->opalcorebuf);
+
+	count = 1 << order;
+	page = virt_to_page(oc_conf->opalcorebuf);
+	for (i = 0; i < count; i++)
+		SetPageReserved(page + i);
+
+	/* Read OPAL related device-tree entries */
+	dn = of_find_node_by_name(NULL, "ibm,opal");
+	if (dn) {
+		ret = of_property_read_u64(dn, "opal-base-address",
+					   &opal_base_addr);
+		pr_debug("opal-base-address: %llx\n", opal_base_addr);
+		ret |= of_property_read_u64(dn, "opal-boot-address",
+					    &opal_boot_entry);
+		pr_debug("opal-boot-address: %llx\n", opal_boot_entry);
+	}
+	if (!dn || ret)
+		pr_warn("WARNING: Failed to read OPAL base & entry values\n");
+
+	/* Use count to keep track of the program headers */
+	count = 0;
+
+	bufp = oc_conf->opalcorebuf;
+	elf = (Elf64_Ehdr *)bufp;
+	bufp += sizeof(Elf64_Ehdr);
+	memcpy(elf->e_ident, ELFMAG, SELFMAG);
+	elf->e_ident[EI_CLASS] = ELF_CLASS;
+	elf->e_ident[EI_DATA] = ELFDATA2MSB;
+	elf->e_ident[EI_VERSION] = EV_CURRENT;
+	elf->e_ident[EI_OSABI] = ELF_OSABI;
+	memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
+	elf->e_type = cpu_to_be16(ET_CORE);
+	elf->e_machine = cpu_to_be16(ELF_ARCH);
+	elf->e_version = cpu_to_be32(EV_CURRENT);
+	elf->e_entry = 0;
+	elf->e_phoff = cpu_to_be64(sizeof(Elf64_Ehdr));
+	elf->e_shoff = 0;
+	elf->e_flags = 0;
+
+	elf->e_ehsize = cpu_to_be16(sizeof(Elf64_Ehdr));
+	elf->e_phentsize = cpu_to_be16(sizeof(Elf64_Phdr));
+	elf->e_phnum = 0;
+	elf->e_shentsize = 0;
+	elf->e_shnum = 0;
+	elf->e_shstrndx = 0;
+
+	phdr = (Elf64_Phdr *)bufp;
+	bufp += sizeof(Elf64_Phdr);
+	phdr->p_type	= cpu_to_be32(PT_NOTE);
+	phdr->p_flags	= 0;
+	phdr->p_align	= 0;
+	phdr->p_paddr	= phdr->p_vaddr = 0;
+	phdr->p_offset	= cpu_to_be64(hdr_size);
+	phdr->p_filesz	= phdr->p_memsz = cpu_to_be64(cpu_notes_size);
+	count++;
+
+	opalcore_off = oc_conf->opalcorebuf_sz;
+	oc_conf->ptload_phdr  = (Elf64_Phdr *)bufp;
+	paddr = 0;
+	for (i = 0; i < oc_conf->ptload_cnt; i++) {
+		phdr = (Elf64_Phdr *)bufp;
+		bufp += sizeof(Elf64_Phdr);
+		phdr->p_type	= cpu_to_be32(PT_LOAD);
+		phdr->p_flags	= cpu_to_be32(PF_R|PF_W|PF_X);
+		phdr->p_align	= 0;
+
+		new = get_new_element();
+		if (!new)
+			return -ENOMEM;
+		new->paddr  = oc_conf->ptload_addr[i];
+		new->size   = oc_conf->ptload_size[i];
+		new->offset = opalcore_off;
+		list_add_tail(&new->list, &opalcore_list);
+
+		phdr->p_paddr	= cpu_to_be64(paddr);
+		phdr->p_vaddr	= cpu_to_be64(opal_base_addr + paddr);
+		phdr->p_filesz	= phdr->p_memsz  =
+			cpu_to_be64(oc_conf->ptload_size[i]);
+		phdr->p_offset	= cpu_to_be64(opalcore_off);
+
+		count++;
+		opalcore_off += oc_conf->ptload_size[i];
+		paddr += oc_conf->ptload_size[i];
+	}
+
+	elf->e_phnum = cpu_to_be16(count);
+
+	bufp = (char *)opalcore_append_cpu_notes((Elf64_Word *)bufp);
+	bufp = (char *)auxv_to_elf64_notes((Elf64_Word *)bufp, opal_boot_entry);
+
+	oc_conf->opalcore_size = opalcore_off;
+	return 0;
+}
+
+static void __init opalcore_config_init(void)
+{
+	struct device_node *np;
+	const __be32 *prop;
+	uint64_t addr = 0;
+	uint32_t idx, cpu_data_version;
+	int i, ret;
+
+
+	np = of_find_node_by_path("/ibm,opal/dump");
+	if (np == NULL)
+		return;
+
+	if (!of_device_is_compatible(np, "ibm,opal-dump")) {
+		pr_err("Support missing for this f/w version!\n");
+		return;
+	}
+
+	/*
+	 * Check if dump has been initiated on last reboot.
+	 */
+	prop = of_get_property(np, "mpipl-boot", NULL);
+	if (!prop)
+		goto out;
+
+	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_OPAL, &addr);
+	if ((ret != OPAL_SUCCESS) || !addr) {
+		pr_err("Failed to get OPAL metadata (%d)\n", ret);
+		goto out;
+	}
+
+	addr = be64_to_cpu(addr);
+	pr_debug("OPAL metadata addr: %llx\n", addr);
+	opalc_metadata = __va(addr);
+	if (opalc_metadata->version != MPIPL_FADUMP_VERSION) {
+		pr_err("OPAL metadata version (%u) not supported by kernel!\n",
+		       opalc_metadata->version);
+		goto out;
+	}
+
+	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, &addr);
+	if ((ret != OPAL_SUCCESS) || !addr) {
+		pr_err("Failed to get OPAL CPU metadata (%d)\n", ret);
+		goto out;
+	}
+
+	addr = be64_to_cpu(addr);
+	pr_debug("CPU metadata addr: %llx\n", addr);
+	opalc_cpu_metadata = __va(addr);
+	cpu_data_version = be32_to_cpu(opalc_cpu_metadata->cpu_data_version);
+	if (cpu_data_version != HDAT_FADUMP_CPU_DATA_VERSION) {
+		pr_err("CPU data version (%u) not supported by kernel!\n",
+		       cpu_data_version);
+		goto out;
+	}
+
+	oc_conf = kzalloc(sizeof(struct opalcore_config), GFP_KERNEL);
+	if (oc_conf == NULL)
+		goto out;
+
+	oc_conf->ptload_cnt = 0;
+	idx = be32_to_cpu(opalc_metadata->region_cnt);
+	if (idx > MAX_PT_LOAD_CNT) {
+		pr_warn("OPAL regions count (%d) adjusted to limit (%d)",
+			MAX_PT_LOAD_CNT, idx);
+		idx = MAX_PT_LOAD_CNT;
+	}
+	for (i = 0; i < idx; i++) {
+		oc_conf->ptload_addr[oc_conf->ptload_cnt] =
+				be64_to_cpu(opalc_metadata->region[i].dest);
+		oc_conf->ptload_size[oc_conf->ptload_cnt++] =
+				be64_to_cpu(opalc_metadata->region[i].size);
+	}
+	oc_conf->ptload_cnt = i;
+	oc_conf->crashing_cpu = be32_to_cpu(opalc_metadata->crashing_pir);
+
+	oc_conf->cpu_state_destination_addr =
+			be64_to_cpu(opalc_cpu_metadata->region[0].dest);
+	oc_conf->cpu_state_data_size =
+			be64_to_cpu(opalc_cpu_metadata->region[0].size);
+	oc_conf->cpu_state_entry_size =
+			be32_to_cpu(opalc_cpu_metadata->cpu_data_size);
+
+	oc_conf->num_cpus = (oc_conf->cpu_state_data_size /
+			     oc_conf->cpu_state_entry_size);
+
+out:
+	of_node_put(np);
+}
+
+/* Cleanup function for opalcore module. */
+static void opalcore_cleanup(void)
+{
+	unsigned long order, count, i;
+	struct page *page;
+
+	if (oc_conf == NULL)
+		return;
+
+	sysfs_remove_bin_file(opal_kobj, &opal_core_attr);
+	oc_conf->ptload_phdr = NULL;
+	oc_conf->ptload_cnt = 0;
+
+	/* free core buffer */
+	if ((oc_conf->opalcorebuf != NULL) && (oc_conf->opalcorebuf_sz != 0)) {
+		order = get_order(oc_conf->opalcorebuf_sz);
+		count = 1 << order;
+		page = virt_to_page(oc_conf->opalcorebuf);
+		for (i = 0; i < count; i++)
+			ClearPageReserved(page + i);
+		__free_pages(page, order);
+
+		oc_conf->opalcorebuf = NULL;
+		oc_conf->opalcorebuf_sz = 0;
+	}
+
+	kfree(oc_conf);
+	oc_conf = NULL;
+}
+__exitcall(opalcore_cleanup);
+
+/* Init function for opalcore module. */
+static int __init opalcore_init(void)
+{
+	int rc = -1;
+
+	opalcore_config_init();
+
+	if (oc_conf == NULL)
+		return rc;
+
+	create_opalcore();
+
+	/*
+	 * If oc_conf->opalcorebuf= is set in the 2nd kernel,
+	 * then capture the dump.
+	 */
+	if (!(is_opalcore_usable())) {
+		pr_err("Failed to export /sys/firmware/opal/core\n");
+		opalcore_cleanup();
+		return rc;
+	}
+
+	/* Set opal core size */
+	opal_core_attr.size = oc_conf->opalcore_size;
+
+	rc = sysfs_create_bin_file(opal_kobj, &opal_core_attr);
+	if (rc != 0) {
+		pr_err("Failed to export /sys/firmware/opal/core\n");
+		opalcore_cleanup();
+		return rc;
+	}
+
+	return 0;
+}
+fs_initcall(opalcore_init);
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 85a89c0..58faa38 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -90,6 +90,10 @@ static const struct opal_fadump_mem_struct *opal_fdm_active;
 static const struct opal_mpipl_fadump *opal_cpu_metadata;
 static struct opal_fadump_mem_struct *opal_fdm;
 
+#ifdef CONFIG_OPAL_CORE
+extern bool kernel_initiated;
+#endif
+
 static int opal_fadump_unregister(struct fw_dump *fadump_conf);
 
 static void opal_fadump_update_config(struct fw_dump *fadump_conf,
@@ -358,72 +362,6 @@ static void opal_fadump_cleanup(struct fw_dump *fadump_conf)
 		pr_warn("Could not reset (%llu) kernel metadata tag!\n", ret);
 }
 
-static inline void opal_fadump_set_regval_regnum(struct pt_regs *regs,
-						 u32 reg_type, u32 reg_num,
-						 u64 reg_val)
-{
-	if (reg_type == HDAT_FADUMP_REG_TYPE_GPR) {
-		if (reg_num < 32)
-			regs->gpr[reg_num] = reg_val;
-		return;
-	}
-
-	switch (reg_num) {
-	case SPRN_CTR:
-		regs->ctr = reg_val;
-		break;
-	case SPRN_LR:
-		regs->link = reg_val;
-		break;
-	case SPRN_XER:
-		regs->xer = reg_val;
-		break;
-	case SPRN_DAR:
-		regs->dar = reg_val;
-		break;
-	case SPRN_DSISR:
-		regs->dsisr = reg_val;
-		break;
-	case HDAT_FADUMP_REG_ID_NIP:
-		regs->nip = reg_val;
-		break;
-	case HDAT_FADUMP_REG_ID_MSR:
-		regs->msr = reg_val;
-		break;
-	case HDAT_FADUMP_REG_ID_CCR:
-		regs->ccr = reg_val;
-		break;
-	}
-}
-
-static inline void opal_fadump_read_regs(char *bufp, unsigned int regs_cnt,
-					 unsigned int reg_entry_size,
-					 struct pt_regs *regs)
-{
-	int i;
-	struct hdat_fadump_reg_entry *reg_entry;
-
-	memset(regs, 0, sizeof(struct pt_regs));
-
-	for (i = 0; i < regs_cnt; i++, bufp += reg_entry_size) {
-		reg_entry = (struct hdat_fadump_reg_entry *)bufp;
-		opal_fadump_set_regval_regnum(regs,
-					      be32_to_cpu(reg_entry->reg_type),
-					      be32_to_cpu(reg_entry->reg_num),
-					      be64_to_cpu(reg_entry->reg_val));
-	}
-}
-
-static inline bool __init is_thread_core_inactive(u8 core_state)
-{
-	bool is_inactive = false;
-
-	if (core_state == HDAT_FADUMP_CORE_INACTIVE)
-		is_inactive = true;
-
-	return is_inactive;
-}
-
 /*
  * Convert CPU state data saved at the time of crash into ELF notes.
  *
@@ -530,7 +468,7 @@ static int __init opal_fadump_build_cpu_notes(struct fw_dump *fadump_conf)
 		}
 
 		opal_fadump_read_regs((bufp + regs_offset), regs_cnt,
-				      reg_esize, &regs);
+				      reg_esize, true, &regs);
 
 		note_buf = fadump_regs_to_elf_notes(note_buf, &regs);
 		pr_debug("CPU PIR: 0x%x - R1 : 0x%lx, NIP : 0x%lx\n",
@@ -564,6 +502,18 @@ static int __init opal_fadump_process(struct fw_dump *fadump_conf)
 		return -EINVAL;
 	}
 
+#ifdef CONFIG_OPAL_CORE
+	/*
+	 * If this is a kernel initiated crash, crashing_cpu would be set
+	 * appropriately and register data of the crashing CPU saved by
+	 * crashing kernel. Add this saved register data of crashing CPU
+	 * to elf notes and populate the pt_regs for the remaining CPUs
+	 * from register state data provided by firmware.
+	 */
+	if (fdh->crashing_cpu != FADUMP_CPU_UNKNOWN)
+		kernel_initiated = true;
+#endif
+
 	rc = opal_fadump_build_cpu_notes(fadump_conf);
 	if (rc)
 		return rc;
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h
index ce4c522..b1a340b 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.h
+++ b/arch/powerpc/platforms/powernv/opal-fadump.h
@@ -9,6 +9,8 @@
 #ifndef __PPC64_OPAL_FA_DUMP_H__
 #define __PPC64_OPAL_FA_DUMP_H__
 
+#include <asm/reg.h>
+
 /* OPAL FADump structure format version */
 #define OPAL_FADUMP_VERSION			0x1
 
@@ -69,4 +71,74 @@ struct hdat_fadump_reg_entry {
 	__be64		reg_val;
 } __attribute__((packed));
 
+static inline bool __init is_thread_core_inactive(u8 core_state)
+{
+	bool is_inactive = false;
+
+	if (core_state == HDAT_FADUMP_CORE_INACTIVE)
+		is_inactive = true;
+
+	return is_inactive;
+}
+
+static inline void opal_fadump_set_regval_regnum(struct pt_regs *regs,
+						 u32 reg_type, u32 reg_num,
+						 u64 reg_val)
+{
+	if (reg_type == HDAT_FADUMP_REG_TYPE_GPR) {
+		if (reg_num < 32)
+			regs->gpr[reg_num] = reg_val;
+		return;
+	}
+
+	switch (reg_num) {
+	case SPRN_CTR:
+		regs->ctr = reg_val;
+		break;
+	case SPRN_LR:
+		regs->link = reg_val;
+		break;
+	case SPRN_XER:
+		regs->xer = reg_val;
+		break;
+	case SPRN_DAR:
+		regs->dar = reg_val;
+		break;
+	case SPRN_DSISR:
+		regs->dsisr = reg_val;
+		break;
+	case HDAT_FADUMP_REG_ID_NIP:
+		regs->nip = reg_val;
+		break;
+	case HDAT_FADUMP_REG_ID_MSR:
+		regs->msr = reg_val;
+		break;
+	case HDAT_FADUMP_REG_ID_CCR:
+		regs->ccr = reg_val;
+		break;
+	}
+}
+
+static inline void opal_fadump_read_regs(char *bufp, unsigned int regs_cnt,
+					 unsigned int reg_entry_size,
+					 bool cpu_endian,
+					 struct pt_regs *regs)
+{
+	int i;
+	u64 val;
+	struct hdat_fadump_reg_entry *reg_entry;
+
+	memset(regs, 0, sizeof(struct pt_regs));
+
+	for (i = 0; i < regs_cnt; i++, bufp += reg_entry_size) {
+		reg_entry = (struct hdat_fadump_reg_entry *)bufp;
+		val = (cpu_endian ? be64_to_cpu(reg_entry->reg_val) :
+		       reg_entry->reg_val);
+		opal_fadump_set_regval_regnum(regs,
+					      be32_to_cpu(reg_entry->reg_type),
+					      be32_to_cpu(reg_entry->reg_num),
+					      val);
+	}
+}
+
 #endif /* __PPC64_OPAL_FA_DUMP_H__ */


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 28/31] powernv/opalcore: provide an option to invalidate /sys/firmware/opal/core file
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (26 preceding siblings ...)
  2019-08-20 12:07 ` [PATCH v5 27/31] powernv/opalcore: export /sys/firmware/opal/core for analysing opal crashes Hari Bathini
@ 2019-08-20 12:07 ` Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 29/31] powerpc/fadump: consider f/w load area Hari Bathini
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:07 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Writing '1' to /sys/kernel/fadump_release_opalcore would release the
memory held by kernel in exporting /sys/firmware/opal/core file.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-core.c |   38 ++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c
index 2f1c8c1..a534f3d 100644
--- a/arch/powerpc/platforms/powernv/opal-core.c
+++ b/arch/powerpc/platforms/powernv/opal-core.c
@@ -15,6 +15,8 @@
 #include <linux/proc_fs.h>
 #include <linux/elf.h>
 #include <linux/elfcore.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
 #include <linux/slab.h>
 #include <linux/crash_core.h>
 #include <linux/of.h>
@@ -558,6 +560,36 @@ static void opalcore_cleanup(void)
 }
 __exitcall(opalcore_cleanup);
 
+static ssize_t fadump_release_opalcore_store(struct kobject *kobj,
+					     struct kobj_attribute *attr,
+					     const char *buf, size_t count)
+{
+	int input = -1;
+
+	if (kstrtoint(buf, 0, &input))
+		return -EINVAL;
+
+	if (input == 1) {
+		if (oc_conf == NULL) {
+			pr_err("'/sys/firmware/opal/core' file not accessible!\n");
+			return -EPERM;
+		}
+
+		/*
+		 * Take away '/sys/firmware/opal/core' and release all memory
+		 * used for exporting this file.
+		 */
+		opalcore_cleanup();
+	} else
+		return -EINVAL;
+
+	return count;
+}
+
+static struct kobj_attribute opalcore_rel_attr = __ATTR(fadump_release_opalcore,
+						0200, NULL,
+						fadump_release_opalcore_store);
+
 /* Init function for opalcore module. */
 static int __init opalcore_init(void)
 {
@@ -590,6 +622,12 @@ static int __init opalcore_init(void)
 		return rc;
 	}
 
+	rc = sysfs_create_file(kernel_kobj, &opalcore_rel_attr.attr);
+	if (rc) {
+		pr_warn("unable to create sysfs file fadump_release_opalcore (%d)\n",
+			rc);
+	}
+
 	return 0;
 }
 fs_initcall(opalcore_init);


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 29/31] powerpc/fadump: consider f/w load area
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (27 preceding siblings ...)
  2019-08-20 12:07 ` [PATCH v5 28/31] powernv/opalcore: provide an option to invalidate /sys/firmware/opal/core file Hari Bathini
@ 2019-08-20 12:07 ` Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 30/31] powernv/fadump: update documentation about option to release opalcore Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 31/31] powernv/fadump: support holes in kernel boot memory area Hari Bathini
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:07 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

OPAL loads kernel & initrd at 512MB offset (256MB size), also exported
as ibm,opal/dump/fw-load-area. So, if boot memory size of FADump is
less than 768MB, kernel memory to be exported as '/proc/vmcore' would
be overwritten by f/w while loading kernel & initrd. To avoid such a
scenario, enforce a minimum boot memory size of 768MB on OPAL platform
and skip using FADump if a newer F/W version loads kernel & initrd
above 768MB.

Also, irrespective of RMA size, set the minimum boot memory size
expected on pseries platform at 320MB. This is to avoid inflating the
minimum memory requirements on systems with 512M/1024M RMA size.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump-common.h          |   11 +---------
 arch/powerpc/kernel/fadump.c                 |   11 +++++++++-
 arch/powerpc/platforms/powernv/opal-fadump.c |   29 ++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-fadump.h |    7 ++++++
 arch/powerpc/platforms/pseries/rtas-fadump.c |    6 +++++
 arch/powerpc/platforms/pseries/rtas-fadump.h |    9 ++++++++
 6 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index ba8481b..55e4f25 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -22,16 +22,6 @@
 #define RMA_START	0x0
 #define RMA_END		(ppc64_rma_size)
 
-/*
- * On some Power systems where RMO is 128MB, it still requires minimum of
- * 256MB for kernel to boot successfully. When kdump infrastructure is
- * configured to save vmcore over network, we run into OOM issue while
- * loading modules related to network setup. Hence we need additional 64M
- * of memory to avoid OOM issue.
- */
-#define MIN_BOOT_MEM	(((RMA_END < (0x1UL << 28)) ? (0x1UL << 28) : RMA_END) \
-			+ (0x1UL << 26))
-
 /* The upper limit percentage for user specified boot memory size (25%) */
 #define MAX_BOOT_MEM_RATIO			4
 
@@ -139,6 +129,7 @@ struct fadump_ops {
 	ulong	(*fadump_init_mem_struct)(struct fw_dump *fadump_config);
 	ulong	(*fadump_get_metadata_size)(void);
 	int	(*fadump_setup_metadata)(struct fw_dump *fadump_config);
+	ulong	(*fadump_get_bootmem_min)(void);
 	int	(*fadump_register)(struct fw_dump *fadump_config);
 	int	(*fadump_unregister)(struct fw_dump *fadump_config);
 	int	(*fadump_invalidate)(struct fw_dump *fadump_config);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 43038f5..4b168c4 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -238,7 +238,8 @@ static inline unsigned long fadump_calculate_reserve_size(void)
 	if (memory_limit && size > memory_limit)
 		size = memory_limit;
 
-	return (size > MIN_BOOT_MEM ? size : MIN_BOOT_MEM);
+	return (size > fw_dump.ops->fadump_get_bootmem_min() ? size :
+		fw_dump.ops->fadump_get_bootmem_min());
 }
 
 /*
@@ -291,6 +292,14 @@ int __init fadump_reserve_mem(void)
 				ALIGN(fw_dump.boot_memory_size,
 							FADUMP_CMA_ALIGNMENT);
 #endif
+
+		if (fw_dump.boot_memory_size <
+		    fw_dump.ops->fadump_get_bootmem_min()) {
+			pr_err("Can't enable fadump with boot memory size (0x%lx) less than 0x%lx\n",
+			       fw_dump.boot_memory_size,
+			       fw_dump.ops->fadump_get_bootmem_min());
+			goto error_out;
+		}
 	}
 
 	/*
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 58faa38..50cf9e6 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -11,6 +11,7 @@
 
 #include <linux/string.h>
 #include <linux/seq_file.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/libfdt.h>
 #include <linux/mm.h>
@@ -271,6 +272,11 @@ static int opal_fadump_setup_metadata(struct fw_dump *fadump_conf)
 	return err;
 }
 
+static ulong opal_fadump_get_bootmem_min(void)
+{
+	return OPAL_FADUMP_MIN_BOOT_MEM;
+}
+
 static int opal_fadump_register(struct fw_dump *fadump_conf)
 {
 	int i, err = -EIO;
@@ -587,6 +593,7 @@ static struct fadump_ops opal_fadump_ops = {
 	.fadump_init_mem_struct		= opal_fadump_init_mem_struct,
 	.fadump_get_metadata_size	= opal_fadump_get_metadata_size,
 	.fadump_setup_metadata		= opal_fadump_setup_metadata,
+	.fadump_get_bootmem_min		= opal_fadump_get_bootmem_min,
 	.fadump_register		= opal_fadump_register,
 	.fadump_unregister		= opal_fadump_unregister,
 	.fadump_invalidate		= opal_fadump_invalidate,
@@ -600,6 +607,7 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 {
 	unsigned long dn;
 	const __be32 *prop;
+	int i, len;
 
 	/*
 	 * Check if Firmware-Assisted Dump is supported. if yes, check
@@ -616,6 +624,27 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 		return 1;
 	}
 
+	prop = of_get_flat_dt_prop(dn, "fw-load-area", &len);
+	if (prop) {
+		/*
+		 * Each f/w load area is an (address,size) pair,
+		 * 2 cells each, totalling 4 cells per range.
+		 */
+		for (i = 0; i < len / (sizeof(*prop) * 4); i++) {
+			u64 base, end;
+
+			base = of_read_number(prop + (i * 4) + 0, 2);
+			end = base;
+			end += of_read_number(prop + (i * 4) + 2, 2);
+			if (end > OPAL_FADUMP_MIN_BOOT_MEM) {
+				pr_err("F/W load area: 0x%llx-0x%llx\n",
+				       base, end);
+				pr_err("F/W version not supported!\n");
+				return 1;
+			}
+		}
+	}
+
 	fadump_conf->ops		= &opal_fadump_ops;
 	fadump_conf->fadump_supported	= 1;
 
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h
index b1a340b..ae71a16 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.h
+++ b/arch/powerpc/platforms/powernv/opal-fadump.h
@@ -11,6 +11,13 @@
 
 #include <asm/reg.h>
 
+/*
+ * With kernel & initrd loaded at 512MB (with 256MB size), enforce a minimum
+ * boot memory size of 768MB to ensure f/w loading kernel and initrd doesn't
+ * mess with crash'ed kernel's memory during MPIPL.
+ */
+#define OPAL_FADUMP_MIN_BOOT_MEM		(0x30000000UL)
+
 /* OPAL FADump structure format version */
 #define OPAL_FADUMP_VERSION			0x1
 
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c
index 6164c5a..7dee6d0 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -136,6 +136,11 @@ static int rtas_fadump_setup_metadata(struct fw_dump *fadump_conf)
 	return 0;
 }
 
+static ulong rtas_fadump_get_bootmem_min(void)
+{
+	return RTAS_FADUMP_MIN_BOOT_MEM;
+}
+
 static int rtas_fadump_register(struct fw_dump *fadump_conf)
 {
 	int rc, err = -EIO;
@@ -507,6 +512,7 @@ static struct fadump_ops rtas_fadump_ops = {
 	.fadump_init_mem_struct		= rtas_fadump_init_mem_struct,
 	.fadump_get_metadata_size	= rtas_fadump_get_metadata_size,
 	.fadump_setup_metadata		= rtas_fadump_setup_metadata,
+	.fadump_get_bootmem_min		= rtas_fadump_get_bootmem_min,
 	.fadump_register		= rtas_fadump_register,
 	.fadump_unregister		= rtas_fadump_unregister,
 	.fadump_invalidate		= rtas_fadump_invalidate,
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.h b/arch/powerpc/platforms/pseries/rtas-fadump.h
index a3a2918..feaeb75 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.h
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.h
@@ -12,6 +12,15 @@
 #ifndef __PPC64_RTAS_FA_DUMP_H__
 #define __PPC64_RTAS_FA_DUMP_H__
 
+/*
+ * On some Power systems where RMO is 128MB, it still requires minimum of
+ * 256MB for kernel to boot successfully. When kdump infrastructure is
+ * configured to save vmcore over network, we run into OOM issue while
+ * loading modules related to network setup. Hence we need additional 64M
+ * of memory to avoid OOM issue.
+ */
+#define RTAS_FADUMP_MIN_BOOT_MEM	((0x1UL << 28) + (0x1UL << 26))
+
 /* Firmware provided dump sections */
 #define RTAS_FADUMP_CPU_STATE_DATA	0x0001
 #define RTAS_FADUMP_HPTE_REGION		0x0002


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 30/31] powernv/fadump: update documentation about option to release opalcore
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (28 preceding siblings ...)
  2019-08-20 12:07 ` [PATCH v5 29/31] powerpc/fadump: consider f/w load area Hari Bathini
@ 2019-08-20 12:07 ` Hari Bathini
  2019-08-20 12:07 ` [PATCH v5 31/31] powernv/fadump: support holes in kernel boot memory area Hari Bathini
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:07 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

With /sys/firmware/opal/core support available on OPAL based machines
and an option to the release memory used by kernel in exporting this
core file, update FADump documentation with these details.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.rst |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
index 23b04a6..ec9fa12 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -110,6 +110,16 @@ capture kernel boot to process this crash data. Kernel config
 option CONFIG_PRESERVE_FA_DUMP has to be enabled on such kernel
 to ensure that crash data is preserved to process later.
 
+-- On OPAL based machines (PowerNV), if the kernel is build with
+   CONFIG_OPAL_CORE=y, OPAL memory at the time of crash is also
+   exported as /sys/firmware/opal/core file. This procfs file is
+   helpful in debugging OPAL crashes with GDB. The kernel memory
+   used for exporting this procfs file can be released by echo'ing
+   '1' to /sys/kernel/fadump_release_opalcore node.
+
+   e.g.
+     # echo 1 > /sys/kernel/fadump_release_opalcore
+
 Implementation details:
 -----------------------
 
@@ -273,6 +283,15 @@ Here is the list of files under kernel sysfs:
     enhanced to use this interface to release the memory reserved for
     dump and continue without 2nd reboot.
 
+ /sys/kernel/fadump_release_opalcore
+
+    This file is available only on OPAL based machines when FADump is
+    active during capture kernel. This is used to release the memory
+    used by the kernel to export /sys/firmware/opal/core file. To
+    release this memory, echo '1' to it:
+
+    echo 1  > /sys/kernel/fadump_release_opalcore
+
 Here is the list of files under powerpc debugfs:
 (Assuming debugfs is mounted on /sys/kernel/debug directory.)
 


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v5 31/31] powernv/fadump: support holes in kernel boot memory area
  2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
                   ` (29 preceding siblings ...)
  2019-08-20 12:07 ` [PATCH v5 30/31] powernv/fadump: update documentation about option to release opalcore Hari Bathini
@ 2019-08-20 12:07 ` Hari Bathini
  30 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-20 12:07 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

With support to copy multiple kernel boot memory regions owing to copy
size limitation, also handle holes in the memory area to be preserved.
Support as many as 128 kernel boot memory regions. This allows having
an adequate FADump capture kernel size for different scenarios.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump-common.c          |   15 ++
 arch/powerpc/kernel/fadump-common.h          |   12 ++
 arch/powerpc/kernel/fadump.c                 |  178 ++++++++++++++++++++++----
 arch/powerpc/platforms/powernv/opal-fadump.c |   56 ++++----
 arch/powerpc/platforms/powernv/opal-fadump.h |    5 -
 arch/powerpc/platforms/pseries/rtas-fadump.c |   14 ++
 arch/powerpc/platforms/pseries/rtas-fadump.h |    5 +
 7 files changed, 222 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/kernel/fadump-common.c b/arch/powerpc/kernel/fadump-common.c
index 7f39e4f..82862de 100644
--- a/arch/powerpc/kernel/fadump-common.c
+++ b/arch/powerpc/kernel/fadump-common.c
@@ -121,10 +121,19 @@ static int is_fadump_memory_area_contiguous(unsigned long d_start,
  */
 int is_fadump_boot_mem_contiguous(struct fw_dump *fadump_conf)
 {
-	unsigned long d_start = RMA_START;
-	unsigned long d_end   = RMA_START + fadump_conf->boot_memory_size;
+	int i, ret = 0;
+	unsigned long d_start, d_end;
 
-	return is_fadump_memory_area_contiguous(d_start, d_end);
+	for (i = 0; i < fadump_conf->boot_mem_regs_cnt; i++) {
+		d_start = fadump_conf->boot_mem_addr[i];
+		d_end   = d_start + fadump_conf->boot_mem_sz[i];
+
+		ret = is_fadump_memory_area_contiguous(d_start, d_end);
+		if (!ret)
+			break;
+	}
+
+	return ret;
 }
 
 /*
diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
index 55e4f25..8da8900 100644
--- a/arch/powerpc/kernel/fadump-common.h
+++ b/arch/powerpc/kernel/fadump-common.h
@@ -12,6 +12,9 @@
 #ifndef __PPC64_FA_DUMP_INTERNAL_H__
 #define __PPC64_FA_DUMP_INTERNAL_H__
 
+/* Maximum number of memory regions kernel supports */
+#define FADUMP_MAX_MEM_REGS			128
+
 #ifndef CONFIG_PRESERVE_FA_DUMP
 /*
  * The RMA region will be saved for later dumping when kernel crashes.
@@ -106,12 +109,21 @@ struct fw_dump {
 	unsigned long	hpte_region_size;
 
 	unsigned long	boot_mem_dest_addr;
+	unsigned long	boot_mem_addr[FADUMP_MAX_MEM_REGS];
+	unsigned long	boot_mem_sz[FADUMP_MAX_MEM_REGS];
+	unsigned long	boot_mem_regs_cnt;
+	unsigned long	boot_mem_top;
 	unsigned long	boot_memory_size;
 
 	unsigned long	fadumphdr_addr;
 	unsigned long	cpu_notes_buf;
 	unsigned long	cpu_notes_buf_size;
 
+	/*
+	 * Maximum size supported by firmware to copy from source to
+	 * destination address per entry.
+	 */
+	unsigned long	max_copy_size;
 	u64		kernel_metadata;
 
 	int		ibm_configure_kernel_dump;
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 4b168c4..4b1bb3c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -129,6 +129,7 @@ int is_fadump_memory_area(u64 addr, ulong size)
 {
 	u64 d_start = fw_dump.reserve_dump_area_start;
 	u64 d_end = d_start + fw_dump.reserve_dump_area_size;
+	u64 b_end = fw_dump.boot_mem_top;
 
 	if (!fw_dump.dump_registered)
 		return 0;
@@ -136,7 +137,7 @@ int is_fadump_memory_area(u64 addr, ulong size)
 	if (((addr + size) > d_start) && (addr <= d_end))
 		return 1;
 
-	return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
+	return (((addr + size) > RMA_START) && (addr <= b_end));
 }
 
 int should_fadump_crash(void)
@@ -154,6 +155,8 @@ int is_fadump_active(void)
 /* Print firmware assisted dump configurations for debugging purpose. */
 static void fadump_show_config(void)
 {
+	int i;
+
 	pr_debug("Support for firmware-assisted dump (fadump): %s\n",
 			(fw_dump.fadump_supported ? "present" : "no support"));
 
@@ -167,7 +170,13 @@ static void fadump_show_config(void)
 	pr_debug("Dump section sizes:\n");
 	pr_debug("    CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
 	pr_debug("    HPTE region size   : %lx\n", fw_dump.hpte_region_size);
-	pr_debug("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
+	pr_debug("    Boot memory size   : %lx\n", fw_dump.boot_memory_size);
+	pr_debug("    Boot memory top    : %lx\n", fw_dump.boot_mem_top);
+	pr_debug("Boot meory regions count : %lx\n", fw_dump.boot_mem_regs_cnt);
+	for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) {
+		pr_debug("%d. base = %lx, size = %lx\n", (i+1),
+			 fw_dump.boot_mem_addr[i], fw_dump.boot_mem_sz[i]);
+	}
 }
 
 /**
@@ -266,6 +275,88 @@ static unsigned long get_fadump_area_size(void)
 	return size;
 }
 
+static int __init add_boot_mem_region(unsigned long rstart,
+				      unsigned long rsize)
+{
+	int i = fw_dump.boot_mem_regs_cnt++;
+
+	if (fw_dump.boot_mem_regs_cnt > FADUMP_MAX_MEM_REGS) {
+		fw_dump.boot_mem_regs_cnt = FADUMP_MAX_MEM_REGS;
+		return 0;
+	}
+
+	pr_debug("Added boot memory range[%d] [%#016lx-%#016lx)\n",
+		 i, rstart, (rstart + rsize));
+	fw_dump.boot_mem_addr[i] = rstart;
+	fw_dump.boot_mem_sz[i] = rsize;
+	return 1;
+}
+
+/*
+ * Firmware usually has a hard limit on the data it can copy per region.
+ * Honour that by splitting a memory range into multiple regions.
+ */
+static int __init add_boot_mem_regions(unsigned long mstart,
+				       unsigned long msize)
+{
+	unsigned long rstart, rsize, max_size;
+	int ret = 1;
+
+	rstart = mstart;
+	max_size = fw_dump.max_copy_size ? fw_dump.max_copy_size : msize;
+	while (msize) {
+		if (msize > max_size)
+			rsize = max_size;
+		else
+			rsize = msize;
+
+		ret = add_boot_mem_region(rstart, rsize);
+		if (!ret)
+			break;
+
+		msize -= rsize;
+		rstart += rsize;
+	}
+
+	return ret;
+}
+
+static int __init fadump_get_boot_mem_regions(void)
+{
+	int ret = 1;
+	struct memblock_region *reg;
+	unsigned long base, size, cur_size, hole_size, last_end;
+	unsigned long mem_size = fw_dump.boot_memory_size;
+
+	fw_dump.boot_mem_regs_cnt = 0;
+
+	last_end = 0;
+	hole_size = 0;
+	cur_size = 0;
+	for_each_memblock(memory, reg) {
+		base = reg->base;
+		size = reg->size;
+		hole_size += (base - last_end);
+
+		if ((cur_size + size) >= mem_size) {
+			size = (mem_size - cur_size);
+			ret = add_boot_mem_regions(base, size);
+			break;
+		}
+
+		mem_size -= size;
+		cur_size += size;
+		ret = add_boot_mem_regions(base, size);
+		if (!ret)
+			break;
+
+		last_end = base + size;
+	}
+	fw_dump.boot_mem_top = fw_dump.boot_memory_size + hole_size;
+
+	return ret;
+}
+
 int __init fadump_reserve_mem(void)
 {
 	int ret = 1;
@@ -300,6 +391,11 @@ int __init fadump_reserve_mem(void)
 			       fw_dump.ops->fadump_get_bootmem_min());
 			goto error_out;
 		}
+
+		if (!fadump_get_boot_mem_regions()) {
+			pr_err("Too many holes in boot memory area to enable fadump\n");
+			goto error_out;
+		}
 	}
 
 	/*
@@ -323,7 +419,8 @@ int __init fadump_reserve_mem(void)
 	else
 		memory_boundary = memblock_end_of_DRAM();
 
-	base = fw_dump.boot_memory_size;
+	base = fw_dump.boot_mem_top;
+	base = PAGE_ALIGN(base);
 	size = get_fadump_area_size();
 	fw_dump.reserve_dump_area_size = size;
 	if (fw_dump.dump_active) {
@@ -625,37 +722,39 @@ static int fadump_init_elfcore_header(char *bufp)
 static int fadump_setup_crash_memory_ranges(void)
 {
 	struct memblock_region *reg;
-	unsigned long long start, end;
-	int ret;
+	unsigned long long start, end, offset;
+	int i, ret;
 
 	pr_debug("Setup crash memory ranges.\n");
 	crash_mrange_info.mem_range_cnt = 0;
+	offset = fw_dump.boot_mem_top;
 
 	/*
-	 * add the first memory chunk (RMA_START through boot_memory_size) as
-	 * a separate memory chunk. The reason is, at the time crash firmware
-	 * will move the content of this memory chunk to different location
-	 * specified during fadump registration. We need to create a separate
-	 * program header for this chunk with the correct offset.
+	 * Boot memory region(s) registered with firmware are moved to
+	 * a different location at the time of crash. Create separate program
+	 * header(s) for this memory chunk(s) with the correct offset.
 	 */
-	ret = fadump_add_mem_range(&crash_mrange_info,
-				   RMA_START, fw_dump.boot_memory_size);
-	if (ret)
-		return ret;
+	for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) {
+		start = fw_dump.boot_mem_addr[i];
+		end = start + fw_dump.boot_mem_sz[i];
+		ret = fadump_add_mem_range(&crash_mrange_info, start, end);
+		if (ret)
+			return ret;
+	}
 
 	for_each_memblock(memory, reg) {
 		start = (unsigned long long)reg->base;
 		end = start + (unsigned long long)reg->size;
 
 		/*
-		 * skip the first memory chunk that is already added (RMA_START
+		 * Skip the first memory chunk that is already added (RMA_START
 		 * through boot_memory_size). This logic needs a relook if and
 		 * when RMA_START changes to a non-zero value.
 		 */
 		BUILD_BUG_ON(RMA_START != 0);
-		if (start < fw_dump.boot_memory_size) {
-			if (end > fw_dump.boot_memory_size)
-				start = fw_dump.boot_memory_size;
+		if (start < offset) {
+			if (end > offset)
+				start = offset;
 			else
 				continue;
 		}
@@ -676,17 +775,35 @@ static int fadump_setup_crash_memory_ranges(void)
  */
 static inline unsigned long fadump_relocate(unsigned long paddr)
 {
-	if (paddr > RMA_START && paddr < fw_dump.boot_memory_size)
-		return fw_dump.boot_mem_dest_addr + paddr;
-	else
-		return paddr;
+	unsigned long raddr, rstart, rend, rlast, hole_size;
+	int i;
+
+	hole_size = 0;
+	rlast = 0;
+	raddr = paddr;
+	for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) {
+		rstart = fw_dump.boot_mem_addr[i];
+		rend = rstart + fw_dump.boot_mem_sz[i];
+		hole_size += (rstart - rlast);
+
+		if (paddr >= rstart && paddr < rend) {
+			raddr += fw_dump.boot_mem_dest_addr - hole_size;
+			break;
+		}
+
+		rlast = rend;
+	}
+
+	pr_debug("vmcoreinfo: paddr = 0x%lx, raddr = 0x%lx\n", paddr, raddr);
+	return raddr;
 }
 
 static int fadump_create_elfcore_headers(char *bufp)
 {
 	struct elfhdr *elf;
 	struct elf_phdr *phdr;
-	int i;
+	unsigned long long raddr, offset;
+	int i, j;
 
 	fadump_init_elfcore_header(bufp);
 	elf = (struct elfhdr *)bufp;
@@ -729,7 +846,9 @@ static int fadump_create_elfcore_headers(char *bufp)
 	(elf->e_phnum)++;
 
 	/* setup PT_LOAD sections. */
-
+	j = 0;
+	offset = 0;
+	raddr = fw_dump.boot_mem_addr[0];
 	for (i = 0; i < crash_mrange_info.mem_range_cnt; i++) {
 		unsigned long long mbase, msize;
 		mbase = crash_mrange_info.mem_ranges[i].base;
@@ -744,13 +863,17 @@ static int fadump_create_elfcore_headers(char *bufp)
 		phdr->p_flags	= PF_R|PF_W|PF_X;
 		phdr->p_offset	= mbase;
 
-		if (mbase == RMA_START) {
+		if (mbase == raddr) {
 			/*
 			 * The entire RMA region will be moved by firmware
 			 * to the specified destination_address. Hence set
 			 * the correct offset.
 			 */
-			phdr->p_offset = fw_dump.boot_mem_dest_addr;
+			phdr->p_offset = fw_dump.boot_mem_dest_addr + offset;
+			if (j < (fw_dump.boot_mem_regs_cnt - 1)) {
+				offset += fw_dump.boot_mem_sz[j];
+				raddr = fw_dump.boot_mem_addr[++j];
+			}
 		}
 
 		phdr->p_paddr = mbase;
@@ -1033,7 +1156,8 @@ static void fadump_invalidate_release_mem(void)
 	fadump_cleanup();
 	mutex_unlock(&fadump_mutex);
 
-	fadump_release_memory(fw_dump.boot_memory_size, memblock_end_of_DRAM());
+	fadump_release_memory(PAGE_ALIGN(fw_dump.boot_mem_top),
+			      memblock_end_of_DRAM());
 	if (fw_dump.cpu_notes_buf) {
 		fadump_cpu_notes_buf_free(
 				(unsigned long)__va(fw_dump.cpu_notes_buf),
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 50cf9e6..593da18 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -120,19 +120,29 @@ static void opal_fadump_update_config(struct fw_dump *fadump_conf,
 static void opal_fadump_get_config(struct fw_dump *fadump_conf,
 				   const struct opal_fadump_mem_struct *fdm)
 {
+	unsigned long base, size, last_end, hole_size;
 	int i;
 
 	if (!fadump_conf->dump_active)
 		return;
 
+	last_end = 0;
+	hole_size = 0;
 	fadump_conf->boot_memory_size = 0;
 
 	pr_debug("Boot memory regions:\n");
 	for (i = 0; i < fdm->region_cnt; i++) {
-		pr_debug("\t%d. base: 0x%llx, size: 0x%llx\n",
-			 (i + 1), fdm->rgn[i].src, fdm->rgn[i].size);
+		base = fdm->rgn[i].src;
+		size = fdm->rgn[i].size;
+		pr_debug("\t%d. base: 0x%lx, size: 0x%lx\n",
+			 (i + 1), base, size);
 
-		fadump_conf->boot_memory_size += fdm->rgn[i].size;
+		fadump_conf->boot_mem_addr[i] = base;
+		fadump_conf->boot_mem_sz[i] = size;
+		fadump_conf->boot_memory_size += size;
+		hole_size += (base - last_end);
+
+		last_end = base + size;
 	}
 
 	/*
@@ -165,6 +175,8 @@ static void opal_fadump_get_config(struct fw_dump *fadump_conf,
 		pr_warn("  But the sanity of the '/proc/vmcore' file depends on whether the above region(s) have any kernel pages or not.\n");
 	}
 
+	fadump_conf->boot_mem_top = (fadump_conf->boot_memory_size + hole_size);
+	fadump_conf->boot_mem_regs_cnt = fdm->region_cnt;
 	opal_fadump_update_config(fadump_conf, fdm);
 }
 
@@ -179,34 +191,20 @@ static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
 
 static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 {
-	ulong src_addr, dest_addr;
-	int max_copy_size, cur_size, size;
+	ulong addr = fadump_conf->reserve_dump_area_start;
+	int i;
 
 	opal_fdm = __va(fadump_conf->kernel_metadata);
 	opal_fadump_init_metadata(opal_fdm);
 
-	/*
-	 * Firmware currently supports only 32-bit value for size,
-	 * align it to pagesize and request firmware to copy multiple
-	 * kernel boot memory regions.
-	 */
-	max_copy_size = _ALIGN_DOWN(U32_MAX, PAGE_SIZE);
-
 	/* Boot memory regions */
-	src_addr = RMA_START;
-	dest_addr = fadump_conf->reserve_dump_area_start;
-	size = fadump_conf->boot_memory_size;
-	while (size) {
-		cur_size = size > max_copy_size ? max_copy_size : size;
-
-		opal_fdm->rgn[opal_fdm->region_cnt].src  = src_addr;
-		opal_fdm->rgn[opal_fdm->region_cnt].dest = dest_addr;
-		opal_fdm->rgn[opal_fdm->region_cnt].size = cur_size;
+	for (i = 0; i < fadump_conf->boot_mem_regs_cnt; i++) {
+		opal_fdm->rgn[i].src	= fadump_conf->boot_mem_addr[i];
+		opal_fdm->rgn[i].dest	= addr;
+		opal_fdm->rgn[i].size	= fadump_conf->boot_mem_sz[i];
 
 		opal_fdm->region_cnt++;
-		dest_addr	+= cur_size;
-		src_addr	+= cur_size;
-		size		-= cur_size;
+		addr += fadump_conf->boot_mem_sz[i];
 	}
 
 	/*
@@ -218,7 +216,7 @@ static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 
 	opal_fadump_update_config(fadump_conf, opal_fdm);
 
-	return dest_addr;
+	return addr;
 }
 
 static ulong opal_fadump_get_metadata_size(void)
@@ -263,7 +261,7 @@ static int opal_fadump_setup_metadata(struct fw_dump *fadump_conf)
 	 * by a kernel that intends to preserve crash'ed kernel's memory.
 	 */
 	ret = opal_mpipl_register_tag(OPAL_MPIPL_TAG_BOOT_MEM,
-				      fadump_conf->boot_memory_size);
+				      fadump_conf->boot_mem_top);
 	if (ret != OPAL_SUCCESS) {
 		pr_err("Failed to set boot memory tag!\n");
 		err = -EPERM;
@@ -649,6 +647,12 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 	fadump_conf->fadump_supported	= 1;
 
 	/*
+	 * Firmware currently supports only 32-bit value for size,
+	 * align it to pagesize.
+	 */
+	fadump_conf->max_copy_size = _ALIGN_DOWN(U32_MAX, PAGE_SIZE);
+
+	/*
 	 * Check if dump has been initiated on last reboot.
 	 */
 	prop = of_get_flat_dt_prop(dn, "mpipl-boot", NULL);
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h
index ae71a16..3b0c4fb 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.h
+++ b/arch/powerpc/platforms/powernv/opal-fadump.h
@@ -21,9 +21,6 @@
 /* OPAL FADump structure format version */
 #define OPAL_FADUMP_VERSION			0x1
 
-/* Maximum number of memory regions kernel supports */
-#define OPAL_FADUMP_MAX_MEM_REGS		128
-
 /*
  * FADump memory structure for storing kernel metadata needed to
  * register-for/process crash dump. The address of this structure will
@@ -36,7 +33,7 @@ struct opal_fadump_mem_struct {
 	u16	region_cnt;		/* number of regions */
 	u16	registered_regions;	/* Regions registered for MPIPL */
 	u64	fadumphdr_addr;
-	struct opal_mpipl_region	rgn[OPAL_FADUMP_MAX_MEM_REGS];
+	struct opal_mpipl_region	rgn[FADUMP_MAX_MEM_REGS];
 } __attribute__((packed));
 
 /*
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c
index 7dee6d0..5ea8360 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -46,7 +46,13 @@ static void rtas_fadump_update_config(struct fw_dump *fadump_conf,
 static void rtas_fadump_get_config(struct fw_dump *fadump_conf,
 				   const struct rtas_fadump_mem_struct *fdm)
 {
-	fadump_conf->boot_memory_size = be64_to_cpu(fdm->rmr_region.source_len);
+	fadump_conf->boot_mem_addr[0] =
+		be64_to_cpu(fdm->rmr_region.source_address);
+	fadump_conf->boot_mem_sz[0] = be64_to_cpu(fdm->rmr_region.source_len);
+	fadump_conf->boot_memory_size = fadump_conf->boot_mem_sz[0];
+
+	fadump_conf->boot_mem_top = fadump_conf->boot_memory_size;
+	fadump_conf->boot_mem_regs_cnt = 1;
 
 	/*
 	 * Start address of reserve dump area (permanent reservation) for
@@ -111,8 +117,7 @@ static ulong rtas_fadump_init_mem_struct(struct fw_dump *fadump_conf)
 	fdm.rmr_region.source_data_type =
 		cpu_to_be16(RTAS_FADUMP_REAL_MODE_REGION);
 	fdm.rmr_region.source_address = cpu_to_be64(RMA_START);
-	fdm.rmr_region.source_len =
-		cpu_to_be64(fadump_conf->boot_memory_size);
+	fdm.rmr_region.source_len = cpu_to_be64(fadump_conf->boot_memory_size);
 	fdm.rmr_region.destination_address = cpu_to_be64(addr);
 	addr += fadump_conf->boot_memory_size;
 
@@ -541,6 +546,9 @@ int __init rtas_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
 	fadump_conf->ops		= &rtas_fadump_ops;
 	fadump_conf->fadump_supported	= 1;
 
+	/* Firmware supports 64-bit value for size, align it to pagesize. */
+	fadump_conf->max_copy_size = _ALIGN_DOWN(U64_MAX, PAGE_SIZE);
+
 	/*
 	 * The 'ibm,kernel-dump' rtas node is present only if there is
 	 * dump data waiting for us.
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.h b/arch/powerpc/platforms/pseries/rtas-fadump.h
index feaeb75..fad87b3 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.h
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.h
@@ -72,6 +72,11 @@ struct rtas_fadump_mem_struct {
 	/* Kernel dump sections */
 	struct rtas_fadump_section		cpu_state_data;
 	struct rtas_fadump_section		hpte_region;
+
+	/*
+	 * TODO: Extend multiple boot memory regions support in the kernel
+	 *       for this platform.
+	 */
 	struct rtas_fadump_section		rmr_region;
 };
 


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 13/31] powernv/fadump: reset metadata address during clean up
  2019-08-20 12:05 ` [PATCH v5 13/31] powernv/fadump: reset metadata address during clean up Hari Bathini
@ 2019-08-27 12:00   ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-08-27 12:00 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 20/08/19 5:35 PM, Hari Bathini wrote:
> During kexec boot, metadata address needs to be reset to avoid running
> into errors interpreting stale metadata address, in case the kexec'ed
> kernel crashes before metadata address could be setup again.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>  arch/powerpc/kernel/fadump-common.h          |    1 +
>  arch/powerpc/kernel/fadump.c                 |    2 ++
>  arch/powerpc/platforms/powernv/opal-fadump.c |   10 ++++++++++
>  arch/powerpc/platforms/pseries/rtas-fadump.c |    3 +++
>  4 files changed, 16 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
> index 0acf412..d2dd117 100644
> --- a/arch/powerpc/kernel/fadump-common.h
> +++ b/arch/powerpc/kernel/fadump-common.h
> @@ -120,6 +120,7 @@ struct fadump_ops {
>  	int	(*fadump_register)(struct fw_dump *fadump_config);
>  	int	(*fadump_unregister)(struct fw_dump *fadump_config);
>  	int	(*fadump_invalidate)(struct fw_dump *fadump_config);
> +	void	(*fadump_cleanup)(struct fw_dump *fadump_config);
>  	int	(*fadump_process)(struct fw_dump *fadump_config);
>  	void	(*fadump_region_show)(struct fw_dump *fadump_config,
>  				      struct seq_file *m);
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index a086a09..b2d5ca6 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -830,6 +830,8 @@ void fadump_cleanup(void)
>  		fw_dump.ops->fadump_unregister(&fw_dump);
>  		free_crash_memory_ranges();
>  	}
> +
> +	fw_dump.ops->fadump_cleanup(&fw_dump);

Actually, need to check if FADump is supported before proceeding with cleanup callbacks
as fadump_cleanup() can be called outside FADump code in shutdown and kexec paths which
could crash the system on machines that do not support FADump. Re-sent the patch adding
the check in fadump_cleanup() function:

    https://patchwork.ozlabs.org/patch/1153806/
    ("[RESEND,v5,13/31] powernv/fadump: reset metadata address during clean up")

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 02/31] powerpc/fadump: move internal code to a new file
  2019-08-20 12:04 ` [PATCH v5 02/31] powerpc/fadump: move internal code to a new file Hari Bathini
@ 2019-09-03 11:09   ` Michael Ellerman
  2019-09-03 16:05     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-03 11:09 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> Make way for refactoring platform specific FADump code by moving code
> that could be referenced from multiple places to fadump-common.c file.
>
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>  arch/powerpc/kernel/Makefile        |    2 
>  arch/powerpc/kernel/fadump-common.c |  140 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/kernel/fadump-common.h |    8 ++
>  arch/powerpc/kernel/fadump.c        |  146 ++---------------------------------
>  4 files changed, 158 insertions(+), 138 deletions(-)
>  create mode 100644 arch/powerpc/kernel/fadump-common.c

I don't understand why we need fadump.c and fadump-common.c? They're
both common/shared across pseries & powernv aren't they?

By the end of the series we end up with 149 lines in fadump-common.c
which seems like a waste of time. Just put it all in fadump.c.

> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 56dfa7a..439d548 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -78,7 +78,7 @@ obj-$(CONFIG_EEH)              += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
>  				  eeh_driver.o eeh_event.o eeh_sysfs.o
>  obj-$(CONFIG_GENERIC_TBSYNC)	+= smp-tbsync.o
>  obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o
> -obj-$(CONFIG_FA_DUMP)		+= fadump.o
> +obj-$(CONFIG_FA_DUMP)		+= fadump.o fadump-common.o
>  ifdef CONFIG_PPC32
>  obj-$(CONFIG_E500)		+= idle_e500.o
>  endif
> diff --git a/arch/powerpc/kernel/fadump-common.c b/arch/powerpc/kernel/fadump-common.c
> new file mode 100644
> index 0000000..7f39e4f
> --- /dev/null
> +++ b/arch/powerpc/kernel/fadump-common.c
> @@ -0,0 +1,140 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Firmware-Assisted Dump internal code.
> + *
> + * Copyright 2011, IBM Corporation
> + * Author: Mahesh Salgaonkar <mahesh@linux.ibm.com>

Can we not put emails in C files anymore please, they just bitrot, just
the names is fine.

> + * Copyright 2019, IBM Corp.
> + * Author: Hari Bathini <hbathini@linux.ibm.com>

These can just be:

 * Copyright 2011, Mahesh Salgaonkar, IBM Corporation.
 * Copyright 2019, Hari Bathini, IBM Corporation.

> + */
> +
> +#undef DEBUG

Don't undef DEBUG please.

> +#define pr_fmt(fmt) "fadump: " fmt
> +
> +#include <linux/memblock.h>
> +#include <linux/elf.h>
> +#include <linux/mm.h>
> +#include <linux/crash_core.h>
> +
> +#include "fadump-common.h"
> +
> +void *fadump_cpu_notes_buf_alloc(unsigned long size)
> +{
> +	void *vaddr;
> +	struct page *page;
> +	unsigned long order, count, i;
> +
> +	order = get_order(size);
> +	vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order);
> +	if (!vaddr)
> +		return NULL;
> +
> +	count = 1 << order;
> +	page = virt_to_page(vaddr);
> +	for (i = 0; i < count; i++)
> +		SetPageReserved(page + i);
> +	return vaddr;
> +}

I realise you're just moving this code, but why do we need all this hand
rolled allocation stuff?

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 01/31] powerpc/fadump: move internal macros/definitions to a new header
  2019-08-20 12:04 ` [PATCH v5 01/31] powerpc/fadump: move internal macros/definitions to a new header Hari Bathini
@ 2019-09-03 11:09   ` Michael Ellerman
  2019-09-03 16:05     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-03 11:09 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> Though asm/fadump.h is meant to be used by other components dealing
> with FADump, it also has macros/definitions internal to FADump code.
> Move them to a new header file used within FADump code. This also
> makes way for refactoring platform specific FADump code.
>
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/fadump.h   |   71 ----------------------------
>  arch/powerpc/kernel/fadump-common.h |   89 +++++++++++++++++++++++++++++++++++
>  arch/powerpc/kernel/fadump.c        |    2 +

I don't like having a header in kernel that's then used in platform
code. Because we end up having to do gross things like:

  arch/powerpc/platforms/powernv/opal-core.c:#include "../../kernel/fadump-common.h"
  arch/powerpc/platforms/powernv/opal-fadump.c:#include "../../kernel/fadump-common.h"
  arch/powerpc/platforms/pseries/rtas-fadump.c:#include "../../kernel/fadump-common.h"


I'd rather you put the internal bits in arch/powerpc/include/asm/fadump-internal.h

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 05/31] pseries/fadump: introduce callbacks for platform specific operations
  2019-08-20 12:04 ` [PATCH v5 05/31] pseries/fadump: introduce callbacks for platform specific operations Hari Bathini
@ 2019-09-03 11:10   ` Michael Ellerman
  2019-09-03 16:06     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-03 11:10 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> Introduce callback functions for platform specific operations like
> register, unregister, invalidate & such. Also, define place-holders
> for the same on pSeries platform.

We already have an ops structure for machine specific calls, it's
ppc_md. Is there a good reason why these aren't just in machdep_calls
under #ifdef CONFIG_FA_DUMP ?

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 06/31] pseries/fadump: define register/un-register callback functions
  2019-08-20 12:04 ` [PATCH v5 06/31] pseries/fadump: define register/un-register callback functions Hari Bathini
@ 2019-09-03 11:10   ` Michael Ellerman
  2019-09-03 17:15     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-03 11:10 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> Make RTAS calls to register and un-register for FADump. Also, update
> how fadump_region contents are diplayed to provide more information.

That sounds like two independent changes, so can this be split into two
patches?

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 07/31] powerpc/fadump: release all the memory above boot memory size
  2019-08-20 12:04 ` [PATCH v5 07/31] powerpc/fadump: release all the memory above boot memory size Hari Bathini
@ 2019-09-03 11:10   ` Michael Ellerman
  2019-09-03 16:27     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-03 11:10 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:

> Except for reserve dump area which is permanent reserved, all memory
                                        permanently

> above boot memory size is released when the dump is invalidated. Make
> this a bit more explicit in the code.

I'm not clear on what you mean by "boot memory size"?

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 10/31] opal: add MPIPL interface definitions
  2019-08-20 12:05 ` [PATCH v5 10/31] opal: add MPIPL interface definitions Hari Bathini
@ 2019-09-03 11:10   ` Michael Ellerman
  2019-09-03 16:28     ` Hari Bathini
  2019-09-04 11:05   ` Michael Ellerman
  1 sibling, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-03 11:10 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index 383242e..c8a5665 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -980,6 +983,50 @@ struct opal_sg_list {
>  };
>  
>  /*
> + * Firmware-Assisted Dump (FADump) using MPIPL
> + */
> +
> +/* MPIPL update operations */
> +enum opal_mpipl_ops {
> +	OPAL_MPIPL_ADD_RANGE			= 0,
> +	OPAL_MPIPL_REMOVE_RANGE			= 1,
> +	OPAL_MPIPL_REMOVE_ALL			= 2,
> +	OPAL_MPIPL_FREE_PRESERVED_MEMORY	= 3,
> +};
> +
> +/*
> + * Each tag maps to a metadata type. Use these tags to register/query
> + * corresponding metadata address with/from OPAL.
> + */
> +enum opal_mpipl_tags {
> +	OPAL_MPIPL_TAG_CPU		= 0,
> +	OPAL_MPIPL_TAG_OPAL		= 1,
> +	OPAL_MPIPL_TAG_KERNEL		= 2,
> +	OPAL_MPIPL_TAG_BOOT_MEM		= 3,
> +};
> +
> +/* Preserved memory details */
> +struct opal_mpipl_region {
> +	__be64	src;
> +	__be64	dest;
> +	__be64	size;
> +};
> +
> +/* FADump structure format version */
> +#define MPIPL_FADUMP_VERSION			0x01
> +
> +/* Metadata provided by OPAL. */
> +struct opal_mpipl_fadump {
> +	u8				version;
> +	u8				reserved[7];
> +	__be32				crashing_pir;
> +	__be32				cpu_data_version;
> +	__be32				cpu_data_size;
> +	__be32				region_cnt;
> +	struct opal_mpipl_region	region[];
> +} __attribute__((packed));
> +

The above hunk is in the wrong place vs the skiboot header. Please put
things in exactly the same place in the skiboot and kernel versions of
the header.

After your kernel & skiboot patches are applied, the result of:

 $ git diff ~/src/skiboot/include/opal-api.h arch/powerpc/include/asm/opal-api.h

Should not include anything MPIPL/fadump related.


> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 57bd029..878110a 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -39,6 +39,12 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
>  				uint64_t PE_handle);
>  int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
>  			uint64_t rate_phys, uint32_t size);
> +
> +int64_t opal_mpipl_update(enum opal_mpipl_ops op, u64 src,
> +			  u64 dest, u64 size);
> +int64_t opal_mpipl_register_tag(enum opal_mpipl_tags tag, uint64_t addr);
> +int64_t opal_mpipl_query_tag(enum opal_mpipl_tags tag, uint64_t *addr);
> +

Please consistently use kernel types for new prototypes in here.

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 11/31] powernv/fadump: add fadump support on powernv
  2019-08-20 12:05 ` [PATCH v5 11/31] powernv/fadump: add fadump support on powernv Hari Bathini
@ 2019-09-03 11:10   ` Michael Ellerman
  2019-09-03 16:31     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-03 11:10 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> Add basic callback functions for FADump on PowerNV platform.

I assume this doesn't actually work yet?

Does something block it from appearing to work at runtime?

> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index d8dcd88..fc4ecfe 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -566,7 +566,7 @@ config CRASH_DUMP
>  
>  config FA_DUMP
>  	bool "Firmware-assisted dump"
> -	depends on PPC64 && PPC_RTAS
> +	depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
>  	select CRASH_CORE
>  	select CRASH_DUMP
>  	help
> @@ -577,7 +577,8 @@ config FA_DUMP
>  	  is meant to be a kdump replacement offering robustness and
>  	  speed not possible without system firmware assistance.
>  
> -	  If unsure, say "N"
> +	  If unsure, say "y". Only special kernels like petitboot may
> +	  need to say "N" here.
>  
>  config IRQ_ALL_CPUS
>  	bool "Distribute interrupts on all CPUs by default"
> diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
> index d2c5b16..f6c52d3 100644
> --- a/arch/powerpc/kernel/fadump-common.h
> +++ b/arch/powerpc/kernel/fadump-common.h
> @@ -140,4 +140,13 @@ static inline int rtas_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
>  }
>  #endif
>  
> +#ifdef CONFIG_PPC_POWERNV
> +extern int opal_fadump_dt_scan(struct fw_dump *fadump_config, ulong node);
> +#else
> +static inline int opal_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
> +{
> +	return 1;
> +}

Extending the strange flat device tree calling convention to these
functions is not ideal.

It would be better I think if they just returned bool true/false for
"found it" / "not found", and then early_init_dt_scan_fw_dump() can
convert that into the appropriate return value.

> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index f7c8073..b8061fb9 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -114,6 +114,9 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
>  	if (strcmp(uname, "rtas") == 0)
>  		return rtas_fadump_dt_scan(&fw_dump, node);
>  
> +	if (strcmp(uname, "ibm,opal") == 0)
> +		return opal_fadump_dt_scan(&fw_dump, node);
> +

ie this would become:

	if (strcmp(uname, "ibm,opal") == 0 && opal_fadump_dt_scan(&fw_dump, node))
            return 1;

>  	return 0;
>  }
>  
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index da2e99e..43a6e1c 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -6,6 +6,7 @@ obj-y			+= opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
>  obj-y			+= opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
>  
>  obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
> +obj-$(CONFIG_FA_DUMP)	+= opal-fadump.o
>  obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
>  obj-$(CONFIG_CXL_BASE)	+= pci-cxl.o
>  obj-$(CONFIG_EEH)	+= eeh-powernv.o
> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
> new file mode 100644
> index 0000000..e330877
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -0,0 +1,97 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Firmware-Assisted Dump support on POWER platform (OPAL).
> + *
> + * Copyright 2019, IBM Corp.
> + * Author: Hari Bathini <hbathini@linux.ibm.com>
> + */
> +
> +#undef DEBUG

No undef again please.

> +#define pr_fmt(fmt) "opal fadump: " fmt
> +
> +#include <linux/string.h>
> +#include <linux/seq_file.h>
> +#include <linux/of_fdt.h>
> +#include <linux/libfdt.h>
> +
> +#include <asm/opal.h>
> +
> +#include "../../kernel/fadump-common.h"
> +
> +static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
> +{
> +	return fadump_conf->reserve_dump_area_start;
> +}
> +
> +static int opal_fadump_register(struct fw_dump *fadump_conf)
> +{
> +	return -EIO;
> +}
> +
> +static int opal_fadump_unregister(struct fw_dump *fadump_conf)
> +{
> +	return -EIO;
> +}
> +
> +static int opal_fadump_invalidate(struct fw_dump *fadump_conf)
> +{
> +	return -EIO;
> +}
> +
> +static int __init opal_fadump_process(struct fw_dump *fadump_conf)
> +{
> +	return -EINVAL;
> +}
> +
> +static void opal_fadump_region_show(struct fw_dump *fadump_conf,
> +				    struct seq_file *m)
> +{
> +}
> +
> +static void opal_fadump_trigger(struct fadump_crash_info_header *fdh,
> +				const char *msg)
> +{
> +	int rc;
> +
> +	rc = opal_cec_reboot2(OPAL_REBOOT_MPIPL, msg);
> +	if (rc == OPAL_UNSUPPORTED) {
> +		pr_emerg("Reboot type %d not supported.\n",
> +			 OPAL_REBOOT_MPIPL);
> +	} else if (rc == OPAL_HARDWARE)
> +		pr_emerg("No backend support for MPIPL!\n");
> +}
> +
> +static struct fadump_ops opal_fadump_ops = {
> +	.fadump_init_mem_struct		= opal_fadump_init_mem_struct,
> +	.fadump_register		= opal_fadump_register,
> +	.fadump_unregister		= opal_fadump_unregister,
> +	.fadump_invalidate		= opal_fadump_invalidate,
> +	.fadump_process			= opal_fadump_process,
> +	.fadump_region_show		= opal_fadump_region_show,
> +	.fadump_trigger			= opal_fadump_trigger,
> +};
> +
> +int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
> +{
> +	unsigned long dn;
> +
> +	/*
> +	 * Check if Firmware-Assisted Dump is supported. if yes, check
> +	 * if dump has been initiated on last reboot.
> +	 */
> +	dn = of_get_flat_dt_subnode_by_name(node, "dump");
> +	if (dn == -FDT_ERR_NOTFOUND) {
> +		pr_debug("FADump support is missing!\n");
> +		return 1;
> +	}
> +
> +	if (!of_flat_dt_is_compatible(dn, "ibm,opal-dump")) {
> +		pr_err("Support missing for this f/w version!\n");
> +		return 1;
> +	}
> +
> +	fadump_conf->ops		= &opal_fadump_ops;
> +	fadump_conf->fadump_supported	= 1;
> +
> +	return 1;
> +}

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 02/31] powerpc/fadump: move internal code to a new file
  2019-09-03 11:09   ` Michael Ellerman
@ 2019-09-03 16:05     ` Hari Bathini
  2019-09-04  9:02       ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-09-03 16:05 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 03/09/19 4:39 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> Make way for refactoring platform specific FADump code by moving code
>> that could be referenced from multiple places to fadump-common.c file.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>  arch/powerpc/kernel/Makefile        |    2 
>>  arch/powerpc/kernel/fadump-common.c |  140 ++++++++++++++++++++++++++++++++++
>>  arch/powerpc/kernel/fadump-common.h |    8 ++
>>  arch/powerpc/kernel/fadump.c        |  146 ++---------------------------------
>>  4 files changed, 158 insertions(+), 138 deletions(-)
>>  create mode 100644 arch/powerpc/kernel/fadump-common.c
> 
> I don't understand why we need fadump.c and fadump-common.c? They're
> both common/shared across pseries & powernv aren't they?

The convention I tried to follow to have fadump-common.c shared between fadump.c,
pseries & powernv code while pseries & powernv code take callback requests from
fadump.c and use fadump-common.c (shared by both platforms), if necessary to fullfil
those requests...

> By the end of the series we end up with 149 lines in fadump-common.c
> which seems like a waste of time. Just put it all in fadump.c.

Yeah. Probably not worth a new C file. Will just have two separate headers. One for
internal code and one for interfacing with other modules...

[...]

>> + * Copyright 2019, IBM Corp.
>> + * Author: Hari Bathini <hbathini@linux.ibm.com>
> 
> These can just be:
> 
>  * Copyright 2011, Mahesh Salgaonkar, IBM Corporation.
>  * Copyright 2019, Hari Bathini, IBM Corporation.
> 

Sure.

>> + */
>> +
>> +#undef DEBUG
> 
> Don't undef DEBUG please.
> 

Sorry! Seeing such thing in most files, I thought this was the convention. Will drop
this change in all the new files I added.

>> +#define pr_fmt(fmt) "fadump: " fmt
>> +
>> +#include <linux/memblock.h>
>> +#include <linux/elf.h>
>> +#include <linux/mm.h>
>> +#include <linux/crash_core.h>
>> +
>> +#include "fadump-common.h"
>> +
>> +void *fadump_cpu_notes_buf_alloc(unsigned long size)
>> +{
>> +	void *vaddr;
>> +	struct page *page;
>> +	unsigned long order, count, i;
>> +
>> +	order = get_order(size);
>> +	vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order);
>> +	if (!vaddr)
>> +		return NULL;
>> +
>> +	count = 1 << order;
>> +	page = virt_to_page(vaddr);
>> +	for (i = 0; i < count; i++)
>> +		SetPageReserved(page + i);
>> +	return vaddr;
>> +}
> 
> I realise you're just moving this code, but why do we need all this hand
> rolled allocation stuff?

Yeah, I think alloc_pages_exact() may be better here. Mahesh, am I missing something?

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 01/31] powerpc/fadump: move internal macros/definitions to a new header
  2019-09-03 11:09   ` Michael Ellerman
@ 2019-09-03 16:05     ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-09-03 16:05 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 03/09/19 4:39 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> Though asm/fadump.h is meant to be used by other components dealing
>> with FADump, it also has macros/definitions internal to FADump code.
>> Move them to a new header file used within FADump code. This also
>> makes way for refactoring platform specific FADump code.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>  arch/powerpc/include/asm/fadump.h   |   71 ----------------------------
>>  arch/powerpc/kernel/fadump-common.h |   89 +++++++++++++++++++++++++++++++++++
>>  arch/powerpc/kernel/fadump.c        |    2 +
> 
> I don't like having a header in kernel that's then used in platform
> code. Because we end up having to do gross things like:
> 
>   arch/powerpc/platforms/powernv/opal-core.c:#include "../../kernel/fadump-common.h"
>   arch/powerpc/platforms/powernv/opal-fadump.c:#include "../../kernel/fadump-common.h"
>   arch/powerpc/platforms/pseries/rtas-fadump.c:#include "../../kernel/fadump-common.h"
> 
> 
> I'd rather you put the internal bits in arch/powerpc/include/asm/fadump-internal.h

True. Will put the internal bits in arch/powerpc/include/asm/fadump-internal.h

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 05/31] pseries/fadump: introduce callbacks for platform specific operations
  2019-09-03 11:10   ` Michael Ellerman
@ 2019-09-03 16:06     ` Hari Bathini
  2019-09-06  6:39       ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-09-03 16:06 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 03/09/19 4:40 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> Introduce callback functions for platform specific operations like
>> register, unregister, invalidate & such. Also, define place-holders
>> for the same on pSeries platform.
> 
> We already have an ops structure for machine specific calls, it's
> ppc_md. Is there a good reason why these aren't just in machdep_calls
> under #ifdef CONFIG_FA_DUMP ?

Not really. We move this callbacks to 'struct machdep_calls'

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 07/31] powerpc/fadump: release all the memory above boot memory size
  2019-09-03 11:10   ` Michael Ellerman
@ 2019-09-03 16:27     ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-09-03 16:27 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 03/09/19 4:40 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
> 
>> Except for reserve dump area which is permanent reserved, all memory
>                                         permanently
> 
>> above boot memory size is released when the dump is invalidated. Make
>> this a bit more explicit in the code.
> 
> I'm not clear on what you mean by "boot memory size"?

boot memory size is the amount of memory used to boot the capture kernel. Basically,
the amount of memory required for the kernel to boot successfully when booted with
restricted memory..

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 10/31] opal: add MPIPL interface definitions
  2019-09-03 11:10   ` Michael Ellerman
@ 2019-09-03 16:28     ` Hari Bathini
  2019-09-04 11:03       ` Michael Ellerman
  0 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-09-03 16:28 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 03/09/19 4:40 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>> index 383242e..c8a5665 100644
>> --- a/arch/powerpc/include/asm/opal-api.h
>> +++ b/arch/powerpc/include/asm/opal-api.h
>> @@ -980,6 +983,50 @@ struct opal_sg_list {
>>  };
>>  
>>  /*
>> + * Firmware-Assisted Dump (FADump) using MPIPL
>> + */
>> +
>> +/* MPIPL update operations */
>> +enum opal_mpipl_ops {
>> +	OPAL_MPIPL_ADD_RANGE			= 0,
>> +	OPAL_MPIPL_REMOVE_RANGE			= 1,
>> +	OPAL_MPIPL_REMOVE_ALL			= 2,
>> +	OPAL_MPIPL_FREE_PRESERVED_MEMORY	= 3,
>> +};
>> +
>> +/*
>> + * Each tag maps to a metadata type. Use these tags to register/query
>> + * corresponding metadata address with/from OPAL.
>> + */
>> +enum opal_mpipl_tags {
>> +	OPAL_MPIPL_TAG_CPU		= 0,
>> +	OPAL_MPIPL_TAG_OPAL		= 1,
>> +	OPAL_MPIPL_TAG_KERNEL		= 2,
>> +	OPAL_MPIPL_TAG_BOOT_MEM		= 3,
>> +};
>> +
>> +/* Preserved memory details */
>> +struct opal_mpipl_region {
>> +	__be64	src;
>> +	__be64	dest;
>> +	__be64	size;
>> +};
>> +
>> +/* FADump structure format version */
>> +#define MPIPL_FADUMP_VERSION			0x01
>> +
>> +/* Metadata provided by OPAL. */
>> +struct opal_mpipl_fadump {
>> +	u8				version;
>> +	u8				reserved[7];
>> +	__be32				crashing_pir;
>> +	__be32				cpu_data_version;
>> +	__be32				cpu_data_size;
>> +	__be32				region_cnt;
>> +	struct opal_mpipl_region	region[];
>> +} __attribute__((packed));
>> +
> 
> The above hunk is in the wrong place vs the skiboot header. Please put
> things in exactly the same place in the skiboot and kernel versions of
> the header.
> 
> After your kernel & skiboot patches are applied, the result of:
> 
>  $ git diff ~/src/skiboot/include/opal-api.h arch/powerpc/include/asm/opal-api.h
> 
> Should not include anything MPIPL/fadump related.

Sure.

> 
>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>> index 57bd029..878110a 100644
>> --- a/arch/powerpc/include/asm/opal.h
>> +++ b/arch/powerpc/include/asm/opal.h
>> @@ -39,6 +39,12 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
>>  				uint64_t PE_handle);
>>  int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
>>  			uint64_t rate_phys, uint32_t size);
>> +
>> +int64_t opal_mpipl_update(enum opal_mpipl_ops op, u64 src,
>> +			  u64 dest, u64 size);
>> +int64_t opal_mpipl_register_tag(enum opal_mpipl_tags tag, uint64_t addr);
>> +int64_t opal_mpipl_query_tag(enum opal_mpipl_tags tag, uint64_t *addr);
>> +
> 
> Please consistently use kernel types for new prototypes in here.

uint64_t instead of 'enum's?

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 11/31] powernv/fadump: add fadump support on powernv
  2019-09-03 11:10   ` Michael Ellerman
@ 2019-09-03 16:31     ` Hari Bathini
  2019-09-04 14:33       ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-09-03 16:31 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 03/09/19 4:40 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> Add basic callback functions for FADump on PowerNV platform.
> 
> I assume this doesn't actually work yet?
> 
> Does something block it from appearing to work at runtime?

With this patch, "fadump=on" would reserve memory for FADump as support is enabled
but registration with f/w is not yet added. So, it would fail to register...

> 
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index d8dcd88..fc4ecfe 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -566,7 +566,7 @@ config CRASH_DUMP
>>  
>>  config FA_DUMP
>>  	bool "Firmware-assisted dump"
>> -	depends on PPC64 && PPC_RTAS
>> +	depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
>>  	select CRASH_CORE
>>  	select CRASH_DUMP
>>  	help
>> @@ -577,7 +577,8 @@ config FA_DUMP
>>  	  is meant to be a kdump replacement offering robustness and
>>  	  speed not possible without system firmware assistance.
>>  
>> -	  If unsure, say "N"
>> +	  If unsure, say "y". Only special kernels like petitboot may
>> +	  need to say "N" here.
>>  
>>  config IRQ_ALL_CPUS
>>  	bool "Distribute interrupts on all CPUs by default"
>> diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
>> index d2c5b16..f6c52d3 100644
>> --- a/arch/powerpc/kernel/fadump-common.h
>> +++ b/arch/powerpc/kernel/fadump-common.h
>> @@ -140,4 +140,13 @@ static inline int rtas_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
>>  }
>>  #endif
>>  
>> +#ifdef CONFIG_PPC_POWERNV
>> +extern int opal_fadump_dt_scan(struct fw_dump *fadump_config, ulong node);
>> +#else
>> +static inline int opal_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
>> +{
>> +	return 1;
>> +}
> 
> Extending the strange flat device tree calling convention to these
> functions is not ideal.
> 
> It would be better I think if they just returned bool true/false for
> "found it" / "not found", and then early_init_dt_scan_fw_dump() can
> convert that into the appropriate return value.
> 
>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>> index f7c8073..b8061fb9 100644
>> --- a/arch/powerpc/kernel/fadump.c
>> +++ b/arch/powerpc/kernel/fadump.c
>> @@ -114,6 +114,9 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
>>  	if (strcmp(uname, "rtas") == 0)
>>  		return rtas_fadump_dt_scan(&fw_dump, node);
>>  
>> +	if (strcmp(uname, "ibm,opal") == 0)
>> +		return opal_fadump_dt_scan(&fw_dump, node);
>> +
> 
> ie this would become:
> 
> 	if (strcmp(uname, "ibm,opal") == 0 && opal_fadump_dt_scan(&fw_dump, node))
>             return 1;
> 

Yeah. Will update accordingly...

Thanks
Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 06/31] pseries/fadump: define register/un-register callback functions
  2019-09-03 11:10   ` Michael Ellerman
@ 2019-09-03 17:15     ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-09-03 17:15 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 03/09/19 4:40 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> Make RTAS calls to register and un-register for FADump. Also, update
>> how fadump_region contents are diplayed to provide more information.
> 
> That sounds like two independent changes, so can this be split into two
> patches?

Yeah. On splitting, the below hunk would look a bit different in this patch
and the split patch would change it to how it looks now:

> +	seq_printf(m, "DUMP: Src: %#016llx, Dest: %#016llx, ",
> +		   be64_to_cpu(fdm_ptr->rmr_region.source_address),
> +		   be64_to_cpu(fdm_ptr->rmr_region.destination_address));
> +	seq_printf(m, "Size: %#llx, Dumped: %#llx bytes\n",
> +		   be64_to_cpu(fdm_ptr->rmr_region.source_len),
> +		   be64_to_cpu(fdm_ptr->rmr_region.bytes_dumped));


- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 02/31] powerpc/fadump: move internal code to a new file
  2019-09-03 16:05     ` Hari Bathini
@ 2019-09-04  9:02       ` Mahesh Jagannath Salgaonkar
  2019-09-04 18:26         ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2019-09-04  9:02 UTC (permalink / raw)
  To: Hari Bathini, Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens

On 9/3/19 9:35 PM, Hari Bathini wrote:
> 
> 
> On 03/09/19 4:39 PM, Michael Ellerman wrote:
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>> Make way for refactoring platform specific FADump code by moving code
>>> that could be referenced from multiple places to fadump-common.c file.
>>>
>>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>>> ---
>>>  arch/powerpc/kernel/Makefile        |    2 
>>>  arch/powerpc/kernel/fadump-common.c |  140 ++++++++++++++++++++++++++++++++++
>>>  arch/powerpc/kernel/fadump-common.h |    8 ++
>>>  arch/powerpc/kernel/fadump.c        |  146 ++---------------------------------
>>>  4 files changed, 158 insertions(+), 138 deletions(-)
>>>  create mode 100644 arch/powerpc/kernel/fadump-common.c
>>
>> I don't understand why we need fadump.c and fadump-common.c? They're
>> both common/shared across pseries & powernv aren't they?
> 
> The convention I tried to follow to have fadump-common.c shared between fadump.c,
> pseries & powernv code while pseries & powernv code take callback requests from
> fadump.c and use fadump-common.c (shared by both platforms), if necessary to fullfil
> those requests...
> 
>> By the end of the series we end up with 149 lines in fadump-common.c
>> which seems like a waste of time. Just put it all in fadump.c.
> 
> Yeah. Probably not worth a new C file. Will just have two separate headers. One for
> internal code and one for interfacing with other modules...
> 
> [...]
> 
>>> + * Copyright 2019, IBM Corp.
>>> + * Author: Hari Bathini <hbathini@linux.ibm.com>
>>
>> These can just be:
>>
>>  * Copyright 2011, Mahesh Salgaonkar, IBM Corporation.
>>  * Copyright 2019, Hari Bathini, IBM Corporation.
>>
> 
> Sure.
> 
>>> + */
>>> +
>>> +#undef DEBUG
>>
>> Don't undef DEBUG please.
>>
> 
> Sorry! Seeing such thing in most files, I thought this was the convention. Will drop
> this change in all the new files I added.
> 
>>> +#define pr_fmt(fmt) "fadump: " fmt
>>> +
>>> +#include <linux/memblock.h>
>>> +#include <linux/elf.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/crash_core.h>
>>> +
>>> +#include "fadump-common.h"
>>> +
>>> +void *fadump_cpu_notes_buf_alloc(unsigned long size)
>>> +{
>>> +	void *vaddr;
>>> +	struct page *page;
>>> +	unsigned long order, count, i;
>>> +
>>> +	order = get_order(size);
>>> +	vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order);
>>> +	if (!vaddr)
>>> +		return NULL;
>>> +
>>> +	count = 1 << order;
>>> +	page = virt_to_page(vaddr);
>>> +	for (i = 0; i < count; i++)
>>> +		SetPageReserved(page + i);
>>> +	return vaddr;
>>> +}
>>
>> I realise you're just moving this code, but why do we need all this hand
>> rolled allocation stuff?
> 
> Yeah, I think alloc_pages_exact() may be better here. Mahesh, am I missing something?

We hook up the physical address of this buffer to ELF core header as
PT_NOTE section. Hence we don't want these pages to be moved around or
reclaimed.

Thanks,
-Mahesh.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 10/31] opal: add MPIPL interface definitions
  2019-09-03 16:28     ` Hari Bathini
@ 2019-09-04 11:03       ` Michael Ellerman
  0 siblings, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 11:03 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> On 03/09/19 4:40 PM, Michael Ellerman wrote:
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>>> index 57bd029..878110a 100644
>>> --- a/arch/powerpc/include/asm/opal.h
>>> +++ b/arch/powerpc/include/asm/opal.h
>>> @@ -39,6 +39,12 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
>>>  				uint64_t PE_handle);
>>>  int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
>>>  			uint64_t rate_phys, uint32_t size);
>>> +
>>> +int64_t opal_mpipl_update(enum opal_mpipl_ops op, u64 src,
>>> +			  u64 dest, u64 size);
>>> +int64_t opal_mpipl_register_tag(enum opal_mpipl_tags tag, uint64_t addr);
>>> +int64_t opal_mpipl_query_tag(enum opal_mpipl_tags tag, uint64_t *addr);
>>> +
>> 
>> Please consistently use kernel types for new prototypes in here.
>
> uint64_t instead of 'enum's?

The enums are fine, I mean u64 instead of uint64_t, s64 instead of
int64_t etc.

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 10/31] opal: add MPIPL interface definitions
  2019-08-20 12:05 ` [PATCH v5 10/31] opal: add MPIPL interface definitions Hari Bathini
  2019-09-03 11:10   ` Michael Ellerman
@ 2019-09-04 11:05   ` Michael Ellerman
  1 sibling, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 11:05 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hi Hari,

One other comment.

Hari Bathini <hbathini@linux.ibm.com> writes:
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>

Change log is missing.

Please define what MPIPL means, and give people some explanation of what
it is, how it works and how you're using it for fadump.

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 12/31] powernv/fadump: register kernel metadata address with opal
  2019-08-20 12:05 ` [PATCH v5 12/31] powernv/fadump: register kernel metadata address with opal Hari Bathini
@ 2019-09-04 11:25   ` Michael Ellerman
  0 siblings, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 11:25 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index b8061fb9..a086a09 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -283,17 +286,17 @@ static void __init fadump_reserve_crash_area(unsigned long base,
>  
>  int __init fadump_reserve_mem(void)
>  {
> +	int ret = 1;
>  	unsigned long base, size, memory_boundary;

Please try to use reverse christmas tree style when possible.

> @@ -363,29 +366,43 @@ int __init fadump_reserve_mem(void)
>  		 * use memblock_find_in_range() here since it doesn't allocate
>  		 * from bottom to top.
>  		 */
> -		for (base = fw_dump.boot_memory_size;
> -		     base <= (memory_boundary - size);
> -		     base += size) {
> +		while (base <= (memory_boundary - size)) {
>  			if (memblock_is_region_memory(base, size) &&
>  			    !memblock_is_region_reserved(base, size))
>  				break;
> +
> +			base += size;
>  		}

Some of these changes look like they might not be necessary in this
patch, ie. could be split out into a lead-up patch. eg. the conversion
from for to while. But it's a bit hard to tell.

> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
> index e330877..e5c4700 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -13,14 +13,86 @@
>  #include <linux/seq_file.h>
>  #include <linux/of_fdt.h>
>  #include <linux/libfdt.h>
> +#include <linux/mm.h>
>  
> +#include <asm/page.h>
>  #include <asm/opal.h>
>  
>  #include "../../kernel/fadump-common.h"
> +#include "opal-fadump.h"
> +
> +static struct opal_fadump_mem_struct *opal_fdm;
> +
> +/* Initialize kernel metadata */
> +static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
> +{
> +	fdm->version = OPAL_FADUMP_VERSION;
> +	fdm->region_cnt = 0;
> +	fdm->registered_regions = 0;
> +	fdm->fadumphdr_addr = 0;
> +}
>  
>  static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
>  {
> -	return fadump_conf->reserve_dump_area_start;
> +	ulong addr = fadump_conf->reserve_dump_area_start;

I just noticed you're using ulong, which I haven't seen much before. KVM
uses it a lot but not much else.

Because this is all 64-bit only code I'd probably rather you just use
u64 explicitly to avoid anyone having to think about it.

> +
> +	opal_fdm = __va(fadump_conf->kernel_metadata);
> +	opal_fadump_init_metadata(opal_fdm);
> +
> +	opal_fdm->region_cnt = 1;
> +	opal_fdm->rgn[0].src	= RMA_START;
> +	opal_fdm->rgn[0].dest	= addr;
> +	opal_fdm->rgn[0].size	= fadump_conf->boot_memory_size;
> +	addr += fadump_conf->boot_memory_size;
> +
> +	/*
> +	 * Kernel metadata is passed to f/w and retrieved in capture kerenl.
> +	 * So, use it to save fadump header address instead of calculating it.
> +	 */
> +	opal_fdm->fadumphdr_addr = (opal_fdm->rgn[0].dest +
> +				    fadump_conf->boot_memory_size);
> +
> +	return addr;
> +}
> +
> +static ulong opal_fadump_get_metadata_size(void)
> +{
> +	ulong size = sizeof(struct opal_fadump_mem_struct);
> +
> +	size = PAGE_ALIGN(size);
> +	return size;

	return PAGE_ALIGN(sizeof(struct opal_fadump_mem_struct));

???

> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h
> new file mode 100644
> index 0000000..19cac1f
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.h
> @@ -0,0 +1,33 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Firmware-Assisted Dump support on POWER platform (OPAL).
> + *
> + * Copyright 2019, IBM Corp.
> + * Author: Hari Bathini <hbathini@linux.ibm.com>
> + */
> +
> +#ifndef __PPC64_OPAL_FA_DUMP_H__
> +#define __PPC64_OPAL_FA_DUMP_H__

Usual style is _ASM_POWERPC_OPAL_FADUMP_H.


> +/* OPAL FADump structure format version */
> +#define OPAL_FADUMP_VERSION			0x1

What is the meaning of this version? How/can we change it. What does it
mean if it's a different number? Please provide some comments or doco
describing how it's expected to be used.

We're defining some sort of ABI here and I want to understand/have
better documentation on what the implications of that are.

> diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c
> index 2b94392..4111ee9 100644
> --- a/arch/powerpc/platforms/pseries/rtas-fadump.c
> +++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
> @@ -121,6 +121,21 @@ static ulong rtas_fadump_init_mem_struct(struct fw_dump *fadump_conf)
>  	return addr;
>  }
>  
> +/*
> + * On this platform, the metadata struture is passed while registering
> + * for FADump and the same is returned by f/w in capture kernel.
> + * No additional provision to setup kernel metadata separately.
> + */
> +static ulong rtas_fadump_get_metadata_size(void)
> +{
> +	return 0;
> +}
> +
> +static int rtas_fadump_setup_metadata(struct fw_dump *fadump_conf)
> +{
> +	return 0;
> +}

Each of these uses about 16 byes of text as well as space in the symbol
table. I think there's only one call site for each, so allowing the
callback to be NULL and skipping the call when it is would be slightly
more efficient.


> @@ -486,9 +501,10 @@ static void rtas_fadump_trigger(struct fadump_crash_info_header *fdh,
>  	rtas_os_term((char *)msg);
>  }
>  
> -
>  static struct fadump_ops rtas_fadump_ops = {
>  	.fadump_init_mem_struct		= rtas_fadump_init_mem_struct,
> +	.fadump_get_metadata_size	= rtas_fadump_get_metadata_size,
> +	.fadump_setup_metadata		= rtas_fadump_setup_metadata,
>  	.fadump_register		= rtas_fadump_register,
>  	.fadump_unregister		= rtas_fadump_unregister,
>  	.fadump_invalidate		= rtas_fadump_invalidate,

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 15/31] powernv/fadump: support copying multiple kernel boot memory regions
  2019-08-20 12:05 ` [PATCH v5 15/31] powernv/fadump: support copying multiple kernel boot memory regions Hari Bathini
@ 2019-09-04 11:30   ` Michael Ellerman
  2019-09-04 20:20     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 11:30 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> Firmware uses 32-bit field for region size while copying/backing-up

Which firmware exactly is imposing that limit?

> memory during MPIPL. So, the maximum copy size for a region would
> be a page less than 4GB (aligned to pagesize) but FADump capture
> kernel usually needs more memory than that to be preserved to avoid
> running into out of memory errors.
>
> So, request firmware to copy multiple kernel boot memory regions
> instead of just one (which worked fine for pseries as 64-bit field
> was used for size there).
>
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/opal-fadump.c |   35 +++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
> index 91fb909..a755705 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -28,6 +28,8 @@ static int opal_fadump_unregister(struct fw_dump *fadump_conf);
>  static void opal_fadump_update_config(struct fw_dump *fadump_conf,
>  				      const struct opal_fadump_mem_struct *fdm)
>  {
> +	pr_debug("Boot memory regions count: %d\n", fdm->region_cnt);
> +
>  	/*
>  	 * The destination address of the first boot memory region is the
>  	 * destination address of boot memory regions.
> @@ -50,16 +52,35 @@ static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
>  
>  static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
>  {
> -	ulong addr = fadump_conf->reserve_dump_area_start;
> +	ulong src_addr, dest_addr;
> +	int max_copy_size, cur_size, size;
>  
>  	opal_fdm = __va(fadump_conf->kernel_metadata);
>  	opal_fadump_init_metadata(opal_fdm);
>  
> -	opal_fdm->region_cnt = 1;
> -	opal_fdm->rgn[0].src	= RMA_START;
> -	opal_fdm->rgn[0].dest	= addr;
> -	opal_fdm->rgn[0].size	= fadump_conf->boot_memory_size;
> -	addr += fadump_conf->boot_memory_size;
> +	/*
> +	 * Firmware currently supports only 32-bit value for size,

"currently" implies it could change in future?

If it does we assume it will only increase, and we're happy that old
kernels will continue to use the 32-bit limit?

> +	 * align it to pagesize and request firmware to copy multiple
> +	 * kernel boot memory regions.
> +	 */
> +	max_copy_size = _ALIGN_DOWN(U32_MAX, PAGE_SIZE);
> +
> +	/* Boot memory regions */
> +	src_addr = RMA_START;

I'm not convinced using RMA_START actually makes things any clearer,
given that it's #defined as 0, and we even have a BUILD_BUG_ON() to make
sure it's never anything else.

eg:

	src_addr = 0;

> +	dest_addr = fadump_conf->reserve_dump_area_start;
> +	size = fadump_conf->boot_memory_size;
> +	while (size) {
> +		cur_size = size > max_copy_size ? max_copy_size : size;
> +
> +		opal_fdm->rgn[opal_fdm->region_cnt].src  = src_addr;
> +		opal_fdm->rgn[opal_fdm->region_cnt].dest = dest_addr;
> +		opal_fdm->rgn[opal_fdm->region_cnt].size = cur_size;
> +
> +		opal_fdm->region_cnt++;
> +		dest_addr	+= cur_size;
> +		src_addr	+= cur_size;
> +		size		-= cur_size;
> +	}
>  
>  	/*
>  	 * Kernel metadata is passed to f/w and retrieved in capture kerenl.
> @@ -70,7 +91,7 @@ static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
>  
>  	opal_fadump_update_config(fadump_conf, opal_fdm);
>  
> -	return addr;
> +	return dest_addr;
>  }
>  
>  static ulong opal_fadump_get_metadata_size(void)

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 16/31] powernv/fadump: process the crashdump by exporting it as /proc/vmcore
  2019-08-20 12:06 ` [PATCH v5 16/31] powernv/fadump: process the crashdump by exporting it as /proc/vmcore Hari Bathini
@ 2019-09-04 11:42   ` Michael Ellerman
  2019-09-04 21:01     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 11:42 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
> index a755705..10f6086 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -41,6 +43,37 @@ static void opal_fadump_update_config(struct fw_dump *fadump_conf,
>  	fadump_conf->fadumphdr_addr = fdm->fadumphdr_addr;
>  }
>  
> +/*
> + * This function is called in the capture kernel to get configuration details
> + * from metadata setup by the first kernel.
> + */
> +static void opal_fadump_get_config(struct fw_dump *fadump_conf,
> +				   const struct opal_fadump_mem_struct *fdm)
> +{
> +	int i;
> +
> +	if (!fadump_conf->dump_active)
> +		return;
> +
> +	fadump_conf->boot_memory_size = 0;
> +
> +	pr_debug("Boot memory regions:\n");
> +	for (i = 0; i < fdm->region_cnt; i++) {
> +		pr_debug("\t%d. base: 0x%llx, size: 0x%llx\n",
> +			 (i + 1), fdm->rgn[i].src, fdm->rgn[i].size);

Printing the zero-based array off by one (i + 1) seems confusing.

> +
> +		fadump_conf->boot_memory_size += fdm->rgn[i].size;
> +	}
> +
> +	/*
> +	 * Start address of reserve dump area (permanent reservation) for
> +	 * re-registering FADump after dump capture.
> +	 */
> +	fadump_conf->reserve_dump_area_start = fdm->rgn[0].dest;
> +
> +	opal_fadump_update_config(fadump_conf, fdm);
> +}
> +
>  /* Initialize kernel metadata */
>  static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
>  {
> @@ -215,24 +248,114 @@ static void opal_fadump_cleanup(struct fw_dump *fadump_conf)
>  		pr_warn("Could not reset (%llu) kernel metadata tag!\n", ret);
>  }
>  
> +/*
> + * Convert CPU state data saved at the time of crash into ELF notes.
> + */
> +static int __init opal_fadump_build_cpu_notes(struct fw_dump *fadump_conf)
> +{
> +	u32 num_cpus, *note_buf;
> +	struct fadump_crash_info_header *fdh = NULL;
> +
> +	num_cpus = 1;
> +	/* Allocate buffer to hold cpu crash notes. */
> +	fadump_conf->cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
> +	fadump_conf->cpu_notes_buf_size =
> +		PAGE_ALIGN(fadump_conf->cpu_notes_buf_size);
> +	note_buf = fadump_cpu_notes_buf_alloc(fadump_conf->cpu_notes_buf_size);
> +	if (!note_buf) {
> +		pr_err("Failed to allocate 0x%lx bytes for cpu notes buffer\n",
> +		       fadump_conf->cpu_notes_buf_size);
> +		return -ENOMEM;
> +	}
> +	fadump_conf->cpu_notes_buf = __pa(note_buf);
> +
> +	pr_debug("Allocated buffer for cpu notes of size %ld at %p\n",
> +		 (num_cpus * sizeof(note_buf_t)), note_buf);
> +
> +	if (fadump_conf->fadumphdr_addr)
> +		fdh = __va(fadump_conf->fadumphdr_addr);
> +
> +	if (fdh && (fdh->crashing_cpu != FADUMP_CPU_UNKNOWN)) {
> +		note_buf = fadump_regs_to_elf_notes(note_buf, &(fdh->regs));
> +		final_note(note_buf);
> +
> +		pr_debug("Updating elfcore header (%llx) with cpu notes\n",
> +			 fdh->elfcorehdr_addr);
> +		fadump_update_elfcore_header(fadump_conf,
> +					     __va(fdh->elfcorehdr_addr));
> +	}
> +
> +	return 0;
> +}
> +
>  static int __init opal_fadump_process(struct fw_dump *fadump_conf)
>  {
> -	return -EINVAL;
> +	struct fadump_crash_info_header *fdh;
> +	int rc = 0;

No need to initialise rc there.

> +	if (!opal_fdm_active || !fadump_conf->fadumphdr_addr)
> +		return -EINVAL;
> +
> +	/* Validate the fadump crash info header */
> +	fdh = __va(fadump_conf->fadumphdr_addr);
> +	if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
> +		pr_err("Crash info header is not valid.\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * TODO: To build cpu notes, find a way to map PIR to logical id.
> +	 *       Also, we may need different method for pseries and powernv.
> +	 *       The currently booted kernel could have a different PIR to
> +	 *       logical id mapping. So, try saving info of previous kernel's
> +	 *       paca to get the right PIR to logical id mapping.
> +	 */

That TODO is removed by the end of the series, so please just omit it entirely.

> +	rc = opal_fadump_build_cpu_notes(fadump_conf);
> +	if (rc)
> +		return rc;

I think this all runs early in boot, so we don't need to worry about
another CPU seeing the partially initialised core due to there being no
barrier here before we set elfcorehdr_addr?

> +	/*
> +	 * We are done validating dump info and elfcore header is now ready
> +	 * to be exported. set elfcorehdr_addr so that vmcore module will
> +	 * export the elfcore header through '/proc/vmcore'.
> +	 */
> +	elfcorehdr_addr = fdh->elfcorehdr_addr;

> @@ -283,5 +407,42 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
>  	fadump_conf->ops		= &opal_fadump_ops;
>  	fadump_conf->fadump_supported	= 1;
>  
> +	/*
> +	 * Check if dump has been initiated on last reboot.
> +	 */
> +	prop = of_get_flat_dt_prop(dn, "mpipl-boot", NULL);
> +	if (prop) {

        if (!prop)
                return 1;

And then everything below can be unindented.

> +		u64 addr = 0;
> +		s64 ret;
> +		const struct opal_fadump_mem_struct *r_opal_fdm_active;

  *
 / \
 /_\
  |

> +
> +		ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &addr);
> +		if ((ret != OPAL_SUCCESS) || !addr) {
> +			pr_err("Failed to get Kernel metadata (%lld)\n", ret);
> +			return 1;
> +		}
> +
> +		addr = be64_to_cpu(addr);
> +		pr_debug("Kernel metadata addr: %llx\n", addr);
> +
> +		opal_fdm_active = __va(addr);
> +		r_opal_fdm_active = (void *)addr;

Why do we need the r_ version?

We're called early in boot, so we are still in real mode, but that's
fine the CPU will ignore the top bits of the virtual address for us.

> +		if (r_opal_fdm_active->version != OPAL_FADUMP_VERSION) {
> +			pr_err("FADump active but version (%u) unsupported!\n",
> +			       r_opal_fdm_active->version);
> +			return 1;
> +		}
> +
> +		/* Kernel regions not registered with f/w for MPIPL */
> +		if (r_opal_fdm_active->registered_regions == 0) {
> +			opal_fdm_active = NULL;
> +			return 1;
> +		}
> +
> +		pr_info("Firmware-assisted dump is active.\n");
> +		fadump_conf->dump_active = 1;
> +		opal_fadump_get_config(fadump_conf, r_opal_fdm_active);
> +	}
> +
>  	return 1;
>  }


cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 17/31] powernv/fadump: Warn before processing partial crashdump
  2019-08-20 12:06 ` [PATCH v5 17/31] powernv/fadump: Warn before processing partial crashdump Hari Bathini
@ 2019-09-04 11:48   ` Michael Ellerman
  0 siblings, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 11:48 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
> index 10f6086..6a05d51 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -71,6 +71,30 @@ static void opal_fadump_get_config(struct fw_dump *fadump_conf,
>  	 */
>  	fadump_conf->reserve_dump_area_start = fdm->rgn[0].dest;
>  
> +	/*
> +	 * Rarely, but it can so happen that system crashes before all
> +	 * boot memory regions are registered for MPIPL. In such
> +	 * cases, warn that the vmcore may not be accurate and proceed
> +	 * anyway as that is the best bet considering free pages, cache
> +	 * pages, user pages, etc are usually filtered out.
> +	 *
> +	 * Hope the memory that could not be preserved only has pages
> +	 * that are usually filtered out while saving the vmcore.
> +	 */
> +	if (fdm->region_cnt > fdm->registered_regions) {
> +		pr_warn("Not all memory regions are saved as system seems to have crashed before all the memory regions could be registered for MPIPL!\n");

That line is rather long, I mean the actual printed line not the source line.

Also "seems to" is vague, I think better to just state what we know to
be true, ie: "Not all memory regions were saved".

> +		pr_warn("  The below boot memory regions could not be saved:\n");
> +		i = fdm->registered_regions;
> +		while (i < fdm->region_cnt) {
> +			pr_warn("\t%d. base: 0x%llx, size: 0x%llx\n", (i + 1),
> +				fdm->rgn[i].src, fdm->rgn[i].size);
> +			i++;
> +		}
> +
> +		pr_warn("  Wishing for the above regions to have only pages that are usually filtered out (user pages, free pages, etc..) and proceeding anyway..\n");
> +		pr_warn("  But the sanity of the '/proc/vmcore' file depends on whether the above region(s) have any kernel pages or not.\n");

Again those lines are too long for people on small consoles.

And "Wishing" is not really what people want to see when their system
has crashed :)

You should say something more definite, eg:
  "If the unsaved regions only contain pages that are filtered out (eg.
   free/user pages), the vmcore should still be usable. If the unsaved
   regions contain kernel pages the vmcore will be corrupted."

Or something like that.

cheers


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 19/31] powerpc/fadump: Update documentation about OPAL platform support
  2019-08-20 12:06 ` [PATCH v5 19/31] powerpc/fadump: Update documentation about OPAL platform support Hari Bathini
@ 2019-09-04 11:51   ` Michael Ellerman
  2019-09-04 12:08     ` Oliver O'Halloran
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 11:51 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> With FADump support now available on both pseries and OPAL platforms,
> update FADump documentation with these details.
>
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>  Documentation/powerpc/firmware-assisted-dump.rst |  104 +++++++++++++---------
>  1 file changed, 63 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
> index d912755..2c3342c 100644
> --- a/Documentation/powerpc/firmware-assisted-dump.rst
> +++ b/Documentation/powerpc/firmware-assisted-dump.rst
> @@ -72,7 +72,8 @@ as follows:
>     normal.
>  
>  -  The freshly booted kernel will notice that there is a new
> -   node (ibm,dump-kernel) in the device tree, indicating that
> +   node (ibm,dump-kernel on PSeries or ibm,opal/dump/mpipl-boot
> +   on OPAL platform) in the device tree, indicating that
>     there is crash data available from a previous boot. During
>     the early boot OS will reserve rest of the memory above
>     boot memory size effectively booting with restricted memory
> @@ -96,7 +97,9 @@ as follows:
>  
>  Please note that the firmware-assisted dump feature
>  is only available on Power6 and above systems with recent
> -firmware versions.

Notice how "recent" has bit rotted.

> +firmware versions on PSeries (PowerVM) platform and Power9
> +and above systems with recent firmware versions on PowerNV
> +(OPAL) platform.

Can we say something more helpful here, ie. "recent" is not very useful.
AFAIK it's actually wrong, there isn't a released firmware with the
support yet at all, right?

Given all the relevant firmware is open source can't we at least point
to a commit or release tag or something?

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 20/31] powerpc/fadump: use smaller offset while finding memory for reservation
  2019-08-20 12:06 ` [PATCH v5 20/31] powerpc/fadump: use smaller offset while finding memory for reservation Hari Bathini
@ 2019-09-04 11:54   ` Michael Ellerman
  0 siblings, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 11:54 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
> index d2dd117..7107cf2 100644
> --- a/arch/powerpc/kernel/fadump-common.h
> +++ b/arch/powerpc/kernel/fadump-common.h
> @@ -66,6 +66,14 @@ static inline u64 fadump_str_to_u64(const char *str)
>  
>  #define FADUMP_CRASH_INFO_MAGIC		fadump_str_to_u64("FADMPINF")
>  
> +/*
> + * Amount of memory (1024MB) to skip before making another attempt at
> + * reserving memory (after the previous attempt to reserve memory for
> + * FADump failed due to memory holes and/or reserved ranges) to reduce
> + * the likelihood of memory reservation failure.
> + */
> +#define FADUMP_OFFSET_SIZE			0x40000000U

This seems like a bit of a hack.

> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 971c50d..8dd2dcc 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -371,7 +371,7 @@ int __init fadump_reserve_mem(void)
>  			    !memblock_is_region_reserved(base, size))
>  				break;
>  
> -			base += size;
> +			base += FADUMP_OFFSET_SIZE;
>  		}

The comment above the loop says:

		/*
		 * Reserve memory at an offset closer to bottom of the RAM to
		 * minimize the impact of memory hot-remove operation. We can't
		 * use memblock_find_in_range() here since it doesn't allocate
		 * from bottom to top.
		 */

Is that true? Can't we set memblock to bottom up mode and then call it?

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 19/31] powerpc/fadump: Update documentation about OPAL platform support
  2019-09-04 11:51   ` Michael Ellerman
@ 2019-09-04 12:08     ` Oliver O'Halloran
  2019-09-05  3:15       ` Michael Ellerman
  0 siblings, 1 reply; 74+ messages in thread
From: Oliver O'Halloran @ 2019-09-04 12:08 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	linuxppc-dev, Nicholas Piggin, Hari Bathini, Daniel Axtens

On Wed, Sep 4, 2019 at 9:51 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Hari Bathini <hbathini@linux.ibm.com> writes:
> > With FADump support now available on both pseries and OPAL platforms,
> > update FADump documentation with these details.
> >
> > Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> > ---
> >  Documentation/powerpc/firmware-assisted-dump.rst |  104 +++++++++++++---------
> >  1 file changed, 63 insertions(+), 41 deletions(-)
> >
> > diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
> > index d912755..2c3342c 100644
> > --- a/Documentation/powerpc/firmware-assisted-dump.rst
> > +++ b/Documentation/powerpc/firmware-assisted-dump.rst
> > @@ -72,7 +72,8 @@ as follows:
> >     normal.
> >
> >  -  The freshly booted kernel will notice that there is a new
> > -   node (ibm,dump-kernel) in the device tree, indicating that
> > +   node (ibm,dump-kernel on PSeries or ibm,opal/dump/mpipl-boot
> > +   on OPAL platform) in the device tree, indicating that
> >     there is crash data available from a previous boot. During
> >     the early boot OS will reserve rest of the memory above
> >     boot memory size effectively booting with restricted memory
> > @@ -96,7 +97,9 @@ as follows:
> >
> >  Please note that the firmware-assisted dump feature
> >  is only available on Power6 and above systems with recent
> > -firmware versions.
>
> Notice how "recent" has bit rotted.
>
> > +firmware versions on PSeries (PowerVM) platform and Power9
> > +and above systems with recent firmware versions on PowerNV
> > +(OPAL) platform.
>
> Can we say something more helpful here, ie. "recent" is not very useful.
> AFAIK it's actually wrong, there isn't a released firmware with the
> support yet at all, right?
>
> Given all the relevant firmware is open source can't we at least point
> to a commit or release tag or something?
>
> cheers

Even if we can quote a git sha it's not terrible useful or user
friendly. We already gate the feature behind DT nodes / properties
existing, so why not just say "fadump requires XYZ firmware feature,
as indicated by <ABC> device-tree property."

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
  2019-08-20 12:06 ` [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware Hari Bathini
@ 2019-09-04 12:20   ` Michael Ellerman
  2019-09-09 13:23     ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-04 12:20 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:

> diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
> index 7107cf2..fc408b0 100644
> --- a/arch/powerpc/kernel/fadump-common.h
> +++ b/arch/powerpc/kernel/fadump-common.h
> @@ -98,7 +98,11 @@ struct fw_dump {
>  	/* cmd line option during boot */
>  	unsigned long	reserve_bootvar;
>  
> +	unsigned long	cpu_state_destination_addr;

AFAICS that is only used in two places, and both of them have to call
__va() on it, so why don't we store the virtual address to start with?

> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
> index f75b861..9a32a7f 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -282,15 +283,122 @@ static void opal_fadump_cleanup(struct fw_dump *fadump_conf)
>  		pr_warn("Could not reset (%llu) kernel metadata tag!\n", ret);
>  }
>  
> +static inline void opal_fadump_set_regval_regnum(struct pt_regs *regs,
> +						 u32 reg_type, u32 reg_num,
> +						 u64 reg_val)
> +{
> +	if (reg_type == HDAT_FADUMP_REG_TYPE_GPR) {
> +		if (reg_num < 32)
> +			regs->gpr[reg_num] = reg_val;
> +		return;
> +	}
> +
> +	switch (reg_num) {
> +	case SPRN_CTR:
> +		regs->ctr = reg_val;
> +		break;
> +	case SPRN_LR:
> +		regs->link = reg_val;
> +		break;
> +	case SPRN_XER:
> +		regs->xer = reg_val;
> +		break;
> +	case SPRN_DAR:
> +		regs->dar = reg_val;
> +		break;
> +	case SPRN_DSISR:
> +		regs->dsisr = reg_val;
> +		break;
> +	case HDAT_FADUMP_REG_ID_NIP:
> +		regs->nip = reg_val;
> +		break;
> +	case HDAT_FADUMP_REG_ID_MSR:
> +		regs->msr = reg_val;
> +		break;
> +	case HDAT_FADUMP_REG_ID_CCR:
> +		regs->ccr = reg_val;
> +		break;
> +	}
> +}
> +
> +static inline void opal_fadump_read_regs(char *bufp, unsigned int regs_cnt,
> +					 unsigned int reg_entry_size,
> +					 struct pt_regs *regs)
> +{
> +	int i;
> +	struct hdat_fadump_reg_entry *reg_entry;

Where's my christmas tree :)

> +
> +	memset(regs, 0, sizeof(struct pt_regs));
> +
> +	for (i = 0; i < regs_cnt; i++, bufp += reg_entry_size) {
> +		reg_entry = (struct hdat_fadump_reg_entry *)bufp;
> +		opal_fadump_set_regval_regnum(regs,
> +					      be32_to_cpu(reg_entry->reg_type),
> +					      be32_to_cpu(reg_entry->reg_num),
> +					      be64_to_cpu(reg_entry->reg_val));
> +	}
> +}
> +
> +static inline bool __init is_thread_core_inactive(u8 core_state)
> +{
> +	bool is_inactive = false;
> +
> +	if (core_state == HDAT_FADUMP_CORE_INACTIVE)
> +		is_inactive = true;
> +
> +	return is_inactive;

	return core_state == HDAT_FADUMP_CORE_INACTIVE;

??

In fact there's only one caller, so just drop the inline entirely.

> +}
> +
>  /*
>   * Convert CPU state data saved at the time of crash into ELF notes.
> + *
> + * Each register entry is of 16 bytes, A numerical identifier along with
> + * a GPR/SPR flag in the first 8 bytes and the register value in the next
> + * 8 bytes. For more details refer to F/W documentation.
>   */
>  static int __init opal_fadump_build_cpu_notes(struct fw_dump *fadump_conf)
>  {
>  	u32 num_cpus, *note_buf;
>  	struct fadump_crash_info_header *fdh = NULL;
> +	struct hdat_fadump_thread_hdr *thdr;
> +	unsigned long addr;
> +	u32 thread_pir;
> +	char *bufp;
> +	struct pt_regs regs;
> +	unsigned int size_of_each_thread;
> +	unsigned int regs_offset, regs_cnt, reg_esize;
> +	int i;

	unsigned int size_of_each_thread, regs_offset, regs_cnt, reg_esize;
  	struct fadump_crash_info_header *fdh = NULL;
  	u32 num_cpus, thread_pir, *note_buf;
	struct hdat_fadump_thread_hdr *thdr;
	struct pt_regs regs;
	unsigned long addr;
	char *bufp;
	int i;

Ah much better :)

Though the number of variables might be an indication that this function
could be split into smaller parts.

> @@ -473,6 +627,26 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
>  			return 1;
>  		}
>  
> +		ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, &addr);
> +		if ((ret != OPAL_SUCCESS) || !addr) {
> +			pr_err("Failed to get CPU metadata (%lld)\n", ret);
> +			return 1;
> +		}
> +
> +		addr = be64_to_cpu(addr);
> +		pr_debug("CPU metadata addr: %llx\n", addr);
> +
> +		opal_cpu_metadata = __va(addr);
> +		r_opal_cpu_metadata = (void *)addr;

Another r_ variable I don't understand.

> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h
> index 19cac1f..ce4c522 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.h
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.h
> @@ -30,4 +30,43 @@ struct opal_fadump_mem_struct {
>  	struct opal_mpipl_region	rgn[OPAL_FADUMP_MAX_MEM_REGS];
>  } __attribute__((packed));
>  
> +/*
> + * CPU state data is provided by f/w. Below are the definitions
> + * provided in HDAT spec. Refer to latest HDAT specification for
> + * any update to this format.
> + */

How is this meant to work? If HDAT ever changes the format they will
break all existing kernels in the field.

> +#define HDAT_FADUMP_CPU_DATA_VERSION		1
> +
> +#define HDAT_FADUMP_CORE_INACTIVE		(0x0F)
> +
> +/* HDAT thread header for register entries */
> +struct hdat_fadump_thread_hdr {
> +	__be32  pir;
> +	/* 0x00 - 0x0F - The corresponding stop state of the core */
> +	u8      core_state;
> +	u8      reserved[3];
> +
> +	__be32	offset;	/* Offset to Register Entries array */
> +	__be32	ecnt;	/* Number of entries */
> +	__be32	esize;	/* Alloc size of each array entry in bytes */
> +	__be32	eactsz;	/* Actual size of each array entry in bytes */
> +} __attribute__((packed));
> +
> +/* Register types populated by f/w */
> +#define HDAT_FADUMP_REG_TYPE_GPR		0x01
> +#define HDAT_FADUMP_REG_TYPE_SPR		0x02
> +
> +/* ID numbers used by f/w while populating certain registers */
> +#define HDAT_FADUMP_REG_ID_NIP			0x7D0
> +#define HDAT_FADUMP_REG_ID_MSR			0x7D1
> +#define HDAT_FADUMP_REG_ID_CCR			0x7D2
> +
> +/* HDAT register entry. */
> +struct hdat_fadump_reg_entry {
> +	__be32		reg_type;
> +	__be32		reg_num;
> +	__be64		reg_val;
> +} __attribute__((packed));

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 11/31] powernv/fadump: add fadump support on powernv
  2019-09-03 16:31     ` Hari Bathini
@ 2019-09-04 14:33       ` Hari Bathini
  2019-09-05  3:11         ` Michael Ellerman
  0 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-09-04 14:33 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens



On 03/09/19 10:01 PM, Hari Bathini wrote:
> 
[...]
>>> diff --git a/arch/powerpc/kernel/fadump-common.h b/arch/powerpc/kernel/fadump-common.h
>>> index d2c5b16..f6c52d3 100644
>>> --- a/arch/powerpc/kernel/fadump-common.h
>>> +++ b/arch/powerpc/kernel/fadump-common.h
>>> @@ -140,4 +140,13 @@ static inline int rtas_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
>>>  }
>>>  #endif
>>>  
>>> +#ifdef CONFIG_PPC_POWERNV
>>> +extern int opal_fadump_dt_scan(struct fw_dump *fadump_config, ulong node);
>>> +#else
>>> +static inline int opal_fadump_dt_scan(struct fw_dump *fadump_config, ulong node)
>>> +{
>>> +	return 1;
>>> +}
>>
>> Extending the strange flat device tree calling convention to these
>> functions is not ideal.
>>
>> It would be better I think if they just returned bool true/false for
>> "found it" / "not found", and then early_init_dt_scan_fw_dump() can
>> convert that into the appropriate return value.
>>
>>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>>> index f7c8073..b8061fb9 100644
>>> --- a/arch/powerpc/kernel/fadump.c
>>> +++ b/arch/powerpc/kernel/fadump.c
>>> @@ -114,6 +114,9 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
>>>  	if (strcmp(uname, "rtas") == 0)
>>>  		return rtas_fadump_dt_scan(&fw_dump, node);
>>>  
>>> +	if (strcmp(uname, "ibm,opal") == 0)
>>> +		return opal_fadump_dt_scan(&fw_dump, node);
>>> +
>>
>> ie this would become:
>>
>> 	if (strcmp(uname, "ibm,opal") == 0 && opal_fadump_dt_scan(&fw_dump, node))
>>             return 1;
>>
> 
> Yeah. Will update accordingly...

On second thoughts, we don't need a return type at all here. fw_dump struct and callbacks are
populated based on what we found in the DT. And irrespective of what we found in DT, we got
to return `1` once the particular depth and node is processed..

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 02/31] powerpc/fadump: move internal code to a new file
  2019-09-04  9:02       ` Mahesh Jagannath Salgaonkar
@ 2019-09-04 18:26         ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-09-04 18:26 UTC (permalink / raw)
  To: Mahesh Jagannath Salgaonkar, Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 04/09/19 2:32 PM, Mahesh Jagannath Salgaonkar wrote:
> On 9/3/19 9:35 PM, Hari Bathini wrote:
>>
>>
>> On 03/09/19 4:39 PM, Michael Ellerman wrote:
>>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>>> Make way for refactoring platform specific FADump code by moving code
>>>> that could be referenced from multiple places to fadump-common.c file.
>>>>
>>>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>>>> ---
>>>>  arch/powerpc/kernel/Makefile        |    2 
>>>>  arch/powerpc/kernel/fadump-common.c |  140 ++++++++++++++++++++++++++++++++++
>>>>  arch/powerpc/kernel/fadump-common.h |    8 ++
>>>>  arch/powerpc/kernel/fadump.c        |  146 ++---------------------------------
>>>>  4 files changed, 158 insertions(+), 138 deletions(-)
>>>>  create mode 100644 arch/powerpc/kernel/fadump-common.c
>>>
>>> I don't understand why we need fadump.c and fadump-common.c? They're
>>> both common/shared across pseries & powernv aren't they?
>>
>> The convention I tried to follow to have fadump-common.c shared between fadump.c,
>> pseries & powernv code while pseries & powernv code take callback requests from
>> fadump.c and use fadump-common.c (shared by both platforms), if necessary to fullfil
>> those requests...
>>
>>> By the end of the series we end up with 149 lines in fadump-common.c
>>> which seems like a waste of time. Just put it all in fadump.c.
>>
>> Yeah. Probably not worth a new C file. Will just have two separate headers. One for
>> internal code and one for interfacing with other modules...
>>
>> [...]
>>
>>>> + * Copyright 2019, IBM Corp.
>>>> + * Author: Hari Bathini <hbathini@linux.ibm.com>
>>>
>>> These can just be:
>>>
>>>  * Copyright 2011, Mahesh Salgaonkar, IBM Corporation.
>>>  * Copyright 2019, Hari Bathini, IBM Corporation.
>>>
>>
>> Sure.
>>
>>>> + */
>>>> +
>>>> +#undef DEBUG
>>>
>>> Don't undef DEBUG please.
>>>
>>
>> Sorry! Seeing such thing in most files, I thought this was the convention. Will drop
>> this change in all the new files I added.
>>
>>>> +#define pr_fmt(fmt) "fadump: " fmt
>>>> +
>>>> +#include <linux/memblock.h>
>>>> +#include <linux/elf.h>
>>>> +#include <linux/mm.h>
>>>> +#include <linux/crash_core.h>
>>>> +
>>>> +#include "fadump-common.h"
>>>> +
>>>> +void *fadump_cpu_notes_buf_alloc(unsigned long size)
>>>> +{
>>>> +	void *vaddr;
>>>> +	struct page *page;
>>>> +	unsigned long order, count, i;
>>>> +
>>>> +	order = get_order(size);
>>>> +	vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order);
>>>> +	if (!vaddr)
>>>> +		return NULL;
>>>> +
>>>> +	count = 1 << order;
>>>> +	page = virt_to_page(vaddr);
>>>> +	for (i = 0; i < count; i++)
>>>> +		SetPageReserved(page + i);
>>>> +	return vaddr;
>>>> +}
>>>
>>> I realise you're just moving this code, but why do we need all this hand
>>> rolled allocation stuff?
>>
>> Yeah, I think alloc_pages_exact() may be better here. Mahesh, am I missing something?
> 
> We hook up the physical address of this buffer to ELF core header as
> PT_NOTE section. Hence we don't want these pages to be moved around or
> reclaimed.

alloc_pages_exact() + mark_page_reserved() should take care of that, I guess..

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 15/31] powernv/fadump: support copying multiple kernel boot memory regions
  2019-09-04 11:30   ` Michael Ellerman
@ 2019-09-04 20:20     ` Hari Bathini
  2019-09-05  3:13       ` Michael Ellerman
  0 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-09-04 20:20 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens



On 04/09/19 5:00 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> Firmware uses 32-bit field for region size while copying/backing-up
> 
> Which firmware exactly is imposing that limit?

I think the MDST/MDRT tables in the f/w. Vasant, which component is that?

>> +	/*
>> +	 * Firmware currently supports only 32-bit value for size,
> 
> "currently" implies it could change in future?
> 
> If it does we assume it will only increase, and we're happy that old
> kernels will continue to use the 32-bit limit?

I am not aware of any plans to make it 64-bit. Let me just say f/w supports
only 32-bit to get rid of that ambiguity..

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 16/31] powernv/fadump: process the crashdump by exporting it as /proc/vmcore
  2019-09-04 11:42   ` Michael Ellerman
@ 2019-09-04 21:01     ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-09-04 21:01 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens



On 04/09/19 5:12 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
>> index a755705..10f6086 100644
>> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
>> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
>> @@ -41,6 +43,37 @@ static void opal_fadump_update_config(struct fw_dump *fadump_conf,
>>  	fadump_conf->fadumphdr_addr = fdm->fadumphdr_addr;
>>  }
>>  
>> +/*
>> + * This function is called in the capture kernel to get configuration details
>> + * from metadata setup by the first kernel.
>> + */
>> +static void opal_fadump_get_config(struct fw_dump *fadump_conf,
>> +				   const struct opal_fadump_mem_struct *fdm)
>> +{
>> +	int i;
>> +
>> +	if (!fadump_conf->dump_active)
>> +		return;
>> +
>> +	fadump_conf->boot_memory_size = 0;
>> +
>> +	pr_debug("Boot memory regions:\n");
>> +	for (i = 0; i < fdm->region_cnt; i++) {
>> +		pr_debug("\t%d. base: 0x%llx, size: 0x%llx\n",
>> +			 (i + 1), fdm->rgn[i].src, fdm->rgn[i].size);
> 
> Printing the zero-based array off by one (i + 1) seems confusing.

Hmmm... Indexing the regions from `0` sounded inappropriate..

> 
>> +
>> +		fadump_conf->boot_memory_size += fdm->rgn[i].size;
>> +	}
>> +
>> +	/*
>> +	 * Start address of reserve dump area (permanent reservation) for
>> +	 * re-registering FADump after dump capture.
>> +	 */
>> +	fadump_conf->reserve_dump_area_start = fdm->rgn[0].dest;
>> +
>> +	opal_fadump_update_config(fadump_conf, fdm);
>> +}
>> +
>>  /* Initialize kernel metadata */
>>  static void opal_fadump_init_metadata(struct opal_fadump_mem_struct *fdm)
>>  {
>> @@ -215,24 +248,114 @@ static void opal_fadump_cleanup(struct fw_dump *fadump_conf)
>>  		pr_warn("Could not reset (%llu) kernel metadata tag!\n", ret);
>>  }
>>  
>> +/*
>> + * Convert CPU state data saved at the time of crash into ELF notes.
>> + */
>> +static int __init opal_fadump_build_cpu_notes(struct fw_dump *fadump_conf)
>> +{
>> +	u32 num_cpus, *note_buf;
>> +	struct fadump_crash_info_header *fdh = NULL;
>> +
>> +	num_cpus = 1;
>> +	/* Allocate buffer to hold cpu crash notes. */
>> +	fadump_conf->cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
>> +	fadump_conf->cpu_notes_buf_size =
>> +		PAGE_ALIGN(fadump_conf->cpu_notes_buf_size);
>> +	note_buf = fadump_cpu_notes_buf_alloc(fadump_conf->cpu_notes_buf_size);
>> +	if (!note_buf) {
>> +		pr_err("Failed to allocate 0x%lx bytes for cpu notes buffer\n",
>> +		       fadump_conf->cpu_notes_buf_size);
>> +		return -ENOMEM;
>> +	}
>> +	fadump_conf->cpu_notes_buf = __pa(note_buf);
>> +
>> +	pr_debug("Allocated buffer for cpu notes of size %ld at %p\n",
>> +		 (num_cpus * sizeof(note_buf_t)), note_buf);
>> +
>> +	if (fadump_conf->fadumphdr_addr)
>> +		fdh = __va(fadump_conf->fadumphdr_addr);
>> +
>> +	if (fdh && (fdh->crashing_cpu != FADUMP_CPU_UNKNOWN)) {
>> +		note_buf = fadump_regs_to_elf_notes(note_buf, &(fdh->regs));
>> +		final_note(note_buf);
>> +
>> +		pr_debug("Updating elfcore header (%llx) with cpu notes\n",
>> +			 fdh->elfcorehdr_addr);
>> +		fadump_update_elfcore_header(fadump_conf,
>> +					     __va(fdh->elfcorehdr_addr));
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int __init opal_fadump_process(struct fw_dump *fadump_conf)
>>  {
>> -	return -EINVAL;
>> +	struct fadump_crash_info_header *fdh;
>> +	int rc = 0;
> > No need to initialise rc there.
> 

	rc = -EINVAL;

and


>> +	if (!opal_fdm_active || !fadump_conf->fadumphdr_addr)
>> +		return -EINVAL;

>> +
>> +	/* Validate the fadump crash info header */
>> +	fdh = __va(fadump_conf->fadumphdr_addr);
>> +	if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
>> +		pr_err("Crash info header is not valid.\n");
>> +		return -EINVAL;

	return rc; ??

>> +	}
>> +
>> +	/*
>> +	 * TODO: To build cpu notes, find a way to map PIR to logical id.
>> +	 *       Also, we may need different method for pseries and powernv.
>> +	 *       The currently booted kernel could have a different PIR to
>> +	 *       logical id mapping. So, try saving info of previous kernel's
>> +	 *       paca to get the right PIR to logical id mapping.
>> +	 */
> 
> That TODO is removed by the end of the series, so please just omit it entirely.
> 
>> +	rc = opal_fadump_build_cpu_notes(fadump_conf);
>> +	if (rc)
>> +		return rc;
> 
> I think this all runs early in boot, so we don't need to worry about
> another CPU seeing the partially initialised core due to there being no
> barrier here before we set elfcorehdr_addr?
> 

This is processed in fs/proc/vmcore.c during fs_initcall() and the data within the core
is processed much later (initrd). So, I think we are good here...

>> +	/*
>> +	 * We are done validating dump info and elfcore header is now ready
>> +	 * to be exported. set elfcorehdr_addr so that vmcore module will
>> +	 * export the elfcore header through '/proc/vmcore'.
>> +	 */
>> +	elfcorehdr_addr = fdh->elfcorehdr_addr;
> 
>> @@ -283,5 +407,42 @@ int __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, ulong node)
>>  	fadump_conf->ops		= &opal_fadump_ops;
>>  	fadump_conf->fadump_supported	= 1;
>>  
>> +	/*
>> +	 * Check if dump has been initiated on last reboot.
>> +	 */
>> +	prop = of_get_flat_dt_prop(dn, "mpipl-boot", NULL);
>> +	if (prop) {
> 
>         if (!prop)
>                 return 1;
> 
> And then everything below can be unindented.
> 
>> +		u64 addr = 0;
>> +		s64 ret;
>> +		const struct opal_fadump_mem_struct *r_opal_fdm_active;
> 
>   *
>  / \
>  /_\
>   |
> 

:) Will take care of such instances...
I think this should be added to checkpatch.pl

>> +
>> +		ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &addr);
>> +		if ((ret != OPAL_SUCCESS) || !addr) {
>> +			pr_err("Failed to get Kernel metadata (%lld)\n", ret);
>> +			return 1;
>> +		}
>> +
>> +		addr = be64_to_cpu(addr);
>> +		pr_debug("Kernel metadata addr: %llx\n", addr);
>> +
>> +		opal_fdm_active = __va(addr);
>> +		r_opal_fdm_active = (void *)addr;
> 
> Why do we need the r_ version?
> 
> We're called early in boot, so we are still in real mode, but that's
> fine the CPU will ignore the top bits of the virtual address for us.

I don't know if I am missing a trick here or if there is a bug somewhere
but trying to access `opal_fdm_active->version` is not working for me..

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 11/31] powernv/fadump: add fadump support on powernv
  2019-09-04 14:33       ` Hari Bathini
@ 2019-09-05  3:11         ` Michael Ellerman
  0 siblings, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-05  3:11 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> On 03/09/19 10:01 PM, Hari Bathini wrote:
>> 
> [...]
>>>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>>>> index f7c8073..b8061fb9 100644
>>>> --- a/arch/powerpc/kernel/fadump.c
>>>> +++ b/arch/powerpc/kernel/fadump.c
>>>> @@ -114,6 +114,9 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
>>>>  	if (strcmp(uname, "rtas") == 0)
>>>>  		return rtas_fadump_dt_scan(&fw_dump, node);
>>>>  
>>>> +	if (strcmp(uname, "ibm,opal") == 0)
>>>> +		return opal_fadump_dt_scan(&fw_dump, node);
>>>> +
>>>
>>> ie this would become:
>>>
>>> 	if (strcmp(uname, "ibm,opal") == 0 && opal_fadump_dt_scan(&fw_dump, node))
>>>             return 1;
>>>
>> 
>> Yeah. Will update accordingly...
>
> On second thoughts, we don't need a return type at all here. fw_dump struct and callbacks are
> populated based on what we found in the DT. And irrespective of what we found in DT, we got
> to return `1` once the particular depth and node is processed..

True. It's a little unclear because you're looking for "rtas" and
"ibm,opal" in the same function. But we know™ that no platform should
have both an "rtas" and an "ibm,opal" node, so once we find either we
are done scanning, regardless of whether the foo_fadump_dt_scan()
succeeds or fails.

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 15/31] powernv/fadump: support copying multiple kernel boot memory regions
  2019-09-04 20:20     ` Hari Bathini
@ 2019-09-05  3:13       ` Michael Ellerman
  0 siblings, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-05  3:13 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> On 04/09/19 5:00 PM, Michael Ellerman wrote:
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>> Firmware uses 32-bit field for region size while copying/backing-up
>> 
>> Which firmware exactly is imposing that limit?
>
> I think the MDST/MDRT tables in the f/w. Vasant, which component is that?
>
>>> +	/*
>>> +	 * Firmware currently supports only 32-bit value for size,
>> 
>> "currently" implies it could change in future?
>> 
>> If it does we assume it will only increase, and we're happy that old
>> kernels will continue to use the 32-bit limit?
>
> I am not aware of any plans to make it 64-bit. Let me just say f/w supports
> only 32-bit to get rid of that ambiguity..

OK. As long as everyone is aware that the kernel has no support for it
increasing it, without code changes.

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 19/31] powerpc/fadump: Update documentation about OPAL platform support
  2019-09-04 12:08     ` Oliver O'Halloran
@ 2019-09-05  3:15       ` Michael Ellerman
  0 siblings, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-05  3:15 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	linuxppc-dev, Nicholas Piggin, Hari Bathini, Daniel Axtens

"Oliver O'Halloran" <oohall@gmail.com> writes:
> On Wed, Sep 4, 2019 at 9:51 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>> Hari Bathini <hbathini@linux.ibm.com> writes:
...
>> > diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
>> > index d912755..2c3342c 100644
>> > --- a/Documentation/powerpc/firmware-assisted-dump.rst
>> > +++ b/Documentation/powerpc/firmware-assisted-dump.rst
>> > @@ -96,7 +97,9 @@ as follows:
>> >
>> >  Please note that the firmware-assisted dump feature
>> >  is only available on Power6 and above systems with recent
>> > -firmware versions.
>>
>> Notice how "recent" has bit rotted.
>>
>> > +firmware versions on PSeries (PowerVM) platform and Power9
>> > +and above systems with recent firmware versions on PowerNV
>> > +(OPAL) platform.
>>
>> Can we say something more helpful here, ie. "recent" is not very useful.
>> AFAIK it's actually wrong, there isn't a released firmware with the
>> support yet at all, right?
>>
>> Given all the relevant firmware is open source can't we at least point
>> to a commit or release tag or something?
>
> Even if we can quote a git sha it's not terrible useful or user
> friendly. We already gate the feature behind DT nodes / properties
> existing, so why not just say "fadump requires XYZ firmware feature,
> as indicated by <ABC> device-tree property."

But how does that help someone who's got a Talos/Blackbird and wants to
test this stuff?

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions
  2019-08-20 12:05 ` [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions Hari Bathini
@ 2019-09-05  4:15   ` Michael Ellerman
  2019-09-05  7:23   ` Michael Ellerman
  1 sibling, 0 replies; 74+ messages in thread
From: Michael Ellerman @ 2019-09-05  4:15 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> Make OPAL calls to register and un-register with firmware for MPIPL.
>

This has the same subject as patch 6, would be good to make them
different.

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions
  2019-08-20 12:05 ` [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions Hari Bathini
  2019-09-05  4:15   ` Michael Ellerman
@ 2019-09-05  7:23   ` Michael Ellerman
  2019-09-05  9:54     ` Hari Bathini
  1 sibling, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-05  7:23 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
> index e466f7e..91fb909 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -52,6 +68,8 @@ static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
>  	opal_fdm->fadumphdr_addr = (opal_fdm->rgn[0].dest +
>  				    fadump_conf->boot_memory_size);
>  
> +	opal_fadump_update_config(fadump_conf, opal_fdm);
> +
>  	return addr;
>  }
>  
> @@ -97,12 +115,69 @@ static int opal_fadump_setup_metadata(struct fw_dump *fadump_conf)
>  
>  static int opal_fadump_register(struct fw_dump *fadump_conf)
>  {
> -	return -EIO;
> +	int i, err = -EIO;
> +	s64 rc;

Some compiler versions are warning about this being used uninitialised:

arch/powerpc/platforms/powernv/opal-fadump.c:316:3: error: 'rc' may be used uninitialized in this function [-Werror=uninitialized]

http://kisskb.ellerman.id.au/kisskb/buildresult/13943984/

Which does seem like a legitimate warning.

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions
  2019-09-05  7:23   ` Michael Ellerman
@ 2019-09-05  9:54     ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-09-05  9:54 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	Oliver, Vasant Hegde, Daniel Axtens



On 05/09/19 12:53 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
>> index e466f7e..91fb909 100644
>> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
>> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
>> @@ -52,6 +68,8 @@ static ulong opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
>>  	opal_fdm->fadumphdr_addr = (opal_fdm->rgn[0].dest +
>>  				    fadump_conf->boot_memory_size);
>>  
>> +	opal_fadump_update_config(fadump_conf, opal_fdm);
>> +
>>  	return addr;
>>  }
>>  
>> @@ -97,12 +115,69 @@ static int opal_fadump_setup_metadata(struct fw_dump *fadump_conf)
>>  
>>  static int opal_fadump_register(struct fw_dump *fadump_conf)
>>  {
>> -	return -EIO;
>> +	int i, err = -EIO;
>> +	s64 rc;
> 
> Some compiler versions are warning about this being used uninitialised:
> 
> arch/powerpc/platforms/powernv/opal-fadump.c:316:3: error: 'rc' may be used uninitialized in this function [-Werror=uninitialized]
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/13943984/
> 
> Which does seem like a legitimate warning.

fixed...


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 05/31] pseries/fadump: introduce callbacks for platform specific operations
  2019-09-03 16:06     ` Hari Bathini
@ 2019-09-06  6:39       ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-09-06  6:39 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens



On 03/09/19 9:36 PM, Hari Bathini wrote:
> 
> 
> On 03/09/19 4:40 PM, Michael Ellerman wrote:
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>> Introduce callback functions for platform specific operations like
>>> register, unregister, invalidate & such. Also, define place-holders
>>> for the same on pSeries platform.
>>
>> We already have an ops structure for machine specific calls, it's
>> ppc_md. Is there a good reason why these aren't just in machdep_calls
>> under #ifdef CONFIG_FA_DUMP ?
> 
> Not really. We move this callbacks to 'struct machdep_calls'

Actually, we can't move this callbacks to 'struct machdep calls' as we need
these callbacks much before machine_probe().

So, if we have to use ppc_md callbacks for fadump... we need to be set the
callbacks twice. Once during dt_scan and again in setup_fadump. Not worth it, I guess..

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
  2019-09-04 12:20   ` Michael Ellerman
@ 2019-09-09 13:23     ` Hari Bathini
  2019-09-09 15:33       ` Oliver O'Halloran
  0 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-09-09 13:23 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Oliver, Nicholas Piggin, Daniel Axtens



On 04/09/19 5:50 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>

[...]

>> +/*
>> + * CPU state data is provided by f/w. Below are the definitions
>> + * provided in HDAT spec. Refer to latest HDAT specification for
>> + * any update to this format.
>> + */
> 
> How is this meant to work? If HDAT ever changes the format they will
> break all existing kernels in the field.
> 
>> +#define HDAT_FADUMP_CPU_DATA_VERSION		1

Changes are not expected here. But this is just to cover for such scenario,
if that ever happens.

Also, I think it is a bit far-fetched to error out if versions mismatch.
Warning and proceeding sounds worthier because the changes are usually
backward compatible, if and when there are any. Will update accordingly... 

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
  2019-09-09 13:23     ` Hari Bathini
@ 2019-09-09 15:33       ` Oliver O'Halloran
  2019-09-10  8:48         ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Oliver O'Halloran @ 2019-09-09 15:33 UTC (permalink / raw)
  To: Hari Bathini
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	linuxppc-dev, Nicholas Piggin, Daniel Axtens

On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
> On 04/09/19 5:50 PM, Michael Ellerman wrote:
> > Hari Bathini <hbathini@linux.ibm.com> writes:
> >
>
> [...]
>
> >> +/*
> >> + * CPU state data is provided by f/w. Below are the definitions
> >> + * provided in HDAT spec. Refer to latest HDAT specification for
> >> + * any update to this format.
> >> + */
> >
> > How is this meant to work? If HDAT ever changes the format they will
> > break all existing kernels in the field.
> >
> >> +#define HDAT_FADUMP_CPU_DATA_VERSION                1
>
> Changes are not expected here. But this is just to cover for such scenario,
> if that ever happens.

The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR.
As far as I can tell the values you've assumed here are chip-specific,
non-architected SPR numbers that come from an array buried somewhere
in the SBE codebase. I don't believe you for a second when you say
that this will never change.

> Also, I think it is a bit far-fetched to error out if versions mismatch.
> Warning and proceeding sounds worthier because the changes are usually
> backward compatible, if and when there are any. Will update accordingly...

Literally the only reason I didn't drop the CPU DATA parts of the OPAL
MPIPL series was because I assumed the kernel would do the sensible
thing and reject or ignore the structure if it did not know how to
parse the data.

Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
  2019-09-09 15:33       ` Oliver O'Halloran
@ 2019-09-10  8:48         ` Hari Bathini
  2019-09-10 14:05           ` Michael Ellerman
  0 siblings, 1 reply; 74+ messages in thread
From: Hari Bathini @ 2019-09-10  8:48 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	linuxppc-dev, Nicholas Piggin, Daniel Axtens



On 09/09/19 9:03 PM, Oliver O'Halloran wrote:
> On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>
>> On 04/09/19 5:50 PM, Michael Ellerman wrote:
>>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>>
>>
>> [...]
>>
>>>> +/*
>>>> + * CPU state data is provided by f/w. Below are the definitions
>>>> + * provided in HDAT spec. Refer to latest HDAT specification for
>>>> + * any update to this format.
>>>> + */
>>>
>>> How is this meant to work? If HDAT ever changes the format they will
>>> break all existing kernels in the field.
>>>
>>>> +#define HDAT_FADUMP_CPU_DATA_VERSION                1
>>
>> Changes are not expected here. But this is just to cover for such scenario,
>> if that ever happens.
> 
> The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR.
> As far as I can tell the values you've assumed here are chip-specific,
> non-architected SPR numbers that come from an array buried somewhere
> in the SBE codebase. I don't believe you for a second when you say
> that this will never change.

At least, the understanding is that this numbers not change across processor
generations. If something changes, it is supposed to be handled in SBE. Also,
I am told this numbers would be listed in the HDAT Spec. Not sure if that
happened yet though. Vasant, you have anything to add?

>> Also, I think it is a bit far-fetched to error out if versions mismatch.
>> Warning and proceeding sounds worthier because the changes are usually
>> backward compatible, if and when there are any. Will update accordingly...
> 
> Literally the only reason I didn't drop the CPU DATA parts of the OPAL
> MPIPL series was because I assumed the kernel would do the sensible
> thing and reject or ignore the structure if it did not know how to
> parse the data.

I think, the changes if any, would have to be backward compatible for the sake
of sanity. Even if they are not, we are better off exporting the /proc/vmcore
with a warning and some crazy CPU register data (if parsing goes alright) than
no dump at all? 

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
  2019-09-10  8:48         ` Hari Bathini
@ 2019-09-10 14:05           ` Michael Ellerman
  2019-09-10 16:10             ` Hari Bathini
  0 siblings, 1 reply; 74+ messages in thread
From: Michael Ellerman @ 2019-09-10 14:05 UTC (permalink / raw)
  To: Hari Bathini, Oliver O'Halloran
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	linuxppc-dev, Nicholas Piggin, Daniel Axtens

Hari Bathini <hbathini@linux.ibm.com> writes:
> On 09/09/19 9:03 PM, Oliver O'Halloran wrote:
>> On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>> On 04/09/19 5:50 PM, Michael Ellerman wrote:
>>>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>> [...]
>>>
>>>>> +/*
>>>>> + * CPU state data is provided by f/w. Below are the definitions
>>>>> + * provided in HDAT spec. Refer to latest HDAT specification for
>>>>> + * any update to this format.
>>>>> + */
>>>>
>>>> How is this meant to work? If HDAT ever changes the format they will
>>>> break all existing kernels in the field.
>>>>
>>>>> +#define HDAT_FADUMP_CPU_DATA_VERSION                1
>>>
>>> Changes are not expected here. But this is just to cover for such scenario,
>>> if that ever happens.
>> 
>> The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR.
>> As far as I can tell the values you've assumed here are chip-specific,
>> non-architected SPR numbers that come from an array buried somewhere
>> in the SBE codebase. I don't believe you for a second when you say
>> that this will never change.
>
> At least, the understanding is that this numbers not change across processor
> generations. If something changes, it is supposed to be handled in SBE. Also,
> I am told this numbers would be listed in the HDAT Spec. Not sure if that
> happened yet though. Vasant, you have anything to add?

That doesn't help much because the HDAT spec is not public.

The point is with the code written the way it is, these values *must
not* change, or else all existing kernels will be broken, which is not
acceptable.

>>> Also, I think it is a bit far-fetched to error out if versions mismatch.
>>> Warning and proceeding sounds worthier because the changes are usually
>>> backward compatible, if and when there are any. Will update accordingly...
>> 
>> Literally the only reason I didn't drop the CPU DATA parts of the OPAL
>> MPIPL series was because I assumed the kernel would do the sensible
>> thing and reject or ignore the structure if it did not know how to
>> parse the data.
>
> I think, the changes if any, would have to be backward compatible for the sake
> of sanity.

People need to understand that this is an ABI between firmware and
in-the-field distribution kernels which are only updated at customer
discretion, or possibly never.

Any changes *must be* backward compatible.

Looking at the header struct:

+struct hdat_fadump_thread_hdr {
+	__be32  pir;
+	/* 0x00 - 0x0F - The corresponding stop state of the core */
+	u8      core_state;
+	u8      reserved[3];

You have those 3 reserved bytes, so a future revision could repurpose
one of those as a flag to indicate a new format. And/or the hdr could be
made bigger and new kernels could be taught to look for new things in
the space after the hdr but before the reg entries.

So I think there is a reasonable mechanism for extending the format in
future, but my point is people must understand that this is an ABI and
changes must be made accordingly.

> Even if they are not, we are better off exporting the /proc/vmcore
> with a warning and some crazy CPU register data (if parsing goes alright) than
> no dump at all? 

If it's just a case of reg entries that we don't recognise then yes I
think it would be OK to just skip them and continue exporting. But if
there's any deeper misunderstanding of the format then we should bail
out.

I notice now that you don't do anything in opal_fadump_set_regval_regnum()
if you are passed a register we don't understand, so that probably needs
fixing.

cheers

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
  2019-09-10 14:05           ` Michael Ellerman
@ 2019-09-10 16:10             ` Hari Bathini
  0 siblings, 0 replies; 74+ messages in thread
From: Hari Bathini @ 2019-09-10 16:10 UTC (permalink / raw)
  To: Michael Ellerman, Oliver O'Halloran
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Nicholas Piggin,
	linuxppc-dev, Vasant Hegde, Daniel Axtens



On 10/09/19 7:35 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> On 09/09/19 9:03 PM, Oliver O'Halloran wrote:
>>> On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>>> On 04/09/19 5:50 PM, Michael Ellerman wrote:
>>>>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>>> [...]
>>>>
>>>>>> +/*
>>>>>> + * CPU state data is provided by f/w. Below are the definitions
>>>>>> + * provided in HDAT spec. Refer to latest HDAT specification for
>>>>>> + * any update to this format.
>>>>>> + */
>>>>>
>>>>> How is this meant to work? If HDAT ever changes the format they will
>>>>> break all existing kernels in the field.
>>>>>
>>>>>> +#define HDAT_FADUMP_CPU_DATA_VERSION                1
>>>>
>>>> Changes are not expected here. But this is just to cover for such scenario,
>>>> if that ever happens.
>>>
>>> The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR.
>>> As far as I can tell the values you've assumed here are chip-specific,
>>> non-architected SPR numbers that come from an array buried somewhere
>>> in the SBE codebase. I don't believe you for a second when you say
>>> that this will never change.
>>
>> At least, the understanding is that this numbers not change across processor
>> generations. If something changes, it is supposed to be handled in SBE. Also,
>> I am told this numbers would be listed in the HDAT Spec. Not sure if that
>> happened yet though. Vasant, you have anything to add?
> 
> That doesn't help much because the HDAT spec is not public.
> 
> The point is with the code written the way it is, these values *must
> not* change, or else all existing kernels will be broken, which is not
> acceptable.

Yeah. It is absurd to error out just by looking at version number...

> 
>>>> Also, I think it is a bit far-fetched to error out if versions mismatch.
>>>> Warning and proceeding sounds worthier because the changes are usually
>>>> backward compatible, if and when there are any. Will update accordingly...
>>>
>>> Literally the only reason I didn't drop the CPU DATA parts of the OPAL
>>> MPIPL series was because I assumed the kernel would do the sensible
>>> thing and reject or ignore the structure if it did not know how to
>>> parse the data.
>>
>> I think, the changes if any, would have to be backward compatible for the sake
>> of sanity.
> 
> People need to understand that this is an ABI between firmware and
> in-the-field distribution kernels which are only updated at customer
> discretion, or possibly never.
> 
> Any changes *must be* backward compatible.
> 
> Looking at the header struct:
> 
> +struct hdat_fadump_thread_hdr {
> +	__be32  pir;
> +	/* 0x00 - 0x0F - The corresponding stop state of the core */
> +	u8      core_state;
> +	u8      reserved[3];
> 
> You have those 3 reserved bytes, so a future revision could repurpose
> one of those as a flag to indicate a new format. And/or the hdr could be
> made bigger and new kernels could be taught to look for new things in
> the space after the hdr but before the reg entries.
> 
> So I think there is a reasonable mechanism for extending the format in
> future, but my point is people must understand that this is an ABI and
> changes must be made accordingly.

True. The folks who make the changes to this format should be aware that
breaking kernel ABI is not going to be pretty and I think they are :)

> 
>> Even if they are not, we are better off exporting the /proc/vmcore
>> with a warning and some crazy CPU register data (if parsing goes alright) than
>> no dump at all? 
> 
> If it's just a case of reg entries that we don't recognise then yes I
> think it would be OK to just skip them and continue exporting. But if
> there's any deeper misunderstanding of the format then we should bail
> out.

Sure. Will try and fix that by first trying to do a sanity check on the
fields that are needed for parsing the data and proceed with a warning if
nothing weird is detected and fallback to just appending crashing cpu as
done in patch 16/31, if anything weird is observed. That should hopefully
take care of all cases in the best possible way..

> 
> I notice now that you don't do anything in opal_fadump_set_regval_regnum()
> if you are passed a register we don't understand, so that probably needs
> fixing.

f/w provides about 100 odd registers in the CPU state data. Most of them
pt_regs doesn't care about. So, opal_fadump_set_regval_regnum is happy as
long as it find the registers listed in it. Unless, pt_regs changes, we
can stick to this and ignore the rest of them?

- Hari


^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2019-09-10 16:13 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 12:04 [PATCH v5 00/31] Add FADump support on PowerNV platform Hari Bathini
2019-08-20 12:04 ` [PATCH v5 01/31] powerpc/fadump: move internal macros/definitions to a new header Hari Bathini
2019-09-03 11:09   ` Michael Ellerman
2019-09-03 16:05     ` Hari Bathini
2019-08-20 12:04 ` [PATCH v5 02/31] powerpc/fadump: move internal code to a new file Hari Bathini
2019-09-03 11:09   ` Michael Ellerman
2019-09-03 16:05     ` Hari Bathini
2019-09-04  9:02       ` Mahesh Jagannath Salgaonkar
2019-09-04 18:26         ` Hari Bathini
2019-08-20 12:04 ` [PATCH v5 03/31] powerpc/fadump: Improve fadump documentation Hari Bathini
2019-08-20 12:04 ` [PATCH v5 04/31] pseries/fadump: move rtas specific definitions to platform code Hari Bathini
2019-08-20 12:04 ` [PATCH v5 05/31] pseries/fadump: introduce callbacks for platform specific operations Hari Bathini
2019-09-03 11:10   ` Michael Ellerman
2019-09-03 16:06     ` Hari Bathini
2019-09-06  6:39       ` Hari Bathini
2019-08-20 12:04 ` [PATCH v5 06/31] pseries/fadump: define register/un-register callback functions Hari Bathini
2019-09-03 11:10   ` Michael Ellerman
2019-09-03 17:15     ` Hari Bathini
2019-08-20 12:04 ` [PATCH v5 07/31] powerpc/fadump: release all the memory above boot memory size Hari Bathini
2019-09-03 11:10   ` Michael Ellerman
2019-09-03 16:27     ` Hari Bathini
2019-08-20 12:05 ` [PATCH v5 08/31] pseries/fadump: move out platform specific support from generic code Hari Bathini
2019-08-20 12:05 ` [PATCH v5 09/31] powerpc/fadump: use FADump instead of fadump for how it is pronounced Hari Bathini
2019-08-20 12:05 ` [PATCH v5 10/31] opal: add MPIPL interface definitions Hari Bathini
2019-09-03 11:10   ` Michael Ellerman
2019-09-03 16:28     ` Hari Bathini
2019-09-04 11:03       ` Michael Ellerman
2019-09-04 11:05   ` Michael Ellerman
2019-08-20 12:05 ` [PATCH v5 11/31] powernv/fadump: add fadump support on powernv Hari Bathini
2019-09-03 11:10   ` Michael Ellerman
2019-09-03 16:31     ` Hari Bathini
2019-09-04 14:33       ` Hari Bathini
2019-09-05  3:11         ` Michael Ellerman
2019-08-20 12:05 ` [PATCH v5 12/31] powernv/fadump: register kernel metadata address with opal Hari Bathini
2019-09-04 11:25   ` Michael Ellerman
2019-08-20 12:05 ` [PATCH v5 13/31] powernv/fadump: reset metadata address during clean up Hari Bathini
2019-08-27 12:00   ` Hari Bathini
2019-08-20 12:05 ` [PATCH v5 14/31] powernv/fadump: define register/un-register callback functions Hari Bathini
2019-09-05  4:15   ` Michael Ellerman
2019-09-05  7:23   ` Michael Ellerman
2019-09-05  9:54     ` Hari Bathini
2019-08-20 12:05 ` [PATCH v5 15/31] powernv/fadump: support copying multiple kernel boot memory regions Hari Bathini
2019-09-04 11:30   ` Michael Ellerman
2019-09-04 20:20     ` Hari Bathini
2019-09-05  3:13       ` Michael Ellerman
2019-08-20 12:06 ` [PATCH v5 16/31] powernv/fadump: process the crashdump by exporting it as /proc/vmcore Hari Bathini
2019-09-04 11:42   ` Michael Ellerman
2019-09-04 21:01     ` Hari Bathini
2019-08-20 12:06 ` [PATCH v5 17/31] powernv/fadump: Warn before processing partial crashdump Hari Bathini
2019-09-04 11:48   ` Michael Ellerman
2019-08-20 12:06 ` [PATCH v5 18/31] powernv/fadump: handle invalidation of crashdump and re-registraion Hari Bathini
2019-08-20 12:06 ` [PATCH v5 19/31] powerpc/fadump: Update documentation about OPAL platform support Hari Bathini
2019-09-04 11:51   ` Michael Ellerman
2019-09-04 12:08     ` Oliver O'Halloran
2019-09-05  3:15       ` Michael Ellerman
2019-08-20 12:06 ` [PATCH v5 20/31] powerpc/fadump: use smaller offset while finding memory for reservation Hari Bathini
2019-09-04 11:54   ` Michael Ellerman
2019-08-20 12:06 ` [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware Hari Bathini
2019-09-04 12:20   ` Michael Ellerman
2019-09-09 13:23     ` Hari Bathini
2019-09-09 15:33       ` Oliver O'Halloran
2019-09-10  8:48         ` Hari Bathini
2019-09-10 14:05           ` Michael Ellerman
2019-09-10 16:10             ` Hari Bathini
2019-08-20 12:06 ` [PATCH v5 22/31] powerpc/fadump: make crash memory ranges array allocation generic Hari Bathini
2019-08-20 12:06 ` [PATCH v5 23/31] powerpc/fadump: consider reserved ranges while releasing memory Hari Bathini
2019-08-20 12:07 ` [PATCH v5 24/31] powerpc/fadump: improve how crashed kernel's memory is reserved Hari Bathini
2019-08-20 12:07 ` [PATCH v5 25/31] powernv/fadump: add support to preserve crash data on FADUMP disabled kernel Hari Bathini
2019-08-20 12:07 ` [PATCH v5 26/31] powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP Hari Bathini
2019-08-20 12:07 ` [PATCH v5 27/31] powernv/opalcore: export /sys/firmware/opal/core for analysing opal crashes Hari Bathini
2019-08-20 12:07 ` [PATCH v5 28/31] powernv/opalcore: provide an option to invalidate /sys/firmware/opal/core file Hari Bathini
2019-08-20 12:07 ` [PATCH v5 29/31] powerpc/fadump: consider f/w load area Hari Bathini
2019-08-20 12:07 ` [PATCH v5 30/31] powernv/fadump: update documentation about option to release opalcore Hari Bathini
2019-08-20 12:07 ` [PATCH v5 31/31] powernv/fadump: support holes in kernel boot memory area Hari Bathini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).