linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events
@ 2022-02-21  8:46 Sourabh Jain
  2022-02-21  8:46 ` [RFC PATCH 1/5] powerpc/kdump: export functions from file_load_64.c Sourabh Jain
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Sourabh Jain @ 2022-02-21  8:46 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, kexec, hbathini

On hotplug event (CPU/memory) the CPU information prepared for the kdump kernel
becomes stale unless it is prepared again. To keep the CPU information
up-to-date a kdump service reload is triggered via the udev rule.

The above approach has two downsides:

1) The udev rules are prone to races if hotplug event is frequent. The time is
   taken to settle down all the kdump service reload requested is significant
   when multiple CPU/memory hotplug is performed at the same time. This creates
   a window where kernel crash might not lead to successfully dump collection.

2) Unnecessary CPU cycles are consumed to reload all the kdump components
   including initrd, vmlinux, FDT, etc. whereas only one component needs to
   update that is FDT.

How this patch series solve the above issue?
--------------------------------------------
As mentioned above the only kexec segment that gets updated during
the kdump service reload (due to hotplug event) is FDT. So, instead
of re-creating the FDT on every hotplug event, it is just created
once and updated on every hotplug event. This FDT is referred as kexec
crash FDT.


How kexec crash FDT is managed?
-------------------------------
During the kernel boot, a hole is allocated for kexec crash FDT in the kdump
reserved region. On kdump service start a fresh copy of kdump FDT
(created by kexec tool or kernel-based on which system call is used) is
copied to the pre-allocated hole for kexec crash FDT. Once a kexec crash
FDT is loaded all the subsequent updates needed due to CPU hot-add event
can directly be done to kexec crash FDT without reloading all the kexec
segment again. A hook is added on the CPU hot-add path to update the kexec
crash FDT.


How kexec crash FDT is accessed in kexec_load and kexec_file_load system call?
------------------------------------------------------------------------------
Since kexec_file_load creates all kexec segments are prepared in the kernel,
it can easily access the kexec crash FDT with help of two global variables,
that holds the start address and the size of the kexec crash FDT.

In kexec_load system call, the kexec segments are prepared by the kexec tool in
userspace. The start address and the size of kexec crash fdt is provided to
userspace via two sysfs files /sys/kernel/kexec_crash_fdt and
/sys/kernel/kexec_crash_fdt_size.


A couple of minor changes are required to realise the benefit of the patch
series:

- disalble the udev rule:

  comment out the below line in kdump udev rule file:
  RHEL: /usr/lib/udev/rules.d/98-kexec.rules
  # SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"

- kexec tool needs to be updated with patch for kexec_load system call
  to work (not needed if -s option is used during kexec panic load):

---
From 37aa38713c163b31d9c6e80ddc059424c9fcd66d Mon Sep 17 00:00:00 2001
From: Sourabh Jain <sourabhjain@linux.ibm.com>
Date: Mon, 22 Nov 2021 14:12:52 +0530
Subject: [PATCH] kexec/ppc64: use pre-allocated memory hole for kexec crash
 FDT

Enabled kexec to use the per allocated memory hole for kexec crash FDT
which is exported via /sys/kernel/kexec_crash_fdt and
/sys/kernel/kexec_crash_fdt_size sysfs files. Using this pre-allocated
memory hole for kdump fdt will allow the kernel to keep the kdump fdt
up-to-date with the latest CPU information.

In case a pre-allocated memory hole is used for kdump fdt, the kdump fdt
the segment is not included in SHA calculation because kdump fdt will be
modified by the kernel.

To maintain the backward compatibility, we fall back to the old option of
finding hole for kdump fdt segment if the pre-allocated buffer is not provided
by the kernel.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 kexec/arch/ppc64/kexec-elf-ppc64.c | 11 +++++--
 kexec/arch/ppc64/kexec-ppc64.c     | 49 ++++++++++++++++++++++++++++++
 kexec/kexec.c                      |  9 ++++++
 kexec/kexec.h                      |  4 +++
 4 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/kexec/arch/ppc64/kexec-elf-ppc64.c b/kexec/arch/ppc64/kexec-elf-ppc64.c
index 695b8b0..8e66ef0 100644
--- a/kexec/arch/ppc64/kexec-elf-ppc64.c
+++ b/kexec/arch/ppc64/kexec-elf-ppc64.c
@@ -329,8 +329,15 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, off_t len,
 	if (result < 0)
 		return result;
 
-	my_dt_offset = add_buffer(info, seg_buf, seg_size, seg_size,
-				0, 0, max_addr, -1);
+        if (kexec_crash_fdt) {
+                my_dt_offset = kexec_crash_fdt;
+                add_segment_phys_virt(info, seg_buf, seg_size,
+				      my_dt_offset, kexec_crash_fdt_size, 0);
+        }
+        else {
+                my_dt_offset = add_buffer(info, seg_buf, seg_size, seg_size,
+                                          0, 0, max_addr, -1);
+        }
 
 #ifdef NEED_RESERVE_DTB
 	/* patch reserve map address for flattened device-tree
diff --git a/kexec/arch/ppc64/kexec-ppc64.c b/kexec/arch/ppc64/kexec-ppc64.c
index 5b17740..d4385bd 100644
--- a/kexec/arch/ppc64/kexec-ppc64.c
+++ b/kexec/arch/ppc64/kexec-ppc64.c
@@ -24,6 +24,7 @@
 #include <errno.h>
 #include <stdint.h>
 #include <string.h>
+#include <fcntl.h>
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <dirent.h>
@@ -373,6 +374,52 @@ void scan_reserved_ranges(unsigned long kexec_flags, int *range_index)
 	*range_index = i;
 }
 
+void get_kexec_crash_fdt_details(unsigned long kexec_flags)
+{
+	int fd, len;
+	char buf[MAXBYTES] = { 0 };
+
+	const char * const kexec_fdt_sysfs = "/sys/kernel/kexec_crash_fdt";
+	const char * const kexec_fdt_size_sysfs = "/sys/kernel/kexec_crash_fdt_size";
+
+        fd = open(kexec_fdt_sysfs, O_RDONLY);
+        if (fd < 0)
+                return;
+
+        len = read(fd, buf, MAXBYTES);
+        if (len < 0)
+                goto err_out;
+
+        kexec_crash_fdt = strtoul(buf, NULL, 16);
+
+	fd = open(kexec_fdt_size_sysfs, O_RDONLY);
+	if (fd < 0)
+		goto err_out;
+
+	len = read(fd, buf, MAXBYTES);
+	if (len < 0)
+		goto err_out;
+
+	kexec_crash_fdt_size = strtoul(buf, NULL, 10);
+
+        exclude_range[nr_exclude_ranges].start = kexec_crash_fdt;
+        exclude_range[nr_exclude_ranges].end = kexec_crash_fdt + \
+					       kexec_crash_fdt_size;
+        nr_exclude_ranges++;
+
+        if (nr_exclude_ranges >= max_memory_ranges)
+                realloc_memory_ranges();
+
+	goto out;
+
+err_out:
+	kexec_crash_fdt = kexec_fdt_size = 0;
+
+out:
+        close (fd);
+        return;
+}
+
 /* Return 0 if fname/value valid, -1 otherwise */
 int get_devtree_value(const char *fname, unsigned long long *value)
 {
@@ -804,6 +851,8 @@ int setup_memory_ranges(unsigned long kexec_flags)
 		goto out;
 	if (get_devtree_details(kexec_flags))
 		goto out;
+	if (kexec_flags & KEXEC_ON_CRASH)
+		get_kexec_crash_fdt_details(kexec_flags);
 
 	for (i = 0; i < nr_exclude_ranges; i++) {
 		/* If first exclude range does not start with 0, include the
diff --git a/kexec/kexec.c b/kexec/kexec.c
index f63b36b..89283f7 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -62,6 +62,10 @@ static unsigned long kexec_flags = 0;
 /* Flags for kexec file (fd) based syscall */
 static unsigned long kexec_file_flags = 0;
 int kexec_debug = 0;
+#if defined(__powerpc__) || defined(__powerpc64__)
+uint64_t kexec_crash_fdt;
+uint32_t kexec_cras_fdt_size;
+#endif
 
 void dbgprint_mem_range(const char *prefix, struct memory_range *mr, int nr_mr)
 {
@@ -672,6 +676,11 @@ static void update_purgatory(struct kexec_info *info)
 		if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
 			continue;
 		}
+
+#if defined(__powerpc__) || defined(__powerpc64__)
+		if (kexec_crash_fdt && (unsigned long)info->segment[i].mem == kexec_crash_fdt)
+			continue;
+#endif
 		sha256_update(&ctx, info->segment[i].buf,
 			      info->segment[i].bufsz);
 		nullsz = info->segment[i].memsz - info->segment[i].bufsz;
diff --git a/kexec/kexec.h b/kexec/kexec.h
index 595dd68..48e8b9f 100644
--- a/kexec/kexec.h
+++ b/kexec/kexec.h
@@ -205,6 +205,10 @@ struct file_type {
 
 extern struct file_type file_type[];
 extern int file_types;
+#if defined(__powerpc__) || defined(__powerpc64__)
+extern uint64_t fdt;
+extern uint32_t fdt_size;
+#endif
 
 #define OPT_HELP		'h'
 #define OPT_VERSION		'v'
-- 
2.34.1
---


Sourabh Jain (5):
  powerpc/kdump: export functions from file_load_64.c
  powerpc/kdump: setup kexec crash FDT
  powerpc/kdump: update kexec crash FDT on CPU hot add event
  powerpc/kdump: enable kexec_file_load system call to use kexec crash
    FDT
  powerpc/kdump: export kexec crash FDT details via sysfs

 arch/powerpc/Kconfig                         |  11 +
 arch/powerpc/include/asm/kexec.h             |  10 +
 arch/powerpc/kexec/core_64.c                 | 318 +++++++++++++++++++
 arch/powerpc/kexec/elf_64.c                  |  22 +-
 arch/powerpc/kexec/file_load_64.c            | 239 +-------------
 arch/powerpc/platforms/pseries/hotplug-cpu.c |   7 +
 6 files changed, 369 insertions(+), 238 deletions(-)

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/5] powerpc/kdump: export functions from file_load_64.c
  2022-02-21  8:46 [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Sourabh Jain
@ 2022-02-21  8:46 ` Sourabh Jain
  2022-02-21  8:46 ` [RFC PATCH 2/5] powerpc/kdump: setup kexec crash FDT Sourabh Jain
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Sourabh Jain @ 2022-02-21  8:46 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, kexec, hbathini

Functions get_exclude_memory_ranges and locate_mem_hole_top_down_ppc64
can be shared across different kexec components, so export them as global
functions.

The locate_mem_hole_top_down_ppc64 and get_exclude_memory_ranges functions
definition is moved to core_64.c so that both kexec_load and kexec_file_load
system call can use them.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/include/asm/kexec.h  |   4 +
 arch/powerpc/kexec/core_64.c      | 150 ++++++++++++++++++++++++++++++
 arch/powerpc/kexec/file_load_64.c | 148 -----------------------------
 3 files changed, 154 insertions(+), 148 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 88d0d7cf3a79..c2398140aa3b 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -128,6 +128,10 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image);
 int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
 			unsigned long initrd_load_addr,
 			unsigned long initrd_len, const char *cmdline);
+int get_exclude_memory_ranges(struct crash_mem **mem_ranges);
+int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf,
+				   u64 buf_min, u64 buf_max,
+				   const struct crash_mem *emem);
 #endif /* CONFIG_PPC64 */
 
 #endif /* CONFIG_KEXEC_FILE */
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 89c069d664a5..583eb7fa3388 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -16,6 +16,7 @@
 #include <linux/kernel.h>
 #include <linux/cpu.h>
 #include <linux/hardirq.h>
+#include <linux/memblock.h>
 
 #include <asm/page.h>
 #include <asm/current.h>
@@ -27,6 +28,7 @@
 #include <asm/sections.h>	/* _end */
 #include <asm/prom.h>
 #include <asm/smp.h>
+#include <asm/kexec_ranges.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/asm-prototypes.h>
 #include <asm/svm.h>
@@ -74,6 +76,154 @@ int default_machine_kexec_prepare(struct kimage *image)
 	return 0;
 }
 
+/**
+ * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
+ *                             regions like opal/rtas, tce-table, initrd,
+ *                             kernel, htab which should be avoided while
+ *                             setting up kexec load segments.
+ * @mem_ranges:                Range list to add the memory ranges to.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
+{
+	int ret;
+
+	ret = add_tce_mem_ranges(mem_ranges);
+	if (ret)
+		goto out;
+
+	ret = add_initrd_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	ret = add_htab_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	ret = add_kernel_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	ret = add_rtas_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	ret = add_opal_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	ret = add_reserved_mem_ranges(mem_ranges);
+	if (ret)
+		goto out;
+
+	/* exclude memory ranges should be sorted for easy lookup */
+	sort_memory_ranges(*mem_ranges, true);
+out:
+	if (ret)
+		pr_err("Failed to setup exclude memory ranges\n");
+	return ret;
+}
+
+/**
+ * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
+ *                              in the memory regions between buf_min & buf_max
+ *                              for the buffer. If found, sets kbuf->mem.
+ * @kbuf:                       Buffer contents and memory parameters.
+ * @buf_min:                    Minimum address for the buffer.
+ * @buf_max:                    Maximum address for the buffer.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
+				      u64 buf_min, u64 buf_max)
+{
+	int ret = -EADDRNOTAVAIL;
+	phys_addr_t start, end;
+	u64 i;
+
+	for_each_mem_range_rev(i, &start, &end) {
+		/*
+		 * memblock uses [start, end) convention while it is
+		 * [start, end] here. Fix the off-by-one to have the
+		 * same convention.
+		 */
+		end -= 1;
+
+		if (start > buf_max)
+			continue;
+
+		/* Memory hole not found */
+		if (end < buf_min)
+			break;
+
+		/* Adjust memory region based on the given range */
+		if (start < buf_min)
+			start = buf_min;
+		if (end > buf_max)
+			end = buf_max;
+
+		start = ALIGN(start, kbuf->buf_align);
+		if (start < end && (end - start + 1) >= kbuf->memsz) {
+			/* Suitable memory range found. Set kbuf->mem */
+			kbuf->mem = ALIGN_DOWN(end - kbuf->memsz + 1,
+					       kbuf->buf_align);
+			ret = 0;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+/**
+ * locate_mem_hole_top_down_ppc64 - Skip special memory regions to find a
+ *                                  suitable buffer with top down approach.
+ * @kbuf:                           Buffer contents and memory parameters.
+ * @buf_min:                        Minimum address for the buffer.
+ * @buf_max:                        Maximum address for the buffer.
+ * @emem:                           Exclude memory ranges.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf,
+					  u64 buf_min, u64 buf_max,
+					  const struct crash_mem *emem)
+{
+	int i, ret = 0, err = -EADDRNOTAVAIL;
+	u64 start, end, tmin, tmax;
+
+	tmax = buf_max;
+	for (i = (emem->nr_ranges - 1); i >= 0; i--) {
+		start = emem->ranges[i].start;
+		end = emem->ranges[i].end;
+
+		if (start > tmax)
+			continue;
+
+		if (end < tmax) {
+			tmin = (end < buf_min ? buf_min : end + 1);
+			ret = __locate_mem_hole_top_down(kbuf, tmin, tmax);
+			if (!ret)
+				return 0;
+		}
+
+		tmax = start - 1;
+
+		if (tmax < buf_min) {
+			ret = err;
+			break;
+		}
+		ret = 0;
+	}
+
+	if (!ret) {
+		tmin = buf_min;
+		ret = __locate_mem_hole_top_down(kbuf, tmin, tmax);
+	}
+	return ret;
+}
+
 /* Called during kexec sequence with MMU off */
 static notrace void copy_segments(unsigned long ind)
 {
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 5056e175ca2c..194a3c72a7a9 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -42,55 +42,6 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
 	NULL
 };
 
-/**
- * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
- *                             regions like opal/rtas, tce-table, initrd,
- *                             kernel, htab which should be avoided while
- *                             setting up kexec load segments.
- * @mem_ranges:                Range list to add the memory ranges to.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
-{
-	int ret;
-
-	ret = add_tce_mem_ranges(mem_ranges);
-	if (ret)
-		goto out;
-
-	ret = add_initrd_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	ret = add_htab_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	ret = add_kernel_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	ret = add_rtas_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	ret = add_opal_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	ret = add_reserved_mem_ranges(mem_ranges);
-	if (ret)
-		goto out;
-
-	/* exclude memory ranges should be sorted for easy lookup */
-	sort_memory_ranges(*mem_ranges, true);
-out:
-	if (ret)
-		pr_err("Failed to setup exclude memory ranges\n");
-	return ret;
-}
-
 /**
  * get_usable_memory_ranges - Get usable memory ranges. This list includes
  *                            regions like crashkernel, opal/rtas & tce-table,
@@ -232,105 +183,6 @@ static int get_reserved_memory_ranges(struct crash_mem **mem_ranges)
 	return ret;
 }
 
-/**
- * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
- *                              in the memory regions between buf_min & buf_max
- *                              for the buffer. If found, sets kbuf->mem.
- * @kbuf:                       Buffer contents and memory parameters.
- * @buf_min:                    Minimum address for the buffer.
- * @buf_max:                    Maximum address for the buffer.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
-				      u64 buf_min, u64 buf_max)
-{
-	int ret = -EADDRNOTAVAIL;
-	phys_addr_t start, end;
-	u64 i;
-
-	for_each_mem_range_rev(i, &start, &end) {
-		/*
-		 * memblock uses [start, end) convention while it is
-		 * [start, end] here. Fix the off-by-one to have the
-		 * same convention.
-		 */
-		end -= 1;
-
-		if (start > buf_max)
-			continue;
-
-		/* Memory hole not found */
-		if (end < buf_min)
-			break;
-
-		/* Adjust memory region based on the given range */
-		if (start < buf_min)
-			start = buf_min;
-		if (end > buf_max)
-			end = buf_max;
-
-		start = ALIGN(start, kbuf->buf_align);
-		if (start < end && (end - start + 1) >= kbuf->memsz) {
-			/* Suitable memory range found. Set kbuf->mem */
-			kbuf->mem = ALIGN_DOWN(end - kbuf->memsz + 1,
-					       kbuf->buf_align);
-			ret = 0;
-			break;
-		}
-	}
-
-	return ret;
-}
-
-/**
- * locate_mem_hole_top_down_ppc64 - Skip special memory regions to find a
- *                                  suitable buffer with top down approach.
- * @kbuf:                           Buffer contents and memory parameters.
- * @buf_min:                        Minimum address for the buffer.
- * @buf_max:                        Maximum address for the buffer.
- * @emem:                           Exclude memory ranges.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf,
-					  u64 buf_min, u64 buf_max,
-					  const struct crash_mem *emem)
-{
-	int i, ret = 0, err = -EADDRNOTAVAIL;
-	u64 start, end, tmin, tmax;
-
-	tmax = buf_max;
-	for (i = (emem->nr_ranges - 1); i >= 0; i--) {
-		start = emem->ranges[i].start;
-		end = emem->ranges[i].end;
-
-		if (start > tmax)
-			continue;
-
-		if (end < tmax) {
-			tmin = (end < buf_min ? buf_min : end + 1);
-			ret = __locate_mem_hole_top_down(kbuf, tmin, tmax);
-			if (!ret)
-				return 0;
-		}
-
-		tmax = start - 1;
-
-		if (tmax < buf_min) {
-			ret = err;
-			break;
-		}
-		ret = 0;
-	}
-
-	if (!ret) {
-		tmin = buf_min;
-		ret = __locate_mem_hole_top_down(kbuf, tmin, tmax);
-	}
-	return ret;
-}
-
 /**
  * __locate_mem_hole_bottom_up - Looks bottom up for a large enough memory hole
  *                               in the memory regions between buf_min & buf_max
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/5] powerpc/kdump: setup kexec crash FDT
  2022-02-21  8:46 [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Sourabh Jain
  2022-02-21  8:46 ` [RFC PATCH 1/5] powerpc/kdump: export functions from file_load_64.c Sourabh Jain
@ 2022-02-21  8:46 ` Sourabh Jain
  2022-02-21  8:46 ` [RFC PATCH 3/5] powerpc/kdump: update kexec crash FDT on CPU hot add event Sourabh Jain
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Sourabh Jain @ 2022-02-21  8:46 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, kexec, hbathini

DLPAR operation post kexec_load or kexec_file_load system call makes the
kexec crash kernel FDT stale. Booting kdump kernel with stale FDT leads
to kernel hang. To keep the kdump FDT updated kdump reload is performed
after DLPAR operation.

The kdump reloading after each DLPAR operation can be avoided if kdump_load
and kexec_file_load can use the FDT created and maintained by the kernel.

Set up a kexec crash FDT in crashkernel reserved region by allocating a memory
hole in crashkernel reserved region. Subsequent patches take care of updating
this FDT and using this FDT in kexec_load and kexec_file_load system calls.

A new config KEXEC_CRASH_FDT is added to include this feature at the build
time.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/Kconfig             | 11 +++++++
 arch/powerpc/include/asm/kexec.h |  5 +++
 arch/powerpc/kexec/core_64.c     | 52 ++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ba5b66189358..b11a50851d2c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -634,6 +634,17 @@ config PRESERVE_FA_DUMP
 	  memory preserving kernel boot would process this crash data.
 	  Petitboot kernel is the typical usecase for this option.
 
+config KEXEC_CRASH_FDT
+	bool "Update crash kernel FDT on hotplug event"
+	depends on CRASH_DUMP && HOTPLUG_CPU
+	help
+	  This option avoids the need to reload kdump on every hotplug event.
+	  During boot, the kernel pre-allocates the space for kdump FDT in
+	  the reserved area and uses the same space to store the FDT created
+	  during kexec_load or kexec_file_load. This helps the kernel to just
+	  update the FDT with the latest hotplug event info rather than
+	  preparing and reloading all the kexec components again.
+
 config OPAL_CORE
 	bool "Export OPAL memory as /sys/firmware/opal/core"
 	depends on PPC64 && PPC_POWERNV
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index c2398140aa3b..dc5b29eb2ddb 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -119,6 +119,11 @@ int setup_purgatory(struct kimage *image, const void *slave_code,
 #ifdef CONFIG_PPC64
 struct kexec_buf;
 
+#ifdef CONFIG_KEXEC_CRASH_FDT
+extern void *kexec_crash_fdt;
+extern u32 kexec_crash_fdt_size;
+#endif /* CONFIG_KEXEC_CRASH_FDT */
+
 int load_crashdump_segments_ppc64(struct kimage *image,
 				  struct kexec_buf *kbuf);
 int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 583eb7fa3388..be15aaf17a88 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -19,6 +19,7 @@
 #include <linux/memblock.h>
 
 #include <asm/page.h>
+#include <asm/kexec.h>
 #include <asm/current.h>
 #include <asm/machdep.h>
 #include <asm/cacheflush.h>
@@ -34,6 +35,13 @@
 #include <asm/svm.h>
 #include <asm/ultravisor.h>
 
+#ifdef CONFIG_KEXEC_CRASH_FDT
+void *kexec_crash_fdt;
+EXPORT_SYMBOL_GPL(kexec_crash_fdt);
+u32 kexec_crash_fdt_size;
+EXPORT_SYMBOL_GPL(kexec_crash_fdt_size);
+#endif
+
 int default_machine_kexec_prepare(struct kimage *image)
 {
 	int i;
@@ -571,3 +579,47 @@ static int __init export_htab_values(void)
 }
 late_initcall(export_htab_values);
 #endif /* CONFIG_PPC_BOOK3S_64 */
+
+#ifdef CONFIG_KEXEC_CRASH_FDT
+/* Calculate the size of kexec crash fdt */
+static u32 get_kexec_crash_fdt_size(void)
+{
+	// TODO: add logic to calculate it based on system configuration
+	return 1024*1024;
+}
+
+/* Setup the memory hole for kdump fdt in reserved region below RMA.
+ */
+static int __init setup_kexec_crash_fdt(void)
+{
+	int ret = 0;
+	struct crash_mem *exclude_ranges = NULL;
+	struct kexec_buf kbuf;
+
+	/* make sure kdump is enabled and necessary memory reservation
+	 * is done. The kexec crash FDT will be part of reserved memory
+	 * region.
+	 */
+	if (!crash_get_memory_size())
+		goto out;
+
+	if (get_exclude_memory_ranges(&exclude_ranges))
+		goto out;
+
+	/* locate memory hole for kdump fdt in reserved region below RMA */
+	kbuf.mem = 0;
+	kbuf.memsz = (unsigned long) get_kexec_crash_fdt_size();
+	ret = locate_mem_hole_top_down_ppc64(&kbuf, crashk_res.start,
+					     min(ppc64_rma_size, crashk_res.end),
+					     exclude_ranges);
+	if (ret)
+		return -ENOMEM;
+
+	kexec_crash_fdt = __va(kbuf.mem);
+	kexec_crash_fdt_size = kbuf.memsz;
+
+out:
+	return ret;
+}
+late_initcall(setup_kexec_crash_fdt);
+#endif /* CONFIG_KEXEC_CRASH_FDT */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/5] powerpc/kdump: update kexec crash FDT on CPU hot add event
  2022-02-21  8:46 [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Sourabh Jain
  2022-02-21  8:46 ` [RFC PATCH 1/5] powerpc/kdump: export functions from file_load_64.c Sourabh Jain
  2022-02-21  8:46 ` [RFC PATCH 2/5] powerpc/kdump: setup kexec crash FDT Sourabh Jain
@ 2022-02-21  8:46 ` Sourabh Jain
  2022-02-21  8:46 ` [RFC PATCH 4/5] powerpc/kdump: enable kexec_file_load system call to use kexec crash FDT Sourabh Jain
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Sourabh Jain @ 2022-02-21  8:46 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, kexec, hbathini

Add a hook in CPU hot add path to update the kexec crash FDT with latest
CPU data.

To avoid code replication update_cpus_node function defined in
kexec/file_load_64.c is exported and moved to core_64.c file so
that it will be accessible to both kexec_load and kexec_file_load system
call.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/include/asm/kexec.h             |  1 +
 arch/powerpc/kexec/core_64.c                 | 89 ++++++++++++++++++++
 arch/powerpc/kexec/file_load_64.c            | 87 -------------------
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  7 ++
 4 files changed, 97 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index dc5b29eb2ddb..d608c661758e 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -137,6 +137,7 @@ int get_exclude_memory_ranges(struct crash_mem **mem_ranges);
 int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf,
 				   u64 buf_min, u64 buf_max,
 				   const struct crash_mem *emem);
+int update_cpus_node(void *fdt);
 #endif /* CONFIG_PPC64 */
 
 #endif /* CONFIG_KEXEC_FILE */
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index be15aaf17a88..57afceee53a6 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -16,6 +16,7 @@
 #include <linux/kernel.h>
 #include <linux/cpu.h>
 #include <linux/hardirq.h>
+#include <linux/libfdt.h>
 #include <linux/memblock.h>
 
 #include <asm/page.h>
@@ -84,6 +85,93 @@ int default_machine_kexec_prepare(struct kimage *image)
 	return 0;
 }
 
+/**
+ * add_node_props - Reads node properties from device node structure and add
+ *                  them to fdt.
+ * @fdt:            Flattened device tree of the kernel
+ * @node_offset:    offset of the node to add a property at
+ * @dn:             device node pointer
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
+{
+	int ret = 0;
+	struct property *pp;
+
+	if (!dn)
+		return -EINVAL;
+
+	for_each_property_of_node(dn, pp) {
+		ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length);
+		if (ret < 0) {
+			pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret));
+			return ret;
+		}
+	}
+	return ret;
+}
+
+/**
+ * update_cpus_node - Update cpus node of flattened device tree using of_root
+ *                    device node.
+ * @fdt:              Flattened device tree of the kernel.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int update_cpus_node(void *fdt)
+{
+	struct device_node *cpus_node, *dn;
+	int cpus_offset, cpus_subnode_offset, ret = 0;
+
+	cpus_offset = fdt_path_offset(fdt, "/cpus");
+	if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
+		pr_err("Malformed device tree: error reading /cpus node: %s\n",
+		       fdt_strerror(cpus_offset));
+		return cpus_offset;
+	}
+
+	if (cpus_offset > 0) {
+		ret = fdt_del_node(fdt, cpus_offset);
+		if (ret < 0) {
+			pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret));
+			return -EINVAL;
+		}
+	}
+
+	/* Add cpus node to fdt */
+	cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
+	if (cpus_offset < 0) {
+		pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset));
+		return -EINVAL;
+	}
+
+	/* Add cpus node properties */
+	cpus_node = of_find_node_by_path("/cpus");
+	ret = add_node_props(fdt, cpus_offset, cpus_node);
+	of_node_put(cpus_node);
+	if (ret < 0)
+		return ret;
+
+	/* Loop through all subnodes of cpus and add them to fdt */
+	for_each_node_by_type(dn, "cpu") {
+		cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name);
+		if (cpus_subnode_offset < 0) {
+			pr_err("Unable to add %s subnode: %s\n", dn->full_name,
+			       fdt_strerror(cpus_subnode_offset));
+			ret = cpus_subnode_offset;
+			goto out;
+		}
+
+		ret = add_node_props(fdt, cpus_subnode_offset, dn);
+		if (ret < 0)
+			goto out;
+	}
+out:
+	of_node_put(dn);
+	return ret;
+}
+
 /**
  * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
  *                             regions like opal/rtas, tce-table, initrd,
@@ -130,6 +218,7 @@ int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
 out:
 	if (ret)
 		pr_err("Failed to setup exclude memory ranges\n");
+
 	return ret;
 }
 
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 194a3c72a7a9..02bb2adb1fe2 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -802,93 +802,6 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 	return (unsigned int)(usm_entries * sizeof(u64));
 }
 
-/**
- * add_node_props - Reads node properties from device node structure and add
- *                  them to fdt.
- * @fdt:            Flattened device tree of the kernel
- * @node_offset:    offset of the node to add a property at
- * @dn:             device node pointer
- *
- * Returns 0 on success, negative errno on error.
- */
-static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
-{
-	int ret = 0;
-	struct property *pp;
-
-	if (!dn)
-		return -EINVAL;
-
-	for_each_property_of_node(dn, pp) {
-		ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length);
-		if (ret < 0) {
-			pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret));
-			return ret;
-		}
-	}
-	return ret;
-}
-
-/**
- * update_cpus_node - Update cpus node of flattened device tree using of_root
- *                    device node.
- * @fdt:              Flattened device tree of the kernel.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int update_cpus_node(void *fdt)
-{
-	struct device_node *cpus_node, *dn;
-	int cpus_offset, cpus_subnode_offset, ret = 0;
-
-	cpus_offset = fdt_path_offset(fdt, "/cpus");
-	if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
-		pr_err("Malformed device tree: error reading /cpus node: %s\n",
-		       fdt_strerror(cpus_offset));
-		return cpus_offset;
-	}
-
-	if (cpus_offset > 0) {
-		ret = fdt_del_node(fdt, cpus_offset);
-		if (ret < 0) {
-			pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret));
-			return -EINVAL;
-		}
-	}
-
-	/* Add cpus node to fdt */
-	cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
-	if (cpus_offset < 0) {
-		pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset));
-		return -EINVAL;
-	}
-
-	/* Add cpus node properties */
-	cpus_node = of_find_node_by_path("/cpus");
-	ret = add_node_props(fdt, cpus_offset, cpus_node);
-	of_node_put(cpus_node);
-	if (ret < 0)
-		return ret;
-
-	/* Loop through all subnodes of cpus and add them to fdt */
-	for_each_node_by_type(dn, "cpu") {
-		cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name);
-		if (cpus_subnode_offset < 0) {
-			pr_err("Unable to add %s subnode: %s\n", dn->full_name,
-			       fdt_strerror(cpus_subnode_offset));
-			ret = cpus_subnode_offset;
-			goto out;
-		}
-
-		ret = add_node_props(fdt, cpus_subnode_offset, dn);
-		if (ret < 0)
-			goto out;
-	}
-out:
-	of_node_put(dn);
-	return ret;
-}
-
 /**
  * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
  *                       being loaded.
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index d646c22e94ab..701a34574e0b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
+#include <asm/kexec.h>
 #include <asm/firmware.h>
 #include <asm/machdep.h>
 #include <asm/vdso_datapage.h>
@@ -595,6 +596,12 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
 		return saved_rc;
 	}
 
+#ifdef CONFIG_KEXEC_CRASH_FDT
+	/* kdump should be aware of new CPU */
+	if (update_cpus_node(kexec_crash_fdt))
+		pr_err("unable to update kexec crash fdt\n");
+#endif
+
 	pr_debug("Successfully added CPU %pOFn, drc index: %x\n", dn,
 		 drc_index);
 	return rc;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 4/5] powerpc/kdump: enable kexec_file_load system call to use kexec crash FDT
  2022-02-21  8:46 [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Sourabh Jain
                   ` (2 preceding siblings ...)
  2022-02-21  8:46 ` [RFC PATCH 3/5] powerpc/kdump: update kexec crash FDT on CPU hot add event Sourabh Jain
@ 2022-02-21  8:46 ` Sourabh Jain
  2022-02-21  8:46 ` [RFC PATCH 5/5] powerpc/kdump: export kexec crash FDT details via sysfs Sourabh Jain
  2022-02-22  3:50 ` [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Baoquan He
  5 siblings, 0 replies; 8+ messages in thread
From: Sourabh Jain @ 2022-02-21  8:46 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, kexec, hbathini

This patch enables the kexec_file_load system to utilize the pre-allocated
space for kexec crash FDT during the system boot.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/kexec/elf_64.c       | 22 +++++++++++++++++++---
 arch/powerpc/kexec/file_load_64.c |  4 ++++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index eeb258002d1e..3176dea0910d 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -116,13 +116,29 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
 	if (ret)
 		goto out_free_fdt;
 
-	fdt_pack(fdt);
+#ifdef CONFIG_KEXEC_CRASH_FDT
+	if (kexec_crash_fdt && image->type == KEXEC_TYPE_CRASH) {
+		memcpy(kexec_crash_fdt, fdt, fdt_totalsize(fdt));
+		/* retain the original total size */
+		((struct fdt_header *)(kexec_crash_fdt))->totalsize = cpu_to_fdt32(kexec_crash_fdt_size);
+	} else
+#endif
+	{
+		kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
+	}
 
 	kbuf.buffer = fdt;
-	kbuf.bufsz = kbuf.memsz = fdt_totalsize(fdt);
+
+#ifdef CONFIG_KEXEC_CRASH_FDT
+	if (kexec_crash_fdt && image->type == KEXEC_TYPE_CRASH) {
+		kbuf.bufsz = kbuf.memsz = fdt_totalsize(kexec_crash_fdt);
+	} else
+#endif
+	{
+		kbuf.bufsz = kbuf.memsz = fdt_totalsize(fdt);
+	}
 	kbuf.buf_align = PAGE_SIZE;
 	kbuf.top_down = true;
-	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
 	ret = kexec_add_buffer(&kbuf);
 	if (ret)
 		goto out_free_fdt;
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 02bb2adb1fe2..7a320d9e2098 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -906,6 +906,10 @@ int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
 	u64 buf_min, buf_max;
 	int ret;
 
+	/* kbuf.mem is already pointing to validate memory hole */
+	if (kbuf->mem != KEXEC_BUF_MEM_UNKNOWN)
+		return 0;
+
 	/* Look up the exclude ranges list while locating the memory hole */
 	emem = &(kbuf->image->arch.exclude_ranges);
 	if (!(*emem) || ((*emem)->nr_ranges == 0)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 5/5] powerpc/kdump: export kexec crash FDT details via sysfs
  2022-02-21  8:46 [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Sourabh Jain
                   ` (3 preceding siblings ...)
  2022-02-21  8:46 ` [RFC PATCH 4/5] powerpc/kdump: enable kexec_file_load system call to use kexec crash FDT Sourabh Jain
@ 2022-02-21  8:46 ` Sourabh Jain
  2022-02-22  3:50 ` [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Baoquan He
  5 siblings, 0 replies; 8+ messages in thread
From: Sourabh Jain @ 2022-02-21  8:46 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, kexec, hbathini

Export kexec crash FDT address and size to /sys/kernel/kexec_crash_fdt
and /sys/kernel/kexec_crash_fdt_size files to enabled kexec tool to
utilize pre-allocated space kdump FDT.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/kexec/core_64.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 57afceee53a6..9bc9973ab3d3 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -677,6 +677,23 @@ static u32 get_kexec_crash_fdt_size(void)
 	return 1024*1024;
 }
 
+static ssize_t kexec_crash_fdt_show(struct kobject *kobj,
+				    struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%lx\n", __pa(kexec_crash_fdt));
+}
+static struct kobj_attribute kexec_crash_fdt_attr = __ATTR_RO(kexec_crash_fdt);
+
+static ssize_t kexec_crash_fdt_size_show(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 char *buf)
+{
+	return sprintf(buf, "%d\n", kexec_crash_fdt_size);
+}
+static struct kobj_attribute kexec_crash_fdt_size_attr = \
+			__ATTR_RO(kexec_crash_fdt_size);
+
+
 /* Setup the memory hole for kdump fdt in reserved region below RMA.
  */
 static int __init setup_kexec_crash_fdt(void)
@@ -707,6 +724,16 @@ static int __init setup_kexec_crash_fdt(void)
 	kexec_crash_fdt = __va(kbuf.mem);
 	kexec_crash_fdt_size = kbuf.memsz;
 
+	if (sysfs_create_file(kernel_kobj, &kexec_crash_fdt_attr.attr)) {
+		pr_err("unable to create kdump_fdt sysfs file\n.");
+		return -1;
+	}
+
+	if (sysfs_create_file(kernel_kobj, &kexec_crash_fdt_size_attr.attr)) {
+		pr_err("unable to cerate kexec_crash_fdt_size sysfs file.\n");
+		return -1;
+	}
+
 out:
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events
  2022-02-21  8:46 [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Sourabh Jain
                   ` (4 preceding siblings ...)
  2022-02-21  8:46 ` [RFC PATCH 5/5] powerpc/kdump: export kexec crash FDT details via sysfs Sourabh Jain
@ 2022-02-22  3:50 ` Baoquan He
  2022-02-24  7:51   ` Sourabh Jain
  5 siblings, 1 reply; 8+ messages in thread
From: Baoquan He @ 2022-02-22  3:50 UTC (permalink / raw)
  To: Sourabh Jain; +Cc: linuxppc-dev, kexec, hbathini

Hi,

On 02/21/22 at 02:16pm, Sourabh Jain wrote:
> On hotplug event (CPU/memory) the CPU information prepared for the kdump kernel
> becomes stale unless it is prepared again. To keep the CPU information
> up-to-date a kdump service reload is triggered via the udev rule.
> 
> The above approach has two downsides:
> 
> 1) The udev rules are prone to races if hotplug event is frequent. The time is
>    taken to settle down all the kdump service reload requested is significant
>    when multiple CPU/memory hotplug is performed at the same time. This creates
>    a window where kernel crash might not lead to successfully dump collection.
> 
> 2) Unnecessary CPU cycles are consumed to reload all the kdump components
>    including initrd, vmlinux, FDT, etc. whereas only one component needs to
>    update that is FDT.

I roughly went through this sereis, while haven't read the code
carefully. Seems the issue and the approach are similar to what below
patchset is doing. Do you notice below patchset from Oracle engineer?
And is there stuff the ppc code can be rebased on and reused?

[PATCH v4 00/10] crash: Kernel handling of CPU and memory hot un/plug
https://lore.kernel.org/all/20220209195706.51522-1-eric.devolder@oracle.com/T/#u
> 
> How this patch series solve the above issue?
> --------------------------------------------
> As mentioned above the only kexec segment that gets updated during
> the kdump service reload (due to hotplug event) is FDT. So, instead
> of re-creating the FDT on every hotplug event, it is just created
> once and updated on every hotplug event. This FDT is referred as kexec
> crash FDT.
> 
> 
> How kexec crash FDT is managed?
> -------------------------------
> During the kernel boot, a hole is allocated for kexec crash FDT in the kdump
> reserved region. On kdump service start a fresh copy of kdump FDT
> (created by kexec tool or kernel-based on which system call is used) is
> copied to the pre-allocated hole for kexec crash FDT. Once a kexec crash
> FDT is loaded all the subsequent updates needed due to CPU hot-add event
> can directly be done to kexec crash FDT without reloading all the kexec
> segment again. A hook is added on the CPU hot-add path to update the kexec
> crash FDT.
> 
> 
> How kexec crash FDT is accessed in kexec_load and kexec_file_load system call?
> ------------------------------------------------------------------------------
> Since kexec_file_load creates all kexec segments are prepared in the kernel,
> it can easily access the kexec crash FDT with help of two global variables,
> that holds the start address and the size of the kexec crash FDT.
> 
> In kexec_load system call, the kexec segments are prepared by the kexec tool in
> userspace. The start address and the size of kexec crash fdt is provided to
> userspace via two sysfs files /sys/kernel/kexec_crash_fdt and
> /sys/kernel/kexec_crash_fdt_size.
> 
> 
> A couple of minor changes are required to realise the benefit of the patch
> series:
> 
> - disalble the udev rule:
> 
>   comment out the below line in kdump udev rule file:
>   RHEL: /usr/lib/udev/rules.d/98-kexec.rules
>   # SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"
> 
> - kexec tool needs to be updated with patch for kexec_load system call
>   to work (not needed if -s option is used during kexec panic load):
> 
> ---
> From 37aa38713c163b31d9c6e80ddc059424c9fcd66d Mon Sep 17 00:00:00 2001
> From: Sourabh Jain <sourabhjain@linux.ibm.com>
> Date: Mon, 22 Nov 2021 14:12:52 +0530
> Subject: [PATCH] kexec/ppc64: use pre-allocated memory hole for kexec crash
>  FDT
> 
> Enabled kexec to use the per allocated memory hole for kexec crash FDT
> which is exported via /sys/kernel/kexec_crash_fdt and
> /sys/kernel/kexec_crash_fdt_size sysfs files. Using this pre-allocated
> memory hole for kdump fdt will allow the kernel to keep the kdump fdt
> up-to-date with the latest CPU information.
> 
> In case a pre-allocated memory hole is used for kdump fdt, the kdump fdt
> the segment is not included in SHA calculation because kdump fdt will be
> modified by the kernel.
> 
> To maintain the backward compatibility, we fall back to the old option of
> finding hole for kdump fdt segment if the pre-allocated buffer is not provided
> by the kernel.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> ---
>  kexec/arch/ppc64/kexec-elf-ppc64.c | 11 +++++--
>  kexec/arch/ppc64/kexec-ppc64.c     | 49 ++++++++++++++++++++++++++++++
>  kexec/kexec.c                      |  9 ++++++
>  kexec/kexec.h                      |  4 +++
>  4 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/kexec/arch/ppc64/kexec-elf-ppc64.c b/kexec/arch/ppc64/kexec-elf-ppc64.c
> index 695b8b0..8e66ef0 100644
> --- a/kexec/arch/ppc64/kexec-elf-ppc64.c
> +++ b/kexec/arch/ppc64/kexec-elf-ppc64.c
> @@ -329,8 +329,15 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, off_t len,
>  	if (result < 0)
>  		return result;
>  
> -	my_dt_offset = add_buffer(info, seg_buf, seg_size, seg_size,
> -				0, 0, max_addr, -1);
> +        if (kexec_crash_fdt) {
> +                my_dt_offset = kexec_crash_fdt;
> +                add_segment_phys_virt(info, seg_buf, seg_size,
> +				      my_dt_offset, kexec_crash_fdt_size, 0);
> +        }
> +        else {
> +                my_dt_offset = add_buffer(info, seg_buf, seg_size, seg_size,
> +                                          0, 0, max_addr, -1);
> +        }
>  
>  #ifdef NEED_RESERVE_DTB
>  	/* patch reserve map address for flattened device-tree
> diff --git a/kexec/arch/ppc64/kexec-ppc64.c b/kexec/arch/ppc64/kexec-ppc64.c
> index 5b17740..d4385bd 100644
> --- a/kexec/arch/ppc64/kexec-ppc64.c
> +++ b/kexec/arch/ppc64/kexec-ppc64.c
> @@ -24,6 +24,7 @@
>  #include <errno.h>
>  #include <stdint.h>
>  #include <string.h>
> +#include <fcntl.h>
>  #include <sys/stat.h>
>  #include <sys/types.h>
>  #include <dirent.h>
> @@ -373,6 +374,52 @@ void scan_reserved_ranges(unsigned long kexec_flags, int *range_index)
>  	*range_index = i;
>  }
>  
> +void get_kexec_crash_fdt_details(unsigned long kexec_flags)
> +{
> +	int fd, len;
> +	char buf[MAXBYTES] = { 0 };
> +
> +	const char * const kexec_fdt_sysfs = "/sys/kernel/kexec_crash_fdt";
> +	const char * const kexec_fdt_size_sysfs = "/sys/kernel/kexec_crash_fdt_size";
> +
> +        fd = open(kexec_fdt_sysfs, O_RDONLY);
> +        if (fd < 0)
> +                return;
> +
> +        len = read(fd, buf, MAXBYTES);
> +        if (len < 0)
> +                goto err_out;
> +
> +        kexec_crash_fdt = strtoul(buf, NULL, 16);
> +
> +	fd = open(kexec_fdt_size_sysfs, O_RDONLY);
> +	if (fd < 0)
> +		goto err_out;
> +
> +	len = read(fd, buf, MAXBYTES);
> +	if (len < 0)
> +		goto err_out;
> +
> +	kexec_crash_fdt_size = strtoul(buf, NULL, 10);
> +
> +        exclude_range[nr_exclude_ranges].start = kexec_crash_fdt;
> +        exclude_range[nr_exclude_ranges].end = kexec_crash_fdt + \
> +					       kexec_crash_fdt_size;
> +        nr_exclude_ranges++;
> +
> +        if (nr_exclude_ranges >= max_memory_ranges)
> +                realloc_memory_ranges();
> +
> +	goto out;
> +
> +err_out:
> +	kexec_crash_fdt = kexec_fdt_size = 0;
> +
> +out:
> +        close (fd);
> +        return;
> +}
> +
>  /* Return 0 if fname/value valid, -1 otherwise */
>  int get_devtree_value(const char *fname, unsigned long long *value)
>  {
> @@ -804,6 +851,8 @@ int setup_memory_ranges(unsigned long kexec_flags)
>  		goto out;
>  	if (get_devtree_details(kexec_flags))
>  		goto out;
> +	if (kexec_flags & KEXEC_ON_CRASH)
> +		get_kexec_crash_fdt_details(kexec_flags);
>  
>  	for (i = 0; i < nr_exclude_ranges; i++) {
>  		/* If first exclude range does not start with 0, include the
> diff --git a/kexec/kexec.c b/kexec/kexec.c
> index f63b36b..89283f7 100644
> --- a/kexec/kexec.c
> +++ b/kexec/kexec.c
> @@ -62,6 +62,10 @@ static unsigned long kexec_flags = 0;
>  /* Flags for kexec file (fd) based syscall */
>  static unsigned long kexec_file_flags = 0;
>  int kexec_debug = 0;
> +#if defined(__powerpc__) || defined(__powerpc64__)
> +uint64_t kexec_crash_fdt;
> +uint32_t kexec_cras_fdt_size;
> +#endif
>  
>  void dbgprint_mem_range(const char *prefix, struct memory_range *mr, int nr_mr)
>  {
> @@ -672,6 +676,11 @@ static void update_purgatory(struct kexec_info *info)
>  		if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
>  			continue;
>  		}
> +
> +#if defined(__powerpc__) || defined(__powerpc64__)
> +		if (kexec_crash_fdt && (unsigned long)info->segment[i].mem == kexec_crash_fdt)
> +			continue;
> +#endif
>  		sha256_update(&ctx, info->segment[i].buf,
>  			      info->segment[i].bufsz);
>  		nullsz = info->segment[i].memsz - info->segment[i].bufsz;
> diff --git a/kexec/kexec.h b/kexec/kexec.h
> index 595dd68..48e8b9f 100644
> --- a/kexec/kexec.h
> +++ b/kexec/kexec.h
> @@ -205,6 +205,10 @@ struct file_type {
>  
>  extern struct file_type file_type[];
>  extern int file_types;
> +#if defined(__powerpc__) || defined(__powerpc64__)
> +extern uint64_t fdt;
> +extern uint32_t fdt_size;
> +#endif
>  
>  #define OPT_HELP		'h'
>  #define OPT_VERSION		'v'
> -- 
> 2.34.1
> ---
> 
> 
> Sourabh Jain (5):
>   powerpc/kdump: export functions from file_load_64.c
>   powerpc/kdump: setup kexec crash FDT
>   powerpc/kdump: update kexec crash FDT on CPU hot add event
>   powerpc/kdump: enable kexec_file_load system call to use kexec crash
>     FDT
>   powerpc/kdump: export kexec crash FDT details via sysfs
> 
>  arch/powerpc/Kconfig                         |  11 +
>  arch/powerpc/include/asm/kexec.h             |  10 +
>  arch/powerpc/kexec/core_64.c                 | 318 +++++++++++++++++++
>  arch/powerpc/kexec/elf_64.c                  |  22 +-
>  arch/powerpc/kexec/file_load_64.c            | 239 +-------------
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |   7 +
>  6 files changed, 369 insertions(+), 238 deletions(-)
> 
> -- 
> 2.34.1
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events
  2022-02-22  3:50 ` [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Baoquan He
@ 2022-02-24  7:51   ` Sourabh Jain
  0 siblings, 0 replies; 8+ messages in thread
From: Sourabh Jain @ 2022-02-24  7:51 UTC (permalink / raw)
  To: Baoquan He; +Cc: linuxppc-dev, kexec, hbathini

Hello Baoquan,

> Hi,
>
> On 02/21/22 at 02:16pm, Sourabh Jain wrote:
>> On hotplug event (CPU/memory) the CPU information prepared for the kdump kernel
>> becomes stale unless it is prepared again. To keep the CPU information
>> up-to-date a kdump service reload is triggered via the udev rule.
>>
>> The above approach has two downsides:
>>
>> 1) The udev rules are prone to races if hotplug event is frequent. The time is
>>     taken to settle down all the kdump service reload requested is significant
>>     when multiple CPU/memory hotplug is performed at the same time. This creates
>>     a window where kernel crash might not lead to successfully dump collection.
>>
>> 2) Unnecessary CPU cycles are consumed to reload all the kdump components
>>     including initrd, vmlinux, FDT, etc. whereas only one component needs to
>>     update that is FDT.
> I roughly went through this sereis, while haven't read the code
> carefully. Seems the issue and the approach are similar to what below
> patchset is doing. Do you notice below patchset from Oracle engineer?
> And is there stuff the ppc code can be rebased on and reused?
>
> [PATCH v4 00/10] crash: Kernel handling of CPU and memory hot un/plug
> https://lore.kernel.org/all/20220209195706.51522-1-eric.devolder@oracle.com/T/#u

Thanks for the suggestion. I have seen earlier versions of this patch series
but since it did not have support for kexec_load system call we tried 
implementing
something from scratch.

Since Eric's added support for kexec_load and has a generic handler for 
CPU and
memory hotplug let me see if I can rebase my PowerPC changes on top of 
his patches.
The major difference across the distro is that on PowerPC we need to 
update FDT instead
of elfcorehdr on hotplug event.

Thanks,
Sourabh Jain


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-02-24  8:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21  8:46 [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Sourabh Jain
2022-02-21  8:46 ` [RFC PATCH 1/5] powerpc/kdump: export functions from file_load_64.c Sourabh Jain
2022-02-21  8:46 ` [RFC PATCH 2/5] powerpc/kdump: setup kexec crash FDT Sourabh Jain
2022-02-21  8:46 ` [RFC PATCH 3/5] powerpc/kdump: update kexec crash FDT on CPU hot add event Sourabh Jain
2022-02-21  8:46 ` [RFC PATCH 4/5] powerpc/kdump: enable kexec_file_load system call to use kexec crash FDT Sourabh Jain
2022-02-21  8:46 ` [RFC PATCH 5/5] powerpc/kdump: export kexec crash FDT details via sysfs Sourabh Jain
2022-02-22  3:50 ` [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events Baoquan He
2022-02-24  7:51   ` Sourabh Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).