linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel
@ 2023-03-12 18:11 Sourabh Jain
  2023-03-12 18:11 ` [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public Sourabh Jain
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Sourabh Jain @ 2023-03-12 18:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: eric.devolder, bhe, mahesh, kexec, ldufour, hbathini

The Problem:
============
Post hotplug/DLPAR events the capture kernel holds stale information about the
system. Dump collection with stale capture kernel might end up in dump capture
failure or an inaccurate dump collection.

Existing solution:
==================
The existing solution to keep the capture kernel up-to-date by monitoring
hotplug event via udev rule and trigger a full capture kernel reload for
every hotplug event.

Shortcomings:
------------------------------------------------
- Leaves a window where kernel crash might not lead to a successful dump
  collection.
- Reloading all kexec components for each hotplug is inefficient.
- udev rules are prone to races if hotplug events are frequent.

More about issues with an existing solution is posted here:
 - https://lkml.org/lkml/2020/12/14/532
 - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html

Proposed Solution:
==================
Instead of reloading all kexec segments on hotplug event, this patch series
focuses on updating only the relevant kexec segment. Once the kexec segments
are loaded in the kernel reserved area then an arch-specific hotplug handler
will update the relevant kexec segment based on hotplug event type.

Series Dependecies
==================
This patch series implements the crash hotplug handler on PowerPC. The generic
crash hotplug update is introduced by https://lkml.org/lkml/2023/3/6/1358 patch
series.

Git tree for testing:
=====================
The below git tree has this patch series applied on top of dependent patch
series.
https://github.com/sourabhjains/linux/tree/in-kernel-crash-update-v9

To realise the feature the kdump udev rules must be disabled for CPU/Memory
hotplug events. Comment out the below line in kdump udev rule file:

  RHEL: /usr/lib/udev/rules.d/98-kexec.rules

  	#SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"
	#SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem"
	#SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem"

  SLES: /usr/lib/kdump/70-kdump.rules

	#SUBSYSTEM=="memory", ACTION=="add|remove", GOTO="kdump_try_restart"
	#SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_try_restart"

Note: only kexec_file_load syscall will work. For kexec_load minor
changes are required in kexec tool.

---
Changelog:

v9:
  - Removed patch to prepare elfcorehdr crash notes for possible CPUs.
    The patch is moved to generic patch series that introduces generic
    infrastructure for in kernel crash update.
  - Removed patch to pass the hotplug action type to the arch crash
    hotplug handler function. The generic patch series has introduced
    the hotplug action type in kimage struct.
  - Add detail commit message for better understanding.

v8:
  - Restrict fdt_index initialization to machine_kexec_post_load
    it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour

  - Updated the logic to find the number of offline core. [6/8]

  - Changed the logic to find the elfcore program header to accommodate
    future memory ranges due memory hotplug events. [8/8]

v7
  - added a new config to configure this feature
  - pass hotplug action type to arch specific handler

v6
  - Added crash memory hotplug support

v5:
  - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU.
  - Move fdt segment identification for kexec_load case to load path
    instead of crash hotplug handler
  - Keep new attribute defined under kimage_arch to track FDT segment
    under CONFIG_HOTPLUG_CPU config.

v4:
  - Update the logic to find the additional space needed for hotadd CPUs post
    kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug
    support for kexec_file_load" patch to know more about the change.
  - Fix a couple of typo.
  - Replace pr_err to pr_info_once to warn user about memory hotplug
    support.
  - In crash hotplug handle exit the for loop if FDT segment is found.

v3
  - Move fdt_index and fdt_index_vaild variables to kimage_arch struct.
  - Rebase patche on top of https://lkml.org/lkml/2022/3/3/674 [v5]
  - Fixed warning reported by checpatch script

v2:
  - Use generic hotplug handler introduced by https://lkml.org/lkml/2022/2/9/1406, a
    significant change from v1.

Sourabh Jain (6):
  powerpc/kexec: turn some static helper functions public
  powerpc/crash: introduce a new config option CRASH_HOTPLUG
  powerpc/crash: add a new member to the kimage_arch struct
  powerpc/crash: add crash CPU hotplug support
  crash: forward memory_notify args to arch crash hotplug handler
  powerpc/kexec: add crash memory hotplug support

 arch/powerpc/Kconfig                    |  12 +
 arch/powerpc/include/asm/kexec.h        |  15 ++
 arch/powerpc/include/asm/kexec_ranges.h |   1 +
 arch/powerpc/kexec/core_64.c            | 322 ++++++++++++++++++++++++
 arch/powerpc/kexec/elf_64.c             |  13 +-
 arch/powerpc/kexec/file_load_64.c       | 212 ++++------------
 arch/powerpc/kexec/ranges.c             |  85 +++++++
 arch/x86/include/asm/kexec.h            |   2 +-
 arch/x86/kernel/crash.c                 |   3 +-
 include/linux/kexec.h                   |   2 +-
 kernel/crash_core.c                     |  14 +-
 11 files changed, 506 insertions(+), 175 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public
  2023-03-12 18:11 [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Sourabh Jain
@ 2023-03-12 18:11 ` Sourabh Jain
  2023-03-13 16:18   ` Laurent Dufour
  2023-03-12 18:11 ` [PATCH v9 2/6] powerpc/crash: introduce a new config option CRASH_HOTPLUG Sourabh Jain
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Sourabh Jain @ 2023-03-12 18:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: eric.devolder, bhe, mahesh, kexec, ldufour, hbathini

Move update_cpus_node and get_crash_memory_ranges functions from
kexec/file_load.c to kexec/core_64.c to make these functions usable
by other kexec compoenets.

Later in the series, these functions are utilized to do in-kernel update to
kexec segments on CPU/Memory hotplug events for both kexec_load and
kexec_file_load syscalls.

No functional change intended.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/include/asm/kexec.h  |   6 ++
 arch/powerpc/kexec/core_64.c      | 166 ++++++++++++++++++++++++++++++
 arch/powerpc/kexec/file_load_64.c | 162 -----------------------------
 3 files changed, 172 insertions(+), 162 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a1ddba01e7d13..8090ad7d97d9d 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -99,6 +99,12 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co
 
 void kexec_copy_flush(struct kimage *image);
 
+#ifdef CONFIG_PPC64
+struct crash_mem;
+int update_cpus_node(void *fdt);
+int get_crash_memory_ranges(struct crash_mem **mem_ranges);
+#endif
+
 #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
 #define crash_free_reserved_phys_range crash_free_reserved_phys_range
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index a79e28c91e2be..0b292f93a74cc 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -17,6 +17,8 @@
 #include <linux/cpu.h>
 #include <linux/hardirq.h>
 #include <linux/of.h>
+#include <linux/libfdt.h>
+#include <linux/memblock.h>
 
 #include <asm/page.h>
 #include <asm/current.h>
@@ -30,6 +32,8 @@
 #include <asm/hw_breakpoint.h>
 #include <asm/svm.h>
 #include <asm/ultravisor.h>
+#include <asm/kexec_ranges.h>
+#include <asm/crashdump-ppc64.h>
 
 int machine_kexec_prepare(struct kimage *image)
 {
@@ -377,6 +381,168 @@ void default_machine_kexec(struct kimage *image)
 	/* NOTREACHED */
 }
 
+/**
+ * get_crash_memory_ranges - Get crash memory ranges. This list includes
+ *                           first/crashing kernel's memory regions that
+ *                           would be exported via an elfcore.
+ * @mem_ranges:              Range list to add the memory ranges to.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int get_crash_memory_ranges(struct crash_mem **mem_ranges)
+{
+	phys_addr_t base, end;
+	struct crash_mem *tmem;
+	u64 i;
+	int ret;
+
+	for_each_mem_range(i, &base, &end) {
+		u64 size = end - base;
+
+		/* Skip backup memory region, which needs a separate entry */
+		if (base == BACKUP_SRC_START) {
+			if (size > BACKUP_SRC_SIZE) {
+				base = BACKUP_SRC_END + 1;
+				size -= BACKUP_SRC_SIZE;
+			} else
+				continue;
+		}
+
+		ret = add_mem_range(mem_ranges, base, size);
+		if (ret)
+			goto out;
+
+		/* Try merging adjacent ranges before reallocation attempt */
+		if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
+			sort_memory_ranges(*mem_ranges, true);
+	}
+
+	/* Reallocate memory ranges if there is no space to split ranges */
+	tmem = *mem_ranges;
+	if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
+		tmem = realloc_mem_ranges(mem_ranges);
+		if (!tmem)
+			goto out;
+	}
+
+	/* Exclude crashkernel region */
+	ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
+	if (ret)
+		goto out;
+
+	/*
+	 * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL
+	 *        regions are exported to save their context at the time of
+	 *        crash, they should actually be backed up just like the
+	 *        first 64K bytes of memory.
+	 */
+	ret = add_rtas_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	ret = add_opal_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	/* create a separate program header for the backup region */
+	ret = add_mem_range(mem_ranges, BACKUP_SRC_START, BACKUP_SRC_SIZE);
+	if (ret)
+		goto out;
+
+	sort_memory_ranges(*mem_ranges, false);
+out:
+	if (ret)
+		pr_err("Failed to setup crash memory ranges\n");
+	return ret;
+}
+
+/**
+ * add_node_props - Reads node properties from device node structure and add
+ *                  them to fdt.
+ * @fdt:            Flattened device tree of the kernel
+ * @node_offset:    offset of the node to add a property at
+ * @dn:             device node pointer
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
+{
+	int ret = 0;
+	struct property *pp;
+
+	if (!dn)
+		return -EINVAL;
+
+	for_each_property_of_node(dn, pp) {
+		ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length);
+		if (ret < 0) {
+			pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret));
+			return ret;
+		}
+	}
+	return ret;
+}
+
+/**
+ * update_cpus_node - Update cpus node of flattened device tree using of_root
+ *                    device node.
+ * @fdt:              Flattened device tree of the kernel.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int update_cpus_node(void *fdt)
+{
+	struct device_node *cpus_node, *dn;
+	int cpus_offset, cpus_subnode_offset, ret = 0;
+
+	cpus_offset = fdt_path_offset(fdt, "/cpus");
+	if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
+		pr_err("Malformed device tree: error reading /cpus node: %s\n",
+		       fdt_strerror(cpus_offset));
+		return cpus_offset;
+	}
+
+	if (cpus_offset > 0) {
+		ret = fdt_del_node(fdt, cpus_offset);
+		if (ret < 0) {
+			pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret));
+			return -EINVAL;
+		}
+	}
+
+	/* Add cpus node to fdt */
+	cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
+	if (cpus_offset < 0) {
+		pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset));
+		return -EINVAL;
+	}
+
+	/* Add cpus node properties */
+	cpus_node = of_find_node_by_path("/cpus");
+	ret = add_node_props(fdt, cpus_offset, cpus_node);
+	of_node_put(cpus_node);
+	if (ret < 0)
+		return ret;
+
+	/* Loop through all subnodes of cpus and add them to fdt */
+	for_each_node_by_type(dn, "cpu") {
+		cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name);
+		if (cpus_subnode_offset < 0) {
+			pr_err("Unable to add %s subnode: %s\n", dn->full_name,
+			       fdt_strerror(cpus_subnode_offset));
+			ret = cpus_subnode_offset;
+			goto out;
+		}
+
+		ret = add_node_props(fdt, cpus_subnode_offset, dn);
+		if (ret < 0)
+			goto out;
+	}
+out:
+	of_node_put(dn);
+	return ret;
+}
+
 #ifdef CONFIG_PPC_64S_HASH_MMU
 /* Values we need to export to the second kernel via the device tree. */
 static unsigned long htab_base;
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 110d28bede2a7..5b0b3f61e0e72 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -133,81 +133,6 @@ static int get_usable_memory_ranges(struct crash_mem **mem_ranges)
 	return ret;
 }
 
-/**
- * get_crash_memory_ranges - Get crash memory ranges. This list includes
- *                           first/crashing kernel's memory regions that
- *                           would be exported via an elfcore.
- * @mem_ranges:              Range list to add the memory ranges to.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
-{
-	phys_addr_t base, end;
-	struct crash_mem *tmem;
-	u64 i;
-	int ret;
-
-	for_each_mem_range(i, &base, &end) {
-		u64 size = end - base;
-
-		/* Skip backup memory region, which needs a separate entry */
-		if (base == BACKUP_SRC_START) {
-			if (size > BACKUP_SRC_SIZE) {
-				base = BACKUP_SRC_END + 1;
-				size -= BACKUP_SRC_SIZE;
-			} else
-				continue;
-		}
-
-		ret = add_mem_range(mem_ranges, base, size);
-		if (ret)
-			goto out;
-
-		/* Try merging adjacent ranges before reallocation attempt */
-		if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
-			sort_memory_ranges(*mem_ranges, true);
-	}
-
-	/* Reallocate memory ranges if there is no space to split ranges */
-	tmem = *mem_ranges;
-	if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
-		tmem = realloc_mem_ranges(mem_ranges);
-		if (!tmem)
-			goto out;
-	}
-
-	/* Exclude crashkernel region */
-	ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
-	if (ret)
-		goto out;
-
-	/*
-	 * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL
-	 *        regions are exported to save their context at the time of
-	 *        crash, they should actually be backed up just like the
-	 *        first 64K bytes of memory.
-	 */
-	ret = add_rtas_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	ret = add_opal_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	/* create a separate program header for the backup region */
-	ret = add_mem_range(mem_ranges, BACKUP_SRC_START, BACKUP_SRC_SIZE);
-	if (ret)
-		goto out;
-
-	sort_memory_ranges(*mem_ranges, false);
-out:
-	if (ret)
-		pr_err("Failed to setup crash memory ranges\n");
-	return ret;
-}
-
 /**
  * get_reserved_memory_ranges - Get reserve memory ranges. This list includes
  *                              memory regions that should be added to the
@@ -1018,93 +943,6 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 	return extra_size;
 }
 
-/**
- * add_node_props - Reads node properties from device node structure and add
- *                  them to fdt.
- * @fdt:            Flattened device tree of the kernel
- * @node_offset:    offset of the node to add a property at
- * @dn:             device node pointer
- *
- * Returns 0 on success, negative errno on error.
- */
-static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
-{
-	int ret = 0;
-	struct property *pp;
-
-	if (!dn)
-		return -EINVAL;
-
-	for_each_property_of_node(dn, pp) {
-		ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length);
-		if (ret < 0) {
-			pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret));
-			return ret;
-		}
-	}
-	return ret;
-}
-
-/**
- * update_cpus_node - Update cpus node of flattened device tree using of_root
- *                    device node.
- * @fdt:              Flattened device tree of the kernel.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int update_cpus_node(void *fdt)
-{
-	struct device_node *cpus_node, *dn;
-	int cpus_offset, cpus_subnode_offset, ret = 0;
-
-	cpus_offset = fdt_path_offset(fdt, "/cpus");
-	if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
-		pr_err("Malformed device tree: error reading /cpus node: %s\n",
-		       fdt_strerror(cpus_offset));
-		return cpus_offset;
-	}
-
-	if (cpus_offset > 0) {
-		ret = fdt_del_node(fdt, cpus_offset);
-		if (ret < 0) {
-			pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret));
-			return -EINVAL;
-		}
-	}
-
-	/* Add cpus node to fdt */
-	cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
-	if (cpus_offset < 0) {
-		pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset));
-		return -EINVAL;
-	}
-
-	/* Add cpus node properties */
-	cpus_node = of_find_node_by_path("/cpus");
-	ret = add_node_props(fdt, cpus_offset, cpus_node);
-	of_node_put(cpus_node);
-	if (ret < 0)
-		return ret;
-
-	/* Loop through all subnodes of cpus and add them to fdt */
-	for_each_node_by_type(dn, "cpu") {
-		cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name);
-		if (cpus_subnode_offset < 0) {
-			pr_err("Unable to add %s subnode: %s\n", dn->full_name,
-			       fdt_strerror(cpus_subnode_offset));
-			ret = cpus_subnode_offset;
-			goto out;
-		}
-
-		ret = add_node_props(fdt, cpus_subnode_offset, dn);
-		if (ret < 0)
-			goto out;
-	}
-out:
-	of_node_put(dn);
-	return ret;
-}
-
 static int copy_property(void *fdt, int node_offset, const struct device_node *dn,
 			 const char *propname)
 {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v9 2/6] powerpc/crash: introduce a new config option CRASH_HOTPLUG
  2023-03-12 18:11 [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Sourabh Jain
  2023-03-12 18:11 ` [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public Sourabh Jain
@ 2023-03-12 18:11 ` Sourabh Jain
  2023-03-13 15:46   ` Eric DeVolder
  2023-03-12 18:11 ` [PATCH v9 3/6] powerpc/crash: add a new member to the kimage_arch struct Sourabh Jain
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Sourabh Jain @ 2023-03-12 18:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: eric.devolder, bhe, mahesh, kexec, ldufour, hbathini

Due to CPU/Memory hotplug events the system resources changes. A similar
change should reflect in the loaded kdump kernel image that describes
the state of the CPU and memory of the running kernel.

If the kdump kernel image is not updated after the CPU or Memory hotplug
events and it tries to collect the dump with the stale system resource
data this might lead to dump collection failure or an inaccurate dump
collection.

The current method to keep the kdump kernel up to date is by triggering
reload (i.e unload and load) the entire kdump kernel image whenever a
CPU or Memory hotplug event is observed by udev in the userspace.
Reloading the complete kdump kernel image is an expensive task. It can
be easily avoided by doing the in-kernel updates to specific kdump
kernel image components which are responsible for describing CPU and
Memory resources of the running kernel to the kdump kernel.

The kernel changes related to in-kernel update to the kdump kernel image
on CPU/Memory hotplug events are kept under the CRASH_HOTPLUG config
option.

Later in the series, a powerpc crash hotplug handler is introduced to
update the kdump kernel image on CPU/Memory hotplug events. This arch
specific handler is trigger from a generic crash handler that registers
with the CPU and memory notifiers.

The CRASH_HOTPLUG config option is enabled by default.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/Kconfig | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a6c4407d3ec83..2f45b3f5175cb 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -681,6 +681,18 @@ config CRASH_DUMP
 	  The same kernel binary can be used as production kernel and dump
 	  capture kernel.
 
+config CRASH_HOTPLUG
+	bool "Update crash capture system on CPU/Memory hotplug event"
+	default y
+	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+	help
+	  In kernel update to relevant kexec segments due to change
+	  in the system configuration, rather reloading all the kexec
+	  segments again from userspace by monitoring CPU/Memory
+	  hotplug events in the userspace using udev.
+
+	  If unsure, say N.
+
 config FA_DUMP
 	bool "Firmware-assisted dump"
 	depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v9 3/6] powerpc/crash: add a new member to the kimage_arch struct
  2023-03-12 18:11 [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Sourabh Jain
  2023-03-12 18:11 ` [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public Sourabh Jain
  2023-03-12 18:11 ` [PATCH v9 2/6] powerpc/crash: introduce a new config option CRASH_HOTPLUG Sourabh Jain
@ 2023-03-12 18:11 ` Sourabh Jain
  2023-03-13 16:25   ` Laurent Dufour
  2023-03-12 18:11 ` [PATCH v9 4/6] powerpc/crash: add crash CPU hotplug support Sourabh Jain
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Sourabh Jain @ 2023-03-12 18:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: eric.devolder, bhe, mahesh, kexec, ldufour, hbathini

A new member "fdt_index" is added to the kimage_arch struct to hold
the index of the FDT (Flattened Device Tree) segment from the kexec
the segment array.

fdt_index will provide direct access to the FDT segment in the kexec
segment array after the kdump kernel is loaded.

The new attribute will be used in the arch crash hotplug handler
(added in upcoming patches) on every CPU and memory hotplug event.

The fdt_index is populated for both kexec_load and kexec_file_load
case.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/include/asm/kexec.h |  5 +++++
 arch/powerpc/kexec/core_64.c     | 31 +++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 8090ad7d97d9d..348eb96e8ca67 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -103,6 +103,8 @@ void kexec_copy_flush(struct kimage *image);
 struct crash_mem;
 int update_cpus_node(void *fdt);
 int get_crash_memory_ranges(struct crash_mem **mem_ranges);
+int machine_kexec_post_load(struct kimage *image);
+#define machine_kexec_post_load machine_kexec_post_load
 #endif
 
 #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
@@ -118,6 +120,9 @@ extern const struct kexec_file_ops kexec_elf64_ops;
 struct kimage_arch {
 	struct crash_mem *exclude_ranges;
 
+#if defined(CONFIG_CRASH_HOTPLUG)
+	int fdt_index;
+#endif
 	unsigned long backup_start;
 	void *backup_buf;
 	void *fdt;
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 0b292f93a74cc..531486c973988 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -77,6 +77,37 @@ int machine_kexec_prepare(struct kimage *image)
 	return 0;
 }
 
+int machine_kexec_post_load(struct kimage *kimage)
+{
+#if defined(CONFIG_CRASH_HOTPLUG)
+	int i;
+	void *ptr;
+	unsigned long mem;
+
+	/* Mark fdt_index invalid */
+	kimage->arch.fdt_index = -1;
+
+	/* fdt_index remains invalid if it is not a crash kernel load */
+	if (kimage->type != KEXEC_TYPE_CRASH)
+		return 0;
+	/*
+	 * Find the FDT segment index in kexec segment array and
+	 * assign it to kimage's member fdt_index to enable direct
+	 * access to FDT segment later on.
+	 */
+	for (i = 0; i < kimage->nr_segments; i++) {
+		mem = kimage->segment[i].mem;
+		ptr = __va(mem);
+
+		if (ptr && fdt_magic(ptr) == FDT_MAGIC) {
+			kimage->arch.fdt_index = i;
+			break;
+		}
+	}
+#endif
+	return 0;
+}
+
 /* Called during kexec sequence with MMU off */
 static notrace void copy_segments(unsigned long ind)
 {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v9 4/6] powerpc/crash: add crash CPU hotplug support
  2023-03-12 18:11 [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Sourabh Jain
                   ` (2 preceding siblings ...)
  2023-03-12 18:11 ` [PATCH v9 3/6] powerpc/crash: add a new member to the kimage_arch struct Sourabh Jain
@ 2023-03-12 18:11 ` Sourabh Jain
  2023-03-12 18:11 ` [PATCH v9 5/6] crash: forward memory_notify args to arch crash hotplug handler Sourabh Jain
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Sourabh Jain @ 2023-03-12 18:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: eric.devolder, bhe, mahesh, kexec, ldufour, hbathini

Introduce PowerPC crash hotplug handler to update the necessary kexec
segments in the kernel on CPU/Memory hotplug events. Currently, these
updates are done by monitoring CPU/Memory hotplug events in userspace.

A common crash hotplug handler is triggered from generic infrastructure
for both CPU/Memory hotplug events. But in this patch, crash updates are
handled only for CPU hotplug events. Support for the crash update on
memory hotplug events is added in upcoming patches.

The elfcorehdr segment is used to exchange the CPU and other
dump-related information between the kernels. Ideally, the elfcorehdr
segment needs to be recreated on CPU hotplug events to reflect the
changes. But on PowerPC, the elfcorehdr is built for possible CPUs hence
there is no need to update/recreate the elfcorehdr on CPU hotplug
events.

In addition to elfcorehdr, there is another kexec segment that holds CPU
data on PowerPC FDT (Flattened Device Tree). During the kdump kernel
boot, it is expected that the crashing CPU must be present in FDT, else
kdump kernel boot fails.

Now the only action needed on PowerPC to handle the crash CPU hotplug
events are to add hot added CPUs in the FDT segment to avoid the kdump
kernel boot failure in case the system crashes on hot added CPU. So for
the CPU hot add events, the FDT segment is updated with hot added CPU
and Since there is no need to remove the hot unplugged CPUs from the FDT
segment hence no action was taken on the CPU hot remove event in the
PowerPC arch crash hotplug handler.

To accommodate a growing number of CPUs, FDT is built with additional
buffer space to ensure that it can hold possible CPU nodes.

The changes done here will also work for the kexec_load system call
given that the kexec tool builds the FDT segment with additional space
to accommodate possible CPU nodes.

Since memory crash hotplug support is not there yet the crash hotplug
the handler simply warns the user and returns.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/include/asm/kexec.h  |  4 +++
 arch/powerpc/kexec/core_64.c      | 52 +++++++++++++++++++++++++++++++
 arch/powerpc/kexec/elf_64.c       | 13 +++++++-
 arch/powerpc/kexec/file_load_64.c | 14 +++++++++
 4 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 348eb96e8ca67..826671bd8ff97 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -105,6 +105,10 @@ int update_cpus_node(void *fdt);
 int get_crash_memory_ranges(struct crash_mem **mem_ranges);
 int machine_kexec_post_load(struct kimage *image);
 #define machine_kexec_post_load machine_kexec_post_load
+#if defined(CONFIG_CRASH_HOTPLUG)
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+#endif
 #endif
 
 #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 531486c973988..35de0aa0fe375 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -574,6 +574,58 @@ int update_cpus_node(void *fdt)
 	return ret;
 }
 
+#if defined(CONFIG_CRASH_HOTPLUG)
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+
+/**
+ * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
+ *                                necessary kexec segments based on the hotplug event.
+ * @image: the active struct kimage
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image)
+{
+	void *fdt;
+	int fdt_index;
+	unsigned int hp_action = image->hp_action;
+
+	/*
+	 * Since the hot-unplugged CPU is already part of crash FDT,
+	 * no action is needed for CPU remove case.
+	 */
+	if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+		return;
+
+	/* crash update on memory hotplug events is not supported yet */
+	if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
+		pr_info_once("Crash update is not supported for memory hotplug\n");
+		return;
+	}
+
+	fdt_index = image->arch.fdt_index;
+	/* Must have valid FDT index, it is been discovered during
+	 * kdump kernel load for both kexec_load and kexec_file_load
+	 * system call.
+	 */
+	if (fdt_index < 0) {
+		pr_err("Invalid FDT index");
+		return;
+	}
+
+	fdt = __va((void *)image->segment[fdt_index].mem);
+
+	/* Temporarily invalidate the crash image while it is replaced */
+	xchg(&kexec_crash_image, NULL);
+
+	/* update FDT to refelect changes in CPU resrouces */
+	if (update_cpus_node(fdt))
+		pr_err("Failed to update crash FDT");
+
+	/* The crash image is now valid once again */
+	xchg(&kexec_crash_image, image);
+}
+#endif
+
 #ifdef CONFIG_PPC_64S_HASH_MMU
 /* Values we need to export to the second kernel via the device tree. */
 static unsigned long htab_base;
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index eeb258002d1e0..486087e0c2ef8 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -30,6 +30,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
 			unsigned long cmdline_len)
 {
 	int ret;
+	bool do_pack_fdt = true;
 	unsigned long kernel_load_addr;
 	unsigned long initrd_load_addr = 0, fdt_load_addr;
 	void *fdt;
@@ -116,13 +117,23 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
 	if (ret)
 		goto out_free_fdt;
 
-	fdt_pack(fdt);
+#if defined(CONFIG_CRASH_HOTPLUG)
+	/*
+	 * Do not pack FDT, additional space is reserved to accommodate
+	 * possible CPU nodes which not yet present in the system.
+	 */
+	if (image->type == KEXEC_TYPE_CRASH)
+		do_pack_fdt = false;
+#endif
+	if (do_pack_fdt)
+		fdt_pack(fdt);
 
 	kbuf.buffer = fdt;
 	kbuf.bufsz = kbuf.memsz = fdt_totalsize(fdt);
 	kbuf.buf_align = PAGE_SIZE;
 	kbuf.top_down = true;
 	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
+
 	ret = kexec_add_buffer(&kbuf);
 	if (ret)
 		goto out_free_fdt;
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 5b0b3f61e0e72..3554168687869 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -908,6 +908,9 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 	unsigned int cpu_nodes, extra_size = 0;
 	struct device_node *dn;
 	u64 usm_entries;
+#if defined(CONFIG_CRASH_HOTPLUG)
+	unsigned int possible_cpu_nodes;
+#endif
 
 	// Budget some space for the password blob. There's already extra space
 	// for the key name
@@ -940,6 +943,17 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 	if (cpu_nodes > boot_cpu_node_count)
 		extra_size += (cpu_nodes - boot_cpu_node_count) * cpu_node_size();
 
+#if defined(CONFIG_CRASH_HOTPLUG)
+	/*
+	 * Make sure enough space is reserved to accommodate possible CPU nodes
+	 * in the crash FDT. This allows packing possible CPU nodes which are
+	 * not yet present in the system without regenerating the entire FDT.
+	 */
+	possible_cpu_nodes = num_possible_cpus() / threads_per_core;
+	if (image->type == KEXEC_TYPE_CRASH && possible_cpu_nodes > cpu_nodes)
+		extra_size += (possible_cpu_nodes - cpu_nodes) * cpu_node_size();
+#endif
+
 	return extra_size;
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v9 5/6] crash: forward memory_notify args to arch crash hotplug handler
  2023-03-12 18:11 [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Sourabh Jain
                   ` (3 preceding siblings ...)
  2023-03-12 18:11 ` [PATCH v9 4/6] powerpc/crash: add crash CPU hotplug support Sourabh Jain
@ 2023-03-12 18:11 ` Sourabh Jain
  2023-03-12 18:11 ` [PATCH v9 6/6] powerpc/kexec: add crash memory hotplug support Sourabh Jain
  2023-03-13 15:42 ` [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Eric DeVolder
  6 siblings, 0 replies; 15+ messages in thread
From: Sourabh Jain @ 2023-03-12 18:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: eric.devolder, bhe, mahesh, kexec, ldufour, hbathini

On PowePC memblock regions are used to prepare elfcorehdr which
describes the memory regions of the running kernel to the kdump kernel.
Since the notifier used for the memory hotplug crash handler gets
initiated before the update of the memblock region happens (as depicted
below) for the memory remove case, the newly prepared elfcorehdr still
holds the old memory regions. If the elfcorehdr is prepared with stale
memblock regions then the newly prepared elfcorehdr will still be
holding stale memory regions. And dump collection with stale elfcorehdr
will lead to dump collection failure or incomplete dump collection.

The sequence of actions done on PowerPC when an LMB memory hot removed:

 Initiate memory hot remove
          |
          v
 offline pages
          |
          v
 initiate memory notify call
 chain for MEM_OFFLINE event.  <---> Prepare new elfcorehdr and update
                                     the kdump kernel image.
          |
          v
 update memblock regions

Such challenges only exist for memory remove case. For the memory add
case the memory regions are updated first and then memory notify calls
the arch crash hotplug handler to update the elfcorehdr.

This patch passes additional information about the hot removed LMB to
the arch crash hotplug handler in the form of memory_notify object.

How passing memory_notify to arch crash hotplug handler will help?

memory_notify holds the start PFN and page count of the hot removed
memory. With that base address and the size of the hot removed memory
can be calculated and same can be used to avoid adding hot removed
memory region to get added in the elfcorehdr.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/include/asm/kexec.h |  2 +-
 arch/powerpc/kexec/core_64.c     |  3 ++-
 arch/x86/include/asm/kexec.h     |  2 +-
 arch/x86/kernel/crash.c          |  3 ++-
 include/linux/kexec.h            |  2 +-
 kernel/crash_core.c              | 14 +++++++-------
 6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 826671bd8ff97..92f7a03c4af46 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -106,7 +106,7 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges);
 int machine_kexec_post_load(struct kimage *image);
 #define machine_kexec_post_load machine_kexec_post_load
 #if defined(CONFIG_CRASH_HOTPLUG)
-void arch_crash_handle_hotplug_event(struct kimage *image);
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 #endif
 #endif
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 35de0aa0fe375..863d9261908cd 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -582,8 +582,9 @@ int update_cpus_node(void *fdt)
  * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
  *                                necessary kexec segments based on the hotplug event.
  * @image: the active struct kimage
+ * @arg: struct memory_notify data handler for memory hotplug and NULL for CPU hotplug
  */
-void arch_crash_handle_hotplug_event(struct kimage *image)
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
 	void *fdt;
 	int fdt_index;
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 1bc852ce347d4..70c3b23b468b6 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -213,7 +213,7 @@ extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
 #ifdef CONFIG_CRASH_HOTPLUG
-void arch_crash_handle_hotplug_event(struct kimage *image);
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8f7b0b7fba909..a57c05b43c740 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -445,11 +445,12 @@ int crash_load_segments(struct kimage *image)
 /**
  * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
  * @image: the active struct kimage
+ * @arg: struct memory_notify data handler for memory hotplug and NULL for CPU hotplug
  *
  * The new elfcorehdr is prepared in a kernel buffer, and then it is
  * written on top of the existing/old elfcorehdr.
  */
-void arch_crash_handle_hotplug_event(struct kimage *image)
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
 	void *elfbuf = NULL, *old_elfcorehdr;
 	unsigned long nr_mem_ranges;
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0ac41f48de0b1..69765e6a92d0d 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -506,7 +506,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
 #endif
 
 #ifndef arch_crash_handle_hotplug_event
-static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { }
 #endif
 
 #ifndef crash_hotplug_cpu_support
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index d5cd0f67e8567..e3c2e791aed87 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -718,7 +718,7 @@ subsys_initcall(crash_save_vmcoreinfo_init);
  * list of segments it checks (since the elfcorehdr changes and thus
  * would require an update to purgatory itself to update the digest).
  */
-static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg)
 {
 	/* Obtain lock while changing crash information */
 	if (kexec_trylock()) {
@@ -769,7 +769,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
 			image->hp_action = hp_action;
 
 			/* Now invoke arch-specific update handler */
-			arch_crash_handle_hotplug_event(image);
+			arch_crash_handle_hotplug_event(image, arg);
 
 			/* No longer handling a hotplug event */
 			image->hp_action = KEXEC_CRASH_HP_NONE;
@@ -785,17 +785,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
 	}
 }
 
-static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg)
 {
 	switch (val) {
 	case MEM_ONLINE:
 		crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
-			KEXEC_CRASH_HP_INVALID_CPU);
+			KEXEC_CRASH_HP_INVALID_CPU, arg);
 		break;
 
 	case MEM_OFFLINE:
 		crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
-			KEXEC_CRASH_HP_INVALID_CPU);
+			KEXEC_CRASH_HP_INVALID_CPU, arg);
 		break;
 	}
 	return NOTIFY_OK;
@@ -808,13 +808,13 @@ static struct notifier_block crash_memhp_nb = {
 
 static int crash_cpuhp_online(unsigned int cpu)
 {
-	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
+	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu, NULL);
 	return 0;
 }
 
 static int crash_cpuhp_offline(unsigned int cpu)
 {
-	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu, NULL);
 	return 0;
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v9 6/6] powerpc/kexec: add crash memory hotplug support
  2023-03-12 18:11 [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Sourabh Jain
                   ` (4 preceding siblings ...)
  2023-03-12 18:11 ` [PATCH v9 5/6] crash: forward memory_notify args to arch crash hotplug handler Sourabh Jain
@ 2023-03-12 18:11 ` Sourabh Jain
  2023-03-13 15:42 ` [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Eric DeVolder
  6 siblings, 0 replies; 15+ messages in thread
From: Sourabh Jain @ 2023-03-12 18:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: eric.devolder, bhe, mahesh, kexec, ldufour, hbathini

Extend PowerPC arch crash hotplug handler to support memory hotplug
events. Since elfcorehdr is used to exchange the memory info between the
kernels hence it needs to be recreated to reflect the changes due to
memory hotplug events.

The way memory hotplug events are handled on PowerPC and the notifier call
chain used in generic code to trigger the arch crash handler, the process
to recreate the elfcorehdr is different for memory add and remove events.

In the memory hot remove case, the memory region is first marked offline
then the notifier call chain is triggered (same is used to initiate arch
crash hotplug handler) and at last the memblock structure is updated.
Whereas in the memory hot add case, memblock regions is updated before
invoking the notifier call chain.

On PowerPC, memblock regions list is used to prepare the elfcorehdr. In
case of memory hot remove the memblock structure is updated after the
arch crash hotplug handler is triggered, hence an additional step is
taken to ensure that memory ranges used to prepare elfcorehdr do not
include hot removed memory. Whereas no such extra step is taken for
while preparing elfcorehdr for memory add case.

The hot removed memory is explicitly removed from the crash memory ranges
and then elfcorehdr is prepared.

When memory is hot removed it possible that memory regions count may
increase. So to accommodate a growing number of memory regions, the
elfcorehdr kexec segment is built with additional buffer space.

The changes done here will also work for the kexec_load system call given
that the kexec tool builds the elfcoredhr with additional space to
accommodate future memory regions as it is done for kexec_file_load
system call in the kernel.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/include/asm/kexec_ranges.h |  1 +
 arch/powerpc/kexec/core_64.c            | 76 +++++++++++++++++++++-
 arch/powerpc/kexec/file_load_64.c       | 36 ++++++++++-
 arch/powerpc/kexec/ranges.c             | 85 +++++++++++++++++++++++++
 4 files changed, 194 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h
index f83866a19e870..802abf580cf0f 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,6 +7,7 @@
 void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
 struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
 int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
 int add_tce_mem_ranges(struct crash_mem **mem_ranges);
 int add_initrd_mem_range(struct crash_mem **mem_ranges);
 #ifdef CONFIG_PPC_64S_HASH_MMU
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 863d9261908cd..5453d4d98209d 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -19,6 +19,7 @@
 #include <linux/of.h>
 #include <linux/libfdt.h>
 #include <linux/memblock.h>
+#include <linux/memory.h>
 
 #include <asm/page.h>
 #include <asm/current.h>
@@ -578,6 +579,76 @@ int update_cpus_node(void *fdt)
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
 
+/**
+ * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with old
+ *			       elfcorehdr in the kexec segment array.
+ * @image: the active struct kimage
+ * @arg: struct memory_notify data handler
+ */
+static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *mn)
+{
+	int ret;
+	struct crash_mem *cmem = NULL;
+	struct kexec_segment *ksegment;
+	void *ptr, *mem, *elfbuf = NULL;
+	unsigned long elfsz, memsz, base_addr, size;
+
+	ksegment = &image->segment[image->elfcorehdr_index];
+	mem = (void *) ksegment->mem;
+	memsz = ksegment->memsz;
+
+	ret = get_crash_memory_ranges(&cmem);
+	if (ret) {
+		pr_err("Failed to get crash mem range\n");
+		return;
+	}
+
+	/*
+	 * The hot unplugged memory is not yet removed from crash memory
+	 * ranges, remove it here.
+	 */
+	if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) {
+		base_addr = PFN_PHYS(mn->start_pfn);
+		size = mn->nr_pages * PAGE_SIZE;
+		ret = remove_mem_range(&cmem, base_addr, size);
+		if (ret) {
+			pr_err("Failed to remove hot-unplugged from crash memory ranges.\n");
+			return;
+		}
+	}
+
+	ret = crash_prepare_elf64_headers(cmem, false, &elfbuf, &elfsz);
+	if (ret) {
+		pr_err("Failed to prepare elf header\n");
+		return;
+	}
+
+	/*
+	 * It is unlikely that kernel hit this because elfcorehdr kexec
+	 * segment (memsz) is built with addition space to accommodate growing
+	 * number of crash memory ranges while loading the kdump kernel. It is
+	 * Just to avoid any unforeseen case.
+	 */
+	if (elfsz > memsz) {
+		pr_err("Updated crash elfcorehdr elfsz %lu > memsz %lu", elfsz, memsz);
+		goto out;
+	}
+
+	ptr = __va(mem);
+	if (ptr) {
+		/* Temporarily invalidate the crash image while it is replaced */
+		xchg(&kexec_crash_image, NULL);
+
+		/* Replace the old elfcorehdr with newly prepared elfcorehdr */
+		memcpy((void *)ptr, elfbuf, elfsz);
+
+		/* The crash image is now valid once again */
+		xchg(&kexec_crash_image, image);
+	}
+out:
+	vfree(elfbuf);
+}
+
 /**
  * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
  *                                necessary kexec segments based on the hotplug event.
@@ -588,6 +659,7 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
 	void *fdt;
 	int fdt_index;
+	struct memory_notify *mn;
 	unsigned int hp_action = image->hp_action;
 
 	/*
@@ -597,9 +669,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 	if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
 		return;
 
-	/* crash update on memory hotplug events is not supported yet */
 	if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
-		pr_info_once("Crash update is not supported for memory hotplug\n");
+		mn = (struct memory_notify *) arg;
+		update_crash_elfcorehdr(image, mn);
 		return;
 	}
 
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 3554168687869..db7ba1c8c5e0b 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -21,6 +21,8 @@
 #include <linux/memblock.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/elf.h>
+
 #include <asm/setup.h>
 #include <asm/drmem.h>
 #include <asm/firmware.h>
@@ -707,6 +709,30 @@ static void update_backup_region_phdr(struct kimage *image, Elf64_Ehdr *ehdr)
 	}
 }
 
+/* get_max_phdr - Find the total number of Phdr needed to represent the
+ *		  max memory in the kdump elfcorehdr.
+ *
+ * @cmem: crash memory ranges in the system.
+ */
+static int get_max_phdr(struct crash_mem *cmem)
+{
+	int max_lmb;
+
+	/* In the worst case, a Phdr is needed for every other LMB to be represented
+	 * as an individual crash range.
+	 */
+	max_lmb = memory_hotplug_max() / 2 * drmem_lmb_size();
+
+	/* Do not cross the Phdr max limit of the elf header.
+	 * Avoid counting Phdr for crash ranges (cmem->nr_ranges) which
+	 * are already part of elfcorehdr.
+	 */
+	if (max_lmb > PN_XNUM)
+		return PN_XNUM - cmem->nr_ranges;
+
+	return max_lmb - cmem->nr_ranges;
+}
+
 /**
  * load_elfcorehdr_segment - Setup crash memory ranges and initialize elfcorehdr
  *                           segment needed to load kdump kernel.
@@ -738,7 +764,13 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
 
 	kbuf->buffer = headers;
 	kbuf->mem = KEXEC_BUF_MEM_UNKNOWN;
-	kbuf->bufsz = kbuf->memsz = headers_sz;
+	kbuf->bufsz = headers_sz;
+/* Additional buffer space to accommodate future memory ranges */
+#if defined(CONFIG_MEMORY_HOTPLUG)
+	kbuf->memsz = headers_sz + get_max_phdr(cmem) * sizeof(Elf64_Phdr);
+#else
+	kbuf->memsz = headers_sz;
+#endif
 	kbuf->top_down = false;
 
 	ret = kexec_add_buffer(kbuf);
@@ -748,7 +780,7 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
 	}
 
 	image->elf_load_addr = kbuf->mem;
-	image->elf_headers_sz = headers_sz;
+	image->elf_headers_sz = kbuf->memsz;
 	image->elf_headers = headers;
 out:
 	kfree(cmem);
diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
index 5fc53a5fcfdf6..d8007363cdc11 100644
--- a/arch/powerpc/kexec/ranges.c
+++ b/arch/powerpc/kexec/ranges.c
@@ -234,6 +234,91 @@ int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
 	return __add_mem_range(mem_ranges, base, size);
 }
 
+/**
+ * remove_mem_range - Removes the given memory range from the range list.
+ * @mem_ranges:    Range list to remove the memory range to.
+ * @base:          Base address of the range to remove.
+ * @size:          Size of the memory range to remove.
+ *
+ * (Re)allocates memory, if needed.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
+{
+	u64 end;
+	int ret = 0;
+	unsigned int i;
+	u64 mstart, mend;
+	struct crash_mem *mem_rngs = *mem_ranges;
+
+	if (!size)
+		return 0;
+
+	/*
+	 * Memory range are stored as start and end address, use
+	 * the same format to do remove operation.
+	 */
+	end = base + size - 1;
+
+	for (i = 0; i < mem_rngs->nr_ranges; i++) {
+		mstart = mem_rngs->ranges[i].start;
+		mend = mem_rngs->ranges[i].end;
+
+		/*
+		 * Memory range to remove is not part of this range entry
+		 * in the memory range list
+		 */
+		if (!(base >= mstart && end <= mend))
+			continue;
+
+		/*
+		 * Memory range to remove is equivalent to this entry in the
+		 * memory range list. Remove the range entry from the list.
+		 */
+		if (base == mstart && end == mend) {
+			for (; i < mem_rngs->nr_ranges - 1; i++) {
+				mem_rngs->ranges[i].start = mem_rngs->ranges[i+1].start;
+				mem_rngs->ranges[i].end = mem_rngs->ranges[i+1].end;
+			}
+			mem_rngs->nr_ranges--;
+			goto out;
+		}
+		/*
+		 * Start address of the memory range to remove and the
+		 * current memory range entry in the list is same. Just
+		 * move the start address of the current memory range
+		 * entry in the list to end + 1.
+		 */
+		else if (base == mstart) {
+			mem_rngs->ranges[i].start = end + 1;
+			goto out;
+		}
+		/*
+		 * End address of the memory range to remove and the
+		 * current memory range entry in the list is same.
+		 * Just move the end address of the current memory
+		 * range entry in the list to base - 1.
+		 */
+		else if (end == mend)  {
+			mem_rngs->ranges[i].end = base - 1;
+			goto out;
+		}
+		/*
+		 * Memory range to remove is not at the edge of current
+		 * memory range entry. Split the current memory entry into
+		 * two half.
+		 */
+		else {
+			mem_rngs->ranges[i].end = base - 1;
+			size = mem_rngs->ranges[i].end - end;
+			ret = add_mem_range(mem_ranges, end + 1, size);
+		}
+	}
+out:
+	return ret;
+}
+
 /**
  * add_tce_mem_ranges - Adds tce-table range to the given memory ranges list.
  * @mem_ranges:         Range list to add the memory range(s) to.
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel
  2023-03-12 18:11 [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Sourabh Jain
                   ` (5 preceding siblings ...)
  2023-03-12 18:11 ` [PATCH v9 6/6] powerpc/kexec: add crash memory hotplug support Sourabh Jain
@ 2023-03-13 15:42 ` Eric DeVolder
  2023-03-14  3:38   ` Sourabh Jain
  6 siblings, 1 reply; 15+ messages in thread
From: Eric DeVolder @ 2023-03-13 15:42 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe; +Cc: mahesh, ldufour, kexec, bhe, hbathini



On 3/12/23 13:11, Sourabh Jain wrote:
> The Problem:
> ============
> Post hotplug/DLPAR events the capture kernel holds stale information about the
> system. Dump collection with stale capture kernel might end up in dump capture
> failure or an inaccurate dump collection.
> 
> Existing solution:
> ==================
> The existing solution to keep the capture kernel up-to-date by monitoring
> hotplug event via udev rule and trigger a full capture kernel reload for
> every hotplug event.
> 
> Shortcomings:
> ------------------------------------------------
> - Leaves a window where kernel crash might not lead to a successful dump
>    collection.
> - Reloading all kexec components for each hotplug is inefficient.
> - udev rules are prone to races if hotplug events are frequent.
> 
> More about issues with an existing solution is posted here:
>   - https://lkml.org/lkml/2020/12/14/532
>   - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html
> 
> Proposed Solution:
> ==================
> Instead of reloading all kexec segments on hotplug event, this patch series
> focuses on updating only the relevant kexec segment. Once the kexec segments
> are loaded in the kernel reserved area then an arch-specific hotplug handler
> will update the relevant kexec segment based on hotplug event type.
> 
> Series Dependecies
> ==================
> This patch series implements the crash hotplug handler on PowerPC. The generic
> crash hotplug update is introduced by https://lkml.org/lkml/2023/3/6/1358 patch
> series.
> 
> Git tree for testing:
> =====================
> The below git tree has this patch series applied on top of dependent patch
> series.
> https://github.com/sourabhjains/linux/tree/in-kernel-crash-update-v9
> 
> To realise the feature the kdump udev rules must be disabled for CPU/Memory
> hotplug events. Comment out the below line in kdump udev rule file:
> 
>    RHEL: /usr/lib/udev/rules.d/98-kexec.rules
> 
>    	#SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"
> 	#SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem"
> 	#SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem"
> 
>    SLES: /usr/lib/kdump/70-kdump.rules
> 
> 	#SUBSYSTEM=="memory", ACTION=="add|remove", GOTO="kdump_try_restart"
> 	#SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_try_restart"
> 

Sourabh,

The above seems to contradict what I anticipate to be udev rules changes once the base series is 
accepted. Specifically I'm suggesting the following:

  - Prevent udev from updating kdump crash kernel on hot un/plug changes.
    Add the following as the first lines to the RHEL udev rule file
    /usr/lib/udev/rules.d/98-kexec.rules:

    # The kernel handles updates to crash elfcorehdr for cpu and memory changes
    SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
    SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

    With this changeset applied, the two rules evaluate to false for
    cpu and memory change events and thus skip the userspace
    unload-then-reload of kdump.

The above additions allow distros to deploy the udev rule immediately and work properly even if the 
base patchset isn't yet merged, or down the road, enabled/configured.

Am I missing something such that your recommendation is different than mine?

> Note: only kexec_file_load syscall will work. For kexec_load minor
> changes are required in kexec tool.

Will this be the same/similar change as I have posted, or do you envision something different?

Thanks,
eric

> 
> ---
> Changelog:
> 
> v9:
>    - Removed patch to prepare elfcorehdr crash notes for possible CPUs.
>      The patch is moved to generic patch series that introduces generic
>      infrastructure for in kernel crash update.
>    - Removed patch to pass the hotplug action type to the arch crash
>      hotplug handler function. The generic patch series has introduced
>      the hotplug action type in kimage struct.
>    - Add detail commit message for better understanding.
> 
> v8:
>    - Restrict fdt_index initialization to machine_kexec_post_load
>      it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour
> 
>    - Updated the logic to find the number of offline core. [6/8]
> 
>    - Changed the logic to find the elfcore program header to accommodate
>      future memory ranges due memory hotplug events. [8/8]
> 
> v7
>    - added a new config to configure this feature
>    - pass hotplug action type to arch specific handler
> 
> v6
>    - Added crash memory hotplug support
> 
> v5:
>    - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU.
>    - Move fdt segment identification for kexec_load case to load path
>      instead of crash hotplug handler
>    - Keep new attribute defined under kimage_arch to track FDT segment
>      under CONFIG_HOTPLUG_CPU config.
> 
> v4:
>    - Update the logic to find the additional space needed for hotadd CPUs post
>      kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug
>      support for kexec_file_load" patch to know more about the change.
>    - Fix a couple of typo.
>    - Replace pr_err to pr_info_once to warn user about memory hotplug
>      support.
>    - In crash hotplug handle exit the for loop if FDT segment is found.
> 
> v3
>    - Move fdt_index and fdt_index_vaild variables to kimage_arch struct.
>    - Rebase patche on top of https://lkml.org/lkml/2022/3/3/674 [v5]
>    - Fixed warning reported by checpatch script
> 
> v2:
>    - Use generic hotplug handler introduced by https://lkml.org/lkml/2022/2/9/1406, a
>      significant change from v1.
> 
> Sourabh Jain (6):
>    powerpc/kexec: turn some static helper functions public
>    powerpc/crash: introduce a new config option CRASH_HOTPLUG
>    powerpc/crash: add a new member to the kimage_arch struct
>    powerpc/crash: add crash CPU hotplug support
>    crash: forward memory_notify args to arch crash hotplug handler
>    powerpc/kexec: add crash memory hotplug support
> 
>   arch/powerpc/Kconfig                    |  12 +
>   arch/powerpc/include/asm/kexec.h        |  15 ++
>   arch/powerpc/include/asm/kexec_ranges.h |   1 +
>   arch/powerpc/kexec/core_64.c            | 322 ++++++++++++++++++++++++
>   arch/powerpc/kexec/elf_64.c             |  13 +-
>   arch/powerpc/kexec/file_load_64.c       | 212 ++++------------
>   arch/powerpc/kexec/ranges.c             |  85 +++++++
>   arch/x86/include/asm/kexec.h            |   2 +-
>   arch/x86/kernel/crash.c                 |   3 +-
>   include/linux/kexec.h                   |   2 +-
>   kernel/crash_core.c                     |  14 +-
>   11 files changed, 506 insertions(+), 175 deletions(-)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 2/6] powerpc/crash: introduce a new config option CRASH_HOTPLUG
  2023-03-12 18:11 ` [PATCH v9 2/6] powerpc/crash: introduce a new config option CRASH_HOTPLUG Sourabh Jain
@ 2023-03-13 15:46   ` Eric DeVolder
  2023-03-14  5:17     ` Sourabh Jain
  0 siblings, 1 reply; 15+ messages in thread
From: Eric DeVolder @ 2023-03-13 15:46 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe; +Cc: mahesh, ldufour, kexec, bhe, hbathini



On 3/12/23 13:11, Sourabh Jain wrote:
> Due to CPU/Memory hotplug events the system resources changes. A similar
> change should reflect in the loaded kdump kernel image that describes
> the state of the CPU and memory of the running kernel.
> 
> If the kdump kernel image is not updated after the CPU or Memory hotplug
> events and it tries to collect the dump with the stale system resource
> data this might lead to dump collection failure or an inaccurate dump
> collection.
> 
> The current method to keep the kdump kernel up to date is by triggering
> reload (i.e unload and load) the entire kdump kernel image whenever a
> CPU or Memory hotplug event is observed by udev in the userspace.
> Reloading the complete kdump kernel image is an expensive task. It can
> be easily avoided by doing the in-kernel updates to specific kdump
> kernel image components which are responsible for describing CPU and
> Memory resources of the running kernel to the kdump kernel.
> 
> The kernel changes related to in-kernel update to the kdump kernel image
> on CPU/Memory hotplug events are kept under the CRASH_HOTPLUG config
> option.
> 
> Later in the series, a powerpc crash hotplug handler is introduced to
> update the kdump kernel image on CPU/Memory hotplug events. This arch
> specific handler is trigger from a generic crash handler that registers
> with the CPU and memory notifiers.
> 
> The CRASH_HOTPLUG config option is enabled by default.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> ---
>   arch/powerpc/Kconfig | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index a6c4407d3ec83..2f45b3f5175cb 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -681,6 +681,18 @@ config CRASH_DUMP
>   	  The same kernel binary can be used as production kernel and dump
>   	  capture kernel.
>   
> +config CRASH_HOTPLUG
> +	bool "Update crash capture system on CPU/Memory hotplug event"
Fwiw, online/offline changes also flow through this infrastructure...
eric

> +	default y
> +	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
> +	help
> +	  In kernel update to relevant kexec segments due to change
> +	  in the system configuration, rather reloading all the kexec
> +	  segments again from userspace by monitoring CPU/Memory
> +	  hotplug events in the userspace using udev.
> +
> +	  If unsure, say N.
> +
>   config FA_DUMP
>   	bool "Firmware-assisted dump"
>   	depends on PPC64 && (PPC_RTAS || PPC_POWERNV)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public
  2023-03-12 18:11 ` [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public Sourabh Jain
@ 2023-03-13 16:18   ` Laurent Dufour
  2023-03-14  3:41     ` Sourabh Jain
  0 siblings, 1 reply; 15+ messages in thread
From: Laurent Dufour @ 2023-03-13 16:18 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe
  Cc: mahesh, eric.devolder, kexec, bhe, hbathini

On 12/03/2023 19:11:49, Sourabh Jain wrote:
> Move update_cpus_node and get_crash_memory_ranges functions from
> kexec/file_load.c to kexec/core_64.c to make these functions usable
        file_load_64.c
> by other kexec compoenets.
                 components
> 
> Later in the series, these functions are utilized to do in-kernel update to
> kexec segments on CPU/Memory hotplug events for both kexec_load and
> kexec_file_load syscalls.
> 
> No functional change intended.
>

FWIW, despite the 2 minor typos above,

Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>

> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/kexec.h  |   6 ++
>  arch/powerpc/kexec/core_64.c      | 166 ++++++++++++++++++++++++++++++
>  arch/powerpc/kexec/file_load_64.c | 162 -----------------------------
>  3 files changed, 172 insertions(+), 162 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> index a1ddba01e7d13..8090ad7d97d9d 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -99,6 +99,12 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co
>  
>  void kexec_copy_flush(struct kimage *image);
>  
> +#ifdef CONFIG_PPC64
> +struct crash_mem;
> +int update_cpus_node(void *fdt);
> +int get_crash_memory_ranges(struct crash_mem **mem_ranges);
> +#endif
> +
>  #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
>  void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
>  #define crash_free_reserved_phys_range crash_free_reserved_phys_range
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index a79e28c91e2be..0b292f93a74cc 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -17,6 +17,8 @@
>  #include <linux/cpu.h>
>  #include <linux/hardirq.h>
>  #include <linux/of.h>
> +#include <linux/libfdt.h>
> +#include <linux/memblock.h>
>  
>  #include <asm/page.h>
>  #include <asm/current.h>
> @@ -30,6 +32,8 @@
>  #include <asm/hw_breakpoint.h>
>  #include <asm/svm.h>
>  #include <asm/ultravisor.h>
> +#include <asm/kexec_ranges.h>
> +#include <asm/crashdump-ppc64.h>
>  
>  int machine_kexec_prepare(struct kimage *image)
>  {
> @@ -377,6 +381,168 @@ void default_machine_kexec(struct kimage *image)
>  	/* NOTREACHED */
>  }
>  
> +/**
> + * get_crash_memory_ranges - Get crash memory ranges. This list includes
> + *                           first/crashing kernel's memory regions that
> + *                           would be exported via an elfcore.
> + * @mem_ranges:              Range list to add the memory ranges to.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +int get_crash_memory_ranges(struct crash_mem **mem_ranges)
> +{
> +	phys_addr_t base, end;
> +	struct crash_mem *tmem;
> +	u64 i;
> +	int ret;
> +
> +	for_each_mem_range(i, &base, &end) {
> +		u64 size = end - base;
> +
> +		/* Skip backup memory region, which needs a separate entry */
> +		if (base == BACKUP_SRC_START) {
> +			if (size > BACKUP_SRC_SIZE) {
> +				base = BACKUP_SRC_END + 1;
> +				size -= BACKUP_SRC_SIZE;
> +			} else
> +				continue;
> +		}
> +
> +		ret = add_mem_range(mem_ranges, base, size);
> +		if (ret)
> +			goto out;
> +
> +		/* Try merging adjacent ranges before reallocation attempt */
> +		if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
> +			sort_memory_ranges(*mem_ranges, true);
> +	}
> +
> +	/* Reallocate memory ranges if there is no space to split ranges */
> +	tmem = *mem_ranges;
> +	if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
> +		tmem = realloc_mem_ranges(mem_ranges);
> +		if (!tmem)
> +			goto out;
> +	}
> +
> +	/* Exclude crashkernel region */
> +	ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
> +	if (ret)
> +		goto out;
> +
> +	/*
> +	 * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL
> +	 *        regions are exported to save their context at the time of
> +	 *        crash, they should actually be backed up just like the
> +	 *        first 64K bytes of memory.
> +	 */
> +	ret = add_rtas_mem_range(mem_ranges);
> +	if (ret)
> +		goto out;
> +
> +	ret = add_opal_mem_range(mem_ranges);
> +	if (ret)
> +		goto out;
> +
> +	/* create a separate program header for the backup region */
> +	ret = add_mem_range(mem_ranges, BACKUP_SRC_START, BACKUP_SRC_SIZE);
> +	if (ret)
> +		goto out;
> +
> +	sort_memory_ranges(*mem_ranges, false);
> +out:
> +	if (ret)
> +		pr_err("Failed to setup crash memory ranges\n");
> +	return ret;
> +}
> +
> +/**
> + * add_node_props - Reads node properties from device node structure and add
> + *                  them to fdt.
> + * @fdt:            Flattened device tree of the kernel
> + * @node_offset:    offset of the node to add a property at
> + * @dn:             device node pointer
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
> +{
> +	int ret = 0;
> +	struct property *pp;
> +
> +	if (!dn)
> +		return -EINVAL;
> +
> +	for_each_property_of_node(dn, pp) {
> +		ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length);
> +		if (ret < 0) {
> +			pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret));
> +			return ret;
> +		}
> +	}
> +	return ret;
> +}
> +
> +/**
> + * update_cpus_node - Update cpus node of flattened device tree using of_root
> + *                    device node.
> + * @fdt:              Flattened device tree of the kernel.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +int update_cpus_node(void *fdt)
> +{
> +	struct device_node *cpus_node, *dn;
> +	int cpus_offset, cpus_subnode_offset, ret = 0;
> +
> +	cpus_offset = fdt_path_offset(fdt, "/cpus");
> +	if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
> +		pr_err("Malformed device tree: error reading /cpus node: %s\n",
> +		       fdt_strerror(cpus_offset));
> +		return cpus_offset;
> +	}
> +
> +	if (cpus_offset > 0) {
> +		ret = fdt_del_node(fdt, cpus_offset);
> +		if (ret < 0) {
> +			pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret));
> +			return -EINVAL;
> +		}
> +	}
> +
> +	/* Add cpus node to fdt */
> +	cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
> +	if (cpus_offset < 0) {
> +		pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset));
> +		return -EINVAL;
> +	}
> +
> +	/* Add cpus node properties */
> +	cpus_node = of_find_node_by_path("/cpus");
> +	ret = add_node_props(fdt, cpus_offset, cpus_node);
> +	of_node_put(cpus_node);
> +	if (ret < 0)
> +		return ret;
> +
> +	/* Loop through all subnodes of cpus and add them to fdt */
> +	for_each_node_by_type(dn, "cpu") {
> +		cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name);
> +		if (cpus_subnode_offset < 0) {
> +			pr_err("Unable to add %s subnode: %s\n", dn->full_name,
> +			       fdt_strerror(cpus_subnode_offset));
> +			ret = cpus_subnode_offset;
> +			goto out;
> +		}
> +
> +		ret = add_node_props(fdt, cpus_subnode_offset, dn);
> +		if (ret < 0)
> +			goto out;
> +	}
> +out:
> +	of_node_put(dn);
> +	return ret;
> +}
> +
>  #ifdef CONFIG_PPC_64S_HASH_MMU
>  /* Values we need to export to the second kernel via the device tree. */
>  static unsigned long htab_base;
> diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
> index 110d28bede2a7..5b0b3f61e0e72 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -133,81 +133,6 @@ static int get_usable_memory_ranges(struct crash_mem **mem_ranges)
>  	return ret;
>  }
>  
> -/**
> - * get_crash_memory_ranges - Get crash memory ranges. This list includes
> - *                           first/crashing kernel's memory regions that
> - *                           would be exported via an elfcore.
> - * @mem_ranges:              Range list to add the memory ranges to.
> - *
> - * Returns 0 on success, negative errno on error.
> - */
> -static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
> -{
> -	phys_addr_t base, end;
> -	struct crash_mem *tmem;
> -	u64 i;
> -	int ret;
> -
> -	for_each_mem_range(i, &base, &end) {
> -		u64 size = end - base;
> -
> -		/* Skip backup memory region, which needs a separate entry */
> -		if (base == BACKUP_SRC_START) {
> -			if (size > BACKUP_SRC_SIZE) {
> -				base = BACKUP_SRC_END + 1;
> -				size -= BACKUP_SRC_SIZE;
> -			} else
> -				continue;
> -		}
> -
> -		ret = add_mem_range(mem_ranges, base, size);
> -		if (ret)
> -			goto out;
> -
> -		/* Try merging adjacent ranges before reallocation attempt */
> -		if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
> -			sort_memory_ranges(*mem_ranges, true);
> -	}
> -
> -	/* Reallocate memory ranges if there is no space to split ranges */
> -	tmem = *mem_ranges;
> -	if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
> -		tmem = realloc_mem_ranges(mem_ranges);
> -		if (!tmem)
> -			goto out;
> -	}
> -
> -	/* Exclude crashkernel region */
> -	ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
> -	if (ret)
> -		goto out;
> -
> -	/*
> -	 * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL
> -	 *        regions are exported to save their context at the time of
> -	 *        crash, they should actually be backed up just like the
> -	 *        first 64K bytes of memory.
> -	 */
> -	ret = add_rtas_mem_range(mem_ranges);
> -	if (ret)
> -		goto out;
> -
> -	ret = add_opal_mem_range(mem_ranges);
> -	if (ret)
> -		goto out;
> -
> -	/* create a separate program header for the backup region */
> -	ret = add_mem_range(mem_ranges, BACKUP_SRC_START, BACKUP_SRC_SIZE);
> -	if (ret)
> -		goto out;
> -
> -	sort_memory_ranges(*mem_ranges, false);
> -out:
> -	if (ret)
> -		pr_err("Failed to setup crash memory ranges\n");
> -	return ret;
> -}
> -
>  /**
>   * get_reserved_memory_ranges - Get reserve memory ranges. This list includes
>   *                              memory regions that should be added to the
> @@ -1018,93 +943,6 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
>  	return extra_size;
>  }
>  
> -/**
> - * add_node_props - Reads node properties from device node structure and add
> - *                  them to fdt.
> - * @fdt:            Flattened device tree of the kernel
> - * @node_offset:    offset of the node to add a property at
> - * @dn:             device node pointer
> - *
> - * Returns 0 on success, negative errno on error.
> - */
> -static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
> -{
> -	int ret = 0;
> -	struct property *pp;
> -
> -	if (!dn)
> -		return -EINVAL;
> -
> -	for_each_property_of_node(dn, pp) {
> -		ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length);
> -		if (ret < 0) {
> -			pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret));
> -			return ret;
> -		}
> -	}
> -	return ret;
> -}
> -
> -/**
> - * update_cpus_node - Update cpus node of flattened device tree using of_root
> - *                    device node.
> - * @fdt:              Flattened device tree of the kernel.
> - *
> - * Returns 0 on success, negative errno on error.
> - */
> -static int update_cpus_node(void *fdt)
> -{
> -	struct device_node *cpus_node, *dn;
> -	int cpus_offset, cpus_subnode_offset, ret = 0;
> -
> -	cpus_offset = fdt_path_offset(fdt, "/cpus");
> -	if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
> -		pr_err("Malformed device tree: error reading /cpus node: %s\n",
> -		       fdt_strerror(cpus_offset));
> -		return cpus_offset;
> -	}
> -
> -	if (cpus_offset > 0) {
> -		ret = fdt_del_node(fdt, cpus_offset);
> -		if (ret < 0) {
> -			pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret));
> -			return -EINVAL;
> -		}
> -	}
> -
> -	/* Add cpus node to fdt */
> -	cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
> -	if (cpus_offset < 0) {
> -		pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset));
> -		return -EINVAL;
> -	}
> -
> -	/* Add cpus node properties */
> -	cpus_node = of_find_node_by_path("/cpus");
> -	ret = add_node_props(fdt, cpus_offset, cpus_node);
> -	of_node_put(cpus_node);
> -	if (ret < 0)
> -		return ret;
> -
> -	/* Loop through all subnodes of cpus and add them to fdt */
> -	for_each_node_by_type(dn, "cpu") {
> -		cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name);
> -		if (cpus_subnode_offset < 0) {
> -			pr_err("Unable to add %s subnode: %s\n", dn->full_name,
> -			       fdt_strerror(cpus_subnode_offset));
> -			ret = cpus_subnode_offset;
> -			goto out;
> -		}
> -
> -		ret = add_node_props(fdt, cpus_subnode_offset, dn);
> -		if (ret < 0)
> -			goto out;
> -	}
> -out:
> -	of_node_put(dn);
> -	return ret;
> -}
> -
>  static int copy_property(void *fdt, int node_offset, const struct device_node *dn,
>  			 const char *propname)
>  {


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 3/6] powerpc/crash: add a new member to the kimage_arch struct
  2023-03-12 18:11 ` [PATCH v9 3/6] powerpc/crash: add a new member to the kimage_arch struct Sourabh Jain
@ 2023-03-13 16:25   ` Laurent Dufour
  2023-03-14  5:01     ` Sourabh Jain
  0 siblings, 1 reply; 15+ messages in thread
From: Laurent Dufour @ 2023-03-13 16:25 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe
  Cc: mahesh, eric.devolder, kexec, bhe, hbathini

On 12/03/2023 19:11:51, Sourabh Jain wrote:
> A new member "fdt_index" is added to the kimage_arch struct to hold
> the index of the FDT (Flattened Device Tree) segment from the kexec
> the segment array.
> 
> fdt_index will provide direct access to the FDT segment in the kexec
> segment array after the kdump kernel is loaded.
> 
> The new attribute will be used in the arch crash hotplug handler
> (added in upcoming patches) on every CPU and memory hotplug event.
> 
> The fdt_index is populated for both kexec_load and kexec_file_load
> case.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/kexec.h |  5 +++++
>  arch/powerpc/kexec/core_64.c     | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> index 8090ad7d97d9d..348eb96e8ca67 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -103,6 +103,8 @@ void kexec_copy_flush(struct kimage *image);
>  struct crash_mem;
>  int update_cpus_node(void *fdt);
>  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
> +int machine_kexec_post_load(struct kimage *image);
> +#define machine_kexec_post_load machine_kexec_post_load

Minor comment, when CONFIG_CRASH_HOTPLUG is not set the function is simply
returning 0, why not defining it has an inline in that case?

>  #endif
>  
>  #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
> @@ -118,6 +120,9 @@ extern const struct kexec_file_ops kexec_elf64_ops;
>  struct kimage_arch {
>  	struct crash_mem *exclude_ranges;
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +	int fdt_index;
> +#endif
>  	unsigned long backup_start;
>  	void *backup_buf;
>  	void *fdt;
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 0b292f93a74cc..531486c973988 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -77,6 +77,37 @@ int machine_kexec_prepare(struct kimage *image)
>  	return 0;
>  }
>  
> +int machine_kexec_post_load(struct kimage *kimage)
> +{
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +	int i;
> +	void *ptr;
> +	unsigned long mem;
> +
> +	/* Mark fdt_index invalid */
> +	kimage->arch.fdt_index = -1;

Why is that not done in the series introducing the generic
crash hotplug update, in do_kimage_alloc_init() ?
This way there is a guarantee that the field will not be used while set by
default to 0.

> +
> +	/* fdt_index remains invalid if it is not a crash kernel load */
> +	if (kimage->type != KEXEC_TYPE_CRASH)
> +		return 0;
> +	/*
> +	 * Find the FDT segment index in kexec segment array and
> +	 * assign it to kimage's member fdt_index to enable direct
> +	 * access to FDT segment later on.
> +	 */
> +	for (i = 0; i < kimage->nr_segments; i++) {
> +		mem = kimage->segment[i].mem;
> +		ptr = __va(mem);
> +
> +		if (ptr && fdt_magic(ptr) == FDT_MAGIC) {
> +			kimage->arch.fdt_index = i;
> +			break;
> +		}
> +	}
> +#endif
> +	return 0;
> +}
> +
>  /* Called during kexec sequence with MMU off */
>  static notrace void copy_segments(unsigned long ind)
>  {


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel
  2023-03-13 15:42 ` [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Eric DeVolder
@ 2023-03-14  3:38   ` Sourabh Jain
  0 siblings, 0 replies; 15+ messages in thread
From: Sourabh Jain @ 2023-03-14  3:38 UTC (permalink / raw)
  To: Eric DeVolder, linuxppc-dev; +Cc: bhe, mahesh, kexec, ldufour, hbathini

Hello Eric,

On 13/03/23 21:12, Eric DeVolder wrote:
>
>
> On 3/12/23 13:11, Sourabh Jain wrote:
>> The Problem:
>> ============
>> Post hotplug/DLPAR events the capture kernel holds stale information 
>> about the
>> system. Dump collection with stale capture kernel might end up in 
>> dump capture
>> failure or an inaccurate dump collection.
>>
>> Existing solution:
>> ==================
>> The existing solution to keep the capture kernel up-to-date by 
>> monitoring
>> hotplug event via udev rule and trigger a full capture kernel reload for
>> every hotplug event.
>>
>> Shortcomings:
>> ------------------------------------------------
>> - Leaves a window where kernel crash might not lead to a successful dump
>>    collection.
>> - Reloading all kexec components for each hotplug is inefficient.
>> - udev rules are prone to races if hotplug events are frequent.
>>
>> More about issues with an existing solution is posted here:
>>   - https://lkml.org/lkml/2020/12/14/532
>>   - 
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html
>>
>> Proposed Solution:
>> ==================
>> Instead of reloading all kexec segments on hotplug event, this patch 
>> series
>> focuses on updating only the relevant kexec segment. Once the kexec 
>> segments
>> are loaded in the kernel reserved area then an arch-specific hotplug 
>> handler
>> will update the relevant kexec segment based on hotplug event type.
>>
>> Series Dependecies
>> ==================
>> This patch series implements the crash hotplug handler on PowerPC. 
>> The generic
>> crash hotplug update is introduced by 
>> https://lkml.org/lkml/2023/3/6/1358 patch
>> series.
>>
>> Git tree for testing:
>> =====================
>> The below git tree has this patch series applied on top of dependent 
>> patch
>> series.
>> https://github.com/sourabhjains/linux/tree/in-kernel-crash-update-v9
>>
>> To realise the feature the kdump udev rules must be disabled for 
>> CPU/Memory
>> hotplug events. Comment out the below line in kdump udev rule file:
>>
>>    RHEL: /usr/lib/udev/rules.d/98-kexec.rules
>>
>>        #SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"
>>     #SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem"
>>     #SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem"
>>
>>    SLES: /usr/lib/kdump/70-kdump.rules
>>
>>     #SUBSYSTEM=="memory", ACTION=="add|remove", GOTO="kdump_try_restart"
>>     #SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_try_restart"
>>
>
> Sourabh,
>
> The above seems to contradict what I anticipate to be udev rules 
> changes once the base series is accepted. Specifically I'm suggesting 
> the following:
>
>  - Prevent udev from updating kdump crash kernel on hot un/plug changes.
>    Add the following as the first lines to the RHEL udev rule file
>    /usr/lib/udev/rules.d/98-kexec.rules:
>
>    # The kernel handles updates to crash elfcorehdr for cpu and memory 
> changes
>    SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>    SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", 
> GOTO="kdump_reload_end"
>
>    With this changeset applied, the two rules evaluate to false for
>    cpu and memory change events and thus skip the userspace
>    unload-then-reload of kdump.
>
> The above additions allow distros to deploy the udev rule immediately 
> and work properly even if the base patchset isn't yet merged, or down 
> the road, enabled/configured.
>
> Am I missing something such that your recommendation is different than 
> mine? ]

It is just for the test I have been suggesting to disable the udev 
rules, but your udev rules changes is the way forward.

I will use the above changes to control kdump service reload.

>> Note: only kexec_file_load syscall will work. For kexec_load minor
>> changes are required in kexec tool.
>
> Will this be the same/similar change as I have posted, or do you 
> envision something different?

I think the generic changes will be same. I might need to add some 
PowerPC specific changes to
make sure elfcorehdr and FDT kexec segment should have additional buffer 
space to accommodate
additional memory ranges.

Thanks for the suggestion, I will align the PowerPC kexec tool changes 
with your changes.

- Souarbh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public
  2023-03-13 16:18   ` Laurent Dufour
@ 2023-03-14  3:41     ` Sourabh Jain
  0 siblings, 0 replies; 15+ messages in thread
From: Sourabh Jain @ 2023-03-14  3:41 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev, mpe
  Cc: mahesh, eric.devolder, kexec, bhe, hbathini


On 13/03/23 21:48, Laurent Dufour wrote:
> On 12/03/2023 19:11:49, Sourabh Jain wrote:
>> Move update_cpus_node and get_crash_memory_ranges functions from
>> kexec/file_load.c to kexec/core_64.c to make these functions usable
>          file_load_64.c
>> by other kexec compoenets.
>                   components]
Thanks for points it out. I will fix them in next version.
>> Later in the series, these functions are utilized to do in-kernel update to
>> kexec segments on CPU/Memory hotplug events for both kexec_load and
>> kexec_file_load syscalls.
>>
>> No functional change intended.
>>
> FWIW, despite the 2 minor typos above,
>
> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Thanks for the review Laurent.

- Sourabh Jain

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 3/6] powerpc/crash: add a new member to the kimage_arch struct
  2023-03-13 16:25   ` Laurent Dufour
@ 2023-03-14  5:01     ` Sourabh Jain
  0 siblings, 0 replies; 15+ messages in thread
From: Sourabh Jain @ 2023-03-14  5:01 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev, mpe
  Cc: mahesh, eric.devolder, kexec, bhe, hbathini


On 13/03/23 21:55, Laurent Dufour wrote:
> On 12/03/2023 19:11:51, Sourabh Jain wrote:
>> A new member "fdt_index" is added to the kimage_arch struct to hold
>> the index of the FDT (Flattened Device Tree) segment from the kexec
>> the segment array.
>>
>> fdt_index will provide direct access to the FDT segment in the kexec
>> segment array after the kdump kernel is loaded.
>>
>> The new attribute will be used in the arch crash hotplug handler
>> (added in upcoming patches) on every CPU and memory hotplug event.
>>
>> The fdt_index is populated for both kexec_load and kexec_file_load
>> case.
>>
>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/kexec.h |  5 +++++
>>   arch/powerpc/kexec/core_64.c     | 31 +++++++++++++++++++++++++++++++
>>   2 files changed, 36 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
>> index 8090ad7d97d9d..348eb96e8ca67 100644
>> --- a/arch/powerpc/include/asm/kexec.h
>> +++ b/arch/powerpc/include/asm/kexec.h
>> @@ -103,6 +103,8 @@ void kexec_copy_flush(struct kimage *image);
>>   struct crash_mem;
>>   int update_cpus_node(void *fdt);
>>   int get_crash_memory_ranges(struct crash_mem **mem_ranges);
>> +int machine_kexec_post_load(struct kimage *image);
>> +#define machine_kexec_post_load machine_kexec_post_load
> Minor comment, when CONFIG_CRASH_HOTPLUG is not set the function is simply
> returning 0, why not defining it has an inline in that case?

We can, but if the initialization of fdt_index is taken care during the 
allocation
of kimage struct, I will move the fdt_index discovery logic to arch crash
hotplug handler. More on fdt_index initialization in the next comment.


>>   #endif
>>   
>>   #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
>> @@ -118,6 +120,9 @@ extern const struct kexec_file_ops kexec_elf64_ops;
>>   struct kimage_arch {
>>   	struct crash_mem *exclude_ranges;
>>   
>> +#if defined(CONFIG_CRASH_HOTPLUG)
>> +	int fdt_index;
>> +#endif
>>   	unsigned long backup_start;
>>   	void *backup_buf;
>>   	void *fdt;
>> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
>> index 0b292f93a74cc..531486c973988 100644
>> --- a/arch/powerpc/kexec/core_64.c
>> +++ b/arch/powerpc/kexec/core_64.c
>> @@ -77,6 +77,37 @@ int machine_kexec_prepare(struct kimage *image)
>>   	return 0;
>>   }
>>   
>> +int machine_kexec_post_load(struct kimage *kimage)
>> +{
>> +#if defined(CONFIG_CRASH_HOTPLUG)
>> +	int i;
>> +	void *ptr;
>> +	unsigned long mem;
>> +
>> +	/* Mark fdt_index invalid */
>> +	kimage->arch.fdt_index = -1;
> Why is that not done in the series introducing the generic
> crash hotplug update, in do_kimage_alloc_init() ?

do_kimage_alloc_init is a generic function where as fdt_index is a ppc 
specific
attribute. If fdt_index is initialized in do_kimage_alloc_init then 
other architectures
will have build issues.

> This way there is a guarantee that the field will not be used while set by
> default to 0.

I agree that until the kernel hits the machine_kexec_post_load function 
on the
kdump kernel load path there is no way to identify the fdt_index is holding
a valid index or not.

Since there is no consumer of kimage's fdt_index attribute from the 
point of its
allocation to until it is initialized, we don't have any impact of 
fdt_index holding
value 0 (which is not valid) for sometime.

But still we can do few things to allow fdt_index attribute consumers to 
check
the sanity of fdt_index.

1. Introduce arch specific function call to initialize kimage_arch 
struct (basically fdt_index for now).
and call it inside do_kimage_alloc_init and initialize the fdt_index 
with -1 there.

2. Add another attribute in kimage_arch struct to indicate the sanity of the
fdt_index attribute. For example fdt_index_valid, if this holds zero then
the fdt_index holds invalid index. (looks inefficient to me)

Not sure is it worth doing but please let me know your opinion.

Thanks,
Sourabh Jain


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 2/6] powerpc/crash: introduce a new config option CRASH_HOTPLUG
  2023-03-13 15:46   ` Eric DeVolder
@ 2023-03-14  5:17     ` Sourabh Jain
  0 siblings, 0 replies; 15+ messages in thread
From: Sourabh Jain @ 2023-03-14  5:17 UTC (permalink / raw)
  To: Eric DeVolder, linuxppc-dev, mpe; +Cc: ldufour, kexec, hbathini, bhe


On 13/03/23 21:16, Eric DeVolder wrote:
>
>
> On 3/12/23 13:11, Sourabh Jain wrote:
>> Due to CPU/Memory hotplug events the system resources changes. A similar
>> change should reflect in the loaded kdump kernel image that describes
>> the state of the CPU and memory of the running kernel.
>>
>> If the kdump kernel image is not updated after the CPU or Memory hotplug
>> events and it tries to collect the dump with the stale system resource
>> data this might lead to dump collection failure or an inaccurate dump
>> collection.
>>
>> The current method to keep the kdump kernel up to date is by triggering
>> reload (i.e unload and load) the entire kdump kernel image whenever a
>> CPU or Memory hotplug event is observed by udev in the userspace.
>> Reloading the complete kdump kernel image is an expensive task. It can
>> be easily avoided by doing the in-kernel updates to specific kdump
>> kernel image components which are responsible for describing CPU and
>> Memory resources of the running kernel to the kdump kernel.
>>
>> The kernel changes related to in-kernel update to the kdump kernel image
>> on CPU/Memory hotplug events are kept under the CRASH_HOTPLUG config
>> option.
>>
>> Later in the series, a powerpc crash hotplug handler is introduced to
>> update the kdump kernel image on CPU/Memory hotplug events. This arch
>> specific handler is trigger from a generic crash handler that registers
>> with the CPU and memory notifiers.
>>
>> The CRASH_HOTPLUG config option is enabled by default.
>>
>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>> ---
>>   arch/powerpc/Kconfig | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index a6c4407d3ec83..2f45b3f5175cb 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -681,6 +681,18 @@ config CRASH_DUMP
>>         The same kernel binary can be used as production kernel and dump
>>         capture kernel.
>>   +config CRASH_HOTPLUG
>> +    bool "Update crash capture system on CPU/Memory hotplug event"
> Fwiw, online/offline changes also flow through this infrastructure...
> eric
Yes I will update the CONFIG summary and commit message to convey
the same.

Thanks for the review.

Thanks,
Sourabh Jain

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-14  5:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-12 18:11 [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Sourabh Jain
2023-03-12 18:11 ` [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public Sourabh Jain
2023-03-13 16:18   ` Laurent Dufour
2023-03-14  3:41     ` Sourabh Jain
2023-03-12 18:11 ` [PATCH v9 2/6] powerpc/crash: introduce a new config option CRASH_HOTPLUG Sourabh Jain
2023-03-13 15:46   ` Eric DeVolder
2023-03-14  5:17     ` Sourabh Jain
2023-03-12 18:11 ` [PATCH v9 3/6] powerpc/crash: add a new member to the kimage_arch struct Sourabh Jain
2023-03-13 16:25   ` Laurent Dufour
2023-03-14  5:01     ` Sourabh Jain
2023-03-12 18:11 ` [PATCH v9 4/6] powerpc/crash: add crash CPU hotplug support Sourabh Jain
2023-03-12 18:11 ` [PATCH v9 5/6] crash: forward memory_notify args to arch crash hotplug handler Sourabh Jain
2023-03-12 18:11 ` [PATCH v9 6/6] powerpc/kexec: add crash memory hotplug support Sourabh Jain
2023-03-13 15:42 ` [PATCH v9 0/6] PowerPC: in kernel handling of CPU hotplug events for crash kernel Eric DeVolder
2023-03-14  3:38   ` Sourabh Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).