All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v22 0/8] crash: Kernel handling of CPU and memory hot un/plug
@ 2023-05-03 22:41 ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

Once the kdump service is loaded, if changes to CPUs or memory occur,
either by hot un/plug or off/onlining, the crash elfcorehdr must also
be updated.

The elfcorehdr describes to kdump the CPUs and memory in the system,
and any inaccuracies can result in a vmcore with missing CPU context
or memory regions.

The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (eg. kernel, initrd, boot_params, purgatory and
elfcorehdr) by the userspace kexec utility. In the original post I
outlined the significant performance problems related to offloading
this activity to userspace.

This patchset introduces a generic crash handler that registers with
the CPU and memory notifiers. Upon CPU or memory changes, from either
hot un/plug or off/onlining, this generic handler is invoked and
performs important housekeeping, for example obtaining the appropriate
lock, and then invokes an architecture specific handler to do the
appropriate elfcorehdr update.

Note the description in patch 'crash: change crash_prepare_elf64_headers()
to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
enables further optimizations related to CPU plug/unplug/online/offline
performance of elfcorehdr updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory; thus no involvement
with userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

 - Prevent udev from updating kdump crash kernel on hot un/plug changes.
   Add the following as the first lines to the RHEL udev rule file
   /usr/lib/udev/rules.d/98-kexec.rules:

   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

   With this changeset applied, the two rules evaluate to false for
   CPU and memory change events and thus skip the userspace
   unload-then-reload of kdump.

 - Change to the kexec_file_load for loading the kdump kernel:
   Eg. on RHEL: in /usr/bin/kdumpctl, change to:
    standard_kexec_args="-p -d -s"
   which adds the -s to select kexec_file_load() syscall.

This kernel patchset also supports kexec_load() with a modified kexec
userspace utility. A working changeset to the kexec userspace utility
is posted to the kexec-tools mailing list here:

 http://lists.infradead.org/pipermail/kexec/2023-May/027049.html

To use the kexec-tools patch, apply, build and install kexec-tools,
then change the kdumpctl's standard_kexec_args to replace the -s with
--hotplug. The removal of -s reverts to the kexec_load syscall and
the addition of --hotplug invokes the changes put forth in the
kexec-tools patch.

Regards,
eric
---
v22: 3may2023
 - Rebased onto 6.3.0
 - Improved support for kexec_load(), per Hari Bathini. See
   "crash: hotplug support for kexec_load()" which is the only
   change to this series.
 - Applied Baoquan He's Acked-by for all other patches.

v21: 4apr2023
 https://lkml.org/lkml/2023/4/4/1136
 https://lore.kernel.org/lkml/20230404180326.6890-1-eric.devolder@oracle.com/
 - Rebased onto 6.3.0-rc5
 - Additional simplification of indentation in crash_handle_hotplug_event(),
   per Baoquan.

v20: 17mar2023
 https://lkml.org/lkml/2023/3/17/1169
 https://lore.kernel.org/lkml/20230317212128.21424-1-eric.devolder@oracle.com/
 - Rebased onto 6.3.0-rc2
 - Defaulting CRASH_HOTPLUG for x86 to Y, per Sourabh.
 - Explicitly initializing image->hp_action, per Baoquan.
 - Simplified kexec_trylock() in crash_handle_hotplug_event(),
   per Baoquan.
 - Applied Sourabh's Reviewed-by to the series.

v19: 6mar2023
 https://lkml.org/lkml/2023/3/6/1358
 https://lore.kernel.org/lkml/20230306162228.8277-1-eric.devolder@oracle.com/
 - Rebased onto 6.2.0
 - Did away with offlinecpu, per Thomas Gleixner.
 - Changed to CPUHP_BP_PREPARE_DYN instead of CPUHP_AP_ONLINE_DYN.
 - Did away with elfcorehdr_index_valid, per Sourabh.
 - Convert to for_each_possible_cpu() in crash_prepare_elf64_headers()
   per Sourabh.
 - Small optimization for x86 cpu changes.

v18: 31jan2023
 https://lkml.org/lkml/2023/1/31/1356
 https://lore.kernel.org/lkml/20230131224236.122805-1-eric.devolder@oracle.com/
 - Rebased onto 6.2.0-rc6
 - Renamed struct kimage member hotplug_event to hp_action, and
   re-enumerated the KEXEC_CRASH_HP_x items, adding _NONE at 0.
 - Moved to cpuhp state CPUHP_BP_PREPARE_DYN instead of
   CPUHP_AP_ONLINE_DYN in order to minimize window of time CPU
   is not reflected in elfcorehdr.
 - Reworked some of the comments and commit messages to offer
   more of the why, than what, per Thomas Gleixner.

v17: 18jan2023
 https://lkml.org/lkml/2023/1/18/1420
 https://lore.kernel.org/lkml/20230118213544.2128-1-eric.devolder@oracle.com/
 - Rebased onto 6.2.0-rc4
 - Moved a bit of code around so that kexec_load()-only builds
   work, per Sourabh.
 - Corrected computation of number of memory region Phdrs needed
   when x86 memory hotplug is not enabled, per Baoquan.

v16: 5jan2023
 https://lkml.org/lkml/2023/1/5/673
 https://lore.kernel.org/lkml/20230105151709.1845-1-eric.devolder@oracle.com/
 - Rebased onto 6.2.0-rc2
 - Corrected error identified by Baoquan.

v15: 9dec2022
 https://lkml.org/lkml/2022/12/9/520
 https://lore.kernel.org/lkml/20221209153656.3284-1-eric.devolder@oracle.com/
 - Rebased onto 6.1.0-rc8
 - Replaced arch_un/map_crash_pages() with direct use of
   kun/map_local_pages(), per Boris.
 - Some x86 changes, per Boris.

v14: 16nov2022
 https://lkml.org/lkml/2022/11/16/1645
 https://lore.kernel.org/lkml/20221116214643.6384-1-eric.devolder@oracle.com/
 - Rebased onto 6.1.0-rc5
 - Introduced CRASH_HOTPLUG Kconfig item to better fine tune
   compilation of feature components, per Boris.
 - Removed hp_action parameter to arch_crash_handle_hotplug_event()
   as it is unused.

v13: 31oct2022
 https://lkml.org/lkml/2022/10/31/854
 https://lore.kernel.org/lkml/20221031193604.28779-1-eric.devolder@oracle.com/
 - Rebased onto 6.1.0-rc3, which means converting to use the new
   kexec_trylock() away from mutex_lock(kexec_mutex).
 - Moved arch_un/map_crash_pages() into kexec.h and default
   implementation using k/unmap_local_pages().
 - Changed more #ifdef's into IS_ENABLED()
 - Changed CRASH_MAX_MEMORY_RANGES to 8192 from 32768, and it moved
   into x86 crash.c as #define rather Kconfig item, per Boris.
 - Check number of Phdrs against PN_XNUM, max possible.

v12: 9sep2022
 https://lkml.org/lkml/2022/9/9/1358
 https://lore.kernel.org/lkml/20220909210509.6286-1-eric.devolder@oracle.com/
 - Rebased onto 6.0-rc4
 - Addressed some minor formatting items, per Baoquan

v11: 26aug2022
 https://lkml.org/lkml/2022/8/26/963
 https://lore.kernel.org/lkml/20220826173704.1895-1-eric.devolder@oracle.com/
 - Rebased onto 6.0-rc2
 - Redid the rework of __weak to use asm/kexec.h, per Baoquan
 - Reworked some comments and minor items, per Baoquan

v10: 21jul2022
 https://lkml.org/lkml/2022/7/21/1007
 https://lore.kernel.org/lkml/20220721181747.1640-1-eric.devolder@oracle.com/
 - Rebased to 5.19.0-rc7
 - Per Sourabh, corrected build issue with arch_un/map_crash_pages()
   for architectures not supporting this feature.
 - Per David Hildebrand, removed the WARN_ONCE() altogether.
 - Per David Hansen, converted to use of kmap_local_page().
 - Per Baoquan He, replaced use of __weak with the kexec technique.

v9: 13jun2022
 https://lkml.org/lkml/2022/6/13/3382
 https://lore.kernel.org/lkml/20220613224240.79400-1-eric.devolder@oracle.com/
 - Rebased to 5.18.0
 - Per Sourabh, moved crash_prepare_elf64_headers() into common
   crash_core.c to avoid compile issues with kexec_load only path.
 - Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
 - Changed the __weak arch_crash_handle_hotplug_event() to utilize
   WARN_ONCE() instead of WARN(). Fix some formatting issues.
 - Per Sourabh, introduced sysfs attribute crash_hotplug for memory
   and CPUs; for use by userspace (udev) to determine if the kernel
   performs crash hot un/plug support.
 - Per Sourabh, moved the code detecting the elfcorehdr segment from
   arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
   and kexec_file_load can benefit.
 - Updated userspace kexec-tools kexec utility to reflect change to
   using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
 - Updated the new proposed udev rules to reflect using the sysfs
   attributes crash_hotplug.

v8: 5may2022
 https://lkml.org/lkml/2022/5/5/1133
 https://lore.kernel.org/lkml/20220505184603.1548-1-eric.devolder@oracle.com/
 - Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
   of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
   is not needed. Also use of IS_ENABLED() rather than #ifdef's.
   Renamed crash_hotplug_handler() to handle_hotplug_event().
   And other corrections.
 - Per Baoquan, minimized the parameters to the arch_crash_
   handle_hotplug_event() to hp_action and cpu.
 - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
 - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
   to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
   by David Hildebrand. Folded this patch into the x86
   kexec_file_load support patch.

v7: 13apr2022
 https://lkml.org/lkml/2022/4/13/850
 https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devolder@oracle.com/
 - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
 https://lkml.org/lkml/2022/4/1/1203
 https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devolder@oracle.com/
 - Reword commit messages and some comment cleanup per Baoquan.
 - Changed elf_index to elfcorehdr_index for clarity.
 - Minor code changes per Baoquan.

v5: 3mar2022
 https://lkml.org/lkml/2022/3/3/674
 https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devolder@oracle.com/
 - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
   David Hildenbrand.
 - Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
 https://lkml.org/lkml/2022/2/9/1406
 https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devolder@oracle.com/
 - Refactored patches per Baoquan suggestsions.
 - A few corrections, per Baoquan.

v3: 10jan2022
 https://lkml.org/lkml/2022/1/10/1212
 https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devolder@oracle.com/
 - Rebasing per Baoquan He request.
 - Changed memory notifier per David Hildenbrand.
 - Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
 https://lkml.org/lkml/2021/12/7/1088
 https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devolder@oracle.com/
 - Acting upon Baoquan He suggestion of removing elfcorehdr from
   the purgatory list of segments, removed purgatory code from
   patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
 https://lkml.org/lkml/2021/11/18/845
 https://lore.kernel.org/lkml/20211118174948.37435-1-eric.devolder@oracle.com/
 - working patchset demonstrating kernel handling of hotplug
   updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
 https://lkml.org/lkml/2020/12/14/532
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com/
 - proposed concept of allowing kernel to handle hotplug update
   of elfcorehdr
---

Eric DeVolder (8):
  crash: move a few code bits to setup support of crash hotplug
  crash: add generic infrastructure for crash hotplug support
  kexec: exclude elfcorehdr from the segment digest
  crash: memory and CPU hotplug sysfs attributes
  x86/crash: add x86 crash hotplug support
  crash: hotplug support for kexec_load()
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  x86/crash: optimize CPU changes

 .../admin-guide/mm/memory-hotplug.rst         |   8 +
 Documentation/core-api/cpu_hotplug.rst        |  18 +
 arch/x86/Kconfig                              |  13 +
 arch/x86/include/asm/kexec.h                  |  18 +
 arch/x86/kernel/crash.c                       | 156 +++++++-
 drivers/base/cpu.c                            |  14 +
 drivers/base/memory.c                         |  13 +
 include/linux/crash_core.h                    |   9 +
 include/linux/kexec.h                         |  63 +++-
 include/uapi/linux/kexec.h                    |   1 +
 kernel/crash_core.c                           | 355 ++++++++++++++++++
 kernel/kexec.c                                |   3 +
 kernel/kexec_core.c                           |   6 +
 kernel/kexec_file.c                           | 187 +--------
 kernel/ksysfs.c                               |  15 +
 15 files changed, 674 insertions(+), 205 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v22 0/8] crash: Kernel handling of CPU and memory hot un/plug
@ 2023-05-03 22:41 ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

Once the kdump service is loaded, if changes to CPUs or memory occur,
either by hot un/plug or off/onlining, the crash elfcorehdr must also
be updated.

The elfcorehdr describes to kdump the CPUs and memory in the system,
and any inaccuracies can result in a vmcore with missing CPU context
or memory regions.

The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (eg. kernel, initrd, boot_params, purgatory and
elfcorehdr) by the userspace kexec utility. In the original post I
outlined the significant performance problems related to offloading
this activity to userspace.

This patchset introduces a generic crash handler that registers with
the CPU and memory notifiers. Upon CPU or memory changes, from either
hot un/plug or off/onlining, this generic handler is invoked and
performs important housekeeping, for example obtaining the appropriate
lock, and then invokes an architecture specific handler to do the
appropriate elfcorehdr update.

Note the description in patch 'crash: change crash_prepare_elf64_headers()
to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
enables further optimizations related to CPU plug/unplug/online/offline
performance of elfcorehdr updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory; thus no involvement
with userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

 - Prevent udev from updating kdump crash kernel on hot un/plug changes.
   Add the following as the first lines to the RHEL udev rule file
   /usr/lib/udev/rules.d/98-kexec.rules:

   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

   With this changeset applied, the two rules evaluate to false for
   CPU and memory change events and thus skip the userspace
   unload-then-reload of kdump.

 - Change to the kexec_file_load for loading the kdump kernel:
   Eg. on RHEL: in /usr/bin/kdumpctl, change to:
    standard_kexec_args="-p -d -s"
   which adds the -s to select kexec_file_load() syscall.

This kernel patchset also supports kexec_load() with a modified kexec
userspace utility. A working changeset to the kexec userspace utility
is posted to the kexec-tools mailing list here:

 http://lists.infradead.org/pipermail/kexec/2023-May/027049.html

To use the kexec-tools patch, apply, build and install kexec-tools,
then change the kdumpctl's standard_kexec_args to replace the -s with
--hotplug. The removal of -s reverts to the kexec_load syscall and
the addition of --hotplug invokes the changes put forth in the
kexec-tools patch.

Regards,
eric
---
v22: 3may2023
 - Rebased onto 6.3.0
 - Improved support for kexec_load(), per Hari Bathini. See
   "crash: hotplug support for kexec_load()" which is the only
   change to this series.
 - Applied Baoquan He's Acked-by for all other patches.

v21: 4apr2023
 https://lkml.org/lkml/2023/4/4/1136
 https://lore.kernel.org/lkml/20230404180326.6890-1-eric.devolder@oracle.com/
 - Rebased onto 6.3.0-rc5
 - Additional simplification of indentation in crash_handle_hotplug_event(),
   per Baoquan.

v20: 17mar2023
 https://lkml.org/lkml/2023/3/17/1169
 https://lore.kernel.org/lkml/20230317212128.21424-1-eric.devolder@oracle.com/
 - Rebased onto 6.3.0-rc2
 - Defaulting CRASH_HOTPLUG for x86 to Y, per Sourabh.
 - Explicitly initializing image->hp_action, per Baoquan.
 - Simplified kexec_trylock() in crash_handle_hotplug_event(),
   per Baoquan.
 - Applied Sourabh's Reviewed-by to the series.

v19: 6mar2023
 https://lkml.org/lkml/2023/3/6/1358
 https://lore.kernel.org/lkml/20230306162228.8277-1-eric.devolder@oracle.com/
 - Rebased onto 6.2.0
 - Did away with offlinecpu, per Thomas Gleixner.
 - Changed to CPUHP_BP_PREPARE_DYN instead of CPUHP_AP_ONLINE_DYN.
 - Did away with elfcorehdr_index_valid, per Sourabh.
 - Convert to for_each_possible_cpu() in crash_prepare_elf64_headers()
   per Sourabh.
 - Small optimization for x86 cpu changes.

v18: 31jan2023
 https://lkml.org/lkml/2023/1/31/1356
 https://lore.kernel.org/lkml/20230131224236.122805-1-eric.devolder@oracle.com/
 - Rebased onto 6.2.0-rc6
 - Renamed struct kimage member hotplug_event to hp_action, and
   re-enumerated the KEXEC_CRASH_HP_x items, adding _NONE at 0.
 - Moved to cpuhp state CPUHP_BP_PREPARE_DYN instead of
   CPUHP_AP_ONLINE_DYN in order to minimize window of time CPU
   is not reflected in elfcorehdr.
 - Reworked some of the comments and commit messages to offer
   more of the why, than what, per Thomas Gleixner.

v17: 18jan2023
 https://lkml.org/lkml/2023/1/18/1420
 https://lore.kernel.org/lkml/20230118213544.2128-1-eric.devolder@oracle.com/
 - Rebased onto 6.2.0-rc4
 - Moved a bit of code around so that kexec_load()-only builds
   work, per Sourabh.
 - Corrected computation of number of memory region Phdrs needed
   when x86 memory hotplug is not enabled, per Baoquan.

v16: 5jan2023
 https://lkml.org/lkml/2023/1/5/673
 https://lore.kernel.org/lkml/20230105151709.1845-1-eric.devolder@oracle.com/
 - Rebased onto 6.2.0-rc2
 - Corrected error identified by Baoquan.

v15: 9dec2022
 https://lkml.org/lkml/2022/12/9/520
 https://lore.kernel.org/lkml/20221209153656.3284-1-eric.devolder@oracle.com/
 - Rebased onto 6.1.0-rc8
 - Replaced arch_un/map_crash_pages() with direct use of
   kun/map_local_pages(), per Boris.
 - Some x86 changes, per Boris.

v14: 16nov2022
 https://lkml.org/lkml/2022/11/16/1645
 https://lore.kernel.org/lkml/20221116214643.6384-1-eric.devolder@oracle.com/
 - Rebased onto 6.1.0-rc5
 - Introduced CRASH_HOTPLUG Kconfig item to better fine tune
   compilation of feature components, per Boris.
 - Removed hp_action parameter to arch_crash_handle_hotplug_event()
   as it is unused.

v13: 31oct2022
 https://lkml.org/lkml/2022/10/31/854
 https://lore.kernel.org/lkml/20221031193604.28779-1-eric.devolder@oracle.com/
 - Rebased onto 6.1.0-rc3, which means converting to use the new
   kexec_trylock() away from mutex_lock(kexec_mutex).
 - Moved arch_un/map_crash_pages() into kexec.h and default
   implementation using k/unmap_local_pages().
 - Changed more #ifdef's into IS_ENABLED()
 - Changed CRASH_MAX_MEMORY_RANGES to 8192 from 32768, and it moved
   into x86 crash.c as #define rather Kconfig item, per Boris.
 - Check number of Phdrs against PN_XNUM, max possible.

v12: 9sep2022
 https://lkml.org/lkml/2022/9/9/1358
 https://lore.kernel.org/lkml/20220909210509.6286-1-eric.devolder@oracle.com/
 - Rebased onto 6.0-rc4
 - Addressed some minor formatting items, per Baoquan

v11: 26aug2022
 https://lkml.org/lkml/2022/8/26/963
 https://lore.kernel.org/lkml/20220826173704.1895-1-eric.devolder@oracle.com/
 - Rebased onto 6.0-rc2
 - Redid the rework of __weak to use asm/kexec.h, per Baoquan
 - Reworked some comments and minor items, per Baoquan

v10: 21jul2022
 https://lkml.org/lkml/2022/7/21/1007
 https://lore.kernel.org/lkml/20220721181747.1640-1-eric.devolder@oracle.com/
 - Rebased to 5.19.0-rc7
 - Per Sourabh, corrected build issue with arch_un/map_crash_pages()
   for architectures not supporting this feature.
 - Per David Hildebrand, removed the WARN_ONCE() altogether.
 - Per David Hansen, converted to use of kmap_local_page().
 - Per Baoquan He, replaced use of __weak with the kexec technique.

v9: 13jun2022
 https://lkml.org/lkml/2022/6/13/3382
 https://lore.kernel.org/lkml/20220613224240.79400-1-eric.devolder@oracle.com/
 - Rebased to 5.18.0
 - Per Sourabh, moved crash_prepare_elf64_headers() into common
   crash_core.c to avoid compile issues with kexec_load only path.
 - Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
 - Changed the __weak arch_crash_handle_hotplug_event() to utilize
   WARN_ONCE() instead of WARN(). Fix some formatting issues.
 - Per Sourabh, introduced sysfs attribute crash_hotplug for memory
   and CPUs; for use by userspace (udev) to determine if the kernel
   performs crash hot un/plug support.
 - Per Sourabh, moved the code detecting the elfcorehdr segment from
   arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
   and kexec_file_load can benefit.
 - Updated userspace kexec-tools kexec utility to reflect change to
   using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
 - Updated the new proposed udev rules to reflect using the sysfs
   attributes crash_hotplug.

v8: 5may2022
 https://lkml.org/lkml/2022/5/5/1133
 https://lore.kernel.org/lkml/20220505184603.1548-1-eric.devolder@oracle.com/
 - Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
   of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
   is not needed. Also use of IS_ENABLED() rather than #ifdef's.
   Renamed crash_hotplug_handler() to handle_hotplug_event().
   And other corrections.
 - Per Baoquan, minimized the parameters to the arch_crash_
   handle_hotplug_event() to hp_action and cpu.
 - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
 - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
   to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
   by David Hildebrand. Folded this patch into the x86
   kexec_file_load support patch.

v7: 13apr2022
 https://lkml.org/lkml/2022/4/13/850
 https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devolder@oracle.com/
 - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
 https://lkml.org/lkml/2022/4/1/1203
 https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devolder@oracle.com/
 - Reword commit messages and some comment cleanup per Baoquan.
 - Changed elf_index to elfcorehdr_index for clarity.
 - Minor code changes per Baoquan.

v5: 3mar2022
 https://lkml.org/lkml/2022/3/3/674
 https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devolder@oracle.com/
 - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
   David Hildenbrand.
 - Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
 https://lkml.org/lkml/2022/2/9/1406
 https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devolder@oracle.com/
 - Refactored patches per Baoquan suggestsions.
 - A few corrections, per Baoquan.

v3: 10jan2022
 https://lkml.org/lkml/2022/1/10/1212
 https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devolder@oracle.com/
 - Rebasing per Baoquan He request.
 - Changed memory notifier per David Hildenbrand.
 - Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
 https://lkml.org/lkml/2021/12/7/1088
 https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devolder@oracle.com/
 - Acting upon Baoquan He suggestion of removing elfcorehdr from
   the purgatory list of segments, removed purgatory code from
   patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
 https://lkml.org/lkml/2021/11/18/845
 https://lore.kernel.org/lkml/20211118174948.37435-1-eric.devolder@oracle.com/
 - working patchset demonstrating kernel handling of hotplug
   updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
 https://lkml.org/lkml/2020/12/14/532
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com/
 - proposed concept of allowing kernel to handle hotplug update
   of elfcorehdr
---

Eric DeVolder (8):
  crash: move a few code bits to setup support of crash hotplug
  crash: add generic infrastructure for crash hotplug support
  kexec: exclude elfcorehdr from the segment digest
  crash: memory and CPU hotplug sysfs attributes
  x86/crash: add x86 crash hotplug support
  crash: hotplug support for kexec_load()
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  x86/crash: optimize CPU changes

 .../admin-guide/mm/memory-hotplug.rst         |   8 +
 Documentation/core-api/cpu_hotplug.rst        |  18 +
 arch/x86/Kconfig                              |  13 +
 arch/x86/include/asm/kexec.h                  |  18 +
 arch/x86/kernel/crash.c                       | 156 +++++++-
 drivers/base/cpu.c                            |  14 +
 drivers/base/memory.c                         |  13 +
 include/linux/crash_core.h                    |   9 +
 include/linux/kexec.h                         |  63 +++-
 include/uapi/linux/kexec.h                    |   1 +
 kernel/crash_core.c                           | 355 ++++++++++++++++++
 kernel/kexec.c                                |   3 +
 kernel/kexec_core.c                           |   6 +
 kernel/kexec_file.c                           | 187 +--------
 kernel/ksysfs.c                               |  15 +
 15 files changed, 674 insertions(+), 205 deletions(-)

-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v22 1/8] crash: move a few code bits to setup support of crash hotplug
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-03 22:41   ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 include/linux/kexec.h |  30 +++----
 kernel/crash_core.c   | 182 ++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec_file.c   | 181 -----------------------------------------
 3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
 };
 #endif
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+	unsigned int max_nr_ranges;
+	unsigned int nr_ranges;
+	struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+				   unsigned long long mstart,
+				   unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+				       void **addr, unsigned long *sz);
+
 #ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
 	/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-	unsigned int max_nr_ranges;
-	unsigned int nr_ranges;
-	struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-				   unsigned long long mstart,
-				   unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
-				       void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include <linux/utsname.h>
 #include <linux/vmalloc.h>
 #include <linux/sizes.h>
+#include <linux/kexec.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+			  void **addr, unsigned long *sz)
+{
+	Elf64_Ehdr *ehdr;
+	Elf64_Phdr *phdr;
+	unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+	unsigned char *buf;
+	unsigned int cpu, i;
+	unsigned long long notes_addr;
+	unsigned long mstart, mend;
+
+	/* extra phdr for vmcoreinfo ELF note */
+	nr_phdr = nr_cpus + 1;
+	nr_phdr += mem->nr_ranges;
+
+	/*
+	 * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+	 * area (for example, ffffffff80000000 - ffffffffa0000000 on x86_64).
+	 * I think this is required by tools like gdb. So same physical
+	 * memory will be mapped in two ELF headers. One will contain kernel
+	 * text virtual addresses and other will have __va(physical) addresses.
+	 */
+
+	nr_phdr++;
+	elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+	elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+	buf = vzalloc(elf_sz);
+	if (!buf)
+		return -ENOMEM;
+
+	ehdr = (Elf64_Ehdr *)buf;
+	phdr = (Elf64_Phdr *)(ehdr + 1);
+	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+	ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+	ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+	ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+	memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+	ehdr->e_type = ET_CORE;
+	ehdr->e_machine = ELF_ARCH;
+	ehdr->e_version = EV_CURRENT;
+	ehdr->e_phoff = sizeof(Elf64_Ehdr);
+	ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+	ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+	/* Prepare one phdr of type PT_NOTE for each present CPU */
+	for_each_present_cpu(cpu) {
+		phdr->p_type = PT_NOTE;
+		notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+		phdr->p_offset = phdr->p_paddr = notes_addr;
+		phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+		(ehdr->e_phnum)++;
+		phdr++;
+	}
+
+	/* Prepare one PT_NOTE header for vmcoreinfo */
+	phdr->p_type = PT_NOTE;
+	phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note();
+	phdr->p_filesz = phdr->p_memsz = VMCOREINFO_NOTE_SIZE;
+	(ehdr->e_phnum)++;
+	phdr++;
+
+	/* Prepare PT_LOAD type program header for kernel text region */
+	if (need_kernel_map) {
+		phdr->p_type = PT_LOAD;
+		phdr->p_flags = PF_R|PF_W|PF_X;
+		phdr->p_vaddr = (unsigned long) _text;
+		phdr->p_filesz = phdr->p_memsz = _end - _text;
+		phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
+		ehdr->e_phnum++;
+		phdr++;
+	}
+
+	/* Go through all the ranges in mem->ranges[] and prepare phdr */
+	for (i = 0; i < mem->nr_ranges; i++) {
+		mstart = mem->ranges[i].start;
+		mend = mem->ranges[i].end;
+
+		phdr->p_type = PT_LOAD;
+		phdr->p_flags = PF_R|PF_W|PF_X;
+		phdr->p_offset  = mstart;
+
+		phdr->p_paddr = mstart;
+		phdr->p_vaddr = (unsigned long) __va(mstart);
+		phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
+		phdr->p_align = 0;
+		ehdr->e_phnum++;
+		pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
+			phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
+			ehdr->e_phnum, phdr->p_offset);
+		phdr++;
+	}
+
+	*addr = buf;
+	*sz = elf_sz;
+	return 0;
+}
+
+int crash_exclude_mem_range(struct crash_mem *mem,
+			    unsigned long long mstart, unsigned long long mend)
+{
+	int i, j;
+	unsigned long long start, end, p_start, p_end;
+	struct range temp_range = {0, 0};
+
+	for (i = 0; i < mem->nr_ranges; i++) {
+		start = mem->ranges[i].start;
+		end = mem->ranges[i].end;
+		p_start = mstart;
+		p_end = mend;
+
+		if (mstart > end || mend < start)
+			continue;
+
+		/* Truncate any area outside of range */
+		if (mstart < start)
+			p_start = start;
+		if (mend > end)
+			p_end = end;
+
+		/* Found completely overlapping range */
+		if (p_start == start && p_end == end) {
+			mem->ranges[i].start = 0;
+			mem->ranges[i].end = 0;
+			if (i < mem->nr_ranges - 1) {
+				/* Shift rest of the ranges to left */
+				for (j = i; j < mem->nr_ranges - 1; j++) {
+					mem->ranges[j].start =
+						mem->ranges[j+1].start;
+					mem->ranges[j].end =
+							mem->ranges[j+1].end;
+				}
+
+				/*
+				 * Continue to check if there are another overlapping ranges
+				 * from the current position because of shifting the above
+				 * mem ranges.
+				 */
+				i--;
+				mem->nr_ranges--;
+				continue;
+			}
+			mem->nr_ranges--;
+			return 0;
+		}
+
+		if (p_start > start && p_end < end) {
+			/* Split original range */
+			mem->ranges[i].end = p_start - 1;
+			temp_range.start = p_end + 1;
+			temp_range.end = end;
+		} else if (p_start != start)
+			mem->ranges[i].end = p_start - 1;
+		else
+			mem->ranges[i].start = p_end + 1;
+		break;
+	}
+
+	/* If a split happened, add the split to array */
+	if (!temp_range.end)
+		return 0;
+
+	/* Split happened */
+	if (i == mem->max_nr_ranges - 1)
+		return -ENOMEM;
+
+	/* Location where new range should go */
+	j = i + 1;
+	if (j < mem->nr_ranges) {
+		/* Move over all ranges one slot towards the end */
+		for (i = mem->nr_ranges - 1; i >= j; i--)
+			mem->ranges[i + 1] = mem->ranges[i];
+	}
+
+	mem->ranges[j].start = temp_range.start;
+	mem->ranges[j].end = temp_range.end;
+	mem->nr_ranges++;
+	return 0;
+}
+
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len)
 {
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f989f5f1933b..f8b1797b3ec9 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -1138,184 +1138,3 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name,
 	return 0;
 }
 #endif /* CONFIG_ARCH_HAS_KEXEC_PURGATORY */
-
-int crash_exclude_mem_range(struct crash_mem *mem,
-			    unsigned long long mstart, unsigned long long mend)
-{
-	int i, j;
-	unsigned long long start, end, p_start, p_end;
-	struct range temp_range = {0, 0};
-
-	for (i = 0; i < mem->nr_ranges; i++) {
-		start = mem->ranges[i].start;
-		end = mem->ranges[i].end;
-		p_start = mstart;
-		p_end = mend;
-
-		if (mstart > end || mend < start)
-			continue;
-
-		/* Truncate any area outside of range */
-		if (mstart < start)
-			p_start = start;
-		if (mend > end)
-			p_end = end;
-
-		/* Found completely overlapping range */
-		if (p_start == start && p_end == end) {
-			mem->ranges[i].start = 0;
-			mem->ranges[i].end = 0;
-			if (i < mem->nr_ranges - 1) {
-				/* Shift rest of the ranges to left */
-				for (j = i; j < mem->nr_ranges - 1; j++) {
-					mem->ranges[j].start =
-						mem->ranges[j+1].start;
-					mem->ranges[j].end =
-							mem->ranges[j+1].end;
-				}
-
-				/*
-				 * Continue to check if there are another overlapping ranges
-				 * from the current position because of shifting the above
-				 * mem ranges.
-				 */
-				i--;
-				mem->nr_ranges--;
-				continue;
-			}
-			mem->nr_ranges--;
-			return 0;
-		}
-
-		if (p_start > start && p_end < end) {
-			/* Split original range */
-			mem->ranges[i].end = p_start - 1;
-			temp_range.start = p_end + 1;
-			temp_range.end = end;
-		} else if (p_start != start)
-			mem->ranges[i].end = p_start - 1;
-		else
-			mem->ranges[i].start = p_end + 1;
-		break;
-	}
-
-	/* If a split happened, add the split to array */
-	if (!temp_range.end)
-		return 0;
-
-	/* Split happened */
-	if (i == mem->max_nr_ranges - 1)
-		return -ENOMEM;
-
-	/* Location where new range should go */
-	j = i + 1;
-	if (j < mem->nr_ranges) {
-		/* Move over all ranges one slot towards the end */
-		for (i = mem->nr_ranges - 1; i >= j; i--)
-			mem->ranges[i + 1] = mem->ranges[i];
-	}
-
-	mem->ranges[j].start = temp_range.start;
-	mem->ranges[j].end = temp_range.end;
-	mem->nr_ranges++;
-	return 0;
-}
-
-int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
-			  void **addr, unsigned long *sz)
-{
-	Elf64_Ehdr *ehdr;
-	Elf64_Phdr *phdr;
-	unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
-	unsigned char *buf;
-	unsigned int cpu, i;
-	unsigned long long notes_addr;
-	unsigned long mstart, mend;
-
-	/* extra phdr for vmcoreinfo ELF note */
-	nr_phdr = nr_cpus + 1;
-	nr_phdr += mem->nr_ranges;
-
-	/*
-	 * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
-	 * area (for example, ffffffff80000000 - ffffffffa0000000 on x86_64).
-	 * I think this is required by tools like gdb. So same physical
-	 * memory will be mapped in two ELF headers. One will contain kernel
-	 * text virtual addresses and other will have __va(physical) addresses.
-	 */
-
-	nr_phdr++;
-	elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
-	elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
-
-	buf = vzalloc(elf_sz);
-	if (!buf)
-		return -ENOMEM;
-
-	ehdr = (Elf64_Ehdr *)buf;
-	phdr = (Elf64_Phdr *)(ehdr + 1);
-	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
-	ehdr->e_ident[EI_CLASS] = ELFCLASS64;
-	ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
-	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
-	ehdr->e_ident[EI_OSABI] = ELF_OSABI;
-	memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
-	ehdr->e_type = ET_CORE;
-	ehdr->e_machine = ELF_ARCH;
-	ehdr->e_version = EV_CURRENT;
-	ehdr->e_phoff = sizeof(Elf64_Ehdr);
-	ehdr->e_ehsize = sizeof(Elf64_Ehdr);
-	ehdr->e_phentsize = sizeof(Elf64_Phdr);
-
-	/* Prepare one phdr of type PT_NOTE for each present CPU */
-	for_each_present_cpu(cpu) {
-		phdr->p_type = PT_NOTE;
-		notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
-		phdr->p_offset = phdr->p_paddr = notes_addr;
-		phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
-		(ehdr->e_phnum)++;
-		phdr++;
-	}
-
-	/* Prepare one PT_NOTE header for vmcoreinfo */
-	phdr->p_type = PT_NOTE;
-	phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note();
-	phdr->p_filesz = phdr->p_memsz = VMCOREINFO_NOTE_SIZE;
-	(ehdr->e_phnum)++;
-	phdr++;
-
-	/* Prepare PT_LOAD type program header for kernel text region */
-	if (need_kernel_map) {
-		phdr->p_type = PT_LOAD;
-		phdr->p_flags = PF_R|PF_W|PF_X;
-		phdr->p_vaddr = (unsigned long) _text;
-		phdr->p_filesz = phdr->p_memsz = _end - _text;
-		phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
-		ehdr->e_phnum++;
-		phdr++;
-	}
-
-	/* Go through all the ranges in mem->ranges[] and prepare phdr */
-	for (i = 0; i < mem->nr_ranges; i++) {
-		mstart = mem->ranges[i].start;
-		mend = mem->ranges[i].end;
-
-		phdr->p_type = PT_LOAD;
-		phdr->p_flags = PF_R|PF_W|PF_X;
-		phdr->p_offset  = mstart;
-
-		phdr->p_paddr = mstart;
-		phdr->p_vaddr = (unsigned long) __va(mstart);
-		phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
-		phdr->p_align = 0;
-		ehdr->e_phnum++;
-		pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
-			phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
-			ehdr->e_phnum, phdr->p_offset);
-		phdr++;
-	}
-
-	*addr = buf;
-	*sz = elf_sz;
-	return 0;
-}
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 1/8] crash: move a few code bits to setup support of crash hotplug
@ 2023-05-03 22:41   ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 include/linux/kexec.h |  30 +++----
 kernel/crash_core.c   | 182 ++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec_file.c   | 181 -----------------------------------------
 3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
 };
 #endif
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+	unsigned int max_nr_ranges;
+	unsigned int nr_ranges;
+	struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+				   unsigned long long mstart,
+				   unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+				       void **addr, unsigned long *sz);
+
 #ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
 	/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-	unsigned int max_nr_ranges;
-	unsigned int nr_ranges;
-	struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-				   unsigned long long mstart,
-				   unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
-				       void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include <linux/utsname.h>
 #include <linux/vmalloc.h>
 #include <linux/sizes.h>
+#include <linux/kexec.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+			  void **addr, unsigned long *sz)
+{
+	Elf64_Ehdr *ehdr;
+	Elf64_Phdr *phdr;
+	unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+	unsigned char *buf;
+	unsigned int cpu, i;
+	unsigned long long notes_addr;
+	unsigned long mstart, mend;
+
+	/* extra phdr for vmcoreinfo ELF note */
+	nr_phdr = nr_cpus + 1;
+	nr_phdr += mem->nr_ranges;
+
+	/*
+	 * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+	 * area (for example, ffffffff80000000 - ffffffffa0000000 on x86_64).
+	 * I think this is required by tools like gdb. So same physical
+	 * memory will be mapped in two ELF headers. One will contain kernel
+	 * text virtual addresses and other will have __va(physical) addresses.
+	 */
+
+	nr_phdr++;
+	elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+	elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+	buf = vzalloc(elf_sz);
+	if (!buf)
+		return -ENOMEM;
+
+	ehdr = (Elf64_Ehdr *)buf;
+	phdr = (Elf64_Phdr *)(ehdr + 1);
+	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+	ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+	ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+	ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+	memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+	ehdr->e_type = ET_CORE;
+	ehdr->e_machine = ELF_ARCH;
+	ehdr->e_version = EV_CURRENT;
+	ehdr->e_phoff = sizeof(Elf64_Ehdr);
+	ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+	ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+	/* Prepare one phdr of type PT_NOTE for each present CPU */
+	for_each_present_cpu(cpu) {
+		phdr->p_type = PT_NOTE;
+		notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+		phdr->p_offset = phdr->p_paddr = notes_addr;
+		phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+		(ehdr->e_phnum)++;
+		phdr++;
+	}
+
+	/* Prepare one PT_NOTE header for vmcoreinfo */
+	phdr->p_type = PT_NOTE;
+	phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note();
+	phdr->p_filesz = phdr->p_memsz = VMCOREINFO_NOTE_SIZE;
+	(ehdr->e_phnum)++;
+	phdr++;
+
+	/* Prepare PT_LOAD type program header for kernel text region */
+	if (need_kernel_map) {
+		phdr->p_type = PT_LOAD;
+		phdr->p_flags = PF_R|PF_W|PF_X;
+		phdr->p_vaddr = (unsigned long) _text;
+		phdr->p_filesz = phdr->p_memsz = _end - _text;
+		phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
+		ehdr->e_phnum++;
+		phdr++;
+	}
+
+	/* Go through all the ranges in mem->ranges[] and prepare phdr */
+	for (i = 0; i < mem->nr_ranges; i++) {
+		mstart = mem->ranges[i].start;
+		mend = mem->ranges[i].end;
+
+		phdr->p_type = PT_LOAD;
+		phdr->p_flags = PF_R|PF_W|PF_X;
+		phdr->p_offset  = mstart;
+
+		phdr->p_paddr = mstart;
+		phdr->p_vaddr = (unsigned long) __va(mstart);
+		phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
+		phdr->p_align = 0;
+		ehdr->e_phnum++;
+		pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
+			phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
+			ehdr->e_phnum, phdr->p_offset);
+		phdr++;
+	}
+
+	*addr = buf;
+	*sz = elf_sz;
+	return 0;
+}
+
+int crash_exclude_mem_range(struct crash_mem *mem,
+			    unsigned long long mstart, unsigned long long mend)
+{
+	int i, j;
+	unsigned long long start, end, p_start, p_end;
+	struct range temp_range = {0, 0};
+
+	for (i = 0; i < mem->nr_ranges; i++) {
+		start = mem->ranges[i].start;
+		end = mem->ranges[i].end;
+		p_start = mstart;
+		p_end = mend;
+
+		if (mstart > end || mend < start)
+			continue;
+
+		/* Truncate any area outside of range */
+		if (mstart < start)
+			p_start = start;
+		if (mend > end)
+			p_end = end;
+
+		/* Found completely overlapping range */
+		if (p_start == start && p_end == end) {
+			mem->ranges[i].start = 0;
+			mem->ranges[i].end = 0;
+			if (i < mem->nr_ranges - 1) {
+				/* Shift rest of the ranges to left */
+				for (j = i; j < mem->nr_ranges - 1; j++) {
+					mem->ranges[j].start =
+						mem->ranges[j+1].start;
+					mem->ranges[j].end =
+							mem->ranges[j+1].end;
+				}
+
+				/*
+				 * Continue to check if there are another overlapping ranges
+				 * from the current position because of shifting the above
+				 * mem ranges.
+				 */
+				i--;
+				mem->nr_ranges--;
+				continue;
+			}
+			mem->nr_ranges--;
+			return 0;
+		}
+
+		if (p_start > start && p_end < end) {
+			/* Split original range */
+			mem->ranges[i].end = p_start - 1;
+			temp_range.start = p_end + 1;
+			temp_range.end = end;
+		} else if (p_start != start)
+			mem->ranges[i].end = p_start - 1;
+		else
+			mem->ranges[i].start = p_end + 1;
+		break;
+	}
+
+	/* If a split happened, add the split to array */
+	if (!temp_range.end)
+		return 0;
+
+	/* Split happened */
+	if (i == mem->max_nr_ranges - 1)
+		return -ENOMEM;
+
+	/* Location where new range should go */
+	j = i + 1;
+	if (j < mem->nr_ranges) {
+		/* Move over all ranges one slot towards the end */
+		for (i = mem->nr_ranges - 1; i >= j; i--)
+			mem->ranges[i + 1] = mem->ranges[i];
+	}
+
+	mem->ranges[j].start = temp_range.start;
+	mem->ranges[j].end = temp_range.end;
+	mem->nr_ranges++;
+	return 0;
+}
+
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len)
 {
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f989f5f1933b..f8b1797b3ec9 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -1138,184 +1138,3 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name,
 	return 0;
 }
 #endif /* CONFIG_ARCH_HAS_KEXEC_PURGATORY */
-
-int crash_exclude_mem_range(struct crash_mem *mem,
-			    unsigned long long mstart, unsigned long long mend)
-{
-	int i, j;
-	unsigned long long start, end, p_start, p_end;
-	struct range temp_range = {0, 0};
-
-	for (i = 0; i < mem->nr_ranges; i++) {
-		start = mem->ranges[i].start;
-		end = mem->ranges[i].end;
-		p_start = mstart;
-		p_end = mend;
-
-		if (mstart > end || mend < start)
-			continue;
-
-		/* Truncate any area outside of range */
-		if (mstart < start)
-			p_start = start;
-		if (mend > end)
-			p_end = end;
-
-		/* Found completely overlapping range */
-		if (p_start == start && p_end == end) {
-			mem->ranges[i].start = 0;
-			mem->ranges[i].end = 0;
-			if (i < mem->nr_ranges - 1) {
-				/* Shift rest of the ranges to left */
-				for (j = i; j < mem->nr_ranges - 1; j++) {
-					mem->ranges[j].start =
-						mem->ranges[j+1].start;
-					mem->ranges[j].end =
-							mem->ranges[j+1].end;
-				}
-
-				/*
-				 * Continue to check if there are another overlapping ranges
-				 * from the current position because of shifting the above
-				 * mem ranges.
-				 */
-				i--;
-				mem->nr_ranges--;
-				continue;
-			}
-			mem->nr_ranges--;
-			return 0;
-		}
-
-		if (p_start > start && p_end < end) {
-			/* Split original range */
-			mem->ranges[i].end = p_start - 1;
-			temp_range.start = p_end + 1;
-			temp_range.end = end;
-		} else if (p_start != start)
-			mem->ranges[i].end = p_start - 1;
-		else
-			mem->ranges[i].start = p_end + 1;
-		break;
-	}
-
-	/* If a split happened, add the split to array */
-	if (!temp_range.end)
-		return 0;
-
-	/* Split happened */
-	if (i == mem->max_nr_ranges - 1)
-		return -ENOMEM;
-
-	/* Location where new range should go */
-	j = i + 1;
-	if (j < mem->nr_ranges) {
-		/* Move over all ranges one slot towards the end */
-		for (i = mem->nr_ranges - 1; i >= j; i--)
-			mem->ranges[i + 1] = mem->ranges[i];
-	}
-
-	mem->ranges[j].start = temp_range.start;
-	mem->ranges[j].end = temp_range.end;
-	mem->nr_ranges++;
-	return 0;
-}
-
-int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
-			  void **addr, unsigned long *sz)
-{
-	Elf64_Ehdr *ehdr;
-	Elf64_Phdr *phdr;
-	unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
-	unsigned char *buf;
-	unsigned int cpu, i;
-	unsigned long long notes_addr;
-	unsigned long mstart, mend;
-
-	/* extra phdr for vmcoreinfo ELF note */
-	nr_phdr = nr_cpus + 1;
-	nr_phdr += mem->nr_ranges;
-
-	/*
-	 * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
-	 * area (for example, ffffffff80000000 - ffffffffa0000000 on x86_64).
-	 * I think this is required by tools like gdb. So same physical
-	 * memory will be mapped in two ELF headers. One will contain kernel
-	 * text virtual addresses and other will have __va(physical) addresses.
-	 */
-
-	nr_phdr++;
-	elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
-	elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
-
-	buf = vzalloc(elf_sz);
-	if (!buf)
-		return -ENOMEM;
-
-	ehdr = (Elf64_Ehdr *)buf;
-	phdr = (Elf64_Phdr *)(ehdr + 1);
-	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
-	ehdr->e_ident[EI_CLASS] = ELFCLASS64;
-	ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
-	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
-	ehdr->e_ident[EI_OSABI] = ELF_OSABI;
-	memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
-	ehdr->e_type = ET_CORE;
-	ehdr->e_machine = ELF_ARCH;
-	ehdr->e_version = EV_CURRENT;
-	ehdr->e_phoff = sizeof(Elf64_Ehdr);
-	ehdr->e_ehsize = sizeof(Elf64_Ehdr);
-	ehdr->e_phentsize = sizeof(Elf64_Phdr);
-
-	/* Prepare one phdr of type PT_NOTE for each present CPU */
-	for_each_present_cpu(cpu) {
-		phdr->p_type = PT_NOTE;
-		notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
-		phdr->p_offset = phdr->p_paddr = notes_addr;
-		phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
-		(ehdr->e_phnum)++;
-		phdr++;
-	}
-
-	/* Prepare one PT_NOTE header for vmcoreinfo */
-	phdr->p_type = PT_NOTE;
-	phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note();
-	phdr->p_filesz = phdr->p_memsz = VMCOREINFO_NOTE_SIZE;
-	(ehdr->e_phnum)++;
-	phdr++;
-
-	/* Prepare PT_LOAD type program header for kernel text region */
-	if (need_kernel_map) {
-		phdr->p_type = PT_LOAD;
-		phdr->p_flags = PF_R|PF_W|PF_X;
-		phdr->p_vaddr = (unsigned long) _text;
-		phdr->p_filesz = phdr->p_memsz = _end - _text;
-		phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
-		ehdr->e_phnum++;
-		phdr++;
-	}
-
-	/* Go through all the ranges in mem->ranges[] and prepare phdr */
-	for (i = 0; i < mem->nr_ranges; i++) {
-		mstart = mem->ranges[i].start;
-		mend = mem->ranges[i].end;
-
-		phdr->p_type = PT_LOAD;
-		phdr->p_flags = PF_R|PF_W|PF_X;
-		phdr->p_offset  = mstart;
-
-		phdr->p_paddr = mstart;
-		phdr->p_vaddr = (unsigned long) __va(mstart);
-		phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
-		phdr->p_align = 0;
-		ehdr->e_phnum++;
-		pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
-			phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
-			ehdr->e_phnum, phdr->p_offset);
-		phdr++;
-	}
-
-	*addr = buf;
-	*sz = elf_sz;
-	return 0;
-}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 2/8] crash: add generic infrastructure for crash hotplug support
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-03 22:41   ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, this is the last state in the PREPARE group, just
prior to the STARTING group, which is very close to the CPU
starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see the discussion in the patch
'crash: change crash_prepare_elf64_headers() to
for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 include/linux/crash_core.h |   9 +++
 include/linux/kexec.h      |  11 +++
 kernel/crash_core.c        | 142 +++++++++++++++++++++++++++++++++++++
 kernel/kexec_core.c        |   6 ++
 4 files changed, 168 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
 
+#define KEXEC_CRASH_HP_NONE			0
+#define KEXEC_CRASH_HP_ADD_CPU			1
+#define KEXEC_CRASH_HP_REMOVE_CPU		2
+#define KEXEC_CRASH_HP_ADD_MEMORY		3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY		4
+#define KEXEC_CRASH_HP_INVALID_CPU		-1U
+
+struct kimage;
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
 #include <linux/compat.h>
 #include <linux/ioport.h>
 #include <linux/module.h>
+#include <linux/highmem.h>
 #include <asm/kexec.h>
 
 /* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
 	struct purgatory_info purgatory_info;
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+	int hp_action;
+	int elfcorehdr_index;
+	bool elfcorehdr_updated;
+#endif
+
 #ifdef CONFIG_IMA_KEXEC
 	/* Virtual address of IMA measurement buffer for kexec syscall */
 	void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g
 static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { }
 #endif
 
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index b7c30b748a16..ef6e91daad56 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -11,6 +11,8 @@
 #include <linux/vmalloc.h>
 #include <linux/sizes.h>
 #include <linux/kexec.h>
+#include <linux/memory.h>
+#include <linux/cpuhotplug.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -18,6 +20,7 @@
 #include <crypto/sha1.h>
 
 #include "kallsyms_internal.h"
+#include "kexec_internal.h"
 
 /* vmcoreinfo stuff */
 unsigned char *vmcoreinfo_data;
@@ -697,3 +700,142 @@ static int __init crash_save_vmcoreinfo_init(void)
 }
 
 subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_CRASH_HOTPLUG
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+/*
+ * To accurately reflect hot un/plug changes of cpu and memory resources
+ * (including onling and offlining of those resources), the elfcorehdr
+ * (which is passed to the crash kernel via the elfcorehdr= parameter)
+ * must be updated with the new list of CPUs and memories.
+ *
+ * In order to make changes to elfcorehdr, two conditions are needed:
+ * First, the segment containing the elfcorehdr must be large enough
+ * to permit a growing number of resources; the elfcorehdr memory size
+ * is based on NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES.
+ * Second, purgatory must explicitly exclude the elfcorehdr from the
+ * list of segments it checks (since the elfcorehdr changes and thus
+ * would require an update to purgatory itself to update the digest).
+ */
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+{
+	struct kimage *image;
+
+	/* Obtain lock while changing crash information */
+	if (!kexec_trylock()) {
+		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
+		return;
+	}
+
+	/* Check kdump is not loaded */
+	if (!kexec_crash_image)
+		goto out;
+
+	image = kexec_crash_image;
+
+	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
+		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
+	else
+		pr_debug("hp_action %u\n", hp_action);
+
+	/*
+	 * When the struct kimage is allocated, the elfcorehdr_index
+	 * is set to -1. Find the segment containing the elfcorehdr,
+	 * if not already found. This works for both the kexec_load
+	 * and kexec_file_load paths.
+	 */
+	if (image->elfcorehdr_index < 0) {
+		unsigned long mem;
+		unsigned char *ptr;
+		unsigned int n;
+
+		for (n = 0; n < image->nr_segments; n++) {
+			mem = image->segment[n].mem;
+			ptr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+			if (ptr) {
+				/* The segment containing elfcorehdr */
+				if (memcmp(ptr, ELFMAG, SELFMAG) == 0)
+					image->elfcorehdr_index = (int)n;
+				kunmap_local(ptr);
+			}
+		}
+	}
+
+	if (image->elfcorehdr_index < 0) {
+		pr_err("unable to locate elfcorehdr segment");
+		goto out;
+	}
+
+	/* Needed in order for the segments to be updated */
+	arch_kexec_unprotect_crashkres();
+
+	/* Differentiate between normal load and hotplug update */
+	image->hp_action = hp_action;
+
+	/* Now invoke arch-specific update handler */
+	arch_crash_handle_hotplug_event(image);
+
+	/* No longer handling a hotplug event */
+	image->hp_action = KEXEC_CRASH_HP_NONE;
+	image->elfcorehdr_updated = true;
+
+	/* Change back to read-only */
+	arch_kexec_protect_crashkres();
+
+out:
+	/* Release lock now that update complete */
+	kexec_unlock();
+}
+
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
+{
+	switch (val) {
+	case MEM_ONLINE:
+		crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
+			KEXEC_CRASH_HP_INVALID_CPU);
+		break;
+
+	case MEM_OFFLINE:
+		crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
+			KEXEC_CRASH_HP_INVALID_CPU);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block crash_memhp_nb = {
+	.notifier_call = crash_memhp_notifier,
+	.priority = 0
+};
+
+static int crash_cpuhp_online(unsigned int cpu)
+{
+	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
+	return 0;
+}
+
+static int crash_cpuhp_offline(unsigned int cpu)
+{
+	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+	return 0;
+}
+
+static int __init crash_hotplug_init(void)
+{
+	int result = 0;
+
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		register_memory_notifier(&crash_memhp_nb);
+
+	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
+		result = cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN,
+			"crash/cpuhp", crash_cpuhp_online, crash_cpuhp_offline);
+	}
+
+	return result;
+}
+
+subsys_initcall(crash_hotplug_init);
+#endif
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 3d578c6fefee..8296d019737c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -277,6 +277,12 @@ struct kimage *do_kimage_alloc_init(void)
 	/* Initialize the list of unusable pages */
 	INIT_LIST_HEAD(&image->unusable_pages);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+	image->hp_action = KEXEC_CRASH_HP_NONE;
+	image->elfcorehdr_index = -1;
+	image->elfcorehdr_updated = false;
+#endif
+
 	return image;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 2/8] crash: add generic infrastructure for crash hotplug support
@ 2023-05-03 22:41   ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, this is the last state in the PREPARE group, just
prior to the STARTING group, which is very close to the CPU
starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see the discussion in the patch
'crash: change crash_prepare_elf64_headers() to
for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 include/linux/crash_core.h |   9 +++
 include/linux/kexec.h      |  11 +++
 kernel/crash_core.c        | 142 +++++++++++++++++++++++++++++++++++++
 kernel/kexec_core.c        |   6 ++
 4 files changed, 168 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
 
+#define KEXEC_CRASH_HP_NONE			0
+#define KEXEC_CRASH_HP_ADD_CPU			1
+#define KEXEC_CRASH_HP_REMOVE_CPU		2
+#define KEXEC_CRASH_HP_ADD_MEMORY		3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY		4
+#define KEXEC_CRASH_HP_INVALID_CPU		-1U
+
+struct kimage;
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
 #include <linux/compat.h>
 #include <linux/ioport.h>
 #include <linux/module.h>
+#include <linux/highmem.h>
 #include <asm/kexec.h>
 
 /* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
 	struct purgatory_info purgatory_info;
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+	int hp_action;
+	int elfcorehdr_index;
+	bool elfcorehdr_updated;
+#endif
+
 #ifdef CONFIG_IMA_KEXEC
 	/* Virtual address of IMA measurement buffer for kexec syscall */
 	void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g
 static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { }
 #endif
 
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index b7c30b748a16..ef6e91daad56 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -11,6 +11,8 @@
 #include <linux/vmalloc.h>
 #include <linux/sizes.h>
 #include <linux/kexec.h>
+#include <linux/memory.h>
+#include <linux/cpuhotplug.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -18,6 +20,7 @@
 #include <crypto/sha1.h>
 
 #include "kallsyms_internal.h"
+#include "kexec_internal.h"
 
 /* vmcoreinfo stuff */
 unsigned char *vmcoreinfo_data;
@@ -697,3 +700,142 @@ static int __init crash_save_vmcoreinfo_init(void)
 }
 
 subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_CRASH_HOTPLUG
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+/*
+ * To accurately reflect hot un/plug changes of cpu and memory resources
+ * (including onling and offlining of those resources), the elfcorehdr
+ * (which is passed to the crash kernel via the elfcorehdr= parameter)
+ * must be updated with the new list of CPUs and memories.
+ *
+ * In order to make changes to elfcorehdr, two conditions are needed:
+ * First, the segment containing the elfcorehdr must be large enough
+ * to permit a growing number of resources; the elfcorehdr memory size
+ * is based on NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES.
+ * Second, purgatory must explicitly exclude the elfcorehdr from the
+ * list of segments it checks (since the elfcorehdr changes and thus
+ * would require an update to purgatory itself to update the digest).
+ */
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+{
+	struct kimage *image;
+
+	/* Obtain lock while changing crash information */
+	if (!kexec_trylock()) {
+		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
+		return;
+	}
+
+	/* Check kdump is not loaded */
+	if (!kexec_crash_image)
+		goto out;
+
+	image = kexec_crash_image;
+
+	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
+		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
+	else
+		pr_debug("hp_action %u\n", hp_action);
+
+	/*
+	 * When the struct kimage is allocated, the elfcorehdr_index
+	 * is set to -1. Find the segment containing the elfcorehdr,
+	 * if not already found. This works for both the kexec_load
+	 * and kexec_file_load paths.
+	 */
+	if (image->elfcorehdr_index < 0) {
+		unsigned long mem;
+		unsigned char *ptr;
+		unsigned int n;
+
+		for (n = 0; n < image->nr_segments; n++) {
+			mem = image->segment[n].mem;
+			ptr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+			if (ptr) {
+				/* The segment containing elfcorehdr */
+				if (memcmp(ptr, ELFMAG, SELFMAG) == 0)
+					image->elfcorehdr_index = (int)n;
+				kunmap_local(ptr);
+			}
+		}
+	}
+
+	if (image->elfcorehdr_index < 0) {
+		pr_err("unable to locate elfcorehdr segment");
+		goto out;
+	}
+
+	/* Needed in order for the segments to be updated */
+	arch_kexec_unprotect_crashkres();
+
+	/* Differentiate between normal load and hotplug update */
+	image->hp_action = hp_action;
+
+	/* Now invoke arch-specific update handler */
+	arch_crash_handle_hotplug_event(image);
+
+	/* No longer handling a hotplug event */
+	image->hp_action = KEXEC_CRASH_HP_NONE;
+	image->elfcorehdr_updated = true;
+
+	/* Change back to read-only */
+	arch_kexec_protect_crashkres();
+
+out:
+	/* Release lock now that update complete */
+	kexec_unlock();
+}
+
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
+{
+	switch (val) {
+	case MEM_ONLINE:
+		crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
+			KEXEC_CRASH_HP_INVALID_CPU);
+		break;
+
+	case MEM_OFFLINE:
+		crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
+			KEXEC_CRASH_HP_INVALID_CPU);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block crash_memhp_nb = {
+	.notifier_call = crash_memhp_notifier,
+	.priority = 0
+};
+
+static int crash_cpuhp_online(unsigned int cpu)
+{
+	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
+	return 0;
+}
+
+static int crash_cpuhp_offline(unsigned int cpu)
+{
+	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+	return 0;
+}
+
+static int __init crash_hotplug_init(void)
+{
+	int result = 0;
+
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		register_memory_notifier(&crash_memhp_nb);
+
+	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
+		result = cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN,
+			"crash/cpuhp", crash_cpuhp_online, crash_cpuhp_offline);
+	}
+
+	return result;
+}
+
+subsys_initcall(crash_hotplug_init);
+#endif
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 3d578c6fefee..8296d019737c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -277,6 +277,12 @@ struct kimage *do_kimage_alloc_init(void)
 	/* Initialize the list of unusable pages */
 	INIT_LIST_HEAD(&image->unusable_pages);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+	image->hp_action = KEXEC_CRASH_HP_NONE;
+	image->elfcorehdr_index = -1;
+	image->elfcorehdr_updated = false;
+#endif
+
 	return image;
 }
 
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 3/8] kexec: exclude elfcorehdr from the segment digest
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-03 22:41   ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. This digest is embedded into
the purgatory image prior to placing purgatory in memory.

This patchset updates the elfcorehdr on CPU or memory changes.
However, changes to the elfcorehdr in turn cause purgatory
integrity checking to fail (at crash time, and no vmcore created).
Therefore, this patch explicitly excludes the elfcorehdr segment
from the list of segments used to create the digest. By doing so,
this permits updates to the elfcorehdr in response to CPU or memory
changes, and avoids the need to also recompute the hash digest and
reload purgatory.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 kernel/kexec_file.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f8b1797b3ec9..1d2cfc869a75 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage *image)
 	for (j = i = 0; i < image->nr_segments; i++) {
 		struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+		/* Exclude elfcorehdr segment to allow future changes via hotplug */
+		if (j == image->elfcorehdr_index)
+			continue;
+#endif
+
 		ksegment = &image->segment[i];
 		/*
 		 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 3/8] kexec: exclude elfcorehdr from the segment digest
@ 2023-05-03 22:41   ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. This digest is embedded into
the purgatory image prior to placing purgatory in memory.

This patchset updates the elfcorehdr on CPU or memory changes.
However, changes to the elfcorehdr in turn cause purgatory
integrity checking to fail (at crash time, and no vmcore created).
Therefore, this patch explicitly excludes the elfcorehdr segment
from the list of segments used to create the digest. By doing so,
this permits updates to the elfcorehdr in response to CPU or memory
changes, and avoids the need to also recompute the hash digest and
reload purgatory.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 kernel/kexec_file.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f8b1797b3ec9..1d2cfc869a75 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage *image)
 	for (j = i = 0; i < image->nr_segments; i++) {
 		struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+		/* Exclude elfcorehdr segment to allow future changes via hotplug */
+		if (j == image->elfcorehdr_index)
+			continue;
+#endif
+
 		ksegment = &image->segment[i];
 		/*
 		 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 4/8] crash: memory and CPU hotplug sysfs attributes
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-03 22:41   ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

This introduces the crash_hotplug attribute for memory and CPUs
for use by userspace.  This change directly facilitates the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, this changeset introduces the crash_hotplug attribute
to the /sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
    KERNEL=="memory81"
    SUBSYSTEM=="memory"
    DRIVER==""
    ATTR{online}=="1"
    ATTR{phys_device}=="0"
    ATTR{phys_index}=="00000051"
    ATTR{removable}=="1"
    ATTR{state}=="online"
    ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
    KERNELS=="memory"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{auto_online_blocks}=="offline"
    ATTRS{block_size_bytes}=="8000000"
    ATTRS{crash_hotplug}=="1"

For CPUs, this changeset introduces the crash_hotplug attribute
to the /sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
    KERNEL=="cpu0"
    SUBSYSTEM=="cpu"
    DRIVER=="processor"
    ATTR{crash_notes}=="277c38600"
    ATTR{crash_notes_size}=="368"
    ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
    KERNELS=="cpu"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{crash_hotplug}=="1"
    ATTRS{isolated}==""
    ATTRS{kernel_max}=="8191"
    ATTRS{nohz_full}=="  (null)"
    ATTRS{offline}=="4-7"
    ATTRS{online}=="0-3"
    ATTRS{possible}=="0-7"
    ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above change
tests if crash_hotplug is set, and if so, it skips the userspace
initiated unload-then-reload of the crash kernel.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule will skip
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 .../admin-guide/mm/memory-hotplug.rst          |  8 ++++++++
 Documentation/core-api/cpu_hotplug.rst         | 18 ++++++++++++++++++
 drivers/base/cpu.c                             | 14 ++++++++++++++
 drivers/base/memory.c                          | 13 +++++++++++++
 include/linux/kexec.h                          |  8 ++++++++
 5 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
 		       Availability depends on the CONFIG_ARCH_MEMORY_PROBE
 		       kernel configuration option.
 ``uevent``	       read-write: generic udev file for device subsystems.
+``crash_hotplug``      read-only: when changes to the system memory map
+		       occur due to hot un/plug of memory, this file contains
+		       '1' if the kernel updates the kdump capture kernel memory
+		       map itself (via elfcorehdr), or '0' if userspace must update
+		       the kdump capture kernel memory map.
+
+		       Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+		       configuration option.
 ====================== =========================================================
 
 .. note::
diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst
index f75778d37488..0c8dc3fe5f94 100644
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -750,6 +750,24 @@ will receive all events. A script like::
 
 can process the event further.
 
+When changes to the CPUs in the system occur, the sysfs file
+/sys/devices/system/cpu/crash_hotplug contains '1' if the kernel
+updates the kdump capture kernel list of CPUs itself (via elfcorehdr),
+or '0' if userspace must update the kdump capture kernel list of CPUs.
+
+The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration
+option.
+
+To skip userspace processing of CPU hot un/plug events for kdump
+(ie the unload-then-reload to obtain a current list of CPUs), this sysfs
+file can be used in a udev rule as follows:
+
+ SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
+
+For a cpu hot un/plug event, if the architecture supports kernel updates
+of the elfcorehdr (which contains the list of CPUs), then the rule skips
+the unload-then-reload of the kdump capture kernel.
+
 Kernel Inline Documentations Reference
 ======================================
 
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index c1815b9dae68..06a0c22b37b8 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -282,6 +282,17 @@ static ssize_t print_cpus_nohz_full(struct device *dev,
 static DEVICE_ATTR(nohz_full, 0444, print_cpus_nohz_full, NULL);
 #endif
 
+#ifdef CONFIG_HOTPLUG_CPU
+#include <linux/kexec.h>
+static ssize_t crash_hotplug_show(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	return sprintf(buf, "%d\n", crash_hotplug_cpu_support());
+}
+static DEVICE_ATTR_ADMIN_RO(crash_hotplug);
+#endif
+
 static void cpu_device_release(struct device *dev)
 {
 	/*
@@ -469,6 +480,9 @@ static struct attribute *cpu_root_attrs[] = {
 #ifdef CONFIG_NO_HZ_FULL
 	&dev_attr_nohz_full.attr,
 #endif
+#ifdef CONFIG_HOTPLUG_CPU
+	&dev_attr_crash_hotplug.attr,
+#endif
 #ifdef CONFIG_GENERIC_CPU_AUTOPROBE
 	&dev_attr_modalias.attr,
 #endif
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..24b8ef4c830c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -490,6 +490,16 @@ static ssize_t auto_online_blocks_store(struct device *dev,
 
 static DEVICE_ATTR_RW(auto_online_blocks);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+#include <linux/kexec.h>
+static ssize_t crash_hotplug_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", crash_hotplug_memory_support());
+}
+static DEVICE_ATTR_RO(crash_hotplug);
+#endif
+
 /*
  * Some architectures will have custom drivers to do this, and
  * will not need to do it from userspace.  The fake hot-add code
@@ -889,6 +899,9 @@ static struct attribute *memory_root_attrs[] = {
 
 	&dev_attr_block_size_bytes.attr,
 	&dev_attr_auto_online_blocks.attr,
+#ifdef CONFIG_MEMORY_HOTPLUG
+	&dev_attr_crash_hotplug.attr,
+#endif
 	NULL
 };
 
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index b9903dd48e24..6a8a724ac638 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -501,6 +501,14 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
 static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
 #endif
 
+#ifndef crash_hotplug_cpu_support
+static inline int crash_hotplug_cpu_support(void) { return 0; }
+#endif
+
+#ifndef crash_hotplug_memory_support
+static inline int crash_hotplug_memory_support(void) { return 0; }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 4/8] crash: memory and CPU hotplug sysfs attributes
@ 2023-05-03 22:41   ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

This introduces the crash_hotplug attribute for memory and CPUs
for use by userspace.  This change directly facilitates the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, this changeset introduces the crash_hotplug attribute
to the /sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
    KERNEL=="memory81"
    SUBSYSTEM=="memory"
    DRIVER==""
    ATTR{online}=="1"
    ATTR{phys_device}=="0"
    ATTR{phys_index}=="00000051"
    ATTR{removable}=="1"
    ATTR{state}=="online"
    ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
    KERNELS=="memory"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{auto_online_blocks}=="offline"
    ATTRS{block_size_bytes}=="8000000"
    ATTRS{crash_hotplug}=="1"

For CPUs, this changeset introduces the crash_hotplug attribute
to the /sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
    KERNEL=="cpu0"
    SUBSYSTEM=="cpu"
    DRIVER=="processor"
    ATTR{crash_notes}=="277c38600"
    ATTR{crash_notes_size}=="368"
    ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
    KERNELS=="cpu"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{crash_hotplug}=="1"
    ATTRS{isolated}==""
    ATTRS{kernel_max}=="8191"
    ATTRS{nohz_full}=="  (null)"
    ATTRS{offline}=="4-7"
    ATTRS{online}=="0-3"
    ATTRS{possible}=="0-7"
    ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above change
tests if crash_hotplug is set, and if so, it skips the userspace
initiated unload-then-reload of the crash kernel.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule will skip
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 .../admin-guide/mm/memory-hotplug.rst          |  8 ++++++++
 Documentation/core-api/cpu_hotplug.rst         | 18 ++++++++++++++++++
 drivers/base/cpu.c                             | 14 ++++++++++++++
 drivers/base/memory.c                          | 13 +++++++++++++
 include/linux/kexec.h                          |  8 ++++++++
 5 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
 		       Availability depends on the CONFIG_ARCH_MEMORY_PROBE
 		       kernel configuration option.
 ``uevent``	       read-write: generic udev file for device subsystems.
+``crash_hotplug``      read-only: when changes to the system memory map
+		       occur due to hot un/plug of memory, this file contains
+		       '1' if the kernel updates the kdump capture kernel memory
+		       map itself (via elfcorehdr), or '0' if userspace must update
+		       the kdump capture kernel memory map.
+
+		       Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+		       configuration option.
 ====================== =========================================================
 
 .. note::
diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst
index f75778d37488..0c8dc3fe5f94 100644
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -750,6 +750,24 @@ will receive all events. A script like::
 
 can process the event further.
 
+When changes to the CPUs in the system occur, the sysfs file
+/sys/devices/system/cpu/crash_hotplug contains '1' if the kernel
+updates the kdump capture kernel list of CPUs itself (via elfcorehdr),
+or '0' if userspace must update the kdump capture kernel list of CPUs.
+
+The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration
+option.
+
+To skip userspace processing of CPU hot un/plug events for kdump
+(ie the unload-then-reload to obtain a current list of CPUs), this sysfs
+file can be used in a udev rule as follows:
+
+ SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
+
+For a cpu hot un/plug event, if the architecture supports kernel updates
+of the elfcorehdr (which contains the list of CPUs), then the rule skips
+the unload-then-reload of the kdump capture kernel.
+
 Kernel Inline Documentations Reference
 ======================================
 
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index c1815b9dae68..06a0c22b37b8 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -282,6 +282,17 @@ static ssize_t print_cpus_nohz_full(struct device *dev,
 static DEVICE_ATTR(nohz_full, 0444, print_cpus_nohz_full, NULL);
 #endif
 
+#ifdef CONFIG_HOTPLUG_CPU
+#include <linux/kexec.h>
+static ssize_t crash_hotplug_show(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	return sprintf(buf, "%d\n", crash_hotplug_cpu_support());
+}
+static DEVICE_ATTR_ADMIN_RO(crash_hotplug);
+#endif
+
 static void cpu_device_release(struct device *dev)
 {
 	/*
@@ -469,6 +480,9 @@ static struct attribute *cpu_root_attrs[] = {
 #ifdef CONFIG_NO_HZ_FULL
 	&dev_attr_nohz_full.attr,
 #endif
+#ifdef CONFIG_HOTPLUG_CPU
+	&dev_attr_crash_hotplug.attr,
+#endif
 #ifdef CONFIG_GENERIC_CPU_AUTOPROBE
 	&dev_attr_modalias.attr,
 #endif
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..24b8ef4c830c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -490,6 +490,16 @@ static ssize_t auto_online_blocks_store(struct device *dev,
 
 static DEVICE_ATTR_RW(auto_online_blocks);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+#include <linux/kexec.h>
+static ssize_t crash_hotplug_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", crash_hotplug_memory_support());
+}
+static DEVICE_ATTR_RO(crash_hotplug);
+#endif
+
 /*
  * Some architectures will have custom drivers to do this, and
  * will not need to do it from userspace.  The fake hot-add code
@@ -889,6 +899,9 @@ static struct attribute *memory_root_attrs[] = {
 
 	&dev_attr_block_size_bytes.attr,
 	&dev_attr_auto_online_blocks.attr,
+#ifdef CONFIG_MEMORY_HOTPLUG
+	&dev_attr_crash_hotplug.attr,
+#endif
 	NULL
 };
 
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index b9903dd48e24..6a8a724ac638 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -501,6 +501,14 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
 static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
 #endif
 
+#ifndef crash_hotplug_cpu_support
+static inline int crash_hotplug_cpu_support(void) { return 0; }
+#endif
+
+#ifndef crash_hotplug_memory_support
+static inline int crash_hotplug_memory_support(void) { return 0; }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 5/8] x86/crash: add x86 crash hotplug support
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-03 22:41   ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

The segment containing the elfcorehdr is identified at run-time
in crash_core:crash_handle_hotplug_event(), which works for both
the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
is generated from the available CPUs and memory into a buffer,
and then installed over the top of the existing elfcorehdr.

In the patch 'kexec: exclude elfcorehdr from the segment digest'
the need to update purgatory due to the change in elfcorehdr was
eliminated.  As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.

To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
only on the kexec_file_load() syscall; for kexec_load() userspace
will need to size the segment similarly.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, and with CONFIG_CRASH_HOTPLUG
enabled, it is necessary to move prepare_elf_headers() and
dependents outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/Kconfig             |  13 ++++
 arch/x86/include/asm/kexec.h |  15 +++++
 arch/x86/kernel/crash.c      | 119 ++++++++++++++++++++++++++++++++---
 3 files changed, 140 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 53bab123a8ee..80538524c494 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2119,6 +2119,19 @@ config CRASH_DUMP
 	  (CONFIG_RELOCATABLE=y).
 	  For more details see Documentation/admin-guide/kdump/kdump.rst
 
+config CRASH_HOTPLUG
+	bool "Update the crash elfcorehdr on system configuration changes"
+	default y
+	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+	help
+	  Enable direct update to the crash elfcorehdr (which contains
+	  the list of CPUs and memory regions to be dumped upon a crash)
+	  in response to hot plug/unplug or online/offline of CPUs or
+	  memory. This is a much more advanced approach than userspace
+	  attempting that.
+
+	  If unsure, say Y.
+
 config KEXEC_JUMP
 	bool "kexec jump"
 	depends on KEXEC && HIBERNATION
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..0c9d496cf7ce 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -41,6 +41,21 @@
 #include <asm/crash.h>
 #include <asm/cmdline.h>
 
+/*
+ * For the kexec_file_load() syscall path, specify the maximum number of
+ * memory regions that the elfcorehdr buffer/segment can accommodate.
+ * These regions are obtained via walk_system_ram_res(); eg. the
+ * 'System RAM' entries in /proc/iomem.
+ * This value is combined with NR_CPUS_DEFAULT and multiplied by
+ * sizeof(Elf64_Phdr) to determine the final elfcorehdr memory buffer/
+ * segment size.
+ * The value 8192, for example, covers a (sparsely populated) 1TiB system
+ * consisting of 128MiB memblocks, while resulting in an elfcorehdr
+ * memory buffer/segment size under 1MiB. This represents a sane choice
+ * to accommodate both baremetal and virtual machine configurations.
+ */
+#define CRASH_MAX_MEMORY_RANGES 8192
+
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
 	struct boot_params *params;
@@ -158,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	crash_save_cpu(regs, safe_smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_FILE
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
 	unsigned int *nr_ranges = arg;
@@ -231,7 +244,7 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 
 /* Prepare elf headers. Return addr and size */
 static int prepare_elf_headers(struct kimage *image, void **addr,
-					unsigned long *sz)
+					unsigned long *sz, unsigned long *nr_mem_ranges)
 {
 	struct crash_mem *cmem;
 	int ret;
@@ -249,6 +262,9 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
 	if (ret)
 		goto out;
 
+	/* Return the computed number of memory ranges, for hotplug usage */
+	*nr_mem_ranges = cmem->nr_ranges;
+
 	/* By default prepare 64bit headers */
 	ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);
 
@@ -257,6 +273,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
 	return ret;
 }
 
+#ifdef CONFIG_KEXEC_FILE
 static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
 {
 	unsigned int nr_e820_entries;
@@ -371,18 +388,42 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
 int crash_load_segments(struct kimage *image)
 {
 	int ret;
+	unsigned long pnum = 0;
 	struct kexec_buf kbuf = { .image = image, .buf_min = 0,
 				  .buf_max = ULONG_MAX, .top_down = false };
 
 	/* Prepare elf headers and add a segment */
-	ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz);
+	ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz, &pnum);
 	if (ret)
 		return ret;
 
-	image->elf_headers = kbuf.buffer;
-	image->elf_headers_sz = kbuf.bufsz;
+	image->elf_headers	= kbuf.buffer;
+	image->elf_headers_sz	= kbuf.bufsz;
+	kbuf.memsz		= kbuf.bufsz;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+	/*
+	 * Ensure the elfcorehdr segment large enough for hotplug changes.
+	 * Account for VMCOREINFO and kernel_map and maximum CPUs.
+	 */
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		pnum = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
+	else
+		pnum += 2 + CONFIG_NR_CPUS_DEFAULT;
+
+	if (pnum < (unsigned long)PN_XNUM) {
+		kbuf.memsz = pnum * sizeof(Elf64_Phdr);
+		kbuf.memsz += sizeof(Elf64_Ehdr);
+
+		image->elfcorehdr_index = image->nr_segments;
+
+		/* Mark as usable to crash kernel, else crash kernel fails on boot */
+		image->elf_headers_sz = kbuf.memsz;
+	} else {
+		pr_err("number of Phdrs %lu exceeds max\n", pnum);
+	}
+#endif
 
-	kbuf.memsz = kbuf.bufsz;
 	kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
 	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
 	ret = kexec_add_buffer(&kbuf);
@@ -395,3 +436,67 @@ int crash_load_segments(struct kimage *image)
 	return ret;
 }
 #endif /* CONFIG_KEXEC_FILE */
+
+#ifdef CONFIG_CRASH_HOTPLUG
+
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+
+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: the active struct kimage
+ *
+ * The new elfcorehdr is prepared in a kernel buffer, and then it is
+ * written on top of the existing/old elfcorehdr.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image)
+{
+	void *elfbuf = NULL, *old_elfcorehdr;
+	unsigned long nr_mem_ranges;
+	unsigned long mem, memsz;
+	unsigned long elfsz = 0;
+
+	/*
+	 * Create the new elfcorehdr reflecting the changes to CPU and/or
+	 * memory resources.
+	 */
+	if (prepare_elf_headers(image, &elfbuf, &elfsz, &nr_mem_ranges)) {
+		pr_err("unable to prepare elfcore headers");
+		goto out;
+	}
+
+	/*
+	 * Obtain address and size of the elfcorehdr segment, and
+	 * check it against the new elfcorehdr buffer.
+	 */
+	mem = image->segment[image->elfcorehdr_index].mem;
+	memsz = image->segment[image->elfcorehdr_index].memsz;
+	if (elfsz > memsz) {
+		pr_err("update elfcorehdr elfsz %lu > memsz %lu",
+			elfsz, memsz);
+		goto out;
+	}
+
+	/*
+	 * Copy new elfcorehdr over the old elfcorehdr at destination.
+	 */
+	old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+	if (!old_elfcorehdr) {
+		pr_err("updating elfcorehdr failed\n");
+		goto out;
+	}
+
+	/*
+	 * Temporarily invalidate the crash image while the
+	 * elfcorehdr is updated.
+	 */
+	xchg(&kexec_crash_image, NULL);
+	memcpy_flushcache(old_elfcorehdr, elfbuf, elfsz);
+	xchg(&kexec_crash_image, image);
+	kunmap_local(old_elfcorehdr);
+	pr_debug("updated elfcorehdr\n");
+
+out:
+	vfree(elfbuf);
+}
+#endif
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 5/8] x86/crash: add x86 crash hotplug support
@ 2023-05-03 22:41   ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

The segment containing the elfcorehdr is identified at run-time
in crash_core:crash_handle_hotplug_event(), which works for both
the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
is generated from the available CPUs and memory into a buffer,
and then installed over the top of the existing elfcorehdr.

In the patch 'kexec: exclude elfcorehdr from the segment digest'
the need to update purgatory due to the change in elfcorehdr was
eliminated.  As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.

To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
only on the kexec_file_load() syscall; for kexec_load() userspace
will need to size the segment similarly.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, and with CONFIG_CRASH_HOTPLUG
enabled, it is necessary to move prepare_elf_headers() and
dependents outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/Kconfig             |  13 ++++
 arch/x86/include/asm/kexec.h |  15 +++++
 arch/x86/kernel/crash.c      | 119 ++++++++++++++++++++++++++++++++---
 3 files changed, 140 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 53bab123a8ee..80538524c494 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2119,6 +2119,19 @@ config CRASH_DUMP
 	  (CONFIG_RELOCATABLE=y).
 	  For more details see Documentation/admin-guide/kdump/kdump.rst
 
+config CRASH_HOTPLUG
+	bool "Update the crash elfcorehdr on system configuration changes"
+	default y
+	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+	help
+	  Enable direct update to the crash elfcorehdr (which contains
+	  the list of CPUs and memory regions to be dumped upon a crash)
+	  in response to hot plug/unplug or online/offline of CPUs or
+	  memory. This is a much more advanced approach than userspace
+	  attempting that.
+
+	  If unsure, say Y.
+
 config KEXEC_JUMP
 	bool "kexec jump"
 	depends on KEXEC && HIBERNATION
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..0c9d496cf7ce 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -41,6 +41,21 @@
 #include <asm/crash.h>
 #include <asm/cmdline.h>
 
+/*
+ * For the kexec_file_load() syscall path, specify the maximum number of
+ * memory regions that the elfcorehdr buffer/segment can accommodate.
+ * These regions are obtained via walk_system_ram_res(); eg. the
+ * 'System RAM' entries in /proc/iomem.
+ * This value is combined with NR_CPUS_DEFAULT and multiplied by
+ * sizeof(Elf64_Phdr) to determine the final elfcorehdr memory buffer/
+ * segment size.
+ * The value 8192, for example, covers a (sparsely populated) 1TiB system
+ * consisting of 128MiB memblocks, while resulting in an elfcorehdr
+ * memory buffer/segment size under 1MiB. This represents a sane choice
+ * to accommodate both baremetal and virtual machine configurations.
+ */
+#define CRASH_MAX_MEMORY_RANGES 8192
+
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
 	struct boot_params *params;
@@ -158,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	crash_save_cpu(regs, safe_smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_FILE
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
 	unsigned int *nr_ranges = arg;
@@ -231,7 +244,7 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 
 /* Prepare elf headers. Return addr and size */
 static int prepare_elf_headers(struct kimage *image, void **addr,
-					unsigned long *sz)
+					unsigned long *sz, unsigned long *nr_mem_ranges)
 {
 	struct crash_mem *cmem;
 	int ret;
@@ -249,6 +262,9 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
 	if (ret)
 		goto out;
 
+	/* Return the computed number of memory ranges, for hotplug usage */
+	*nr_mem_ranges = cmem->nr_ranges;
+
 	/* By default prepare 64bit headers */
 	ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);
 
@@ -257,6 +273,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
 	return ret;
 }
 
+#ifdef CONFIG_KEXEC_FILE
 static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
 {
 	unsigned int nr_e820_entries;
@@ -371,18 +388,42 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
 int crash_load_segments(struct kimage *image)
 {
 	int ret;
+	unsigned long pnum = 0;
 	struct kexec_buf kbuf = { .image = image, .buf_min = 0,
 				  .buf_max = ULONG_MAX, .top_down = false };
 
 	/* Prepare elf headers and add a segment */
-	ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz);
+	ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz, &pnum);
 	if (ret)
 		return ret;
 
-	image->elf_headers = kbuf.buffer;
-	image->elf_headers_sz = kbuf.bufsz;
+	image->elf_headers	= kbuf.buffer;
+	image->elf_headers_sz	= kbuf.bufsz;
+	kbuf.memsz		= kbuf.bufsz;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+	/*
+	 * Ensure the elfcorehdr segment large enough for hotplug changes.
+	 * Account for VMCOREINFO and kernel_map and maximum CPUs.
+	 */
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		pnum = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
+	else
+		pnum += 2 + CONFIG_NR_CPUS_DEFAULT;
+
+	if (pnum < (unsigned long)PN_XNUM) {
+		kbuf.memsz = pnum * sizeof(Elf64_Phdr);
+		kbuf.memsz += sizeof(Elf64_Ehdr);
+
+		image->elfcorehdr_index = image->nr_segments;
+
+		/* Mark as usable to crash kernel, else crash kernel fails on boot */
+		image->elf_headers_sz = kbuf.memsz;
+	} else {
+		pr_err("number of Phdrs %lu exceeds max\n", pnum);
+	}
+#endif
 
-	kbuf.memsz = kbuf.bufsz;
 	kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
 	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
 	ret = kexec_add_buffer(&kbuf);
@@ -395,3 +436,67 @@ int crash_load_segments(struct kimage *image)
 	return ret;
 }
 #endif /* CONFIG_KEXEC_FILE */
+
+#ifdef CONFIG_CRASH_HOTPLUG
+
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+
+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: the active struct kimage
+ *
+ * The new elfcorehdr is prepared in a kernel buffer, and then it is
+ * written on top of the existing/old elfcorehdr.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image)
+{
+	void *elfbuf = NULL, *old_elfcorehdr;
+	unsigned long nr_mem_ranges;
+	unsigned long mem, memsz;
+	unsigned long elfsz = 0;
+
+	/*
+	 * Create the new elfcorehdr reflecting the changes to CPU and/or
+	 * memory resources.
+	 */
+	if (prepare_elf_headers(image, &elfbuf, &elfsz, &nr_mem_ranges)) {
+		pr_err("unable to prepare elfcore headers");
+		goto out;
+	}
+
+	/*
+	 * Obtain address and size of the elfcorehdr segment, and
+	 * check it against the new elfcorehdr buffer.
+	 */
+	mem = image->segment[image->elfcorehdr_index].mem;
+	memsz = image->segment[image->elfcorehdr_index].memsz;
+	if (elfsz > memsz) {
+		pr_err("update elfcorehdr elfsz %lu > memsz %lu",
+			elfsz, memsz);
+		goto out;
+	}
+
+	/*
+	 * Copy new elfcorehdr over the old elfcorehdr at destination.
+	 */
+	old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+	if (!old_elfcorehdr) {
+		pr_err("updating elfcorehdr failed\n");
+		goto out;
+	}
+
+	/*
+	 * Temporarily invalidate the crash image while the
+	 * elfcorehdr is updated.
+	 */
+	xchg(&kexec_crash_image, NULL);
+	memcpy_flushcache(old_elfcorehdr, elfbuf, elfsz);
+	xchg(&kexec_crash_image, image);
+	kunmap_local(old_elfcorehdr);
+	pr_debug("updated elfcorehdr\n");
+
+out:
+	vfree(elfbuf);
+}
+#endif
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 6/8] crash: hotplug support for kexec_load()
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-03 22:41   ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

The hotplug support for kexec_load() requires coordination with
userspace, and therefore a little extra help from the kernel to
facilitate the coordination.

In the absence of the solution contained within this particular
patch, if a kdump capture kernel is loaded via kexec_load() syscall,
then the crash hotplug logic would find the segment containing the
elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
generally speaking that is the desired behavior and outcome, a
problem arises from the fact that if the kdump image includes a
purgatory that performs a digest checksum, then that check would
fail (because the elfcorehdr was changed), and the capture kernel
would fail to boot and no kdump occur.

Therefore, what is needed is for the userspace kexec-tools to
indicate to the kernel whether or not the supplied kdump image/
elfcorehdr can be modified (because the kexec-tools excludes the
elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
appropriately).

To solve these problems, this patch introduces:
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
   safe for the kernel to modify the elfcorehdr (because kexec-tools
   has excluded the elfcorehdr from the digest).
 - the /sys/kernel/crash_elfcorehdr_size node to communicate to
   kexec-tools what the preferred size of the elfcorehdr memory buffer
   should be in order to accommodate hotplug changes.
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
   that they examine kexec_file_load() vs kexec_load(), and when
   kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
   This is critical so that the udev rule processing of crash_hotplug
   indicates correctly (ie. the userspace unload-then-load of the
   kdump of the kdump image can be skipped, or not).

With this patch in place, I believe the following statements to be true
(with local testing to verify):

 - For systems which have these kernel changes in place, but not the
   corresponding changes to the crash hot plug udev rules and
   kexec-tools, (ie "older" systems) those systems will continue to
   unload-then-load the kdump image, as has always been done. The
   kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
 - For systems which have these kernel changes in place and the proposed
   udev rule changes in place, but not the kexec-tools changes in place:
    - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
      so the unload-then-reload of kdump image will occur (the sysfs
      crash_hotplug nodes will show 0).
    - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
      to show 1, and the kernel will modify the elfcorehdr directly. And
      with the udev changes in place, the unload-then-load will not occur!
 - For systems which have these kernel changes as well as the udev and
   kexec-tools changes in place, then the user/admin has full authority
   over the enablement and support of crash hotplug support, whether via
   kexec_file_load() or kexec_load().

Said differently, as kexec_load() was/is widely in use, these changes
permit it to continue to be used as-is (retaining the current unload-then-
reload behavior) until such time as the udev and kexec-tools changes can
be rolled out as well.

I've intentionally kept the changes related to userspace coordination
for kexec_load() separate as this need was identified late; the
rest of this series has been generally reviewed and accepted. Once
this support has been vetted, I can refactor if needed.

Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 arch/x86/include/asm/kexec.h | 11 +++++++----
 arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
 include/linux/kexec.h        | 14 ++++++++++++--
 include/uapi/linux/kexec.h   |  1 +
 kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
 kernel/kexec.c               |  3 +++
 kernel/ksysfs.c              | 15 +++++++++++++++
 7 files changed, 96 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 0c9d496cf7ce..8064e65de6c0 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
 
+/* These functions provide the value for the sysfs crash_hotplug nodes */
+#ifdef CONFIG_HOTPLUG_CPU
+int arch_crash_hotplug_cpu_support(void)
+{
+	return crash_check_update_elfcorehdr();
+}
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int arch_crash_hotplug_memory_support(void)
+{
+	return crash_check_update_elfcorehdr();
+}
+#endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void)
+{
+	unsigned int sz;
+
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
+	else
+		sz += 2 + CONFIG_NR_CPUS_DEFAULT;
+	sz *= sizeof(Elf64_Phdr);
+	return sz;
+}
+
 /**
  * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
  * @image: the active struct kimage
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 6a8a724ac638..050e20066cdb 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -335,6 +335,10 @@ struct kimage {
 	unsigned int preserve_context : 1;
 	/* If set, we are using file mode kexec syscall */
 	unsigned int file_mode:1;
+#ifdef CONFIG_CRASH_HOTPLUG
+	/* If set, allow changes to elfcorehdr of kexec_load'd image */
+	unsigned int update_elfcorehdr:1;
+#endif
 
 #ifdef ARCH_HAS_KIMAGE_ARCH
 	struct kimage_arch arch;
@@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
 
 /* List of defined/legal kexec flags */
 #ifndef CONFIG_KEXEC_JUMP
-#define KEXEC_FLAGS    KEXEC_ON_CRASH
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
 #else
-#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
 #endif
 
 /* List of defined/legal kexec file flags */
@@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
 static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
 #endif
 
+int crash_check_update_elfcorehdr(void);
+
 #ifndef crash_hotplug_cpu_support
 static inline int crash_hotplug_cpu_support(void) { return 0; }
 #endif
@@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
 static inline int crash_hotplug_memory_support(void) { return 0; }
 #endif
 
+#ifndef crash_get_elfcorehdr_size
+static inline crash_get_elfcorehdr_size(void) { return 0; }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 981016e05cfa..01766dd839b0 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -12,6 +12,7 @@
 /* kexec flags for different usage scenarios */
 #define KEXEC_ON_CRASH		0x00000001
 #define KEXEC_PRESERVE_CONTEXT	0x00000002
+#define KEXEC_UPDATE_ELFCOREHDR	0x00000004
 #define KEXEC_ARCH_MASK		0xffff0000
 
 /*
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index ef6e91daad56..e05bfdb7eaed 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
 #ifdef CONFIG_CRASH_HOTPLUG
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
+
+/*
+ * This routine utilized when the crash_hotplug sysfs node is read.
+ * It reflects the kernel's ability/permission to update the crash
+ * elfcorehdr directly.
+ */
+int crash_check_update_elfcorehdr(void)
+{
+	int rc = 0;
+
+	/* Obtain lock while reading crash information */
+	if (!kexec_trylock()) {
+		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
+		return 0;
+	}
+	if (kexec_crash_image) {
+		if (kexec_crash_image->file_mode)
+			rc = 1;
+		else
+			rc = kexec_crash_image->update_elfcorehdr;
+	}
+	/* Release lock now that update complete */
+	kexec_unlock();
+
+	return rc;
+}
+
 /*
  * To accurately reflect hot un/plug changes of cpu and memory resources
  * (including onling and offlining of those resources), the elfcorehdr
@@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
 
 	image = kexec_crash_image;
 
+	/* Check that updating elfcorehdr is permitted */
+	if (!(image->file_mode || image->update_elfcorehdr))
+		goto out;
+
 	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
 		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
 		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 92d301f98776..60de64bd14b9 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
 	if (flags & KEXEC_PRESERVE_CONTEXT)
 		image->preserve_context = 1;
 
+	if (flags & KEXEC_UPDATE_ELFCOREHDR)
+		image->update_elfcorehdr = 1;
+
 	ret = machine_kexec_prepare(image);
 	if (ret)
 		goto out;
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index aad7a3bfd846..1d4bc493b2f4 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
 }
 KERNEL_ATTR_RO(vmcoreinfo);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
+			       struct kobj_attribute *attr, char *buf)
+{
+	unsigned int sz = crash_get_elfcorehdr_size();
+
+	return sysfs_emit(buf, "%u\n", sz);
+}
+KERNEL_ATTR_RO(crash_elfcorehdr_size);
+
+#endif
+
 #endif /* CONFIG_CRASH_CORE */
 
 /* whether file capabilities are enabled */
@@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
 #endif
 #ifdef CONFIG_CRASH_CORE
 	&vmcoreinfo_attr.attr,
+#ifdef CONFIG_CRASH_HOTPLUG
+	&crash_elfcorehdr_size_attr.attr,
+#endif
 #endif
 #ifndef CONFIG_TINY_RCU
 	&rcu_expedited_attr.attr,
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 6/8] crash: hotplug support for kexec_load()
@ 2023-05-03 22:41   ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

The hotplug support for kexec_load() requires coordination with
userspace, and therefore a little extra help from the kernel to
facilitate the coordination.

In the absence of the solution contained within this particular
patch, if a kdump capture kernel is loaded via kexec_load() syscall,
then the crash hotplug logic would find the segment containing the
elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
generally speaking that is the desired behavior and outcome, a
problem arises from the fact that if the kdump image includes a
purgatory that performs a digest checksum, then that check would
fail (because the elfcorehdr was changed), and the capture kernel
would fail to boot and no kdump occur.

Therefore, what is needed is for the userspace kexec-tools to
indicate to the kernel whether or not the supplied kdump image/
elfcorehdr can be modified (because the kexec-tools excludes the
elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
appropriately).

To solve these problems, this patch introduces:
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
   safe for the kernel to modify the elfcorehdr (because kexec-tools
   has excluded the elfcorehdr from the digest).
 - the /sys/kernel/crash_elfcorehdr_size node to communicate to
   kexec-tools what the preferred size of the elfcorehdr memory buffer
   should be in order to accommodate hotplug changes.
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
   that they examine kexec_file_load() vs kexec_load(), and when
   kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
   This is critical so that the udev rule processing of crash_hotplug
   indicates correctly (ie. the userspace unload-then-load of the
   kdump of the kdump image can be skipped, or not).

With this patch in place, I believe the following statements to be true
(with local testing to verify):

 - For systems which have these kernel changes in place, but not the
   corresponding changes to the crash hot plug udev rules and
   kexec-tools, (ie "older" systems) those systems will continue to
   unload-then-load the kdump image, as has always been done. The
   kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
 - For systems which have these kernel changes in place and the proposed
   udev rule changes in place, but not the kexec-tools changes in place:
    - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
      so the unload-then-reload of kdump image will occur (the sysfs
      crash_hotplug nodes will show 0).
    - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
      to show 1, and the kernel will modify the elfcorehdr directly. And
      with the udev changes in place, the unload-then-load will not occur!
 - For systems which have these kernel changes as well as the udev and
   kexec-tools changes in place, then the user/admin has full authority
   over the enablement and support of crash hotplug support, whether via
   kexec_file_load() or kexec_load().

Said differently, as kexec_load() was/is widely in use, these changes
permit it to continue to be used as-is (retaining the current unload-then-
reload behavior) until such time as the udev and kexec-tools changes can
be rolled out as well.

I've intentionally kept the changes related to userspace coordination
for kexec_load() separate as this need was identified late; the
rest of this series has been generally reviewed and accepted. Once
this support has been vetted, I can refactor if needed.

Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 arch/x86/include/asm/kexec.h | 11 +++++++----
 arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
 include/linux/kexec.h        | 14 ++++++++++++--
 include/uapi/linux/kexec.h   |  1 +
 kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
 kernel/kexec.c               |  3 +++
 kernel/ksysfs.c              | 15 +++++++++++++++
 7 files changed, 96 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 0c9d496cf7ce..8064e65de6c0 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
 
+/* These functions provide the value for the sysfs crash_hotplug nodes */
+#ifdef CONFIG_HOTPLUG_CPU
+int arch_crash_hotplug_cpu_support(void)
+{
+	return crash_check_update_elfcorehdr();
+}
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int arch_crash_hotplug_memory_support(void)
+{
+	return crash_check_update_elfcorehdr();
+}
+#endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void)
+{
+	unsigned int sz;
+
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
+	else
+		sz += 2 + CONFIG_NR_CPUS_DEFAULT;
+	sz *= sizeof(Elf64_Phdr);
+	return sz;
+}
+
 /**
  * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
  * @image: the active struct kimage
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 6a8a724ac638..050e20066cdb 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -335,6 +335,10 @@ struct kimage {
 	unsigned int preserve_context : 1;
 	/* If set, we are using file mode kexec syscall */
 	unsigned int file_mode:1;
+#ifdef CONFIG_CRASH_HOTPLUG
+	/* If set, allow changes to elfcorehdr of kexec_load'd image */
+	unsigned int update_elfcorehdr:1;
+#endif
 
 #ifdef ARCH_HAS_KIMAGE_ARCH
 	struct kimage_arch arch;
@@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
 
 /* List of defined/legal kexec flags */
 #ifndef CONFIG_KEXEC_JUMP
-#define KEXEC_FLAGS    KEXEC_ON_CRASH
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
 #else
-#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
 #endif
 
 /* List of defined/legal kexec file flags */
@@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
 static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
 #endif
 
+int crash_check_update_elfcorehdr(void);
+
 #ifndef crash_hotplug_cpu_support
 static inline int crash_hotplug_cpu_support(void) { return 0; }
 #endif
@@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
 static inline int crash_hotplug_memory_support(void) { return 0; }
 #endif
 
+#ifndef crash_get_elfcorehdr_size
+static inline crash_get_elfcorehdr_size(void) { return 0; }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 981016e05cfa..01766dd839b0 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -12,6 +12,7 @@
 /* kexec flags for different usage scenarios */
 #define KEXEC_ON_CRASH		0x00000001
 #define KEXEC_PRESERVE_CONTEXT	0x00000002
+#define KEXEC_UPDATE_ELFCOREHDR	0x00000004
 #define KEXEC_ARCH_MASK		0xffff0000
 
 /*
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index ef6e91daad56..e05bfdb7eaed 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
 #ifdef CONFIG_CRASH_HOTPLUG
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
+
+/*
+ * This routine utilized when the crash_hotplug sysfs node is read.
+ * It reflects the kernel's ability/permission to update the crash
+ * elfcorehdr directly.
+ */
+int crash_check_update_elfcorehdr(void)
+{
+	int rc = 0;
+
+	/* Obtain lock while reading crash information */
+	if (!kexec_trylock()) {
+		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
+		return 0;
+	}
+	if (kexec_crash_image) {
+		if (kexec_crash_image->file_mode)
+			rc = 1;
+		else
+			rc = kexec_crash_image->update_elfcorehdr;
+	}
+	/* Release lock now that update complete */
+	kexec_unlock();
+
+	return rc;
+}
+
 /*
  * To accurately reflect hot un/plug changes of cpu and memory resources
  * (including onling and offlining of those resources), the elfcorehdr
@@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
 
 	image = kexec_crash_image;
 
+	/* Check that updating elfcorehdr is permitted */
+	if (!(image->file_mode || image->update_elfcorehdr))
+		goto out;
+
 	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
 		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
 		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 92d301f98776..60de64bd14b9 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
 	if (flags & KEXEC_PRESERVE_CONTEXT)
 		image->preserve_context = 1;
 
+	if (flags & KEXEC_UPDATE_ELFCOREHDR)
+		image->update_elfcorehdr = 1;
+
 	ret = machine_kexec_prepare(image);
 	if (ret)
 		goto out;
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index aad7a3bfd846..1d4bc493b2f4 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
 }
 KERNEL_ATTR_RO(vmcoreinfo);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
+			       struct kobj_attribute *attr, char *buf)
+{
+	unsigned int sz = crash_get_elfcorehdr_size();
+
+	return sysfs_emit(buf, "%u\n", sz);
+}
+KERNEL_ATTR_RO(crash_elfcorehdr_size);
+
+#endif
+
 #endif /* CONFIG_CRASH_CORE */
 
 /* whether file capabilities are enabled */
@@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
 #endif
 #ifdef CONFIG_CRASH_CORE
 	&vmcoreinfo_attr.attr,
+#ifdef CONFIG_CRASH_HOTPLUG
+	&crash_elfcorehdr_size_attr.attr,
+#endif
 #endif
 #ifndef CONFIG_TINY_RCU
 	&rcu_expedited_attr.attr,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-03 22:41   ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the change to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
           elfcorehdr      /proc/vmcore     vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This change results in the benefit of having all CPUs described in
the elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs in the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index e05bfdb7eaed..26262789baf6 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 	ehdr->e_ehsize = sizeof(Elf64_Ehdr);
 	ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-	/* Prepare one phdr of type PT_NOTE for each present CPU */
-	for_each_present_cpu(cpu) {
+	/* Prepare one phdr of type PT_NOTE for each possible CPU */
+	for_each_possible_cpu(cpu) {
 		phdr->p_type = PT_NOTE;
 		notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
 		phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
@ 2023-05-03 22:41   ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the change to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
           elfcorehdr      /proc/vmcore     vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This change results in the benefit of having all CPUs described in
the elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs in the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index e05bfdb7eaed..26262789baf6 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 	ehdr->e_ehsize = sizeof(Elf64_Ehdr);
 	ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-	/* Prepare one phdr of type PT_NOTE for each present CPU */
-	for_each_present_cpu(cpu) {
+	/* Prepare one phdr of type PT_NOTE for each possible CPU */
+	for_each_possible_cpu(cpu) {
 		phdr->p_type = PT_NOTE;
 		notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
 		phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 8/8] x86/crash: optimize CPU changes
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-03 22:41   ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

This patch is dependent upon the patch 'crash: change
crash_prepare_elf64_headers() to for_each_possible_cpu()'. With that
patch, crash_prepare_elf64_headers() writes out an ELF CPU PT_NOTE
for all possible CPUs, thus further CPU changes to the elfcorehdr
are not needed.

This change works for kexec_file_load() and kexec_load() syscalls.
For kexec_file_load(), crash_prepare_elf64_headers() is utilized
directly and thus all ELF CPU PT_NOTEs are in the elfcorehdr already.
This is the kimage->file_mode term.
For kexec_load() syscall, one CPU or memory change will cause the
elfcorehdr to be updated via crash_prepare_elf64_headers() and at
that point all ELF CPU PT_NOTEs are in the elfcorehdr. This is the
kimage->elfcorehdr_updated term.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/kernel/crash.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8064e65de6c0..3157e6068747 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -483,6 +483,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
 	unsigned long mem, memsz;
 	unsigned long elfsz = 0;
 
+	/* As crash_prepare_elf64_headers() has already described all
+	 * possible CPUs, there is no need to update the elfcorehdr
+	 * for additional CPU changes. This works for both kexec_load()
+	 * and kexec_file_load() syscalls.
+	 */
+	if ((image->file_mode || image->elfcorehdr_updated) &&
+		((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+		(image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+		return;
+
 	/*
 	 * Create the new elfcorehdr reflecting the changes to CPU and/or
 	 * memory resources.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v22 8/8] x86/crash: optimize CPU changes
@ 2023-05-03 22:41   ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-03 22:41 UTC (permalink / raw)
  To: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

This patch is dependent upon the patch 'crash: change
crash_prepare_elf64_headers() to for_each_possible_cpu()'. With that
patch, crash_prepare_elf64_headers() writes out an ELF CPU PT_NOTE
for all possible CPUs, thus further CPU changes to the elfcorehdr
are not needed.

This change works for kexec_file_load() and kexec_load() syscalls.
For kexec_file_load(), crash_prepare_elf64_headers() is utilized
directly and thus all ELF CPU PT_NOTEs are in the elfcorehdr already.
This is the kimage->file_mode term.
For kexec_load() syscall, one CPU or memory change will cause the
elfcorehdr to be updated via crash_prepare_elf64_headers() and at
that point all ELF CPU PT_NOTEs are in the elfcorehdr. This is the
kimage->elfcorehdr_updated term.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/kernel/crash.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8064e65de6c0..3157e6068747 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -483,6 +483,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
 	unsigned long mem, memsz;
 	unsigned long elfsz = 0;
 
+	/* As crash_prepare_elf64_headers() has already described all
+	 * possible CPUs, there is no need to update the elfcorehdr
+	 * for additional CPU changes. This works for both kexec_load()
+	 * and kexec_file_load() syscalls.
+	 */
+	if ((image->file_mode || image->elfcorehdr_updated) &&
+		((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+		(image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+		return;
+
 	/*
 	 * Create the new elfcorehdr reflecting the changes to CPU and/or
 	 * memory resources.
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 0/8] crash: Kernel handling of CPU and memory hot un/plug
  2023-05-03 22:41 ` Eric DeVolder
@ 2023-05-04  6:22   ` Hari Bathini
  -1 siblings, 0 replies; 38+ messages in thread
From: Hari Bathini @ 2023-05-04  6:22 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky



On 04/05/23 4:11 am, Eric DeVolder wrote:
> Once the kdump service is loaded, if changes to CPUs or memory occur,
> either by hot un/plug or off/onlining, the crash elfcorehdr must also
> be updated.
> 
> The elfcorehdr describes to kdump the CPUs and memory in the system,
> and any inaccuracies can result in a vmcore with missing CPU context
> or memory regions.
> 
> The current solution utilizes udev to initiate an unload-then-reload
> of the kdump image (eg. kernel, initrd, boot_params, purgatory and
> elfcorehdr) by the userspace kexec utility. In the original post I
> outlined the significant performance problems related to offloading
> this activity to userspace.
> 
> This patchset introduces a generic crash handler that registers with
> the CPU and memory notifiers. Upon CPU or memory changes, from either
> hot un/plug or off/onlining, this generic handler is invoked and
> performs important housekeeping, for example obtaining the appropriate
> lock, and then invokes an architecture specific handler to do the
> appropriate elfcorehdr update.
> 
> Note the description in patch 'crash: change crash_prepare_elf64_headers()
> to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
> enables further optimizations related to CPU plug/unplug/online/offline
> performance of elfcorehdr updates.
> 
> In the case of x86_64, the arch specific handler generates a new
> elfcorehdr, and overwrites the old one in memory; thus no involvement
> with userspace needed.
> 
> To realize the benefits/test this patchset, one must make a couple
> of minor changes to userspace:
> 
>   - Prevent udev from updating kdump crash kernel on hot un/plug changes.
>     Add the following as the first lines to the RHEL udev rule file
>     /usr/lib/udev/rules.d/98-kexec.rules:
> 
>     # The kernel updates the crash elfcorehdr for CPU and memory changes
>     SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>     SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
> 
>     With this changeset applied, the two rules evaluate to false for
>     CPU and memory change events and thus skip the userspace
>     unload-then-reload of kdump.
> 
>   - Change to the kexec_file_load for loading the kdump kernel:
>     Eg. on RHEL: in /usr/bin/kdumpctl, change to:
>      standard_kexec_args="-p -d -s"
>     which adds the -s to select kexec_file_load() syscall.
> 
> This kernel patchset also supports kexec_load() with a modified kexec
> userspace utility. A working changeset to the kexec userspace utility
> is posted to the kexec-tools mailing list here:
> 
>   http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
> 
> To use the kexec-tools patch, apply, build and install kexec-tools,
> then change the kdumpctl's standard_kexec_args to replace the -s with
> --hotplug. The removal of -s reverts to the kexec_load syscall and
> the addition of --hotplug invokes the changes put forth in the
> kexec-tools patch.

The changes look good to me. For the series..

Acked-by: Hari Bathini <hbathini@linux.ibm.com>

> 
> Regards,
> eric
> ---
> v22: 3may2023
>   - Rebased onto 6.3.0
>   - Improved support for kexec_load(), per Hari Bathini. See
>     "crash: hotplug support for kexec_load()" which is the only
>     change to this series.
>   - Applied Baoquan He's Acked-by for all other patches.
> 
> v21: 4apr2023
>   https://lkml.org/lkml/2023/4/4/1136
>   https://lore.kernel.org/lkml/20230404180326.6890-1-eric.devolder@oracle.com/
>   - Rebased onto 6.3.0-rc5
>   - Additional simplification of indentation in crash_handle_hotplug_event(),
>     per Baoquan.
> 
> v20: 17mar2023
>   https://lkml.org/lkml/2023/3/17/1169
>   https://lore.kernel.org/lkml/20230317212128.21424-1-eric.devolder@oracle.com/
>   - Rebased onto 6.3.0-rc2
>   - Defaulting CRASH_HOTPLUG for x86 to Y, per Sourabh.
>   - Explicitly initializing image->hp_action, per Baoquan.
>   - Simplified kexec_trylock() in crash_handle_hotplug_event(),
>     per Baoquan.
>   - Applied Sourabh's Reviewed-by to the series.
> 
> v19: 6mar2023
>   https://lkml.org/lkml/2023/3/6/1358
>   https://lore.kernel.org/lkml/20230306162228.8277-1-eric.devolder@oracle.com/
>   - Rebased onto 6.2.0
>   - Did away with offlinecpu, per Thomas Gleixner.
>   - Changed to CPUHP_BP_PREPARE_DYN instead of CPUHP_AP_ONLINE_DYN.
>   - Did away with elfcorehdr_index_valid, per Sourabh.
>   - Convert to for_each_possible_cpu() in crash_prepare_elf64_headers()
>     per Sourabh.
>   - Small optimization for x86 cpu changes.
> 
> v18: 31jan2023
>   https://lkml.org/lkml/2023/1/31/1356
>   https://lore.kernel.org/lkml/20230131224236.122805-1-eric.devolder@oracle.com/
>   - Rebased onto 6.2.0-rc6
>   - Renamed struct kimage member hotplug_event to hp_action, and
>     re-enumerated the KEXEC_CRASH_HP_x items, adding _NONE at 0.
>   - Moved to cpuhp state CPUHP_BP_PREPARE_DYN instead of
>     CPUHP_AP_ONLINE_DYN in order to minimize window of time CPU
>     is not reflected in elfcorehdr.
>   - Reworked some of the comments and commit messages to offer
>     more of the why, than what, per Thomas Gleixner.
> 
> v17: 18jan2023
>   https://lkml.org/lkml/2023/1/18/1420
>   https://lore.kernel.org/lkml/20230118213544.2128-1-eric.devolder@oracle.com/
>   - Rebased onto 6.2.0-rc4
>   - Moved a bit of code around so that kexec_load()-only builds
>     work, per Sourabh.
>   - Corrected computation of number of memory region Phdrs needed
>     when x86 memory hotplug is not enabled, per Baoquan.
> 
> v16: 5jan2023
>   https://lkml.org/lkml/2023/1/5/673
>   https://lore.kernel.org/lkml/20230105151709.1845-1-eric.devolder@oracle.com/
>   - Rebased onto 6.2.0-rc2
>   - Corrected error identified by Baoquan.
> 
> v15: 9dec2022
>   https://lkml.org/lkml/2022/12/9/520
>   https://lore.kernel.org/lkml/20221209153656.3284-1-eric.devolder@oracle.com/
>   - Rebased onto 6.1.0-rc8
>   - Replaced arch_un/map_crash_pages() with direct use of
>     kun/map_local_pages(), per Boris.
>   - Some x86 changes, per Boris.
> 
> v14: 16nov2022
>   https://lkml.org/lkml/2022/11/16/1645
>   https://lore.kernel.org/lkml/20221116214643.6384-1-eric.devolder@oracle.com/
>   - Rebased onto 6.1.0-rc5
>   - Introduced CRASH_HOTPLUG Kconfig item to better fine tune
>     compilation of feature components, per Boris.
>   - Removed hp_action parameter to arch_crash_handle_hotplug_event()
>     as it is unused.
> 
> v13: 31oct2022
>   https://lkml.org/lkml/2022/10/31/854
>   https://lore.kernel.org/lkml/20221031193604.28779-1-eric.devolder@oracle.com/
>   - Rebased onto 6.1.0-rc3, which means converting to use the new
>     kexec_trylock() away from mutex_lock(kexec_mutex).
>   - Moved arch_un/map_crash_pages() into kexec.h and default
>     implementation using k/unmap_local_pages().
>   - Changed more #ifdef's into IS_ENABLED()
>   - Changed CRASH_MAX_MEMORY_RANGES to 8192 from 32768, and it moved
>     into x86 crash.c as #define rather Kconfig item, per Boris.
>   - Check number of Phdrs against PN_XNUM, max possible.
> 
> v12: 9sep2022
>   https://lkml.org/lkml/2022/9/9/1358
>   https://lore.kernel.org/lkml/20220909210509.6286-1-eric.devolder@oracle.com/
>   - Rebased onto 6.0-rc4
>   - Addressed some minor formatting items, per Baoquan
> 
> v11: 26aug2022
>   https://lkml.org/lkml/2022/8/26/963
>   https://lore.kernel.org/lkml/20220826173704.1895-1-eric.devolder@oracle.com/
>   - Rebased onto 6.0-rc2
>   - Redid the rework of __weak to use asm/kexec.h, per Baoquan
>   - Reworked some comments and minor items, per Baoquan
> 
> v10: 21jul2022
>   https://lkml.org/lkml/2022/7/21/1007
>   https://lore.kernel.org/lkml/20220721181747.1640-1-eric.devolder@oracle.com/
>   - Rebased to 5.19.0-rc7
>   - Per Sourabh, corrected build issue with arch_un/map_crash_pages()
>     for architectures not supporting this feature.
>   - Per David Hildebrand, removed the WARN_ONCE() altogether.
>   - Per David Hansen, converted to use of kmap_local_page().
>   - Per Baoquan He, replaced use of __weak with the kexec technique.
> 
> v9: 13jun2022
>   https://lkml.org/lkml/2022/6/13/3382
>   https://lore.kernel.org/lkml/20220613224240.79400-1-eric.devolder@oracle.com/
>   - Rebased to 5.18.0
>   - Per Sourabh, moved crash_prepare_elf64_headers() into common
>     crash_core.c to avoid compile issues with kexec_load only path.
>   - Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
>   - Changed the __weak arch_crash_handle_hotplug_event() to utilize
>     WARN_ONCE() instead of WARN(). Fix some formatting issues.
>   - Per Sourabh, introduced sysfs attribute crash_hotplug for memory
>     and CPUs; for use by userspace (udev) to determine if the kernel
>     performs crash hot un/plug support.
>   - Per Sourabh, moved the code detecting the elfcorehdr segment from
>     arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
>     and kexec_file_load can benefit.
>   - Updated userspace kexec-tools kexec utility to reflect change to
>     using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
>   - Updated the new proposed udev rules to reflect using the sysfs
>     attributes crash_hotplug.
> 
> v8: 5may2022
>   https://lkml.org/lkml/2022/5/5/1133
>   https://lore.kernel.org/lkml/20220505184603.1548-1-eric.devolder@oracle.com/
>   - Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
>     of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
>     is not needed. Also use of IS_ENABLED() rather than #ifdef's.
>     Renamed crash_hotplug_handler() to handle_hotplug_event().
>     And other corrections.
>   - Per Baoquan, minimized the parameters to the arch_crash_
>     handle_hotplug_event() to hp_action and cpu.
>   - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
>   - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
>     to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
>     by David Hildebrand. Folded this patch into the x86
>     kexec_file_load support patch.
> 
> v7: 13apr2022
>   https://lkml.org/lkml/2022/4/13/850
>   https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devolder@oracle.com/
>   - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.
> 
> v6: 1apr2022
>   https://lkml.org/lkml/2022/4/1/1203
>   https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devolder@oracle.com/
>   - Reword commit messages and some comment cleanup per Baoquan.
>   - Changed elf_index to elfcorehdr_index for clarity.
>   - Minor code changes per Baoquan.
> 
> v5: 3mar2022
>   https://lkml.org/lkml/2022/3/3/674
>   https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devolder@oracle.com/
>   - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
>     David Hildenbrand.
>   - Refactored slightly a few patches per Baoquan recommendation.
> 
> v4: 9feb2022
>   https://lkml.org/lkml/2022/2/9/1406
>   https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devolder@oracle.com/
>   - Refactored patches per Baoquan suggestsions.
>   - A few corrections, per Baoquan.
> 
> v3: 10jan2022
>   https://lkml.org/lkml/2022/1/10/1212
>   https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devolder@oracle.com/
>   - Rebasing per Baoquan He request.
>   - Changed memory notifier per David Hildenbrand.
>   - Providing example kexec userspace change in cover letter.
> 
> RFC v2: 7dec2021
>   https://lkml.org/lkml/2021/12/7/1088
>   https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devolder@oracle.com/
>   - Acting upon Baoquan He suggestion of removing elfcorehdr from
>     the purgatory list of segments, removed purgatory code from
>     patchset, and it is signficiantly simpler now.
> 
> RFC v1: 18nov2021
>   https://lkml.org/lkml/2021/11/18/845
>   https://lore.kernel.org/lkml/20211118174948.37435-1-eric.devolder@oracle.com/
>   - working patchset demonstrating kernel handling of hotplug
>     updates to x86 elfcorehdr for kexec_file_load
> 
> RFC: 14dec2020
>   https://lkml.org/lkml/2020/12/14/532
>   https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com/
>   - proposed concept of allowing kernel to handle hotplug update
>     of elfcorehdr
> ---
> 
> Eric DeVolder (8):
>    crash: move a few code bits to setup support of crash hotplug
>    crash: add generic infrastructure for crash hotplug support
>    kexec: exclude elfcorehdr from the segment digest
>    crash: memory and CPU hotplug sysfs attributes
>    x86/crash: add x86 crash hotplug support
>    crash: hotplug support for kexec_load()
>    crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
>    x86/crash: optimize CPU changes
> 
>   .../admin-guide/mm/memory-hotplug.rst         |   8 +
>   Documentation/core-api/cpu_hotplug.rst        |  18 +
>   arch/x86/Kconfig                              |  13 +
>   arch/x86/include/asm/kexec.h                  |  18 +
>   arch/x86/kernel/crash.c                       | 156 +++++++-
>   drivers/base/cpu.c                            |  14 +
>   drivers/base/memory.c                         |  13 +
>   include/linux/crash_core.h                    |   9 +
>   include/linux/kexec.h                         |  63 +++-
>   include/uapi/linux/kexec.h                    |   1 +
>   kernel/crash_core.c                           | 355 ++++++++++++++++++
>   kernel/kexec.c                                |   3 +
>   kernel/kexec_core.c                           |   6 +
>   kernel/kexec_file.c                           | 187 +--------
>   kernel/ksysfs.c                               |  15 +
>   15 files changed, 674 insertions(+), 205 deletions(-)
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 0/8] crash: Kernel handling of CPU and memory hot un/plug
@ 2023-05-04  6:22   ` Hari Bathini
  0 siblings, 0 replies; 38+ messages in thread
From: Hari Bathini @ 2023-05-04  6:22 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky



On 04/05/23 4:11 am, Eric DeVolder wrote:
> Once the kdump service is loaded, if changes to CPUs or memory occur,
> either by hot un/plug or off/onlining, the crash elfcorehdr must also
> be updated.
> 
> The elfcorehdr describes to kdump the CPUs and memory in the system,
> and any inaccuracies can result in a vmcore with missing CPU context
> or memory regions.
> 
> The current solution utilizes udev to initiate an unload-then-reload
> of the kdump image (eg. kernel, initrd, boot_params, purgatory and
> elfcorehdr) by the userspace kexec utility. In the original post I
> outlined the significant performance problems related to offloading
> this activity to userspace.
> 
> This patchset introduces a generic crash handler that registers with
> the CPU and memory notifiers. Upon CPU or memory changes, from either
> hot un/plug or off/onlining, this generic handler is invoked and
> performs important housekeeping, for example obtaining the appropriate
> lock, and then invokes an architecture specific handler to do the
> appropriate elfcorehdr update.
> 
> Note the description in patch 'crash: change crash_prepare_elf64_headers()
> to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
> enables further optimizations related to CPU plug/unplug/online/offline
> performance of elfcorehdr updates.
> 
> In the case of x86_64, the arch specific handler generates a new
> elfcorehdr, and overwrites the old one in memory; thus no involvement
> with userspace needed.
> 
> To realize the benefits/test this patchset, one must make a couple
> of minor changes to userspace:
> 
>   - Prevent udev from updating kdump crash kernel on hot un/plug changes.
>     Add the following as the first lines to the RHEL udev rule file
>     /usr/lib/udev/rules.d/98-kexec.rules:
> 
>     # The kernel updates the crash elfcorehdr for CPU and memory changes
>     SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>     SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
> 
>     With this changeset applied, the two rules evaluate to false for
>     CPU and memory change events and thus skip the userspace
>     unload-then-reload of kdump.
> 
>   - Change to the kexec_file_load for loading the kdump kernel:
>     Eg. on RHEL: in /usr/bin/kdumpctl, change to:
>      standard_kexec_args="-p -d -s"
>     which adds the -s to select kexec_file_load() syscall.
> 
> This kernel patchset also supports kexec_load() with a modified kexec
> userspace utility. A working changeset to the kexec userspace utility
> is posted to the kexec-tools mailing list here:
> 
>   http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
> 
> To use the kexec-tools patch, apply, build and install kexec-tools,
> then change the kdumpctl's standard_kexec_args to replace the -s with
> --hotplug. The removal of -s reverts to the kexec_load syscall and
> the addition of --hotplug invokes the changes put forth in the
> kexec-tools patch.

The changes look good to me. For the series..

Acked-by: Hari Bathini <hbathini@linux.ibm.com>

> 
> Regards,
> eric
> ---
> v22: 3may2023
>   - Rebased onto 6.3.0
>   - Improved support for kexec_load(), per Hari Bathini. See
>     "crash: hotplug support for kexec_load()" which is the only
>     change to this series.
>   - Applied Baoquan He's Acked-by for all other patches.
> 
> v21: 4apr2023
>   https://lkml.org/lkml/2023/4/4/1136
>   https://lore.kernel.org/lkml/20230404180326.6890-1-eric.devolder@oracle.com/
>   - Rebased onto 6.3.0-rc5
>   - Additional simplification of indentation in crash_handle_hotplug_event(),
>     per Baoquan.
> 
> v20: 17mar2023
>   https://lkml.org/lkml/2023/3/17/1169
>   https://lore.kernel.org/lkml/20230317212128.21424-1-eric.devolder@oracle.com/
>   - Rebased onto 6.3.0-rc2
>   - Defaulting CRASH_HOTPLUG for x86 to Y, per Sourabh.
>   - Explicitly initializing image->hp_action, per Baoquan.
>   - Simplified kexec_trylock() in crash_handle_hotplug_event(),
>     per Baoquan.
>   - Applied Sourabh's Reviewed-by to the series.
> 
> v19: 6mar2023
>   https://lkml.org/lkml/2023/3/6/1358
>   https://lore.kernel.org/lkml/20230306162228.8277-1-eric.devolder@oracle.com/
>   - Rebased onto 6.2.0
>   - Did away with offlinecpu, per Thomas Gleixner.
>   - Changed to CPUHP_BP_PREPARE_DYN instead of CPUHP_AP_ONLINE_DYN.
>   - Did away with elfcorehdr_index_valid, per Sourabh.
>   - Convert to for_each_possible_cpu() in crash_prepare_elf64_headers()
>     per Sourabh.
>   - Small optimization for x86 cpu changes.
> 
> v18: 31jan2023
>   https://lkml.org/lkml/2023/1/31/1356
>   https://lore.kernel.org/lkml/20230131224236.122805-1-eric.devolder@oracle.com/
>   - Rebased onto 6.2.0-rc6
>   - Renamed struct kimage member hotplug_event to hp_action, and
>     re-enumerated the KEXEC_CRASH_HP_x items, adding _NONE at 0.
>   - Moved to cpuhp state CPUHP_BP_PREPARE_DYN instead of
>     CPUHP_AP_ONLINE_DYN in order to minimize window of time CPU
>     is not reflected in elfcorehdr.
>   - Reworked some of the comments and commit messages to offer
>     more of the why, than what, per Thomas Gleixner.
> 
> v17: 18jan2023
>   https://lkml.org/lkml/2023/1/18/1420
>   https://lore.kernel.org/lkml/20230118213544.2128-1-eric.devolder@oracle.com/
>   - Rebased onto 6.2.0-rc4
>   - Moved a bit of code around so that kexec_load()-only builds
>     work, per Sourabh.
>   - Corrected computation of number of memory region Phdrs needed
>     when x86 memory hotplug is not enabled, per Baoquan.
> 
> v16: 5jan2023
>   https://lkml.org/lkml/2023/1/5/673
>   https://lore.kernel.org/lkml/20230105151709.1845-1-eric.devolder@oracle.com/
>   - Rebased onto 6.2.0-rc2
>   - Corrected error identified by Baoquan.
> 
> v15: 9dec2022
>   https://lkml.org/lkml/2022/12/9/520
>   https://lore.kernel.org/lkml/20221209153656.3284-1-eric.devolder@oracle.com/
>   - Rebased onto 6.1.0-rc8
>   - Replaced arch_un/map_crash_pages() with direct use of
>     kun/map_local_pages(), per Boris.
>   - Some x86 changes, per Boris.
> 
> v14: 16nov2022
>   https://lkml.org/lkml/2022/11/16/1645
>   https://lore.kernel.org/lkml/20221116214643.6384-1-eric.devolder@oracle.com/
>   - Rebased onto 6.1.0-rc5
>   - Introduced CRASH_HOTPLUG Kconfig item to better fine tune
>     compilation of feature components, per Boris.
>   - Removed hp_action parameter to arch_crash_handle_hotplug_event()
>     as it is unused.
> 
> v13: 31oct2022
>   https://lkml.org/lkml/2022/10/31/854
>   https://lore.kernel.org/lkml/20221031193604.28779-1-eric.devolder@oracle.com/
>   - Rebased onto 6.1.0-rc3, which means converting to use the new
>     kexec_trylock() away from mutex_lock(kexec_mutex).
>   - Moved arch_un/map_crash_pages() into kexec.h and default
>     implementation using k/unmap_local_pages().
>   - Changed more #ifdef's into IS_ENABLED()
>   - Changed CRASH_MAX_MEMORY_RANGES to 8192 from 32768, and it moved
>     into x86 crash.c as #define rather Kconfig item, per Boris.
>   - Check number of Phdrs against PN_XNUM, max possible.
> 
> v12: 9sep2022
>   https://lkml.org/lkml/2022/9/9/1358
>   https://lore.kernel.org/lkml/20220909210509.6286-1-eric.devolder@oracle.com/
>   - Rebased onto 6.0-rc4
>   - Addressed some minor formatting items, per Baoquan
> 
> v11: 26aug2022
>   https://lkml.org/lkml/2022/8/26/963
>   https://lore.kernel.org/lkml/20220826173704.1895-1-eric.devolder@oracle.com/
>   - Rebased onto 6.0-rc2
>   - Redid the rework of __weak to use asm/kexec.h, per Baoquan
>   - Reworked some comments and minor items, per Baoquan
> 
> v10: 21jul2022
>   https://lkml.org/lkml/2022/7/21/1007
>   https://lore.kernel.org/lkml/20220721181747.1640-1-eric.devolder@oracle.com/
>   - Rebased to 5.19.0-rc7
>   - Per Sourabh, corrected build issue with arch_un/map_crash_pages()
>     for architectures not supporting this feature.
>   - Per David Hildebrand, removed the WARN_ONCE() altogether.
>   - Per David Hansen, converted to use of kmap_local_page().
>   - Per Baoquan He, replaced use of __weak with the kexec technique.
> 
> v9: 13jun2022
>   https://lkml.org/lkml/2022/6/13/3382
>   https://lore.kernel.org/lkml/20220613224240.79400-1-eric.devolder@oracle.com/
>   - Rebased to 5.18.0
>   - Per Sourabh, moved crash_prepare_elf64_headers() into common
>     crash_core.c to avoid compile issues with kexec_load only path.
>   - Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
>   - Changed the __weak arch_crash_handle_hotplug_event() to utilize
>     WARN_ONCE() instead of WARN(). Fix some formatting issues.
>   - Per Sourabh, introduced sysfs attribute crash_hotplug for memory
>     and CPUs; for use by userspace (udev) to determine if the kernel
>     performs crash hot un/plug support.
>   - Per Sourabh, moved the code detecting the elfcorehdr segment from
>     arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
>     and kexec_file_load can benefit.
>   - Updated userspace kexec-tools kexec utility to reflect change to
>     using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
>   - Updated the new proposed udev rules to reflect using the sysfs
>     attributes crash_hotplug.
> 
> v8: 5may2022
>   https://lkml.org/lkml/2022/5/5/1133
>   https://lore.kernel.org/lkml/20220505184603.1548-1-eric.devolder@oracle.com/
>   - Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
>     of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
>     is not needed. Also use of IS_ENABLED() rather than #ifdef's.
>     Renamed crash_hotplug_handler() to handle_hotplug_event().
>     And other corrections.
>   - Per Baoquan, minimized the parameters to the arch_crash_
>     handle_hotplug_event() to hp_action and cpu.
>   - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
>   - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
>     to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
>     by David Hildebrand. Folded this patch into the x86
>     kexec_file_load support patch.
> 
> v7: 13apr2022
>   https://lkml.org/lkml/2022/4/13/850
>   https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devolder@oracle.com/
>   - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.
> 
> v6: 1apr2022
>   https://lkml.org/lkml/2022/4/1/1203
>   https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devolder@oracle.com/
>   - Reword commit messages and some comment cleanup per Baoquan.
>   - Changed elf_index to elfcorehdr_index for clarity.
>   - Minor code changes per Baoquan.
> 
> v5: 3mar2022
>   https://lkml.org/lkml/2022/3/3/674
>   https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devolder@oracle.com/
>   - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
>     David Hildenbrand.
>   - Refactored slightly a few patches per Baoquan recommendation.
> 
> v4: 9feb2022
>   https://lkml.org/lkml/2022/2/9/1406
>   https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devolder@oracle.com/
>   - Refactored patches per Baoquan suggestsions.
>   - A few corrections, per Baoquan.
> 
> v3: 10jan2022
>   https://lkml.org/lkml/2022/1/10/1212
>   https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devolder@oracle.com/
>   - Rebasing per Baoquan He request.
>   - Changed memory notifier per David Hildenbrand.
>   - Providing example kexec userspace change in cover letter.
> 
> RFC v2: 7dec2021
>   https://lkml.org/lkml/2021/12/7/1088
>   https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devolder@oracle.com/
>   - Acting upon Baoquan He suggestion of removing elfcorehdr from
>     the purgatory list of segments, removed purgatory code from
>     patchset, and it is signficiantly simpler now.
> 
> RFC v1: 18nov2021
>   https://lkml.org/lkml/2021/11/18/845
>   https://lore.kernel.org/lkml/20211118174948.37435-1-eric.devolder@oracle.com/
>   - working patchset demonstrating kernel handling of hotplug
>     updates to x86 elfcorehdr for kexec_file_load
> 
> RFC: 14dec2020
>   https://lkml.org/lkml/2020/12/14/532
>   https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com/
>   - proposed concept of allowing kernel to handle hotplug update
>     of elfcorehdr
> ---
> 
> Eric DeVolder (8):
>    crash: move a few code bits to setup support of crash hotplug
>    crash: add generic infrastructure for crash hotplug support
>    kexec: exclude elfcorehdr from the segment digest
>    crash: memory and CPU hotplug sysfs attributes
>    x86/crash: add x86 crash hotplug support
>    crash: hotplug support for kexec_load()
>    crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
>    x86/crash: optimize CPU changes
> 
>   .../admin-guide/mm/memory-hotplug.rst         |   8 +
>   Documentation/core-api/cpu_hotplug.rst        |  18 +
>   arch/x86/Kconfig                              |  13 +
>   arch/x86/include/asm/kexec.h                  |  18 +
>   arch/x86/kernel/crash.c                       | 156 +++++++-
>   drivers/base/cpu.c                            |  14 +
>   drivers/base/memory.c                         |  13 +
>   include/linux/crash_core.h                    |   9 +
>   include/linux/kexec.h                         |  63 +++-
>   include/uapi/linux/kexec.h                    |   1 +
>   kernel/crash_core.c                           | 355 ++++++++++++++++++
>   kernel/kexec.c                                |   3 +
>   kernel/kexec_core.c                           |   6 +
>   kernel/kexec_file.c                           | 187 +--------
>   kernel/ksysfs.c                               |  15 +
>   15 files changed, 674 insertions(+), 205 deletions(-)
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
  2023-05-03 22:41   ` Eric DeVolder
@ 2023-05-08  5:13     ` Baoquan He
  -1 siblings, 0 replies; 38+ messages in thread
From: Baoquan He @ 2023-05-08  5:13 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo,
	bp, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault,
	rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky

On 05/03/23 at 06:41pm, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
> 
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
> 
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
> 
> To solve these problems, this patch introduces:
>  - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>    safe for the kernel to modify the elfcorehdr (because kexec-tools
>    has excluded the elfcorehdr from the digest).
>  - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>    kexec-tools what the preferred size of the elfcorehdr memory buffer
>    should be in order to accommodate hotplug changes.
>  - The sysfs crash_hotplug nodes (ie.
>    /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>    that they examine kexec_file_load() vs kexec_load(), and when
>    kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>    This is critical so that the udev rule processing of crash_hotplug
>    indicates correctly (ie. the userspace unload-then-load of the
>    kdump of the kdump image can be skipped, or not).
> 
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
> 
>  - For systems which have these kernel changes in place, but not the
>    corresponding changes to the crash hot plug udev rules and
>    kexec-tools, (ie "older" systems) those systems will continue to
>    unload-then-load the kdump image, as has always been done. The
>    kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>  - For systems which have these kernel changes in place and the proposed
>    udev rule changes in place, but not the kexec-tools changes in place:
>     - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>       so the unload-then-reload of kdump image will occur (the sysfs
>       crash_hotplug nodes will show 0).
>     - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>       to show 1, and the kernel will modify the elfcorehdr directly. And
>       with the udev changes in place, the unload-then-load will not occur!
>  - For systems which have these kernel changes as well as the udev and
>    kexec-tools changes in place, then the user/admin has full authority
>    over the enablement and support of crash hotplug support, whether via
>    kexec_file_load() or kexec_load().
> 
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
> 
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
> 
> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

LGTM,

Acked-by: Baoquan He <bhe@redhat.com>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
@ 2023-05-08  5:13     ` Baoquan He
  0 siblings, 0 replies; 38+ messages in thread
From: Baoquan He @ 2023-05-08  5:13 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo,
	bp, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault,
	rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky

On 05/03/23 at 06:41pm, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
> 
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
> 
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
> 
> To solve these problems, this patch introduces:
>  - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>    safe for the kernel to modify the elfcorehdr (because kexec-tools
>    has excluded the elfcorehdr from the digest).
>  - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>    kexec-tools what the preferred size of the elfcorehdr memory buffer
>    should be in order to accommodate hotplug changes.
>  - The sysfs crash_hotplug nodes (ie.
>    /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>    that they examine kexec_file_load() vs kexec_load(), and when
>    kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>    This is critical so that the udev rule processing of crash_hotplug
>    indicates correctly (ie. the userspace unload-then-load of the
>    kdump of the kdump image can be skipped, or not).
> 
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
> 
>  - For systems which have these kernel changes in place, but not the
>    corresponding changes to the crash hot plug udev rules and
>    kexec-tools, (ie "older" systems) those systems will continue to
>    unload-then-load the kdump image, as has always been done. The
>    kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>  - For systems which have these kernel changes in place and the proposed
>    udev rule changes in place, but not the kexec-tools changes in place:
>     - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>       so the unload-then-reload of kdump image will occur (the sysfs
>       crash_hotplug nodes will show 0).
>     - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>       to show 1, and the kernel will modify the elfcorehdr directly. And
>       with the udev changes in place, the unload-then-load will not occur!
>  - For systems which have these kernel changes as well as the udev and
>    kexec-tools changes in place, then the user/admin has full authority
>    over the enablement and support of crash hotplug support, whether via
>    kexec_file_load() or kexec_load().
> 
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
> 
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
> 
> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

LGTM,

Acked-by: Baoquan He <bhe@redhat.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
  2023-05-03 22:41   ` Eric DeVolder
@ 2023-05-09  6:15     ` Sourabh Jain
  -1 siblings, 0 replies; 38+ messages in thread
From: Sourabh Jain @ 2023-05-09  6:15 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, konrad.wilk, boris.ostrovsky


On 04/05/23 04:11, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
>
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
>
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
>
> To solve these problems, this patch introduces:
>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is

Architectures may need to update kexec segment other then elfcorehdr.
How about changing the flag name to KEXEC_UPDATE_SEGMENTS?

- Sourabh

>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>     has excluded the elfcorehdr from the digest).
>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>     should be in order to accommodate hotplug changes.
>   - The sysfs crash_hotplug nodes (ie.
>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>     that they examine kexec_file_load() vs kexec_load(), and when
>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>     This is critical so that the udev rule processing of crash_hotplug
>     indicates correctly (ie. the userspace unload-then-load of the
>     kdump of the kdump image can be skipped, or not).
>
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
>
>   - For systems which have these kernel changes in place, but not the
>     corresponding changes to the crash hot plug udev rules and
>     kexec-tools, (ie "older" systems) those systems will continue to
>     unload-then-load the kdump image, as has always been done. The
>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>   - For systems which have these kernel changes in place and the proposed
>     udev rule changes in place, but not the kexec-tools changes in place:
>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>        so the unload-then-reload of kdump image will occur (the sysfs
>        crash_hotplug nodes will show 0).
>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>        to show 1, and the kernel will modify the elfcorehdr directly. And
>        with the udev changes in place, the unload-then-load will not occur!
>   - For systems which have these kernel changes as well as the udev and
>     kexec-tools changes in place, then the user/admin has full authority
>     over the enablement and support of crash hotplug support, whether via
>     kexec_file_load() or kexec_load().
>
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
>
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
>
> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>   arch/x86/include/asm/kexec.h | 11 +++++++----
>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>   include/linux/kexec.h        | 14 ++++++++++++--
>   include/uapi/linux/kexec.h   |  1 +
>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>   kernel/kexec.c               |  3 +++
>   kernel/ksysfs.c              | 15 +++++++++++++++
>   7 files changed, 96 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 9143100ea3ea..3be6a98751f0 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>   
>   #ifdef CONFIG_HOTPLUG_CPU
> -static inline int crash_hotplug_cpu_support(void) { return 1; }
> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
> +int arch_crash_hotplug_cpu_support(void);
> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>   #endif
>   
>   #ifdef CONFIG_MEMORY_HOTPLUG
> -static inline int crash_hotplug_memory_support(void) { return 1; }
> -#define crash_hotplug_memory_support crash_hotplug_memory_support
> +int arch_crash_hotplug_memory_support(void);
> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>   #endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void);
> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>   #endif
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 0c9d496cf7ce..8064e65de6c0 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
>   
> +/* These functions provide the value for the sysfs crash_hotplug nodes */
> +#ifdef CONFIG_HOTPLUG_CPU
> +int arch_crash_hotplug_cpu_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int arch_crash_hotplug_memory_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void)
> +{
> +	unsigned int sz;
> +
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
> +	else
> +		sz += 2 + CONFIG_NR_CPUS_DEFAULT;
> +	sz *= sizeof(Elf64_Phdr);
> +	return sz;
> +}
> +
>   /**
>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>    * @image: the active struct kimage
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 6a8a724ac638..050e20066cdb 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -335,6 +335,10 @@ struct kimage {
>   	unsigned int preserve_context : 1;
>   	/* If set, we are using file mode kexec syscall */
>   	unsigned int file_mode:1;
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	/* If set, allow changes to elfcorehdr of kexec_load'd image */
> +	unsigned int update_elfcorehdr:1;
> +#endif
>   
>   #ifdef ARCH_HAS_KIMAGE_ARCH
>   	struct kimage_arch arch;
> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>   
>   /* List of defined/legal kexec flags */
>   #ifndef CONFIG_KEXEC_JUMP
> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>   #else
> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>   #endif
>   
>   /* List of defined/legal kexec file flags */
> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>   #endif
>   
> +int crash_check_update_elfcorehdr(void);
> +
>   #ifndef crash_hotplug_cpu_support
>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>   #endif
> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>   static inline int crash_hotplug_memory_support(void) { return 0; }
>   #endif
>   
> +#ifndef crash_get_elfcorehdr_size
> +static inline crash_get_elfcorehdr_size(void) { return 0; }
> +#endif
> +
>   #else /* !CONFIG_KEXEC_CORE */
>   struct pt_regs;
>   struct task_struct;
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index 981016e05cfa..01766dd839b0 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -12,6 +12,7 @@
>   /* kexec flags for different usage scenarios */
>   #define KEXEC_ON_CRASH		0x00000001
>   #define KEXEC_PRESERVE_CONTEXT	0x00000002
> +#define KEXEC_UPDATE_ELFCOREHDR	0x00000004
>   #define KEXEC_ARCH_MASK		0xffff0000
>   
>   /*
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index ef6e91daad56..e05bfdb7eaed 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>   #ifdef CONFIG_CRASH_HOTPLUG
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
> +
> +/*
> + * This routine utilized when the crash_hotplug sysfs node is read.
> + * It reflects the kernel's ability/permission to update the crash
> + * elfcorehdr directly.
> + */
> +int crash_check_update_elfcorehdr(void)
> +{
> +	int rc = 0;
> +
> +	/* Obtain lock while reading crash information */
> +	if (!kexec_trylock()) {
> +		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
> +		return 0;
> +	}
> +	if (kexec_crash_image) {
> +		if (kexec_crash_image->file_mode)
> +			rc = 1;
> +		else
> +			rc = kexec_crash_image->update_elfcorehdr;
> +	}
> +	/* Release lock now that update complete */
> +	kexec_unlock();
> +
> +	return rc;
> +}
> +
>   /*
>    * To accurately reflect hot un/plug changes of cpu and memory resources
>    * (including onling and offlining of those resources), the elfcorehdr
> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>   
>   	image = kexec_crash_image;
>   
> +	/* Check that updating elfcorehdr is permitted */
> +	if (!(image->file_mode || image->update_elfcorehdr))
> +		goto out;
> +
>   	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>   		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>   		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 92d301f98776..60de64bd14b9 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>   	if (flags & KEXEC_PRESERVE_CONTEXT)
>   		image->preserve_context = 1;
>   
> +	if (flags & KEXEC_UPDATE_ELFCOREHDR)
> +		image->update_elfcorehdr = 1;
> +
>   	ret = machine_kexec_prepare(image);
>   	if (ret)
>   		goto out;
> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> index aad7a3bfd846..1d4bc493b2f4 100644
> --- a/kernel/ksysfs.c
> +++ b/kernel/ksysfs.c
> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>   }
>   KERNEL_ATTR_RO(vmcoreinfo);
>   
> +#ifdef CONFIG_CRASH_HOTPLUG
> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
> +			       struct kobj_attribute *attr, char *buf)
> +{
> +	unsigned int sz = crash_get_elfcorehdr_size();
> +
> +	return sysfs_emit(buf, "%u\n", sz);
> +}
> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
> +
> +#endif
> +
>   #endif /* CONFIG_CRASH_CORE */
>   
>   /* whether file capabilities are enabled */
> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>   #endif
>   #ifdef CONFIG_CRASH_CORE
>   	&vmcoreinfo_attr.attr,
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	&crash_elfcorehdr_size_attr.attr,
> +#endif
>   #endif
>   #ifndef CONFIG_TINY_RCU
>   	&rcu_expedited_attr.attr,

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
@ 2023-05-09  6:15     ` Sourabh Jain
  0 siblings, 0 replies; 38+ messages in thread
From: Sourabh Jain @ 2023-05-09  6:15 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, konrad.wilk, boris.ostrovsky


On 04/05/23 04:11, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
>
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
>
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
>
> To solve these problems, this patch introduces:
>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is

Architectures may need to update kexec segment other then elfcorehdr.
How about changing the flag name to KEXEC_UPDATE_SEGMENTS?

- Sourabh

>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>     has excluded the elfcorehdr from the digest).
>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>     should be in order to accommodate hotplug changes.
>   - The sysfs crash_hotplug nodes (ie.
>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>     that they examine kexec_file_load() vs kexec_load(), and when
>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>     This is critical so that the udev rule processing of crash_hotplug
>     indicates correctly (ie. the userspace unload-then-load of the
>     kdump of the kdump image can be skipped, or not).
>
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
>
>   - For systems which have these kernel changes in place, but not the
>     corresponding changes to the crash hot plug udev rules and
>     kexec-tools, (ie "older" systems) those systems will continue to
>     unload-then-load the kdump image, as has always been done. The
>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>   - For systems which have these kernel changes in place and the proposed
>     udev rule changes in place, but not the kexec-tools changes in place:
>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>        so the unload-then-reload of kdump image will occur (the sysfs
>        crash_hotplug nodes will show 0).
>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>        to show 1, and the kernel will modify the elfcorehdr directly. And
>        with the udev changes in place, the unload-then-load will not occur!
>   - For systems which have these kernel changes as well as the udev and
>     kexec-tools changes in place, then the user/admin has full authority
>     over the enablement and support of crash hotplug support, whether via
>     kexec_file_load() or kexec_load().
>
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
>
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
>
> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>   arch/x86/include/asm/kexec.h | 11 +++++++----
>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>   include/linux/kexec.h        | 14 ++++++++++++--
>   include/uapi/linux/kexec.h   |  1 +
>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>   kernel/kexec.c               |  3 +++
>   kernel/ksysfs.c              | 15 +++++++++++++++
>   7 files changed, 96 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 9143100ea3ea..3be6a98751f0 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>   
>   #ifdef CONFIG_HOTPLUG_CPU
> -static inline int crash_hotplug_cpu_support(void) { return 1; }
> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
> +int arch_crash_hotplug_cpu_support(void);
> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>   #endif
>   
>   #ifdef CONFIG_MEMORY_HOTPLUG
> -static inline int crash_hotplug_memory_support(void) { return 1; }
> -#define crash_hotplug_memory_support crash_hotplug_memory_support
> +int arch_crash_hotplug_memory_support(void);
> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>   #endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void);
> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>   #endif
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 0c9d496cf7ce..8064e65de6c0 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
>   
> +/* These functions provide the value for the sysfs crash_hotplug nodes */
> +#ifdef CONFIG_HOTPLUG_CPU
> +int arch_crash_hotplug_cpu_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int arch_crash_hotplug_memory_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void)
> +{
> +	unsigned int sz;
> +
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
> +	else
> +		sz += 2 + CONFIG_NR_CPUS_DEFAULT;
> +	sz *= sizeof(Elf64_Phdr);
> +	return sz;
> +}
> +
>   /**
>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>    * @image: the active struct kimage
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 6a8a724ac638..050e20066cdb 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -335,6 +335,10 @@ struct kimage {
>   	unsigned int preserve_context : 1;
>   	/* If set, we are using file mode kexec syscall */
>   	unsigned int file_mode:1;
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	/* If set, allow changes to elfcorehdr of kexec_load'd image */
> +	unsigned int update_elfcorehdr:1;
> +#endif
>   
>   #ifdef ARCH_HAS_KIMAGE_ARCH
>   	struct kimage_arch arch;
> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>   
>   /* List of defined/legal kexec flags */
>   #ifndef CONFIG_KEXEC_JUMP
> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>   #else
> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>   #endif
>   
>   /* List of defined/legal kexec file flags */
> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>   #endif
>   
> +int crash_check_update_elfcorehdr(void);
> +
>   #ifndef crash_hotplug_cpu_support
>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>   #endif
> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>   static inline int crash_hotplug_memory_support(void) { return 0; }
>   #endif
>   
> +#ifndef crash_get_elfcorehdr_size
> +static inline crash_get_elfcorehdr_size(void) { return 0; }
> +#endif
> +
>   #else /* !CONFIG_KEXEC_CORE */
>   struct pt_regs;
>   struct task_struct;
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index 981016e05cfa..01766dd839b0 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -12,6 +12,7 @@
>   /* kexec flags for different usage scenarios */
>   #define KEXEC_ON_CRASH		0x00000001
>   #define KEXEC_PRESERVE_CONTEXT	0x00000002
> +#define KEXEC_UPDATE_ELFCOREHDR	0x00000004
>   #define KEXEC_ARCH_MASK		0xffff0000
>   
>   /*
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index ef6e91daad56..e05bfdb7eaed 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>   #ifdef CONFIG_CRASH_HOTPLUG
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
> +
> +/*
> + * This routine utilized when the crash_hotplug sysfs node is read.
> + * It reflects the kernel's ability/permission to update the crash
> + * elfcorehdr directly.
> + */
> +int crash_check_update_elfcorehdr(void)
> +{
> +	int rc = 0;
> +
> +	/* Obtain lock while reading crash information */
> +	if (!kexec_trylock()) {
> +		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
> +		return 0;
> +	}
> +	if (kexec_crash_image) {
> +		if (kexec_crash_image->file_mode)
> +			rc = 1;
> +		else
> +			rc = kexec_crash_image->update_elfcorehdr;
> +	}
> +	/* Release lock now that update complete */
> +	kexec_unlock();
> +
> +	return rc;
> +}
> +
>   /*
>    * To accurately reflect hot un/plug changes of cpu and memory resources
>    * (including onling and offlining of those resources), the elfcorehdr
> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>   
>   	image = kexec_crash_image;
>   
> +	/* Check that updating elfcorehdr is permitted */
> +	if (!(image->file_mode || image->update_elfcorehdr))
> +		goto out;
> +
>   	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>   		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>   		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 92d301f98776..60de64bd14b9 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>   	if (flags & KEXEC_PRESERVE_CONTEXT)
>   		image->preserve_context = 1;
>   
> +	if (flags & KEXEC_UPDATE_ELFCOREHDR)
> +		image->update_elfcorehdr = 1;
> +
>   	ret = machine_kexec_prepare(image);
>   	if (ret)
>   		goto out;
> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> index aad7a3bfd846..1d4bc493b2f4 100644
> --- a/kernel/ksysfs.c
> +++ b/kernel/ksysfs.c
> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>   }
>   KERNEL_ATTR_RO(vmcoreinfo);
>   
> +#ifdef CONFIG_CRASH_HOTPLUG
> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
> +			       struct kobj_attribute *attr, char *buf)
> +{
> +	unsigned int sz = crash_get_elfcorehdr_size();
> +
> +	return sysfs_emit(buf, "%u\n", sz);
> +}
> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
> +
> +#endif
> +
>   #endif /* CONFIG_CRASH_CORE */
>   
>   /* whether file capabilities are enabled */
> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>   #endif
>   #ifdef CONFIG_CRASH_CORE
>   	&vmcoreinfo_attr.attr,
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	&crash_elfcorehdr_size_attr.attr,
> +#endif
>   #endif
>   #ifndef CONFIG_TINY_RCU
>   	&rcu_expedited_attr.attr,

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
  2023-05-03 22:41   ` Eric DeVolder
@ 2023-05-09  6:56     ` Sourabh Jain
  -1 siblings, 0 replies; 38+ messages in thread
From: Sourabh Jain @ 2023-05-09  6:56 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, konrad.wilk, boris.ostrovsky


On 04/05/23 04:11, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
>
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
>
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
>
> To solve these problems, this patch introduces:
>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>     has excluded the elfcorehdr from the digest).
>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>     should be in order to accommodate hotplug changes.
>   - The sysfs crash_hotplug nodes (ie.
>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>     that they examine kexec_file_load() vs kexec_load(), and when
>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>     This is critical so that the udev rule processing of crash_hotplug
>     indicates correctly (ie. the userspace unload-then-load of the
>     kdump of the kdump image can be skipped, or not).
>
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
>
>   - For systems which have these kernel changes in place, but not the
>     corresponding changes to the crash hot plug udev rules and
>     kexec-tools, (ie "older" systems) those systems will continue to
>     unload-then-load the kdump image, as has always been done. The
>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>   - For systems which have these kernel changes in place and the proposed
>     udev rule changes in place, but not the kexec-tools changes in place:
>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>        so the unload-then-reload of kdump image will occur (the sysfs
>        crash_hotplug nodes will show 0).
>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>        to show 1, and the kernel will modify the elfcorehdr directly. And
>        with the udev changes in place, the unload-then-load will not occur!
>   - For systems which have these kernel changes as well as the udev and
>     kexec-tools changes in place, then the user/admin has full authority
>     over the enablement and support of crash hotplug support, whether via
>     kexec_file_load() or kexec_load().
>
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
>
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
>
> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>   arch/x86/include/asm/kexec.h | 11 +++++++----
>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>   include/linux/kexec.h        | 14 ++++++++++++--
>   include/uapi/linux/kexec.h   |  1 +
>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>   kernel/kexec.c               |  3 +++
>   kernel/ksysfs.c              | 15 +++++++++++++++
>   7 files changed, 96 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 9143100ea3ea..3be6a98751f0 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>   
>   #ifdef CONFIG_HOTPLUG_CPU
> -static inline int crash_hotplug_cpu_support(void) { return 1; }
> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
> +int arch_crash_hotplug_cpu_support(void);
> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>   #endif
>   
>   #ifdef CONFIG_MEMORY_HOTPLUG
> -static inline int crash_hotplug_memory_support(void) { return 1; }
> -#define crash_hotplug_memory_support crash_hotplug_memory_support
> +int arch_crash_hotplug_memory_support(void);
> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>   #endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void);
> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>   #endif
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 0c9d496cf7ce..8064e65de6c0 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
>   
> +/* These functions provide the value for the sysfs crash_hotplug nodes */
> +#ifdef CONFIG_HOTPLUG_CPU
> +int arch_crash_hotplug_cpu_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int arch_crash_hotplug_memory_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void)
> +{
> +	unsigned int sz;
> +
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
> +	else
> +		sz += 2 + CONFIG_NR_CPUS_DEFAULT;

If the sz holds the garbage value we have issues in else part. Or if you 
expecting
sz to be 0 then += is not needed in the else part.

How to doing this way?

unsigned int sz;

sz = 2 + CONFIG_NR_CPUS_DEFAULT;

if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
     sz += CRASH_MAX_MEMORY_RANGES


Thanks,
Sourabh Jain

> +	sz *= sizeof(Elf64_Phdr);
> +	return sz;
> +}
> +
>   /**
>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>    * @image: the active struct kimage
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 6a8a724ac638..050e20066cdb 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -335,6 +335,10 @@ struct kimage {
>   	unsigned int preserve_context : 1;
>   	/* If set, we are using file mode kexec syscall */
>   	unsigned int file_mode:1;
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	/* If set, allow changes to elfcorehdr of kexec_load'd image */
> +	unsigned int update_elfcorehdr:1;
> +#endif
>   
>   #ifdef ARCH_HAS_KIMAGE_ARCH
>   	struct kimage_arch arch;
> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>   
>   /* List of defined/legal kexec flags */
>   #ifndef CONFIG_KEXEC_JUMP
> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>   #else
> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>   #endif
>   
>   /* List of defined/legal kexec file flags */
> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>   #endif
>   
> +int crash_check_update_elfcorehdr(void);
> +
>   #ifndef crash_hotplug_cpu_support
>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>   #endif
> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>   static inline int crash_hotplug_memory_support(void) { return 0; }
>   #endif
>   
> +#ifndef crash_get_elfcorehdr_size
> +static inline crash_get_elfcorehdr_size(void) { return 0; }
> +#endif
> +
>   #else /* !CONFIG_KEXEC_CORE */
>   struct pt_regs;
>   struct task_struct;
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index 981016e05cfa..01766dd839b0 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -12,6 +12,7 @@
>   /* kexec flags for different usage scenarios */
>   #define KEXEC_ON_CRASH		0x00000001
>   #define KEXEC_PRESERVE_CONTEXT	0x00000002
> +#define KEXEC_UPDATE_ELFCOREHDR	0x00000004
>   #define KEXEC_ARCH_MASK		0xffff0000
>   
>   /*
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index ef6e91daad56..e05bfdb7eaed 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>   #ifdef CONFIG_CRASH_HOTPLUG
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
> +
> +/*
> + * This routine utilized when the crash_hotplug sysfs node is read.
> + * It reflects the kernel's ability/permission to update the crash
> + * elfcorehdr directly.
> + */
> +int crash_check_update_elfcorehdr(void)
> +{
> +	int rc = 0;
> +
> +	/* Obtain lock while reading crash information */
> +	if (!kexec_trylock()) {
> +		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
> +		return 0;
> +	}
> +	if (kexec_crash_image) {
> +		if (kexec_crash_image->file_mode)
> +			rc = 1;
> +		else
> +			rc = kexec_crash_image->update_elfcorehdr;
> +	}
> +	/* Release lock now that update complete */
> +	kexec_unlock();
> +
> +	return rc;
> +}
> +
>   /*
>    * To accurately reflect hot un/plug changes of cpu and memory resources
>    * (including onling and offlining of those resources), the elfcorehdr
> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>   
>   	image = kexec_crash_image;
>   
> +	/* Check that updating elfcorehdr is permitted */
> +	if (!(image->file_mode || image->update_elfcorehdr))
> +		goto out;
> +
>   	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>   		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>   		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 92d301f98776..60de64bd14b9 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>   	if (flags & KEXEC_PRESERVE_CONTEXT)
>   		image->preserve_context = 1;
>   
> +	if (flags & KEXEC_UPDATE_ELFCOREHDR)
> +		image->update_elfcorehdr = 1;
> +
>   	ret = machine_kexec_prepare(image);
>   	if (ret)
>   		goto out;
> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> index aad7a3bfd846..1d4bc493b2f4 100644
> --- a/kernel/ksysfs.c
> +++ b/kernel/ksysfs.c
> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>   }
>   KERNEL_ATTR_RO(vmcoreinfo);
>   
> +#ifdef CONFIG_CRASH_HOTPLUG
> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
> +			       struct kobj_attribute *attr, char *buf)
> +{
> +	unsigned int sz = crash_get_elfcorehdr_size();
> +
> +	return sysfs_emit(buf, "%u\n", sz);
> +}
> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
> +
> +#endif
> +
>   #endif /* CONFIG_CRASH_CORE */
>   
>   /* whether file capabilities are enabled */
> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>   #endif
>   #ifdef CONFIG_CRASH_CORE
>   	&vmcoreinfo_attr.attr,
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	&crash_elfcorehdr_size_attr.attr,
> +#endif
>   #endif
>   #ifndef CONFIG_TINY_RCU
>   	&rcu_expedited_attr.attr,

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
@ 2023-05-09  6:56     ` Sourabh Jain
  0 siblings, 0 replies; 38+ messages in thread
From: Sourabh Jain @ 2023-05-09  6:56 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, konrad.wilk, boris.ostrovsky


On 04/05/23 04:11, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
>
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
>
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
>
> To solve these problems, this patch introduces:
>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>     has excluded the elfcorehdr from the digest).
>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>     should be in order to accommodate hotplug changes.
>   - The sysfs crash_hotplug nodes (ie.
>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>     that they examine kexec_file_load() vs kexec_load(), and when
>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>     This is critical so that the udev rule processing of crash_hotplug
>     indicates correctly (ie. the userspace unload-then-load of the
>     kdump of the kdump image can be skipped, or not).
>
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
>
>   - For systems which have these kernel changes in place, but not the
>     corresponding changes to the crash hot plug udev rules and
>     kexec-tools, (ie "older" systems) those systems will continue to
>     unload-then-load the kdump image, as has always been done. The
>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>   - For systems which have these kernel changes in place and the proposed
>     udev rule changes in place, but not the kexec-tools changes in place:
>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>        so the unload-then-reload of kdump image will occur (the sysfs
>        crash_hotplug nodes will show 0).
>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>        to show 1, and the kernel will modify the elfcorehdr directly. And
>        with the udev changes in place, the unload-then-load will not occur!
>   - For systems which have these kernel changes as well as the udev and
>     kexec-tools changes in place, then the user/admin has full authority
>     over the enablement and support of crash hotplug support, whether via
>     kexec_file_load() or kexec_load().
>
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
>
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
>
> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>   arch/x86/include/asm/kexec.h | 11 +++++++----
>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>   include/linux/kexec.h        | 14 ++++++++++++--
>   include/uapi/linux/kexec.h   |  1 +
>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>   kernel/kexec.c               |  3 +++
>   kernel/ksysfs.c              | 15 +++++++++++++++
>   7 files changed, 96 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 9143100ea3ea..3be6a98751f0 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>   
>   #ifdef CONFIG_HOTPLUG_CPU
> -static inline int crash_hotplug_cpu_support(void) { return 1; }
> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
> +int arch_crash_hotplug_cpu_support(void);
> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>   #endif
>   
>   #ifdef CONFIG_MEMORY_HOTPLUG
> -static inline int crash_hotplug_memory_support(void) { return 1; }
> -#define crash_hotplug_memory_support crash_hotplug_memory_support
> +int arch_crash_hotplug_memory_support(void);
> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>   #endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void);
> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>   #endif
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 0c9d496cf7ce..8064e65de6c0 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
>   
> +/* These functions provide the value for the sysfs crash_hotplug nodes */
> +#ifdef CONFIG_HOTPLUG_CPU
> +int arch_crash_hotplug_cpu_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int arch_crash_hotplug_memory_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void)
> +{
> +	unsigned int sz;
> +
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
> +	else
> +		sz += 2 + CONFIG_NR_CPUS_DEFAULT;

If the sz holds the garbage value we have issues in else part. Or if you 
expecting
sz to be 0 then += is not needed in the else part.

How to doing this way?

unsigned int sz;

sz = 2 + CONFIG_NR_CPUS_DEFAULT;

if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
     sz += CRASH_MAX_MEMORY_RANGES


Thanks,
Sourabh Jain

> +	sz *= sizeof(Elf64_Phdr);
> +	return sz;
> +}
> +
>   /**
>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>    * @image: the active struct kimage
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 6a8a724ac638..050e20066cdb 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -335,6 +335,10 @@ struct kimage {
>   	unsigned int preserve_context : 1;
>   	/* If set, we are using file mode kexec syscall */
>   	unsigned int file_mode:1;
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	/* If set, allow changes to elfcorehdr of kexec_load'd image */
> +	unsigned int update_elfcorehdr:1;
> +#endif
>   
>   #ifdef ARCH_HAS_KIMAGE_ARCH
>   	struct kimage_arch arch;
> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>   
>   /* List of defined/legal kexec flags */
>   #ifndef CONFIG_KEXEC_JUMP
> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>   #else
> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>   #endif
>   
>   /* List of defined/legal kexec file flags */
> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>   #endif
>   
> +int crash_check_update_elfcorehdr(void);
> +
>   #ifndef crash_hotplug_cpu_support
>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>   #endif
> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>   static inline int crash_hotplug_memory_support(void) { return 0; }
>   #endif
>   
> +#ifndef crash_get_elfcorehdr_size
> +static inline crash_get_elfcorehdr_size(void) { return 0; }
> +#endif
> +
>   #else /* !CONFIG_KEXEC_CORE */
>   struct pt_regs;
>   struct task_struct;
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index 981016e05cfa..01766dd839b0 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -12,6 +12,7 @@
>   /* kexec flags for different usage scenarios */
>   #define KEXEC_ON_CRASH		0x00000001
>   #define KEXEC_PRESERVE_CONTEXT	0x00000002
> +#define KEXEC_UPDATE_ELFCOREHDR	0x00000004
>   #define KEXEC_ARCH_MASK		0xffff0000
>   
>   /*
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index ef6e91daad56..e05bfdb7eaed 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>   #ifdef CONFIG_CRASH_HOTPLUG
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
> +
> +/*
> + * This routine utilized when the crash_hotplug sysfs node is read.
> + * It reflects the kernel's ability/permission to update the crash
> + * elfcorehdr directly.
> + */
> +int crash_check_update_elfcorehdr(void)
> +{
> +	int rc = 0;
> +
> +	/* Obtain lock while reading crash information */
> +	if (!kexec_trylock()) {
> +		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
> +		return 0;
> +	}
> +	if (kexec_crash_image) {
> +		if (kexec_crash_image->file_mode)
> +			rc = 1;
> +		else
> +			rc = kexec_crash_image->update_elfcorehdr;
> +	}
> +	/* Release lock now that update complete */
> +	kexec_unlock();
> +
> +	return rc;
> +}
> +
>   /*
>    * To accurately reflect hot un/plug changes of cpu and memory resources
>    * (including onling and offlining of those resources), the elfcorehdr
> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>   
>   	image = kexec_crash_image;
>   
> +	/* Check that updating elfcorehdr is permitted */
> +	if (!(image->file_mode || image->update_elfcorehdr))
> +		goto out;
> +
>   	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>   		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>   		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 92d301f98776..60de64bd14b9 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>   	if (flags & KEXEC_PRESERVE_CONTEXT)
>   		image->preserve_context = 1;
>   
> +	if (flags & KEXEC_UPDATE_ELFCOREHDR)
> +		image->update_elfcorehdr = 1;
> +
>   	ret = machine_kexec_prepare(image);
>   	if (ret)
>   		goto out;
> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> index aad7a3bfd846..1d4bc493b2f4 100644
> --- a/kernel/ksysfs.c
> +++ b/kernel/ksysfs.c
> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>   }
>   KERNEL_ATTR_RO(vmcoreinfo);
>   
> +#ifdef CONFIG_CRASH_HOTPLUG
> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
> +			       struct kobj_attribute *attr, char *buf)
> +{
> +	unsigned int sz = crash_get_elfcorehdr_size();
> +
> +	return sysfs_emit(buf, "%u\n", sz);
> +}
> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
> +
> +#endif
> +
>   #endif /* CONFIG_CRASH_CORE */
>   
>   /* whether file capabilities are enabled */
> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>   #endif
>   #ifdef CONFIG_CRASH_CORE
>   	&vmcoreinfo_attr.attr,
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	&crash_elfcorehdr_size_attr.attr,
> +#endif
>   #endif
>   #ifndef CONFIG_TINY_RCU
>   	&rcu_expedited_attr.attr,

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
  2023-05-09  6:56     ` Sourabh Jain
@ 2023-05-09 20:35       ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-09 20:35 UTC (permalink / raw)
  To: Sourabh Jain, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, Konrad Wilk, Boris Ostrovsky



On 5/9/23 01:56, Sourabh Jain wrote:
> 
> On 04/05/23 04:11, Eric DeVolder wrote:
>> The hotplug support for kexec_load() requires coordination with
>> userspace, and therefore a little extra help from the kernel to
>> facilitate the coordination.
>>
>> In the absence of the solution contained within this particular
>> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
>> then the crash hotplug logic would find the segment containing the
>> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
>> generally speaking that is the desired behavior and outcome, a
>> problem arises from the fact that if the kdump image includes a
>> purgatory that performs a digest checksum, then that check would
>> fail (because the elfcorehdr was changed), and the capture kernel
>> would fail to boot and no kdump occur.
>>
>> Therefore, what is needed is for the userspace kexec-tools to
>> indicate to the kernel whether or not the supplied kdump image/
>> elfcorehdr can be modified (because the kexec-tools excludes the
>> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
>> appropriately).
>>
>> To solve these problems, this patch introduces:
>>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>>     has excluded the elfcorehdr from the digest).
>>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>>     should be in order to accommodate hotplug changes.
>>   - The sysfs crash_hotplug nodes (ie.
>>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>>     that they examine kexec_file_load() vs kexec_load(), and when
>>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>>     This is critical so that the udev rule processing of crash_hotplug
>>     indicates correctly (ie. the userspace unload-then-load of the
>>     kdump of the kdump image can be skipped, or not).
>>
>> With this patch in place, I believe the following statements to be true
>> (with local testing to verify):
>>
>>   - For systems which have these kernel changes in place, but not the
>>     corresponding changes to the crash hot plug udev rules and
>>     kexec-tools, (ie "older" systems) those systems will continue to
>>     unload-then-load the kdump image, as has always been done. The
>>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>>   - For systems which have these kernel changes in place and the proposed
>>     udev rule changes in place, but not the kexec-tools changes in place:
>>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>>        so the unload-then-reload of kdump image will occur (the sysfs
>>        crash_hotplug nodes will show 0).
>>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>>        to show 1, and the kernel will modify the elfcorehdr directly. And
>>        with the udev changes in place, the unload-then-load will not occur!
>>   - For systems which have these kernel changes as well as the udev and
>>     kexec-tools changes in place, then the user/admin has full authority
>>     over the enablement and support of crash hotplug support, whether via
>>     kexec_file_load() or kexec_load().
>>
>> Said differently, as kexec_load() was/is widely in use, these changes
>> permit it to continue to be used as-is (retaining the current unload-then-
>> reload behavior) until such time as the udev and kexec-tools changes can
>> be rolled out as well.
>>
>> I've intentionally kept the changes related to userspace coordination
>> for kexec_load() separate as this need was identified late; the
>> rest of this series has been generally reviewed and accepted. Once
>> this support has been vetted, I can refactor if needed.
>>
>> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   arch/x86/include/asm/kexec.h | 11 +++++++----
>>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>>   include/linux/kexec.h        | 14 ++++++++++++--
>>   include/uapi/linux/kexec.h   |  1 +
>>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>>   kernel/kexec.c               |  3 +++
>>   kernel/ksysfs.c              | 15 +++++++++++++++
>>   7 files changed, 96 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 9143100ea3ea..3be6a98751f0 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>>   #ifdef CONFIG_HOTPLUG_CPU
>> -static inline int crash_hotplug_cpu_support(void) { return 1; }
>> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
>> +int arch_crash_hotplug_cpu_support(void);
>> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>>   #endif
>>   #ifdef CONFIG_MEMORY_HOTPLUG
>> -static inline int crash_hotplug_memory_support(void) { return 1; }
>> -#define crash_hotplug_memory_support crash_hotplug_memory_support
>> +int arch_crash_hotplug_memory_support(void);
>> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>>   #endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void);
>> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>>   #endif
>>   #endif /* __ASSEMBLY__ */
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 0c9d496cf7ce..8064e65de6c0 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +/* These functions provide the value for the sysfs crash_hotplug nodes */
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +int arch_crash_hotplug_cpu_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +int arch_crash_hotplug_memory_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void)
>> +{
>> +    unsigned int sz;
>> +
>> +    if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>> +        sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
>> +    else
>> +        sz += 2 + CONFIG_NR_CPUS_DEFAULT;
> 
> If the sz holds the garbage value we have issues in else part. Or if you expecting
> sz to be 0 then += is not needed in the else part.
> 
> How to doing this way?
> 
> unsigned int sz;
> 
> sz = 2 + CONFIG_NR_CPUS_DEFAULT;
> 
> if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>      sz += CRASH_MAX_MEMORY_RANGES
> 
> 
> Thanks,
> Sourabh Jain
> 

Thanks for catching this mistake, sz is to be initialized to zero.
eric

>> +    sz *= sizeof(Elf64_Phdr);
>> +    return sz;
>> +}
>> +
>>   /**
>>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>>    * @image: the active struct kimage
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 6a8a724ac638..050e20066cdb 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -335,6 +335,10 @@ struct kimage {
>>       unsigned int preserve_context : 1;
>>       /* If set, we are using file mode kexec syscall */
>>       unsigned int file_mode:1;
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    /* If set, allow changes to elfcorehdr of kexec_load'd image */
>> +    unsigned int update_elfcorehdr:1;
>> +#endif
>>   #ifdef ARCH_HAS_KIMAGE_ARCH
>>       struct kimage_arch arch;
>> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>>   /* List of defined/legal kexec flags */
>>   #ifndef CONFIG_KEXEC_JUMP
>> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>>   #else
>> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>>   #endif
>>   /* List of defined/legal kexec file flags */
>> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>>   #endif
>> +int crash_check_update_elfcorehdr(void);
>> +
>>   #ifndef crash_hotplug_cpu_support
>>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   #endif
>> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   static inline int crash_hotplug_memory_support(void) { return 0; }
>>   #endif
>> +#ifndef crash_get_elfcorehdr_size
>> +static inline crash_get_elfcorehdr_size(void) { return 0; }
>> +#endif
>> +
>>   #else /* !CONFIG_KEXEC_CORE */
>>   struct pt_regs;
>>   struct task_struct;
>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 981016e05cfa..01766dd839b0 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -12,6 +12,7 @@
>>   /* kexec flags for different usage scenarios */
>>   #define KEXEC_ON_CRASH        0x00000001
>>   #define KEXEC_PRESERVE_CONTEXT    0x00000002
>> +#define KEXEC_UPDATE_ELFCOREHDR    0x00000004
>>   #define KEXEC_ARCH_MASK        0xffff0000
>>   /*
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index ef6e91daad56..e05bfdb7eaed 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>>   #ifdef CONFIG_CRASH_HOTPLUG
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +
>> +/*
>> + * This routine utilized when the crash_hotplug sysfs node is read.
>> + * It reflects the kernel's ability/permission to update the crash
>> + * elfcorehdr directly.
>> + */
>> +int crash_check_update_elfcorehdr(void)
>> +{
>> +    int rc = 0;
>> +
>> +    /* Obtain lock while reading crash information */
>> +    if (!kexec_trylock()) {
>> +        pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
>> +        return 0;
>> +    }
>> +    if (kexec_crash_image) {
>> +        if (kexec_crash_image->file_mode)
>> +            rc = 1;
>> +        else
>> +            rc = kexec_crash_image->update_elfcorehdr;
>> +    }
>> +    /* Release lock now that update complete */
>> +    kexec_unlock();
>> +
>> +    return rc;
>> +}
>> +
>>   /*
>>    * To accurately reflect hot un/plug changes of cpu and memory resources
>>    * (including onling and offlining of those resources), the elfcorehdr
>> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>>       image = kexec_crash_image;
>> +    /* Check that updating elfcorehdr is permitted */
>> +    if (!(image->file_mode || image->update_elfcorehdr))
>> +        goto out;
>> +
>>       if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>>           hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>>           pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>> index 92d301f98776..60de64bd14b9 100644
>> --- a/kernel/kexec.c
>> +++ b/kernel/kexec.c
>> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>>       if (flags & KEXEC_PRESERVE_CONTEXT)
>>           image->preserve_context = 1;
>> +    if (flags & KEXEC_UPDATE_ELFCOREHDR)
>> +        image->update_elfcorehdr = 1;
>> +
>>       ret = machine_kexec_prepare(image);
>>       if (ret)
>>           goto out;
>> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
>> index aad7a3bfd846..1d4bc493b2f4 100644
>> --- a/kernel/ksysfs.c
>> +++ b/kernel/ksysfs.c
>> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>>   }
>>   KERNEL_ATTR_RO(vmcoreinfo);
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
>> +                   struct kobj_attribute *attr, char *buf)
>> +{
>> +    unsigned int sz = crash_get_elfcorehdr_size();
>> +
>> +    return sysfs_emit(buf, "%u\n", sz);
>> +}
>> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
>> +
>> +#endif
>> +
>>   #endif /* CONFIG_CRASH_CORE */
>>   /* whether file capabilities are enabled */
>> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>>   #endif
>>   #ifdef CONFIG_CRASH_CORE
>>       &vmcoreinfo_attr.attr,
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    &crash_elfcorehdr_size_attr.attr,
>> +#endif
>>   #endif
>>   #ifndef CONFIG_TINY_RCU
>>       &rcu_expedited_attr.attr,

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
@ 2023-05-09 20:35       ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-09 20:35 UTC (permalink / raw)
  To: Sourabh Jain, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, Konrad Wilk, Boris Ostrovsky



On 5/9/23 01:56, Sourabh Jain wrote:
> 
> On 04/05/23 04:11, Eric DeVolder wrote:
>> The hotplug support for kexec_load() requires coordination with
>> userspace, and therefore a little extra help from the kernel to
>> facilitate the coordination.
>>
>> In the absence of the solution contained within this particular
>> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
>> then the crash hotplug logic would find the segment containing the
>> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
>> generally speaking that is the desired behavior and outcome, a
>> problem arises from the fact that if the kdump image includes a
>> purgatory that performs a digest checksum, then that check would
>> fail (because the elfcorehdr was changed), and the capture kernel
>> would fail to boot and no kdump occur.
>>
>> Therefore, what is needed is for the userspace kexec-tools to
>> indicate to the kernel whether or not the supplied kdump image/
>> elfcorehdr can be modified (because the kexec-tools excludes the
>> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
>> appropriately).
>>
>> To solve these problems, this patch introduces:
>>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>>     has excluded the elfcorehdr from the digest).
>>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>>     should be in order to accommodate hotplug changes.
>>   - The sysfs crash_hotplug nodes (ie.
>>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>>     that they examine kexec_file_load() vs kexec_load(), and when
>>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>>     This is critical so that the udev rule processing of crash_hotplug
>>     indicates correctly (ie. the userspace unload-then-load of the
>>     kdump of the kdump image can be skipped, or not).
>>
>> With this patch in place, I believe the following statements to be true
>> (with local testing to verify):
>>
>>   - For systems which have these kernel changes in place, but not the
>>     corresponding changes to the crash hot plug udev rules and
>>     kexec-tools, (ie "older" systems) those systems will continue to
>>     unload-then-load the kdump image, as has always been done. The
>>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>>   - For systems which have these kernel changes in place and the proposed
>>     udev rule changes in place, but not the kexec-tools changes in place:
>>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>>        so the unload-then-reload of kdump image will occur (the sysfs
>>        crash_hotplug nodes will show 0).
>>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>>        to show 1, and the kernel will modify the elfcorehdr directly. And
>>        with the udev changes in place, the unload-then-load will not occur!
>>   - For systems which have these kernel changes as well as the udev and
>>     kexec-tools changes in place, then the user/admin has full authority
>>     over the enablement and support of crash hotplug support, whether via
>>     kexec_file_load() or kexec_load().
>>
>> Said differently, as kexec_load() was/is widely in use, these changes
>> permit it to continue to be used as-is (retaining the current unload-then-
>> reload behavior) until such time as the udev and kexec-tools changes can
>> be rolled out as well.
>>
>> I've intentionally kept the changes related to userspace coordination
>> for kexec_load() separate as this need was identified late; the
>> rest of this series has been generally reviewed and accepted. Once
>> this support has been vetted, I can refactor if needed.
>>
>> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   arch/x86/include/asm/kexec.h | 11 +++++++----
>>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>>   include/linux/kexec.h        | 14 ++++++++++++--
>>   include/uapi/linux/kexec.h   |  1 +
>>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>>   kernel/kexec.c               |  3 +++
>>   kernel/ksysfs.c              | 15 +++++++++++++++
>>   7 files changed, 96 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 9143100ea3ea..3be6a98751f0 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>>   #ifdef CONFIG_HOTPLUG_CPU
>> -static inline int crash_hotplug_cpu_support(void) { return 1; }
>> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
>> +int arch_crash_hotplug_cpu_support(void);
>> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>>   #endif
>>   #ifdef CONFIG_MEMORY_HOTPLUG
>> -static inline int crash_hotplug_memory_support(void) { return 1; }
>> -#define crash_hotplug_memory_support crash_hotplug_memory_support
>> +int arch_crash_hotplug_memory_support(void);
>> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>>   #endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void);
>> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>>   #endif
>>   #endif /* __ASSEMBLY__ */
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 0c9d496cf7ce..8064e65de6c0 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +/* These functions provide the value for the sysfs crash_hotplug nodes */
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +int arch_crash_hotplug_cpu_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +int arch_crash_hotplug_memory_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void)
>> +{
>> +    unsigned int sz;
>> +
>> +    if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>> +        sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
>> +    else
>> +        sz += 2 + CONFIG_NR_CPUS_DEFAULT;
> 
> If the sz holds the garbage value we have issues in else part. Or if you expecting
> sz to be 0 then += is not needed in the else part.
> 
> How to doing this way?
> 
> unsigned int sz;
> 
> sz = 2 + CONFIG_NR_CPUS_DEFAULT;
> 
> if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>      sz += CRASH_MAX_MEMORY_RANGES
> 
> 
> Thanks,
> Sourabh Jain
> 

Thanks for catching this mistake, sz is to be initialized to zero.
eric

>> +    sz *= sizeof(Elf64_Phdr);
>> +    return sz;
>> +}
>> +
>>   /**
>>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>>    * @image: the active struct kimage
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 6a8a724ac638..050e20066cdb 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -335,6 +335,10 @@ struct kimage {
>>       unsigned int preserve_context : 1;
>>       /* If set, we are using file mode kexec syscall */
>>       unsigned int file_mode:1;
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    /* If set, allow changes to elfcorehdr of kexec_load'd image */
>> +    unsigned int update_elfcorehdr:1;
>> +#endif
>>   #ifdef ARCH_HAS_KIMAGE_ARCH
>>       struct kimage_arch arch;
>> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>>   /* List of defined/legal kexec flags */
>>   #ifndef CONFIG_KEXEC_JUMP
>> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>>   #else
>> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>>   #endif
>>   /* List of defined/legal kexec file flags */
>> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>>   #endif
>> +int crash_check_update_elfcorehdr(void);
>> +
>>   #ifndef crash_hotplug_cpu_support
>>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   #endif
>> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   static inline int crash_hotplug_memory_support(void) { return 0; }
>>   #endif
>> +#ifndef crash_get_elfcorehdr_size
>> +static inline crash_get_elfcorehdr_size(void) { return 0; }
>> +#endif
>> +
>>   #else /* !CONFIG_KEXEC_CORE */
>>   struct pt_regs;
>>   struct task_struct;
>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 981016e05cfa..01766dd839b0 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -12,6 +12,7 @@
>>   /* kexec flags for different usage scenarios */
>>   #define KEXEC_ON_CRASH        0x00000001
>>   #define KEXEC_PRESERVE_CONTEXT    0x00000002
>> +#define KEXEC_UPDATE_ELFCOREHDR    0x00000004
>>   #define KEXEC_ARCH_MASK        0xffff0000
>>   /*
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index ef6e91daad56..e05bfdb7eaed 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>>   #ifdef CONFIG_CRASH_HOTPLUG
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +
>> +/*
>> + * This routine utilized when the crash_hotplug sysfs node is read.
>> + * It reflects the kernel's ability/permission to update the crash
>> + * elfcorehdr directly.
>> + */
>> +int crash_check_update_elfcorehdr(void)
>> +{
>> +    int rc = 0;
>> +
>> +    /* Obtain lock while reading crash information */
>> +    if (!kexec_trylock()) {
>> +        pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
>> +        return 0;
>> +    }
>> +    if (kexec_crash_image) {
>> +        if (kexec_crash_image->file_mode)
>> +            rc = 1;
>> +        else
>> +            rc = kexec_crash_image->update_elfcorehdr;
>> +    }
>> +    /* Release lock now that update complete */
>> +    kexec_unlock();
>> +
>> +    return rc;
>> +}
>> +
>>   /*
>>    * To accurately reflect hot un/plug changes of cpu and memory resources
>>    * (including onling and offlining of those resources), the elfcorehdr
>> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>>       image = kexec_crash_image;
>> +    /* Check that updating elfcorehdr is permitted */
>> +    if (!(image->file_mode || image->update_elfcorehdr))
>> +        goto out;
>> +
>>       if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>>           hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>>           pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>> index 92d301f98776..60de64bd14b9 100644
>> --- a/kernel/kexec.c
>> +++ b/kernel/kexec.c
>> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>>       if (flags & KEXEC_PRESERVE_CONTEXT)
>>           image->preserve_context = 1;
>> +    if (flags & KEXEC_UPDATE_ELFCOREHDR)
>> +        image->update_elfcorehdr = 1;
>> +
>>       ret = machine_kexec_prepare(image);
>>       if (ret)
>>           goto out;
>> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
>> index aad7a3bfd846..1d4bc493b2f4 100644
>> --- a/kernel/ksysfs.c
>> +++ b/kernel/ksysfs.c
>> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>>   }
>>   KERNEL_ATTR_RO(vmcoreinfo);
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
>> +                   struct kobj_attribute *attr, char *buf)
>> +{
>> +    unsigned int sz = crash_get_elfcorehdr_size();
>> +
>> +    return sysfs_emit(buf, "%u\n", sz);
>> +}
>> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
>> +
>> +#endif
>> +
>>   #endif /* CONFIG_CRASH_CORE */
>>   /* whether file capabilities are enabled */
>> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>>   #endif
>>   #ifdef CONFIG_CRASH_CORE
>>       &vmcoreinfo_attr.attr,
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    &crash_elfcorehdr_size_attr.attr,
>> +#endif
>>   #endif
>>   #ifndef CONFIG_TINY_RCU
>>       &rcu_expedited_attr.attr,

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 8/8] x86/crash: optimize CPU changes
  2023-05-03 22:41   ` Eric DeVolder
@ 2023-05-09 22:39     ` Thomas Gleixner
  -1 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2023-05-09 22:39 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:
> This patch is dependent upon the patch 'crash: change

Seriously? You send a patch series which is ordered in itself and then
tell in the changelog of patch 8/8 that it depends on patch 7/8?

This information is complete garbage once the patches are applied and
ends up in the git logs and even for the submission it's useless
information.

Patch series are usually ordered by dependecy, no?

Aside of that please do:

# git grep 'This patch' Documentation/process/

> crash_prepare_elf64_headers() to for_each_possible_cpu()'. With that
> patch, crash_prepare_elf64_headers() writes out an ELF CPU PT_NOTE
> for all possible CPUs, thus further CPU changes to the elfcorehdr
> are not needed.

I'm having a hard time to decode this word salad.

  crash_prepare_elf64_headers() is writing out an ELF CPU PT_NOTE for
  all possible CPUs, thus further changes to the ELF core header are
  not required.

Makes some sense to me.

> This change works for kexec_file_load() and kexec_load() syscalls.
> For kexec_file_load(), crash_prepare_elf64_headers() is utilized
> directly and thus all ELF CPU PT_NOTEs are in the elfcorehdr already.
> This is the kimage->file_mode term.
> For kexec_load() syscall, one CPU or memory change will cause the
> elfcorehdr to be updated via crash_prepare_elf64_headers() and at
> that point all ELF CPU PT_NOTEs are in the elfcorehdr. This is the
> kimage->elfcorehdr_updated term.

Sorry. I tried hard, but this is completely incomprehensible.

> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 8064e65de6c0..3157e6068747 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -483,6 +483,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
>  	unsigned long mem, memsz;
>  	unsigned long elfsz = 0;
>  
> +	/* As crash_prepare_elf64_headers() has already described all

This is not a proper multiline comment. Please read and follow the tip
tree documentation along with all other things which are documented
there:

  https://www.kernel.org/doc/html/latest/process/maintainer-tip.html

This documentation is not there for entertainment value or exists just
because we are bored to death.

> +	 * possible CPUs, there is no need to update the elfcorehdr
> +	 * for additional CPU changes. This works for both kexec_load()
> +	 * and kexec_file_load() syscalls.

And it does not work for what?

You cannot expect that anyone who reads this code is an kexec/crash*
wizard who might be able to deduce the meaning of this.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 8/8] x86/crash: optimize CPU changes
@ 2023-05-09 22:39     ` Thomas Gleixner
  0 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2023-05-09 22:39 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:
> This patch is dependent upon the patch 'crash: change

Seriously? You send a patch series which is ordered in itself and then
tell in the changelog of patch 8/8 that it depends on patch 7/8?

This information is complete garbage once the patches are applied and
ends up in the git logs and even for the submission it's useless
information.

Patch series are usually ordered by dependecy, no?

Aside of that please do:

# git grep 'This patch' Documentation/process/

> crash_prepare_elf64_headers() to for_each_possible_cpu()'. With that
> patch, crash_prepare_elf64_headers() writes out an ELF CPU PT_NOTE
> for all possible CPUs, thus further CPU changes to the elfcorehdr
> are not needed.

I'm having a hard time to decode this word salad.

  crash_prepare_elf64_headers() is writing out an ELF CPU PT_NOTE for
  all possible CPUs, thus further changes to the ELF core header are
  not required.

Makes some sense to me.

> This change works for kexec_file_load() and kexec_load() syscalls.
> For kexec_file_load(), crash_prepare_elf64_headers() is utilized
> directly and thus all ELF CPU PT_NOTEs are in the elfcorehdr already.
> This is the kimage->file_mode term.
> For kexec_load() syscall, one CPU or memory change will cause the
> elfcorehdr to be updated via crash_prepare_elf64_headers() and at
> that point all ELF CPU PT_NOTEs are in the elfcorehdr. This is the
> kimage->elfcorehdr_updated term.

Sorry. I tried hard, but this is completely incomprehensible.

> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 8064e65de6c0..3157e6068747 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -483,6 +483,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
>  	unsigned long mem, memsz;
>  	unsigned long elfsz = 0;
>  
> +	/* As crash_prepare_elf64_headers() has already described all

This is not a proper multiline comment. Please read and follow the tip
tree documentation along with all other things which are documented
there:

  https://www.kernel.org/doc/html/latest/process/maintainer-tip.html

This documentation is not there for entertainment value or exists just
because we are bored to death.

> +	 * possible CPUs, there is no need to update the elfcorehdr
> +	 * for additional CPU changes. This works for both kexec_load()
> +	 * and kexec_file_load() syscalls.

And it does not work for what?

You cannot expect that anyone who reads this code is an kexec/crash*
wizard who might be able to deduce the meaning of this.

Thanks,

        tglx

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 5/8] x86/crash: add x86 crash hotplug support
  2023-05-03 22:41   ` Eric DeVolder
@ 2023-05-09 22:52     ` Thomas Gleixner
  -1 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2023-05-09 22:52 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:
> In the patch 'kexec: exclude elfcorehdr from the segment digest'

See reply to 8/8
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 53bab123a8ee..80538524c494 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2119,6 +2119,19 @@ config CRASH_DUMP
>  	  (CONFIG_RELOCATABLE=y).
>  	  For more details see Documentation/admin-guide/kdump/kdump.rst
>  
> +config CRASH_HOTPLUG
> +	bool "Update the crash elfcorehdr on system configuration changes"
> +	default y
> +	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
> +	help
> +	  Enable direct update to the crash elfcorehdr (which contains
> +	  the list of CPUs and memory regions to be dumped upon a crash)
> +	  in response to hot plug/unplug or online/offline of CPUs or
> +	  memory. This is a much more advanced approach than userspace
> +	  attempting that.
> +
> +	  If unsure, say Y.

Why is this config an X86 specific thing?

Neither CRASH_DUMP nor HOTPLUG_CPU nor MEMORY_HOTPLUG are in any way X86
specific at all. So why can't you stick that into a place where it can
be reused by other architectures?

It's not rocket science to do 

+	depends on WANTS_CRASH_HOTPLUG && CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)

or something like that. It's so tiring to have x86 Kconfig be the dump
ground for the initial implementation, then having the sh*t copied to
every other architecture and the cleanup is left to the maintainers.

It's not rocket science to differentiate between a real architecture
specific option and a generally useful option in the first place, right?


> +#ifdef CONFIG_CRASH_HOTPLUG
> +	/*
> +	 * Ensure the elfcorehdr segment large enough for hotplug changes.
> +	 * Account for VMCOREINFO and kernel_map and maximum CPUs.

Neither the first line nor the second one qualifies as parseable sentences.

> +/**
> + * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
> + * @image: the active struct kimage

What is an active struct kimage?

> + *
> + * The new elfcorehdr is prepared in a kernel buffer, and then it is
> + * written on top of the existing/old elfcorehdr.

-ENOPARSE

> + */
> +void arch_crash_handle_hotplug_event(struct kimage *image)
> +{
> +	void *elfbuf = NULL, *old_elfcorehdr;
> +	unsigned long nr_mem_ranges;
> +	unsigned long mem, memsz;
> +	unsigned long elfsz = 0;
> +
> +	/*
> +	 * Create the new elfcorehdr reflecting the changes to CPU and/or
> +	 * memory resources.
> +	 */
> +	if (prepare_elf_headers(image, &elfbuf, &elfsz, &nr_mem_ranges)) {
> +		pr_err("unable to prepare elfcore headers");
> +		goto out;

So this can fail. Why is there just a pr_err() and no return value which
tells the caller that this failed?

> +	/*
> +	 * Copy new elfcorehdr over the old elfcorehdr at destination.
> +	 */
> +	old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
> +	if (!old_elfcorehdr) {
> +		pr_err("updating elfcorehdr failed\n");

How hard is it to write an error message which is clearly describing the
problem?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 5/8] x86/crash: add x86 crash hotplug support
@ 2023-05-09 22:52     ` Thomas Gleixner
  0 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2023-05-09 22:52 UTC (permalink / raw)
  To: Eric DeVolder, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky,
	eric.devolder

On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:
> In the patch 'kexec: exclude elfcorehdr from the segment digest'

See reply to 8/8
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 53bab123a8ee..80538524c494 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2119,6 +2119,19 @@ config CRASH_DUMP
>  	  (CONFIG_RELOCATABLE=y).
>  	  For more details see Documentation/admin-guide/kdump/kdump.rst
>  
> +config CRASH_HOTPLUG
> +	bool "Update the crash elfcorehdr on system configuration changes"
> +	default y
> +	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
> +	help
> +	  Enable direct update to the crash elfcorehdr (which contains
> +	  the list of CPUs and memory regions to be dumped upon a crash)
> +	  in response to hot plug/unplug or online/offline of CPUs or
> +	  memory. This is a much more advanced approach than userspace
> +	  attempting that.
> +
> +	  If unsure, say Y.

Why is this config an X86 specific thing?

Neither CRASH_DUMP nor HOTPLUG_CPU nor MEMORY_HOTPLUG are in any way X86
specific at all. So why can't you stick that into a place where it can
be reused by other architectures?

It's not rocket science to do 

+	depends on WANTS_CRASH_HOTPLUG && CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)

or something like that. It's so tiring to have x86 Kconfig be the dump
ground for the initial implementation, then having the sh*t copied to
every other architecture and the cleanup is left to the maintainers.

It's not rocket science to differentiate between a real architecture
specific option and a generally useful option in the first place, right?


> +#ifdef CONFIG_CRASH_HOTPLUG
> +	/*
> +	 * Ensure the elfcorehdr segment large enough for hotplug changes.
> +	 * Account for VMCOREINFO and kernel_map and maximum CPUs.

Neither the first line nor the second one qualifies as parseable sentences.

> +/**
> + * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
> + * @image: the active struct kimage

What is an active struct kimage?

> + *
> + * The new elfcorehdr is prepared in a kernel buffer, and then it is
> + * written on top of the existing/old elfcorehdr.

-ENOPARSE

> + */
> +void arch_crash_handle_hotplug_event(struct kimage *image)
> +{
> +	void *elfbuf = NULL, *old_elfcorehdr;
> +	unsigned long nr_mem_ranges;
> +	unsigned long mem, memsz;
> +	unsigned long elfsz = 0;
> +
> +	/*
> +	 * Create the new elfcorehdr reflecting the changes to CPU and/or
> +	 * memory resources.
> +	 */
> +	if (prepare_elf_headers(image, &elfbuf, &elfsz, &nr_mem_ranges)) {
> +		pr_err("unable to prepare elfcore headers");
> +		goto out;

So this can fail. Why is there just a pr_err() and no return value which
tells the caller that this failed?

> +	/*
> +	 * Copy new elfcorehdr over the old elfcorehdr at destination.
> +	 */
> +	old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
> +	if (!old_elfcorehdr) {
> +		pr_err("updating elfcorehdr failed\n");

How hard is it to write an error message which is clearly describing the
problem?

Thanks,

        tglx

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 8/8] x86/crash: optimize CPU changes
  2023-05-09 22:39     ` Thomas Gleixner
@ 2023-05-10 22:49       ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-10 22:49 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky



On 5/9/23 17:39, Thomas Gleixner wrote:
> On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:
>> This patch is dependent upon the patch 'crash: change
> 
> Seriously? You send a patch series which is ordered in itself and then
> tell in the changelog of patch 8/8 that it depends on patch 7/8?
> 
> This information is complete garbage once the patches are applied and
> ends up in the git logs and even for the submission it's useless
> information.
> 
> Patch series are usually ordered by dependecy, no?
> 
> Aside of that please do:
> 
> # git grep 'This patch' Documentation/process/
> 
I'll remove, and re-examine the messages to use imperative tone.

>> crash_prepare_elf64_headers() to for_each_possible_cpu()'. With that
>> patch, crash_prepare_elf64_headers() writes out an ELF CPU PT_NOTE
>> for all possible CPUs, thus further CPU changes to the elfcorehdr
>> are not needed.
> 
> I'm having a hard time to decode this word salad.
> 
>    crash_prepare_elf64_headers() is writing out an ELF CPU PT_NOTE for
>    all possible CPUs, thus further changes to the ELF core header are
>    not required.
> 
> Makes some sense to me.

How about this?

crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

> 
>> This change works for kexec_file_load() and kexec_load() syscalls.
>> For kexec_file_load(), crash_prepare_elf64_headers() is utilized
>> directly and thus all ELF CPU PT_NOTEs are in the elfcorehdr already.
>> This is the kimage->file_mode term.
>> For kexec_load() syscall, one CPU or memory change will cause the
>> elfcorehdr to be updated via crash_prepare_elf64_headers() and at
>> that point all ELF CPU PT_NOTEs are in the elfcorehdr. This is the
>> kimage->elfcorehdr_updated term.
> 
> Sorry. I tried hard, but this is completely incomprehensible.
> 
How about this?

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 8064e65de6c0..3157e6068747 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -483,6 +483,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
>>   	unsigned long mem, memsz;
>>   	unsigned long elfsz = 0;
>>   
>> +	/* As crash_prepare_elf64_headers() has already described all
> 
> This is not a proper multiline comment. Please read and follow the tip
> tree documentation along with all other things which are documented
> there:
> 
>    https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
> 
> This documentation is not there for entertainment value or exists just
> because we are bored to death.
> 
I'll fix it; unintentional. Should checkpatch.pl catch this (it did not)?

>> +	 * possible CPUs, there is no need to update the elfcorehdr
>> +	 * for additional CPU changes. This works for both kexec_load()
>> +	 * and kexec_file_load() syscalls.
> 
> And it does not work for what?
> 
I'll remove this.

I keep using phrases like this since kexec_file_load() is wholly controlled by the kernel code, 
where as kexec_load() has userspace dependencies. In this case,the sentence isn't warranted; it
will work; no exceptional cases.

> You cannot expect that anyone who reads this code is an kexec/crash*
> wizard who might be able to deduce the meaning of this.
> 
> Thanks,
> 
>          tglx

Yes, thanks for the fresh eyes!
eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 8/8] x86/crash: optimize CPU changes
@ 2023-05-10 22:49       ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-10 22:49 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky



On 5/9/23 17:39, Thomas Gleixner wrote:
> On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:
>> This patch is dependent upon the patch 'crash: change
> 
> Seriously? You send a patch series which is ordered in itself and then
> tell in the changelog of patch 8/8 that it depends on patch 7/8?
> 
> This information is complete garbage once the patches are applied and
> ends up in the git logs and even for the submission it's useless
> information.
> 
> Patch series are usually ordered by dependecy, no?
> 
> Aside of that please do:
> 
> # git grep 'This patch' Documentation/process/
> 
I'll remove, and re-examine the messages to use imperative tone.

>> crash_prepare_elf64_headers() to for_each_possible_cpu()'. With that
>> patch, crash_prepare_elf64_headers() writes out an ELF CPU PT_NOTE
>> for all possible CPUs, thus further CPU changes to the elfcorehdr
>> are not needed.
> 
> I'm having a hard time to decode this word salad.
> 
>    crash_prepare_elf64_headers() is writing out an ELF CPU PT_NOTE for
>    all possible CPUs, thus further changes to the ELF core header are
>    not required.
> 
> Makes some sense to me.

How about this?

crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

> 
>> This change works for kexec_file_load() and kexec_load() syscalls.
>> For kexec_file_load(), crash_prepare_elf64_headers() is utilized
>> directly and thus all ELF CPU PT_NOTEs are in the elfcorehdr already.
>> This is the kimage->file_mode term.
>> For kexec_load() syscall, one CPU or memory change will cause the
>> elfcorehdr to be updated via crash_prepare_elf64_headers() and at
>> that point all ELF CPU PT_NOTEs are in the elfcorehdr. This is the
>> kimage->elfcorehdr_updated term.
> 
> Sorry. I tried hard, but this is completely incomprehensible.
> 
How about this?

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 8064e65de6c0..3157e6068747 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -483,6 +483,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
>>   	unsigned long mem, memsz;
>>   	unsigned long elfsz = 0;
>>   
>> +	/* As crash_prepare_elf64_headers() has already described all
> 
> This is not a proper multiline comment. Please read and follow the tip
> tree documentation along with all other things which are documented
> there:
> 
>    https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
> 
> This documentation is not there for entertainment value or exists just
> because we are bored to death.
> 
I'll fix it; unintentional. Should checkpatch.pl catch this (it did not)?

>> +	 * possible CPUs, there is no need to update the elfcorehdr
>> +	 * for additional CPU changes. This works for both kexec_load()
>> +	 * and kexec_file_load() syscalls.
> 
> And it does not work for what?
> 
I'll remove this.

I keep using phrases like this since kexec_file_load() is wholly controlled by the kernel code, 
where as kexec_load() has userspace dependencies. In this case,the sentence isn't warranted; it
will work; no exceptional cases.

> You cannot expect that anyone who reads this code is an kexec/crash*
> wizard who might be able to deduce the meaning of this.
> 
> Thanks,
> 
>          tglx

Yes, thanks for the fresh eyes!
eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 5/8] x86/crash: add x86 crash hotplug support
  2023-05-09 22:52     ` Thomas Gleixner
@ 2023-05-10 22:49       ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-10 22:49 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky



On 5/9/23 17:52, Thomas Gleixner wrote:
> On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:
>> In the patch 'kexec: exclude elfcorehdr from the segment digest'
> 
> See reply to 8/8
yep

>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 53bab123a8ee..80538524c494 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -2119,6 +2119,19 @@ config CRASH_DUMP
>>   	  (CONFIG_RELOCATABLE=y).
>>   	  For more details see Documentation/admin-guide/kdump/kdump.rst
>>   
>> +config CRASH_HOTPLUG
>> +	bool "Update the crash elfcorehdr on system configuration changes"
>> +	default y
>> +	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
>> +	help
>> +	  Enable direct update to the crash elfcorehdr (which contains
>> +	  the list of CPUs and memory regions to be dumped upon a crash)
>> +	  in response to hot plug/unplug or online/offline of CPUs or
>> +	  memory. This is a much more advanced approach than userspace
>> +	  attempting that.
>> +
>> +	  If unsure, say Y.
> 
> Why is this config an X86 specific thing?
> 
> Neither CRASH_DUMP nor HOTPLUG_CPU nor MEMORY_HOTPLUG are in any way X86
> specific at all. So why can't you stick that into a place where it can
> be reused by other architectures?
> 
> It's not rocket science to do
> 
> +	depends on WANTS_CRASH_HOTPLUG && CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
> 
> or something like that. It's so tiring to have x86 Kconfig be the dump
> ground for the initial implementation, then having the sh*t copied to
> every other architecture and the cleanup is left to the maintainers.
> 
> It's not rocket science to differentiate between a real architecture
> specific option and a generally useful option in the first place, right?

Right. To your point, CRASH_DUMP has been copied in all the archs:
arch/arm/Kconfig:config CRASH_DUMP
arch/arm64/Kconfig:config CRASH_DUMP
arch/ia64/Kconfig:config CRASH_DUMP
arch/mips/Kconfig:config CRASH_DUMP
arch/powerpc/Kconfig:config CRASH_DUMP
arch/riscv/Kconfig:config CRASH_DUMP
arch/s390/Kconfig:config CRASH_DUMP
arch/sh/Kconfig:config CRASH_DUMP
arch/x86/Kconfig:config CRASH_DUMP
arch/loongarch/Kconfig:config CRASH_DUMP

Likewise for KEXEC and KEXEC_FILE.

I've looked into this in the past, and looking again today, I don't see a natural
place to put the option. Perhaps starting a kernel/Kconfig.kexec?

> 
> 
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +	/*
>> +	 * Ensure the elfcorehdr segment large enough for hotplug changes.
>> +	 * Account for VMCOREINFO and kernel_map and maximum CPUs.
> 
> Neither the first line nor the second one qualifies as parseable sentences.
> 

What about:
Ensure the elfcorehdr segment is large enough for hotplug changes.
The segment size accounts for VMCOREINFO, kernel_map, maximum CPUs
and maximum memory ranges.


>> +/**
>> + * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>> + * @image: the active struct kimage
> 
> What is an active struct kimage?
> 
How about this:
@image: a pointer to kexec_crash_image

>> + *
>> + * The new elfcorehdr is prepared in a kernel buffer, and then it is
>> + * written on top of the existing/old elfcorehdr.
> 
> -ENOPARSE
> 
How about:
Prepare the new elfcorehdr and replace the existing elfcorehdr.

>> + */
>> +void arch_crash_handle_hotplug_event(struct kimage *image)
>> +{
>> +	void *elfbuf = NULL, *old_elfcorehdr;
>> +	unsigned long nr_mem_ranges;
>> +	unsigned long mem, memsz;
>> +	unsigned long elfsz = 0;
>> +
>> +	/*
>> +	 * Create the new elfcorehdr reflecting the changes to CPU and/or
>> +	 * memory resources.
>> +	 */
>> +	if (prepare_elf_headers(image, &elfbuf, &elfsz, &nr_mem_ranges)) {
>> +		pr_err("unable to prepare elfcore headers");
>> +		goto out;
> 
> So this can fail. Why is there just a pr_err() and no return value which
> tells the caller that this failed?
An error in the crash elfcorehdr infrastructure introduced in this series
is not a reason to rollback state. The cpuhp and memory notifier callbacks
always return an OK.
The primary errors that might occur are failure to obtain the kexec_lock,
and failure to obtain a temporary kernel buffer to stage the new elfcorehdr.
How about:
pr_err("prepare_elf_headers() failed");

> 
>> +	/*
>> +	 * Copy new elfcorehdr over the old elfcorehdr at destination.
>> +	 */
>> +	old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
>> +	if (!old_elfcorehdr) {
>> +		pr_err("updating elfcorehdr failed\n");
> 
> How hard is it to write an error message which is clearly describing the
> problem?
> 
How about:
pr_err("mapping elfcorehdr segment failed");

> Thanks,
> 
>          tglx
Again, thanks for the fresh eyes!
eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 5/8] x86/crash: add x86 crash hotplug support
@ 2023-05-10 22:49       ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-10 22:49 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, sourabhjain, konrad.wilk, boris.ostrovsky



On 5/9/23 17:52, Thomas Gleixner wrote:
> On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:
>> In the patch 'kexec: exclude elfcorehdr from the segment digest'
> 
> See reply to 8/8
yep

>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 53bab123a8ee..80538524c494 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -2119,6 +2119,19 @@ config CRASH_DUMP
>>   	  (CONFIG_RELOCATABLE=y).
>>   	  For more details see Documentation/admin-guide/kdump/kdump.rst
>>   
>> +config CRASH_HOTPLUG
>> +	bool "Update the crash elfcorehdr on system configuration changes"
>> +	default y
>> +	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
>> +	help
>> +	  Enable direct update to the crash elfcorehdr (which contains
>> +	  the list of CPUs and memory regions to be dumped upon a crash)
>> +	  in response to hot plug/unplug or online/offline of CPUs or
>> +	  memory. This is a much more advanced approach than userspace
>> +	  attempting that.
>> +
>> +	  If unsure, say Y.
> 
> Why is this config an X86 specific thing?
> 
> Neither CRASH_DUMP nor HOTPLUG_CPU nor MEMORY_HOTPLUG are in any way X86
> specific at all. So why can't you stick that into a place where it can
> be reused by other architectures?
> 
> It's not rocket science to do
> 
> +	depends on WANTS_CRASH_HOTPLUG && CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
> 
> or something like that. It's so tiring to have x86 Kconfig be the dump
> ground for the initial implementation, then having the sh*t copied to
> every other architecture and the cleanup is left to the maintainers.
> 
> It's not rocket science to differentiate between a real architecture
> specific option and a generally useful option in the first place, right?

Right. To your point, CRASH_DUMP has been copied in all the archs:
arch/arm/Kconfig:config CRASH_DUMP
arch/arm64/Kconfig:config CRASH_DUMP
arch/ia64/Kconfig:config CRASH_DUMP
arch/mips/Kconfig:config CRASH_DUMP
arch/powerpc/Kconfig:config CRASH_DUMP
arch/riscv/Kconfig:config CRASH_DUMP
arch/s390/Kconfig:config CRASH_DUMP
arch/sh/Kconfig:config CRASH_DUMP
arch/x86/Kconfig:config CRASH_DUMP
arch/loongarch/Kconfig:config CRASH_DUMP

Likewise for KEXEC and KEXEC_FILE.

I've looked into this in the past, and looking again today, I don't see a natural
place to put the option. Perhaps starting a kernel/Kconfig.kexec?

> 
> 
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +	/*
>> +	 * Ensure the elfcorehdr segment large enough for hotplug changes.
>> +	 * Account for VMCOREINFO and kernel_map and maximum CPUs.
> 
> Neither the first line nor the second one qualifies as parseable sentences.
> 

What about:
Ensure the elfcorehdr segment is large enough for hotplug changes.
The segment size accounts for VMCOREINFO, kernel_map, maximum CPUs
and maximum memory ranges.


>> +/**
>> + * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>> + * @image: the active struct kimage
> 
> What is an active struct kimage?
> 
How about this:
@image: a pointer to kexec_crash_image

>> + *
>> + * The new elfcorehdr is prepared in a kernel buffer, and then it is
>> + * written on top of the existing/old elfcorehdr.
> 
> -ENOPARSE
> 
How about:
Prepare the new elfcorehdr and replace the existing elfcorehdr.

>> + */
>> +void arch_crash_handle_hotplug_event(struct kimage *image)
>> +{
>> +	void *elfbuf = NULL, *old_elfcorehdr;
>> +	unsigned long nr_mem_ranges;
>> +	unsigned long mem, memsz;
>> +	unsigned long elfsz = 0;
>> +
>> +	/*
>> +	 * Create the new elfcorehdr reflecting the changes to CPU and/or
>> +	 * memory resources.
>> +	 */
>> +	if (prepare_elf_headers(image, &elfbuf, &elfsz, &nr_mem_ranges)) {
>> +		pr_err("unable to prepare elfcore headers");
>> +		goto out;
> 
> So this can fail. Why is there just a pr_err() and no return value which
> tells the caller that this failed?
An error in the crash elfcorehdr infrastructure introduced in this series
is not a reason to rollback state. The cpuhp and memory notifier callbacks
always return an OK.
The primary errors that might occur are failure to obtain the kexec_lock,
and failure to obtain a temporary kernel buffer to stage the new elfcorehdr.
How about:
pr_err("prepare_elf_headers() failed");

> 
>> +	/*
>> +	 * Copy new elfcorehdr over the old elfcorehdr at destination.
>> +	 */
>> +	old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
>> +	if (!old_elfcorehdr) {
>> +		pr_err("updating elfcorehdr failed\n");
> 
> How hard is it to write an error message which is clearly describing the
> problem?
> 
How about:
pr_err("mapping elfcorehdr segment failed");

> Thanks,
> 
>          tglx
Again, thanks for the fresh eyes!
eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
  2023-05-09  6:15     ` Sourabh Jain
@ 2023-05-10 22:52       ` Eric DeVolder
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-10 22:52 UTC (permalink / raw)
  To: Sourabh Jain, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, konrad.wilk, boris.ostrovsky



On 5/9/23 01:15, Sourabh Jain wrote:
> 
> On 04/05/23 04:11, Eric DeVolder wrote:
>> The hotplug support for kexec_load() requires coordination with
>> userspace, and therefore a little extra help from the kernel to
>> facilitate the coordination.
>>
>> In the absence of the solution contained within this particular
>> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
>> then the crash hotplug logic would find the segment containing the
>> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
>> generally speaking that is the desired behavior and outcome, a
>> problem arises from the fact that if the kdump image includes a
>> purgatory that performs a digest checksum, then that check would
>> fail (because the elfcorehdr was changed), and the capture kernel
>> would fail to boot and no kdump occur.
>>
>> Therefore, what is needed is for the userspace kexec-tools to
>> indicate to the kernel whether or not the supplied kdump image/
>> elfcorehdr can be modified (because the kexec-tools excludes the
>> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
>> appropriately).
>>
>> To solve these problems, this patch introduces:
>>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
> 
> Architectures may need to update kexec segment other then elfcorehdr.
> How about changing the flag name to KEXEC_UPDATE_SEGMENTS?
> 
> - Sourabh
> 
These seems almost too generic and vague. I get that for PPC this flag
will drive updating elfcorehdr as well as FDT, so the flag is over-loaded
in a sense.

Another idea for the name?
eric


>>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>>     has excluded the elfcorehdr from the digest).
>>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>>     should be in order to accommodate hotplug changes.
>>   - The sysfs crash_hotplug nodes (ie.
>>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>>     that they examine kexec_file_load() vs kexec_load(), and when
>>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>>     This is critical so that the udev rule processing of crash_hotplug
>>     indicates correctly (ie. the userspace unload-then-load of the
>>     kdump of the kdump image can be skipped, or not).
>>
>> With this patch in place, I believe the following statements to be true
>> (with local testing to verify):
>>
>>   - For systems which have these kernel changes in place, but not the
>>     corresponding changes to the crash hot plug udev rules and
>>     kexec-tools, (ie "older" systems) those systems will continue to
>>     unload-then-load the kdump image, as has always been done. The
>>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>>   - For systems which have these kernel changes in place and the proposed
>>     udev rule changes in place, but not the kexec-tools changes in place:
>>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>>        so the unload-then-reload of kdump image will occur (the sysfs
>>        crash_hotplug nodes will show 0).
>>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>>        to show 1, and the kernel will modify the elfcorehdr directly. And
>>        with the udev changes in place, the unload-then-load will not occur!
>>   - For systems which have these kernel changes as well as the udev and
>>     kexec-tools changes in place, then the user/admin has full authority
>>     over the enablement and support of crash hotplug support, whether via
>>     kexec_file_load() or kexec_load().
>>
>> Said differently, as kexec_load() was/is widely in use, these changes
>> permit it to continue to be used as-is (retaining the current unload-then-
>> reload behavior) until such time as the udev and kexec-tools changes can
>> be rolled out as well.
>>
>> I've intentionally kept the changes related to userspace coordination
>> for kexec_load() separate as this need was identified late; the
>> rest of this series has been generally reviewed and accepted. Once
>> this support has been vetted, I can refactor if needed.
>>
>> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   arch/x86/include/asm/kexec.h | 11 +++++++----
>>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>>   include/linux/kexec.h        | 14 ++++++++++++--
>>   include/uapi/linux/kexec.h   |  1 +
>>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>>   kernel/kexec.c               |  3 +++
>>   kernel/ksysfs.c              | 15 +++++++++++++++
>>   7 files changed, 96 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 9143100ea3ea..3be6a98751f0 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>>   #ifdef CONFIG_HOTPLUG_CPU
>> -static inline int crash_hotplug_cpu_support(void) { return 1; }
>> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
>> +int arch_crash_hotplug_cpu_support(void);
>> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>>   #endif
>>   #ifdef CONFIG_MEMORY_HOTPLUG
>> -static inline int crash_hotplug_memory_support(void) { return 1; }
>> -#define crash_hotplug_memory_support crash_hotplug_memory_support
>> +int arch_crash_hotplug_memory_support(void);
>> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>>   #endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void);
>> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>>   #endif
>>   #endif /* __ASSEMBLY__ */
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 0c9d496cf7ce..8064e65de6c0 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +/* These functions provide the value for the sysfs crash_hotplug nodes */
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +int arch_crash_hotplug_cpu_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +int arch_crash_hotplug_memory_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void)
>> +{
>> +    unsigned int sz;
>> +
>> +    if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>> +        sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
>> +    else
>> +        sz += 2 + CONFIG_NR_CPUS_DEFAULT;
>> +    sz *= sizeof(Elf64_Phdr);
>> +    return sz;
>> +}
>> +
>>   /**
>>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>>    * @image: the active struct kimage
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 6a8a724ac638..050e20066cdb 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -335,6 +335,10 @@ struct kimage {
>>       unsigned int preserve_context : 1;
>>       /* If set, we are using file mode kexec syscall */
>>       unsigned int file_mode:1;
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    /* If set, allow changes to elfcorehdr of kexec_load'd image */
>> +    unsigned int update_elfcorehdr:1;
>> +#endif
>>   #ifdef ARCH_HAS_KIMAGE_ARCH
>>       struct kimage_arch arch;
>> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>>   /* List of defined/legal kexec flags */
>>   #ifndef CONFIG_KEXEC_JUMP
>> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>>   #else
>> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>>   #endif
>>   /* List of defined/legal kexec file flags */
>> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>>   #endif
>> +int crash_check_update_elfcorehdr(void);
>> +
>>   #ifndef crash_hotplug_cpu_support
>>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   #endif
>> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   static inline int crash_hotplug_memory_support(void) { return 0; }
>>   #endif
>> +#ifndef crash_get_elfcorehdr_size
>> +static inline crash_get_elfcorehdr_size(void) { return 0; }
>> +#endif
>> +
>>   #else /* !CONFIG_KEXEC_CORE */
>>   struct pt_regs;
>>   struct task_struct;
>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 981016e05cfa..01766dd839b0 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -12,6 +12,7 @@
>>   /* kexec flags for different usage scenarios */
>>   #define KEXEC_ON_CRASH        0x00000001
>>   #define KEXEC_PRESERVE_CONTEXT    0x00000002
>> +#define KEXEC_UPDATE_ELFCOREHDR    0x00000004
>>   #define KEXEC_ARCH_MASK        0xffff0000
>>   /*
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index ef6e91daad56..e05bfdb7eaed 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>>   #ifdef CONFIG_CRASH_HOTPLUG
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +
>> +/*
>> + * This routine utilized when the crash_hotplug sysfs node is read.
>> + * It reflects the kernel's ability/permission to update the crash
>> + * elfcorehdr directly.
>> + */
>> +int crash_check_update_elfcorehdr(void)
>> +{
>> +    int rc = 0;
>> +
>> +    /* Obtain lock while reading crash information */
>> +    if (!kexec_trylock()) {
>> +        pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
>> +        return 0;
>> +    }
>> +    if (kexec_crash_image) {
>> +        if (kexec_crash_image->file_mode)
>> +            rc = 1;
>> +        else
>> +            rc = kexec_crash_image->update_elfcorehdr;
>> +    }
>> +    /* Release lock now that update complete */
>> +    kexec_unlock();
>> +
>> +    return rc;
>> +}
>> +
>>   /*
>>    * To accurately reflect hot un/plug changes of cpu and memory resources
>>    * (including onling and offlining of those resources), the elfcorehdr
>> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>>       image = kexec_crash_image;
>> +    /* Check that updating elfcorehdr is permitted */
>> +    if (!(image->file_mode || image->update_elfcorehdr))
>> +        goto out;
>> +
>>       if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>>           hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>>           pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>> index 92d301f98776..60de64bd14b9 100644
>> --- a/kernel/kexec.c
>> +++ b/kernel/kexec.c
>> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>>       if (flags & KEXEC_PRESERVE_CONTEXT)
>>           image->preserve_context = 1;
>> +    if (flags & KEXEC_UPDATE_ELFCOREHDR)
>> +        image->update_elfcorehdr = 1;
>> +
>>       ret = machine_kexec_prepare(image);
>>       if (ret)
>>           goto out;
>> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
>> index aad7a3bfd846..1d4bc493b2f4 100644
>> --- a/kernel/ksysfs.c
>> +++ b/kernel/ksysfs.c
>> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>>   }
>>   KERNEL_ATTR_RO(vmcoreinfo);
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
>> +                   struct kobj_attribute *attr, char *buf)
>> +{
>> +    unsigned int sz = crash_get_elfcorehdr_size();
>> +
>> +    return sysfs_emit(buf, "%u\n", sz);
>> +}
>> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
>> +
>> +#endif
>> +
>>   #endif /* CONFIG_CRASH_CORE */
>>   /* whether file capabilities are enabled */
>> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>>   #endif
>>   #ifdef CONFIG_CRASH_CORE
>>       &vmcoreinfo_attr.attr,
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    &crash_elfcorehdr_size_attr.attr,
>> +#endif
>>   #endif
>>   #ifndef CONFIG_TINY_RCU
>>       &rcu_expedited_attr.attr,

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
@ 2023-05-10 22:52       ` Eric DeVolder
  0 siblings, 0 replies; 38+ messages in thread
From: Eric DeVolder @ 2023-05-10 22:52 UTC (permalink / raw)
  To: Sourabh Jain, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal
  Cc: tglx, mingo, bp, dave.hansen, hpa, nramas, thomas.lendacky, robh,
	efault, rppt, david, konrad.wilk, boris.ostrovsky



On 5/9/23 01:15, Sourabh Jain wrote:
> 
> On 04/05/23 04:11, Eric DeVolder wrote:
>> The hotplug support for kexec_load() requires coordination with
>> userspace, and therefore a little extra help from the kernel to
>> facilitate the coordination.
>>
>> In the absence of the solution contained within this particular
>> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
>> then the crash hotplug logic would find the segment containing the
>> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
>> generally speaking that is the desired behavior and outcome, a
>> problem arises from the fact that if the kdump image includes a
>> purgatory that performs a digest checksum, then that check would
>> fail (because the elfcorehdr was changed), and the capture kernel
>> would fail to boot and no kdump occur.
>>
>> Therefore, what is needed is for the userspace kexec-tools to
>> indicate to the kernel whether or not the supplied kdump image/
>> elfcorehdr can be modified (because the kexec-tools excludes the
>> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
>> appropriately).
>>
>> To solve these problems, this patch introduces:
>>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
> 
> Architectures may need to update kexec segment other then elfcorehdr.
> How about changing the flag name to KEXEC_UPDATE_SEGMENTS?
> 
> - Sourabh
> 
These seems almost too generic and vague. I get that for PPC this flag
will drive updating elfcorehdr as well as FDT, so the flag is over-loaded
in a sense.

Another idea for the name?
eric


>>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>>     has excluded the elfcorehdr from the digest).
>>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>>     should be in order to accommodate hotplug changes.
>>   - The sysfs crash_hotplug nodes (ie.
>>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>>     that they examine kexec_file_load() vs kexec_load(), and when
>>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>>     This is critical so that the udev rule processing of crash_hotplug
>>     indicates correctly (ie. the userspace unload-then-load of the
>>     kdump of the kdump image can be skipped, or not).
>>
>> With this patch in place, I believe the following statements to be true
>> (with local testing to verify):
>>
>>   - For systems which have these kernel changes in place, but not the
>>     corresponding changes to the crash hot plug udev rules and
>>     kexec-tools, (ie "older" systems) those systems will continue to
>>     unload-then-load the kdump image, as has always been done. The
>>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>>   - For systems which have these kernel changes in place and the proposed
>>     udev rule changes in place, but not the kexec-tools changes in place:
>>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>>        so the unload-then-reload of kdump image will occur (the sysfs
>>        crash_hotplug nodes will show 0).
>>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>>        to show 1, and the kernel will modify the elfcorehdr directly. And
>>        with the udev changes in place, the unload-then-load will not occur!
>>   - For systems which have these kernel changes as well as the udev and
>>     kexec-tools changes in place, then the user/admin has full authority
>>     over the enablement and support of crash hotplug support, whether via
>>     kexec_file_load() or kexec_load().
>>
>> Said differently, as kexec_load() was/is widely in use, these changes
>> permit it to continue to be used as-is (retaining the current unload-then-
>> reload behavior) until such time as the udev and kexec-tools changes can
>> be rolled out as well.
>>
>> I've intentionally kept the changes related to userspace coordination
>> for kexec_load() separate as this need was identified late; the
>> rest of this series has been generally reviewed and accepted. Once
>> this support has been vetted, I can refactor if needed.
>>
>> Suggested-by: Hari Bathini <hbathini@linux.ibm.com>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   arch/x86/include/asm/kexec.h | 11 +++++++----
>>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>>   include/linux/kexec.h        | 14 ++++++++++++--
>>   include/uapi/linux/kexec.h   |  1 +
>>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>>   kernel/kexec.c               |  3 +++
>>   kernel/ksysfs.c              | 15 +++++++++++++++
>>   7 files changed, 96 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 9143100ea3ea..3be6a98751f0 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>>   #ifdef CONFIG_HOTPLUG_CPU
>> -static inline int crash_hotplug_cpu_support(void) { return 1; }
>> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
>> +int arch_crash_hotplug_cpu_support(void);
>> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>>   #endif
>>   #ifdef CONFIG_MEMORY_HOTPLUG
>> -static inline int crash_hotplug_memory_support(void) { return 1; }
>> -#define crash_hotplug_memory_support crash_hotplug_memory_support
>> +int arch_crash_hotplug_memory_support(void);
>> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>>   #endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void);
>> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>>   #endif
>>   #endif /* __ASSEMBLY__ */
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 0c9d496cf7ce..8064e65de6c0 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +/* These functions provide the value for the sysfs crash_hotplug nodes */
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +int arch_crash_hotplug_cpu_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +int arch_crash_hotplug_memory_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void)
>> +{
>> +    unsigned int sz;
>> +
>> +    if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>> +        sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
>> +    else
>> +        sz += 2 + CONFIG_NR_CPUS_DEFAULT;
>> +    sz *= sizeof(Elf64_Phdr);
>> +    return sz;
>> +}
>> +
>>   /**
>>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>>    * @image: the active struct kimage
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 6a8a724ac638..050e20066cdb 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -335,6 +335,10 @@ struct kimage {
>>       unsigned int preserve_context : 1;
>>       /* If set, we are using file mode kexec syscall */
>>       unsigned int file_mode:1;
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    /* If set, allow changes to elfcorehdr of kexec_load'd image */
>> +    unsigned int update_elfcorehdr:1;
>> +#endif
>>   #ifdef ARCH_HAS_KIMAGE_ARCH
>>       struct kimage_arch arch;
>> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>>   /* List of defined/legal kexec flags */
>>   #ifndef CONFIG_KEXEC_JUMP
>> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>>   #else
>> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>>   #endif
>>   /* List of defined/legal kexec file flags */
>> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>>   #endif
>> +int crash_check_update_elfcorehdr(void);
>> +
>>   #ifndef crash_hotplug_cpu_support
>>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   #endif
>> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   static inline int crash_hotplug_memory_support(void) { return 0; }
>>   #endif
>> +#ifndef crash_get_elfcorehdr_size
>> +static inline crash_get_elfcorehdr_size(void) { return 0; }
>> +#endif
>> +
>>   #else /* !CONFIG_KEXEC_CORE */
>>   struct pt_regs;
>>   struct task_struct;
>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 981016e05cfa..01766dd839b0 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -12,6 +12,7 @@
>>   /* kexec flags for different usage scenarios */
>>   #define KEXEC_ON_CRASH        0x00000001
>>   #define KEXEC_PRESERVE_CONTEXT    0x00000002
>> +#define KEXEC_UPDATE_ELFCOREHDR    0x00000004
>>   #define KEXEC_ARCH_MASK        0xffff0000
>>   /*
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index ef6e91daad56..e05bfdb7eaed 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>>   #ifdef CONFIG_CRASH_HOTPLUG
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +
>> +/*
>> + * This routine utilized when the crash_hotplug sysfs node is read.
>> + * It reflects the kernel's ability/permission to update the crash
>> + * elfcorehdr directly.
>> + */
>> +int crash_check_update_elfcorehdr(void)
>> +{
>> +    int rc = 0;
>> +
>> +    /* Obtain lock while reading crash information */
>> +    if (!kexec_trylock()) {
>> +        pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
>> +        return 0;
>> +    }
>> +    if (kexec_crash_image) {
>> +        if (kexec_crash_image->file_mode)
>> +            rc = 1;
>> +        else
>> +            rc = kexec_crash_image->update_elfcorehdr;
>> +    }
>> +    /* Release lock now that update complete */
>> +    kexec_unlock();
>> +
>> +    return rc;
>> +}
>> +
>>   /*
>>    * To accurately reflect hot un/plug changes of cpu and memory resources
>>    * (including onling and offlining of those resources), the elfcorehdr
>> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>>       image = kexec_crash_image;
>> +    /* Check that updating elfcorehdr is permitted */
>> +    if (!(image->file_mode || image->update_elfcorehdr))
>> +        goto out;
>> +
>>       if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>>           hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>>           pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>> index 92d301f98776..60de64bd14b9 100644
>> --- a/kernel/kexec.c
>> +++ b/kernel/kexec.c
>> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>>       if (flags & KEXEC_PRESERVE_CONTEXT)
>>           image->preserve_context = 1;
>> +    if (flags & KEXEC_UPDATE_ELFCOREHDR)
>> +        image->update_elfcorehdr = 1;
>> +
>>       ret = machine_kexec_prepare(image);
>>       if (ret)
>>           goto out;
>> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
>> index aad7a3bfd846..1d4bc493b2f4 100644
>> --- a/kernel/ksysfs.c
>> +++ b/kernel/ksysfs.c
>> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>>   }
>>   KERNEL_ATTR_RO(vmcoreinfo);
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
>> +                   struct kobj_attribute *attr, char *buf)
>> +{
>> +    unsigned int sz = crash_get_elfcorehdr_size();
>> +
>> +    return sysfs_emit(buf, "%u\n", sz);
>> +}
>> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
>> +
>> +#endif
>> +
>>   #endif /* CONFIG_CRASH_CORE */
>>   /* whether file capabilities are enabled */
>> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>>   #endif
>>   #ifdef CONFIG_CRASH_CORE
>>       &vmcoreinfo_attr.attr,
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    &crash_elfcorehdr_size_attr.attr,
>> +#endif
>>   #endif
>>   #ifndef CONFIG_TINY_RCU
>>       &rcu_expedited_attr.attr,

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-05-10 22:53 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-03 22:41 [PATCH v22 0/8] crash: Kernel handling of CPU and memory hot un/plug Eric DeVolder
2023-05-03 22:41 ` Eric DeVolder
2023-05-03 22:41 ` [PATCH v22 1/8] crash: move a few code bits to setup support of crash hotplug Eric DeVolder
2023-05-03 22:41   ` Eric DeVolder
2023-05-03 22:41 ` [PATCH v22 2/8] crash: add generic infrastructure for crash hotplug support Eric DeVolder
2023-05-03 22:41   ` Eric DeVolder
2023-05-03 22:41 ` [PATCH v22 3/8] kexec: exclude elfcorehdr from the segment digest Eric DeVolder
2023-05-03 22:41   ` Eric DeVolder
2023-05-03 22:41 ` [PATCH v22 4/8] crash: memory and CPU hotplug sysfs attributes Eric DeVolder
2023-05-03 22:41   ` Eric DeVolder
2023-05-03 22:41 ` [PATCH v22 5/8] x86/crash: add x86 crash hotplug support Eric DeVolder
2023-05-03 22:41   ` Eric DeVolder
2023-05-09 22:52   ` Thomas Gleixner
2023-05-09 22:52     ` Thomas Gleixner
2023-05-10 22:49     ` Eric DeVolder
2023-05-10 22:49       ` Eric DeVolder
2023-05-03 22:41 ` [PATCH v22 6/8] crash: hotplug support for kexec_load() Eric DeVolder
2023-05-03 22:41   ` Eric DeVolder
2023-05-08  5:13   ` Baoquan He
2023-05-08  5:13     ` Baoquan He
2023-05-09  6:15   ` Sourabh Jain
2023-05-09  6:15     ` Sourabh Jain
2023-05-10 22:52     ` Eric DeVolder
2023-05-10 22:52       ` Eric DeVolder
2023-05-09  6:56   ` Sourabh Jain
2023-05-09  6:56     ` Sourabh Jain
2023-05-09 20:35     ` Eric DeVolder
2023-05-09 20:35       ` Eric DeVolder
2023-05-03 22:41 ` [PATCH v22 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() Eric DeVolder
2023-05-03 22:41   ` Eric DeVolder
2023-05-03 22:41 ` [PATCH v22 8/8] x86/crash: optimize CPU changes Eric DeVolder
2023-05-03 22:41   ` Eric DeVolder
2023-05-09 22:39   ` Thomas Gleixner
2023-05-09 22:39     ` Thomas Gleixner
2023-05-10 22:49     ` Eric DeVolder
2023-05-10 22:49       ` Eric DeVolder
2023-05-04  6:22 ` [PATCH v22 0/8] crash: Kernel handling of CPU and memory hot un/plug Hari Bathini
2023-05-04  6:22   ` Hari Bathini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.