All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
@ 2021-12-07 10:37 Hari Bathini
  2021-12-07 10:37 ` [PATCH v3 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
  2021-12-15  0:24 ` [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Michael Ellerman
  0 siblings, 2 replies; 3+ messages in thread
From: Hari Bathini @ 2021-12-07 10:37 UTC (permalink / raw)
  To: mpe, linuxppc-dev, npiggin
  Cc: Hari Bathini, mahesh, sourabhjain, kernel test robot

Kdump can be triggered after panic_notifers since commit f06e5153f4ae2
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
after panic_notifers") introduced crash_kexec_post_notifiers option.
But using this option would mean smp_send_stop(), that marks all other
CPUs as offline, gets called before kdump is triggered. As a result,
kdump routines fail to save other CPUs' registers. To fix this, kdump
friendly crash_smp_send_stop() function was introduced with kernel
commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path"). Override this kdump friendly weak
function to handle crash_kexec_post_notifiers option appropriately
on powerpc.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com]
Reported-by: kernel test robot <lkp@intel.com>
---
 arch/powerpc/kernel/smp.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c23ee842c4c3..2d33c167b438 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -620,6 +620,36 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
 }
 #endif
 
+#ifdef CONFIG_NMI_IPI
+static void crash_stop_this_cpu(struct pt_regs *regs)
+#else
+static void crash_stop_this_cpu(void *dummy)
+#endif
+{
+	/*
+	 * Just busy wait here and avoid marking CPU as offline to ensure
+	 * register data is captured appropriately.
+	 */
+	while (1)
+		cpu_relax();
+}
+
+void crash_smp_send_stop(void)
+{
+	static bool stopped = false;
+
+	if (stopped)
+		return;
+
+	stopped = true;
+
+#ifdef CONFIG_NMI_IPI
+	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
+#else
+	smp_call_function(crash_stop_this_cpu, NULL, 0);
+#endif /* CONFIG_NMI_IPI */
+}
+
 #ifdef CONFIG_NMI_IPI
 static void nmi_stop_this_cpu(struct pt_regs *regs)
 {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v3 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic
  2021-12-07 10:37 [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
@ 2021-12-07 10:37 ` Hari Bathini
  2021-12-15  0:24 ` [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Hari Bathini @ 2021-12-07 10:37 UTC (permalink / raw)
  To: mpe, linuxppc-dev, npiggin; +Cc: Hari Bathini, mahesh, sourabhjain

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f222d7 ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d38417e ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump.c |  8 ++++++++
 arch/powerpc/kernel/smp.c    | 10 ++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b7ceb041743c..60f5fc14aa23 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1641,6 +1641,14 @@ int __init setup_fadump(void)
 	else if (fw_dump.reserve_dump_area_size)
 		fw_dump.ops->fadump_init_mem_struct(&fw_dump);
 
+	/*
+	 * In case of panic, fadump is triggered via ppc_panic_event()
+	 * panic notifier. Setting crash_kexec_post_notifiers to 'true'
+	 * lets panic() function take crash friendly path before panic
+	 * notifiers are invoked.
+	 */
+	crash_kexec_post_notifiers = true;
+
 	return 1;
 }
 subsys_initcall(setup_fadump);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 2d33c167b438..10fb01837e6b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -61,6 +61,7 @@
 #include <asm/cpu_has_feature.h>
 #include <asm/ftrace.h>
 #include <asm/kup.h>
+#include <asm/fadump.h>
 
 #ifdef DEBUG
 #include <asm/udbg.h>
@@ -638,6 +639,15 @@ void crash_smp_send_stop(void)
 {
 	static bool stopped = false;
 
+	/*
+	 * In case of fadump, register data for all CPUs is captured by f/w
+	 * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
+	 * this rtas call to avoid tricky post processing of those CPUs'
+	 * backtraces.
+	 */
+	if (should_fadump_crash())
+		return;
+
 	if (stopped)
 		return;
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
  2021-12-07 10:37 [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
  2021-12-07 10:37 ` [PATCH v3 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
@ 2021-12-15  0:24 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2021-12-15  0:24 UTC (permalink / raw)
  To: mpe, linuxppc-dev, Hari Bathini, npiggin
  Cc: mahesh, sourabhjain, kernel test robot

On Tue, 7 Dec 2021 16:07:18 +0530, Hari Bathini wrote:
> Kdump can be triggered after panic_notifers since commit f06e5153f4ae2
> ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
> after panic_notifers") introduced crash_kexec_post_notifiers option.
> But using this option would mean smp_send_stop(), that marks all other
> CPUs as offline, gets called before kdump is triggered. As a result,
> kdump routines fail to save other CPUs' registers. To fix this, kdump
> friendly crash_smp_send_stop() function was introduced with kernel
> commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump
> friendly version in panic path"). Override this kdump friendly weak
> function to handle crash_kexec_post_notifiers option appropriately
> on powerpc.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
      https://git.kernel.org/powerpc/c/219572d2fc4135b5ce65c735d881787d48b10e71
[2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic
      https://git.kernel.org/powerpc/c/06e629c25daa519be620a8c17359ae8fc7a2e903

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-15  0:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07 10:37 [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
2021-12-07 10:37 ` [PATCH v3 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
2021-12-15  0:24 ` [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.