sparclinux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI
@ 2023-04-19 22:55 Douglas Anderson
  2023-04-19 22:55 ` [PATCH v8 04/10] nmi: backtrace: Allow runtime arch specific override Douglas Anderson
  2023-05-10 15:28 ` [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI Doug Anderson
  0 siblings, 2 replies; 7+ messages in thread
From: Douglas Anderson @ 2023-04-19 22:55 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Sumit Garg, Daniel Thompson,
	Marc Zyngier, Mark Rutland
  Cc: ito-yuichi, kgdb-bugreport, Chen-Yu Tsai, Masayoshi Mizuma,
	Peter Zijlstra, Ard Biesheuvel, Rafael J . Wysocki,
	linux-arm-kernel, Stephen Boyd, Lecopzer Chen, Thomas Gleixner,
	linux-perf-users, Douglas Anderson, Alexandru Elisei,
	Andrey Konovalov, Ben Dooks, Borislav Petkov, Christophe Leroy,
	Darrick J. Wong, Dave Hansen, David S. Miller, Eric W. Biederman,
	Frederic Weisbecker, Gaosheng Cui, Gautham R. Shenoy,
	Greg Kroah-Hartman, Guilherme G. Piccoli, Guo Ren,
	H. Peter Anvin, Huacai Chen, Ingo Molnar, Ingo Molnar,
	Jason A. Donenfeld, Jason Wessel, Jianmin Lv, Jiaxun Yang,
	Jinyang He, Joey Gouly, Kees Cook, Laurent Dufour,
	Masahiro Yamada, Masayoshi Mizuma, Michael Ellerman,
	Nicholas Piggin, Paul E. McKenney, Philippe Mathieu-Daudé,
	Pierre Gondois, Qing Zhang, Russell King (Oracle),
	Russell King, Thomas Bogendoerfer, Ulf Hansson, WANG Xuerui,
	linux-kernel, linux-mips, linuxppc-dev, loongarch, sparclinux,
	x86

This is an attempt to resurrect Sumit's old patch series [1] that
allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
also to round up CPUs in kdb/kgdb. The last post from Sumit that I
could find was v7, so I called this series v8. I haven't copied all of
his old changelongs here, but you can find them from the link.

Since v7, I have:
* Addressed the small amount of feedback that was there for v7.
* Rebased.
* Added a new patch that prevents us from spamming the logs with idle
  tasks.
* Added an extra patch to gracefully fall back to regular IPIs if
  pseudo-NMIs aren't there.

Since there appear to be a few different patches series related to
being able to use NMIs to get stack traces of crashed systems, let me
try to organize them to the best of my understanding:

a) This series. On its own, a) will (among other things) enable stack
   traces of all running processes with the soft lockup detector if
   you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace". On
   its own, a) doesn't give a hard lockup detector.

b) A different recently-posted series [2] that adds a hard lockup
   detector based on perf. On its own, b) gives a stack crawl of the
   locked up CPU but no stack crawls of other CPUs (even if they're
   locked too). Together with a) + b) we get everything (full lockup
   detect, full ability to get stack crawls).

c) The old Android "buddy" hard lockup detector [3] that I'm
   considering trying to upstream. If b) lands then I believe c) would
   be redundant (at least for arm64). c) on its own is really only
   useful on arm64 for platforms that can print CPU_DBGPCSR somehow
   (see [4]). a) + c) is roughly as good as a) + b).

[1] https://lore.kernel.org/linux-arm-kernel/1604317487-14543-1-git-send-email-sumit.garg@linaro.org/
[2] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
[3] https://issuetracker.google.com/172213097
[4] https://issuetracker.google.com/172213129

Changes in v8:
- dynamic_ipi_setup() and dynamic_ipi_teardown() no longer take cpu param
- dynamic_ipi_setup() and dynamic_ipi_teardown() no longer take cpu param
- Add loongarch support, too
- Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
- "Tag the arm64 idle functions as __cpuidle" new for v8
- "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
- "Fallback to a regular IPI if NMI isn't enabled" new for v8

Douglas Anderson (3):
  arm64: idle: Tag the arm64 idle functions as __cpuidle
  kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB
  arm64: ipi_nmi: Fallback to a regular IPI if NMI isn't enabled

Sumit Garg (7):
  arm64: Add framework to turn IPI as NMI
  irqchip/gic-v3: Enable support for SGIs to act as NMIs
  arm64: smp: Assign and setup an IPI as NMI
  nmi: backtrace: Allow runtime arch specific override
  arm64: ipi_nmi: Add support for NMI backtrace
  kgdb: Expose default CPUs roundup fallback mechanism
  arm64: kgdb: Roundup cpus using IPI as NMI

 arch/arm/include/asm/irq.h       |   2 +-
 arch/arm/kernel/smp.c            |   3 +-
 arch/arm64/include/asm/irq.h     |   4 ++
 arch/arm64/include/asm/nmi.h     |  17 +++++
 arch/arm64/kernel/Makefile       |   2 +-
 arch/arm64/kernel/idle.c         |   4 +-
 arch/arm64/kernel/ipi_nmi.c      | 103 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/kgdb.c         |  18 ++++++
 arch/arm64/kernel/smp.c          |   8 +++
 arch/loongarch/include/asm/irq.h |   2 +-
 arch/loongarch/kernel/process.c  |   3 +-
 arch/mips/include/asm/irq.h      |   2 +-
 arch/mips/kernel/process.c       |   3 +-
 arch/powerpc/include/asm/nmi.h   |   2 +-
 arch/powerpc/kernel/stacktrace.c |   3 +-
 arch/sparc/include/asm/irq_64.h  |   2 +-
 arch/sparc/kernel/process_64.c   |   4 +-
 arch/x86/include/asm/irq.h       |   2 +-
 arch/x86/kernel/apic/hw_nmi.c    |   3 +-
 drivers/irqchip/irq-gic-v3.c     |  29 ++++++---
 include/linux/kgdb.h             |  13 ++++
 include/linux/nmi.h              |  12 ++--
 kernel/debug/debug_core.c        |   8 ++-
 23 files changed, 217 insertions(+), 32 deletions(-)
 create mode 100644 arch/arm64/include/asm/nmi.h
 create mode 100644 arch/arm64/kernel/ipi_nmi.c

-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v8 04/10] nmi: backtrace: Allow runtime arch specific override
  2023-04-19 22:55 [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI Douglas Anderson
@ 2023-04-19 22:55 ` Douglas Anderson
  2023-05-10 15:28 ` [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI Doug Anderson
  1 sibling, 0 replies; 7+ messages in thread
From: Douglas Anderson @ 2023-04-19 22:55 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Sumit Garg, Daniel Thompson,
	Marc Zyngier, Mark Rutland
  Cc: ito-yuichi, kgdb-bugreport, Chen-Yu Tsai, Masayoshi Mizuma,
	Peter Zijlstra, Ard Biesheuvel, Rafael J . Wysocki,
	linux-arm-kernel, Stephen Boyd, Lecopzer Chen, Thomas Gleixner,
	linux-perf-users, Douglas Anderson, Ben Dooks, Borislav Petkov,
	Christophe Leroy, Darrick J. Wong, Dave Hansen, David S. Miller,
	Eric W. Biederman, Gaosheng Cui, Gautham R. Shenoy,
	Greg Kroah-Hartman, Guilherme G. Piccoli, Guo Ren,
	H. Peter Anvin, Huacai Chen, Ingo Molnar, Jason A. Donenfeld,
	Jianmin Lv, Jiaxun Yang, Jinyang He, Kees Cook, Laurent Dufour,
	Michael Ellerman, Nicholas Piggin, Paul E. McKenney,
	Philippe Mathieu-Daudé, Qing Zhang, Russell King (Oracle),
	Russell King, Thomas Bogendoerfer, Ulf Hansson, WANG Xuerui,
	linux-kernel, linux-mips, linuxppc-dev, loongarch, sparclinux,
	x86

From: Sumit Garg <sumit.garg@linaro.org>

Add a boolean return to arch_trigger_cpumask_backtrace() to support a
use-case where a particular architecture detects at runtime if it supports
NMI backtrace or it would like to fallback to default implementation using
SMP cross-calls.

Currently such an architecture example is arm64 supporting pseudo NMIs
feature which is only available on platforms which have support for GICv3
or later version.

Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v8:
- Add loongarch support, too

 arch/arm/include/asm/irq.h       |  2 +-
 arch/arm/kernel/smp.c            |  3 ++-
 arch/loongarch/include/asm/irq.h |  2 +-
 arch/loongarch/kernel/process.c  |  3 ++-
 arch/mips/include/asm/irq.h      |  2 +-
 arch/mips/kernel/process.c       |  3 ++-
 arch/powerpc/include/asm/nmi.h   |  2 +-
 arch/powerpc/kernel/stacktrace.c |  3 ++-
 arch/sparc/include/asm/irq_64.h  |  2 +-
 arch/sparc/kernel/process_64.c   |  4 +++-
 arch/x86/include/asm/irq.h       |  2 +-
 arch/x86/kernel/apic/hw_nmi.c    |  3 ++-
 include/linux/nmi.h              | 12 ++++--------
 13 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index a7c2337b0c7d..e6b62c7d6f0e 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -32,7 +32,7 @@ void init_IRQ(void);
 #ifdef CONFIG_SMP
 #include <linux/cpumask.h>
 
-extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
+extern bool arch_trigger_cpumask_backtrace(const cpumask_t *mask,
 					   bool exclude_self);
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 #endif
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 0b8c25763adc..acb97d9219b1 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -849,7 +849,8 @@ static void raise_nmi(cpumask_t *mask)
 	__ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask);
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
 	nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_nmi);
+	return true;
 }
diff --git a/arch/loongarch/include/asm/irq.h b/arch/loongarch/include/asm/irq.h
index a115e8999c69..c7a152d6bf0c 100644
--- a/arch/loongarch/include/asm/irq.h
+++ b/arch/loongarch/include/asm/irq.h
@@ -40,7 +40,7 @@ void spurious_interrupt(void);
 #define NR_IRQS_LEGACY 16
 
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
-void arch_trigger_cpumask_backtrace(const struct cpumask *mask, bool exclude_self);
+bool arch_trigger_cpumask_backtrace(const struct cpumask *mask, bool exclude_self);
 
 #define MAX_IO_PICS 2
 #define NR_IRQS	(64 + (256 * MAX_IO_PICS))
diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
index fa2443c7afb2..8f7f818f5c4e 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -339,9 +339,10 @@ static void raise_backtrace(cpumask_t *mask)
 	}
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
 	nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
+	return true;
 }
 
 #ifdef CONFIG_64BIT
diff --git a/arch/mips/include/asm/irq.h b/arch/mips/include/asm/irq.h
index 44f9824c1d8c..daf16173486a 100644
--- a/arch/mips/include/asm/irq.h
+++ b/arch/mips/include/asm/irq.h
@@ -77,7 +77,7 @@ extern int cp0_fdc_irq;
 
 extern int get_c0_fdc_int(void);
 
-void arch_trigger_cpumask_backtrace(const struct cpumask *mask,
+bool arch_trigger_cpumask_backtrace(const struct cpumask *mask,
 				    bool exclude_self);
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 093dbbd6b843..7d538571830a 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -750,9 +750,10 @@ static void raise_backtrace(cpumask_t *mask)
 	}
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
 	nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
+	return true;
 }
 
 int mips_get_process_fp_mode(struct task_struct *task)
diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index c3c7adef74de..135f65adcf63 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -12,7 +12,7 @@ static inline void watchdog_nmi_set_timeout_pct(u64 pct) {}
 #endif
 
 #ifdef CONFIG_NMI_IPI
-extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
+extern bool arch_trigger_cpumask_backtrace(const cpumask_t *mask,
 					   bool exclude_self);
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 #endif
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 5de8597eaab8..0fee4bded7ba 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -221,8 +221,9 @@ static void raise_backtrace_ipi(cpumask_t *mask)
 	}
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
 	nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
+	return true;
 }
 #endif /* defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_NMI_IPI) */
diff --git a/arch/sparc/include/asm/irq_64.h b/arch/sparc/include/asm/irq_64.h
index 154df2cf19f4..00a0051a9da0 100644
--- a/arch/sparc/include/asm/irq_64.h
+++ b/arch/sparc/include/asm/irq_64.h
@@ -87,7 +87,7 @@ static inline unsigned long get_softint(void)
 	return retval;
 }
 
-void arch_trigger_cpumask_backtrace(const struct cpumask *mask,
+bool arch_trigger_cpumask_backtrace(const struct cpumask *mask,
 				    bool exclude_self);
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 91c2b8124527..f9aea1df3adf 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -236,7 +236,7 @@ static void __global_reg_poll(struct global_reg_snapshot *gp)
 	}
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
 	struct thread_info *tp = current_thread_info();
 	struct pt_regs *regs = get_irq_regs();
@@ -291,6 +291,8 @@ void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 	memset(global_cpu_snapshot, 0, sizeof(global_cpu_snapshot));
 
 	spin_unlock_irqrestore(&global_cpu_snapshot_lock, flags);
+
+	return true;
 }
 
 #ifdef CONFIG_MAGIC_SYSRQ
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index 768aa234cbb4..f731638cc38e 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -43,7 +43,7 @@ extern void init_ISA_irqs(void);
 extern void __init init_IRQ(void);
 
 #ifdef CONFIG_X86_LOCAL_APIC
-void arch_trigger_cpumask_backtrace(const struct cpumask *mask,
+bool arch_trigger_cpumask_backtrace(const struct cpumask *mask,
 				    bool exclude_self);
 
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 34a992e275ef..e7dcd28bc824 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -34,10 +34,11 @@ static void nmi_raise_cpu_backtrace(cpumask_t *mask)
 	apic->send_IPI_mask(mask, NMI_VECTOR);
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
 	nmi_trigger_cpumask_backtrace(mask, exclude_self,
 				      nmi_raise_cpu_backtrace);
+	return true;
 }
 
 static int nmi_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 048c0b9aa623..7d8a77cd1e03 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -145,26 +145,22 @@ static inline void touch_nmi_watchdog(void)
 #ifdef arch_trigger_cpumask_backtrace
 static inline bool trigger_all_cpu_backtrace(void)
 {
-	arch_trigger_cpumask_backtrace(cpu_online_mask, false);
-	return true;
+	return arch_trigger_cpumask_backtrace(cpu_online_mask, false);
 }
 
 static inline bool trigger_allbutself_cpu_backtrace(void)
 {
-	arch_trigger_cpumask_backtrace(cpu_online_mask, true);
-	return true;
+	return arch_trigger_cpumask_backtrace(cpu_online_mask, true);
 }
 
 static inline bool trigger_cpumask_backtrace(struct cpumask *mask)
 {
-	arch_trigger_cpumask_backtrace(mask, false);
-	return true;
+	return arch_trigger_cpumask_backtrace(mask, false);
 }
 
 static inline bool trigger_single_cpu_backtrace(int cpu)
 {
-	arch_trigger_cpumask_backtrace(cpumask_of(cpu), false);
-	return true;
+	return arch_trigger_cpumask_backtrace(cpumask_of(cpu), false);
 }
 
 /* generic implementation */
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI
  2023-04-19 22:55 [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI Douglas Anderson
  2023-04-19 22:55 ` [PATCH v8 04/10] nmi: backtrace: Allow runtime arch specific override Douglas Anderson
@ 2023-05-10 15:28 ` Doug Anderson
  2023-05-10 16:30   ` Mark Rutland
  1 sibling, 1 reply; 7+ messages in thread
From: Doug Anderson @ 2023-05-10 15:28 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Sumit Garg, Daniel Thompson,
	Marc Zyngier, Mark Rutland
  Cc: ito-yuichi, kgdb-bugreport, Chen-Yu Tsai, Masayoshi Mizuma,
	Peter Zijlstra, Ard Biesheuvel, Rafael J . Wysocki,
	linux-arm-kernel, Stephen Boyd, Lecopzer Chen, Thomas Gleixner,
	linux-perf-users, Alexandru Elisei, Andrey Konovalov, Ben Dooks,
	Borislav Petkov, Christophe Leroy, Darrick J. Wong, Dave Hansen,
	David S. Miller, Eric W. Biederman, Frederic Weisbecker,
	Gaosheng Cui, Gautham R. Shenoy, Greg Kroah-Hartman,
	Guilherme G. Piccoli, Guo Ren, H. Peter Anvin, Huacai Chen,
	Ingo Molnar, Ingo Molnar, Jason A. Donenfeld, Jason Wessel,
	Jianmin Lv, Jiaxun Yang, Jinyang He, Joey Gouly, Kees Cook,
	Laurent Dufour, Masahiro Yamada, Masayoshi Mizuma,
	Michael Ellerman, Nicholas Piggin, Paul E. McKenney,
	Philippe Mathieu-Daudé,
	Pierre Gondois, Qing Zhang, Russell King (Oracle),
	Russell King, Thomas Bogendoerfer, Ulf Hansson, WANG Xuerui,
	linux-kernel, linux-mips, linuxppc-dev, loongarch, sparclinux,
	x86

Hi,

On Wed, Apr 19, 2023 at 3:57 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> This is an attempt to resurrect Sumit's old patch series [1] that
> allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> could find was v7, so I called this series v8. I haven't copied all of
> his old changelongs here, but you can find them from the link.
>
> Since v7, I have:
> * Addressed the small amount of feedback that was there for v7.
> * Rebased.
> * Added a new patch that prevents us from spamming the logs with idle
>   tasks.
> * Added an extra patch to gracefully fall back to regular IPIs if
>   pseudo-NMIs aren't there.
>
> Since there appear to be a few different patches series related to
> being able to use NMIs to get stack traces of crashed systems, let me
> try to organize them to the best of my understanding:
>
> a) This series. On its own, a) will (among other things) enable stack
>    traces of all running processes with the soft lockup detector if
>    you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace". On
>    its own, a) doesn't give a hard lockup detector.
>
> b) A different recently-posted series [2] that adds a hard lockup
>    detector based on perf. On its own, b) gives a stack crawl of the
>    locked up CPU but no stack crawls of other CPUs (even if they're
>    locked too). Together with a) + b) we get everything (full lockup
>    detect, full ability to get stack crawls).
>
> c) The old Android "buddy" hard lockup detector [3] that I'm
>    considering trying to upstream. If b) lands then I believe c) would
>    be redundant (at least for arm64). c) on its own is really only
>    useful on arm64 for platforms that can print CPU_DBGPCSR somehow
>    (see [4]). a) + c) is roughly as good as a) + b).
>
> [1] https://lore.kernel.org/linux-arm-kernel/1604317487-14543-1-git-send-email-sumit.garg@linaro.org/
> [2] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
> [3] https://issuetracker.google.com/172213097
> [4] https://issuetracker.google.com/172213129
>
> Changes in v8:
> - dynamic_ipi_setup() and dynamic_ipi_teardown() no longer take cpu param
> - dynamic_ipi_setup() and dynamic_ipi_teardown() no longer take cpu param
> - Add loongarch support, too
> - Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
> - "Tag the arm64 idle functions as __cpuidle" new for v8
> - "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
> - "Fallback to a regular IPI if NMI isn't enabled" new for v8
>
> Douglas Anderson (3):
>   arm64: idle: Tag the arm64 idle functions as __cpuidle
>   kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB
>   arm64: ipi_nmi: Fallback to a regular IPI if NMI isn't enabled
>
> Sumit Garg (7):
>   arm64: Add framework to turn IPI as NMI
>   irqchip/gic-v3: Enable support for SGIs to act as NMIs
>   arm64: smp: Assign and setup an IPI as NMI
>   nmi: backtrace: Allow runtime arch specific override
>   arm64: ipi_nmi: Add support for NMI backtrace
>   kgdb: Expose default CPUs roundup fallback mechanism
>   arm64: kgdb: Roundup cpus using IPI as NMI
>
>  arch/arm/include/asm/irq.h       |   2 +-
>  arch/arm/kernel/smp.c            |   3 +-
>  arch/arm64/include/asm/irq.h     |   4 ++
>  arch/arm64/include/asm/nmi.h     |  17 +++++
>  arch/arm64/kernel/Makefile       |   2 +-
>  arch/arm64/kernel/idle.c         |   4 +-
>  arch/arm64/kernel/ipi_nmi.c      | 103 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/kgdb.c         |  18 ++++++
>  arch/arm64/kernel/smp.c          |   8 +++
>  arch/loongarch/include/asm/irq.h |   2 +-
>  arch/loongarch/kernel/process.c  |   3 +-
>  arch/mips/include/asm/irq.h      |   2 +-
>  arch/mips/kernel/process.c       |   3 +-
>  arch/powerpc/include/asm/nmi.h   |   2 +-
>  arch/powerpc/kernel/stacktrace.c |   3 +-
>  arch/sparc/include/asm/irq_64.h  |   2 +-
>  arch/sparc/kernel/process_64.c   |   4 +-
>  arch/x86/include/asm/irq.h       |   2 +-
>  arch/x86/kernel/apic/hw_nmi.c    |   3 +-
>  drivers/irqchip/irq-gic-v3.c     |  29 ++++++---
>  include/linux/kgdb.h             |  13 ++++
>  include/linux/nmi.h              |  12 ++--
>  kernel/debug/debug_core.c        |   8 ++-
>  23 files changed, 217 insertions(+), 32 deletions(-)

It's been 3 weeks and I haven't heard a peep on this series. That
means nobody has any objections and it's all good to land, right?
Right? :-P

Seriously, though, I really think we should figure out how to get this
landed. There's obviously interest from several different parties and
I'm chomping at the bit waiting to spin this series according to your
wishes. I also don't think there's anything super scary/ugly here. IMO
the ideal situation is that folks are OK with the idea presented in
the last patch in the series ("arm64: ipi_nmi: Fallback to a regular
IPI if NMI isn't enabled") and then I can re-spin this series to be
_much_ simpler. That being said, I'm also OK with dropping that patch
and starting the discussion for how to land the rest of the patches in
the series.

Please let me know!

-Doug

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI
  2023-05-10 15:28 ` [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI Doug Anderson
@ 2023-05-10 16:30   ` Mark Rutland
  2023-05-10 16:42     ` Doug Anderson
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Rutland @ 2023-05-10 16:30 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Catalin Marinas, Will Deacon, Sumit Garg, Daniel Thompson,
	Marc Zyngier, ito-yuichi, kgdb-bugreport, Chen-Yu Tsai,
	Masayoshi Mizuma, Peter Zijlstra, Ard Biesheuvel,
	Rafael J . Wysocki, linux-arm-kernel, Stephen Boyd,
	Lecopzer Chen, Thomas Gleixner, linux-perf-users,
	Alexandru Elisei, Andrey Konovalov, Ben Dooks, Borislav Petkov,
	Christophe Leroy, Darrick J. Wong, Dave Hansen, David S. Miller,
	Eric W. Biederman, Frederic Weisbecker, Gaosheng Cui,
	Gautham R. Shenoy, Greg Kroah-Hartman, Guilherme G. Piccoli,
	Guo Ren, H. Peter Anvin, Huacai Chen, Ingo Molnar, Ingo Molnar,
	Jason A. Donenfeld, Jason Wessel, Jianmin Lv, Jiaxun Yang,
	Jinyang He, Joey Gouly, Kees Cook, Laurent Dufour,
	Masahiro Yamada, Masayoshi Mizuma, Michael Ellerman,
	Nicholas Piggin, Paul E. McKenney, Philippe Mathieu-Daudé,
	Pierre Gondois, Qing Zhang, Russell King (Oracle),
	Russell King, Thomas Bogendoerfer, Ulf Hansson, WANG Xuerui,
	linux-kernel, linux-mips, linuxppc-dev, loongarch, sparclinux,
	x86

On Wed, May 10, 2023 at 08:28:17AM -0700, Doug Anderson wrote:
> Hi,

Hi Doug,

> On Wed, Apr 19, 2023 at 3:57 PM Douglas Anderson <dianders@chromium.org> wrote:
> > This is an attempt to resurrect Sumit's old patch series [1] that
> > allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> > also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> > could find was v7, so I called this series v8. I haven't copied all of
> > his old changelongs here, but you can find them from the link.
> >
> > Since v7, I have:
> > * Addressed the small amount of feedback that was there for v7.
> > * Rebased.
> > * Added a new patch that prevents us from spamming the logs with idle
> >   tasks.
> > * Added an extra patch to gracefully fall back to regular IPIs if
> >   pseudo-NMIs aren't there.
> >
> > Since there appear to be a few different patches series related to
> > being able to use NMIs to get stack traces of crashed systems, let me
> > try to organize them to the best of my understanding:
> >
> > a) This series. On its own, a) will (among other things) enable stack
> >    traces of all running processes with the soft lockup detector if
> >    you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace". On
> >    its own, a) doesn't give a hard lockup detector.
> >
> > b) A different recently-posted series [2] that adds a hard lockup
> >    detector based on perf. On its own, b) gives a stack crawl of the
> >    locked up CPU but no stack crawls of other CPUs (even if they're
> >    locked too). Together with a) + b) we get everything (full lockup
> >    detect, full ability to get stack crawls).
> >
> > c) The old Android "buddy" hard lockup detector [3] that I'm
> >    considering trying to upstream. If b) lands then I believe c) would
> >    be redundant (at least for arm64). c) on its own is really only
> >    useful on arm64 for platforms that can print CPU_DBGPCSR somehow
> >    (see [4]). a) + c) is roughly as good as a) + b).

> It's been 3 weeks and I haven't heard a peep on this series. That
> means nobody has any objections and it's all good to land, right?
> Right? :-P

FWIW, there are still longstanding soundness issues in the arm64 pseudo-NMI
support (and fixing that requires an overhaul of our DAIF / IRQ flag
management, which I've been chipping away at for a number of releases), so I
hadn't looked at this in detail yet because the foundations are still somewhat
dodgy.

I appreciate that this has been around for a while, and it's on my queue to
look at.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI
  2023-05-10 16:30   ` Mark Rutland
@ 2023-05-10 16:42     ` Doug Anderson
  2023-05-16 10:09       ` Sumit Garg
  2023-06-01 21:46       ` Doug Anderson
  0 siblings, 2 replies; 7+ messages in thread
From: Doug Anderson @ 2023-05-10 16:42 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Will Deacon, Sumit Garg, Daniel Thompson,
	Marc Zyngier, ito-yuichi, kgdb-bugreport, Chen-Yu Tsai,
	Masayoshi Mizuma, Peter Zijlstra, Ard Biesheuvel,
	Rafael J . Wysocki, linux-arm-kernel, Stephen Boyd,
	Lecopzer Chen, Thomas Gleixner, linux-perf-users,
	Alexandru Elisei, Andrey Konovalov, Ben Dooks, Borislav Petkov,
	Christophe Leroy, Darrick J. Wong, Dave Hansen, David S. Miller,
	Eric W. Biederman, Frederic Weisbecker, Gaosheng Cui,
	Gautham R. Shenoy, Greg Kroah-Hartman, Guilherme G. Piccoli,
	Guo Ren, H. Peter Anvin, Huacai Chen, Ingo Molnar, Ingo Molnar,
	Jason A. Donenfeld, Jason Wessel, Jianmin Lv, Jiaxun Yang,
	Jinyang He, Joey Gouly, Kees Cook, Laurent Dufour,
	Masahiro Yamada, Masayoshi Mizuma, Michael Ellerman,
	Nicholas Piggin, Paul E. McKenney, Philippe Mathieu-Daudé,
	Pierre Gondois, Qing Zhang, Russell King (Oracle),
	Russell King, Thomas Bogendoerfer, Ulf Hansson, WANG Xuerui,
	linux-kernel, linux-mips, linuxppc-dev, loongarch, sparclinux,
	x86

Hi,

On Wed, May 10, 2023 at 9:30 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, May 10, 2023 at 08:28:17AM -0700, Doug Anderson wrote:
> > Hi,
>
> Hi Doug,
>
> > On Wed, Apr 19, 2023 at 3:57 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > This is an attempt to resurrect Sumit's old patch series [1] that
> > > allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> > > also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> > > could find was v7, so I called this series v8. I haven't copied all of
> > > his old changelongs here, but you can find them from the link.
> > >
> > > Since v7, I have:
> > > * Addressed the small amount of feedback that was there for v7.
> > > * Rebased.
> > > * Added a new patch that prevents us from spamming the logs with idle
> > >   tasks.
> > > * Added an extra patch to gracefully fall back to regular IPIs if
> > >   pseudo-NMIs aren't there.
> > >
> > > Since there appear to be a few different patches series related to
> > > being able to use NMIs to get stack traces of crashed systems, let me
> > > try to organize them to the best of my understanding:
> > >
> > > a) This series. On its own, a) will (among other things) enable stack
> > >    traces of all running processes with the soft lockup detector if
> > >    you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace". On
> > >    its own, a) doesn't give a hard lockup detector.
> > >
> > > b) A different recently-posted series [2] that adds a hard lockup
> > >    detector based on perf. On its own, b) gives a stack crawl of the
> > >    locked up CPU but no stack crawls of other CPUs (even if they're
> > >    locked too). Together with a) + b) we get everything (full lockup
> > >    detect, full ability to get stack crawls).
> > >
> > > c) The old Android "buddy" hard lockup detector [3] that I'm
> > >    considering trying to upstream. If b) lands then I believe c) would
> > >    be redundant (at least for arm64). c) on its own is really only
> > >    useful on arm64 for platforms that can print CPU_DBGPCSR somehow
> > >    (see [4]). a) + c) is roughly as good as a) + b).
>
> > It's been 3 weeks and I haven't heard a peep on this series. That
> > means nobody has any objections and it's all good to land, right?
> > Right? :-P
>
> FWIW, there are still longstanding soundness issues in the arm64 pseudo-NMI
> support (and fixing that requires an overhaul of our DAIF / IRQ flag
> management, which I've been chipping away at for a number of releases), so I
> hadn't looked at this in detail yet because the foundations are still somewhat
> dodgy.
>
> I appreciate that this has been around for a while, and it's on my queue to
> look at.

Ah, thanks for the heads up! We've been thinking about turning this on
in production in ChromeOS because it will help us track down a whole
class of field-generated crash reports that are otherwise opaque to
us. It sounds as if maybe that's not a good idea quite yet? Do you
have any idea of how much farther along this needs to go? ...of
course, we've also run into issues with Mediatek devices because they
don't save/restore GICR registers properly [1]. In theory, we might be
able to work around that in the kernel.

In any case, even if there are bugs that would prevent turning this on
for production, it still seems like we could still land this series.
It simply wouldn't do anything until someone turned on pseudo NMIs,
which wouldn't happen till the kinks are worked out.

...actually, I guess I should say that if all the patches of the
current series do land then it actually _would_ still do something,
even without pseudo-NMI. Assuming the last patch looks OK, it would at
least start falling back to using regular IPIs to do backtraces. That
wouldn't get backtraces on hard locked up CPUs but it would be better
than what we have today where we don't get any backtraces. This would
get arm64 on par with arm32...

[1] https://issuetracker.google.com/281831288

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI
  2023-05-10 16:42     ` Doug Anderson
@ 2023-05-16 10:09       ` Sumit Garg
  2023-06-01 21:46       ` Doug Anderson
  1 sibling, 0 replies; 7+ messages in thread
From: Sumit Garg @ 2023-05-16 10:09 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Daniel Thompson,
	Marc Zyngier, ito-yuichi, kgdb-bugreport, Chen-Yu Tsai,
	Masayoshi Mizuma, Peter Zijlstra, Ard Biesheuvel,
	Rafael J . Wysocki, linux-arm-kernel, Stephen Boyd,
	Lecopzer Chen, Thomas Gleixner, linux-perf-users,
	Alexandru Elisei, Andrey Konovalov, Ben Dooks, Borislav Petkov,
	Christophe Leroy, Darrick J. Wong, Dave Hansen, David S. Miller,
	Eric W. Biederman, Frederic Weisbecker, Gaosheng Cui,
	Gautham R. Shenoy, Greg Kroah-Hartman, Guilherme G. Piccoli,
	Guo Ren, H. Peter Anvin, Huacai Chen, Ingo Molnar, Ingo Molnar,
	Jason A. Donenfeld, Jason Wessel, Jianmin Lv, Jiaxun Yang,
	Jinyang He, Joey Gouly, Kees Cook, Laurent Dufour,
	Masahiro Yamada, Masayoshi Mizuma, Michael Ellerman,
	Nicholas Piggin, Paul E. McKenney, Philippe Mathieu-Daudé,
	Pierre Gondois, Qing Zhang, Russell King (Oracle),
	Russell King, Thomas Bogendoerfer, Ulf Hansson, WANG Xuerui,
	linux-kernel, linux-mips, linuxppc-dev, loongarch, sparclinux,
	x86

On Wed, 10 May 2023 at 22:20, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Wed, May 10, 2023 at 9:30 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Wed, May 10, 2023 at 08:28:17AM -0700, Doug Anderson wrote:
> > > Hi,
> >
> > Hi Doug,
> >
> > > On Wed, Apr 19, 2023 at 3:57 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > > This is an attempt to resurrect Sumit's old patch series [1] that
> > > > allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> > > > also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> > > > could find was v7, so I called this series v8. I haven't copied all of
> > > > his old changelongs here, but you can find them from the link.
> > > >

Thanks Doug for picking up this work and for all your additions/improvements.

> > > > Since v7, I have:
> > > > * Addressed the small amount of feedback that was there for v7.
> > > > * Rebased.
> > > > * Added a new patch that prevents us from spamming the logs with idle
> > > >   tasks.
> > > > * Added an extra patch to gracefully fall back to regular IPIs if
> > > >   pseudo-NMIs aren't there.
> > > >
> > > > Since there appear to be a few different patches series related to
> > > > being able to use NMIs to get stack traces of crashed systems, let me
> > > > try to organize them to the best of my understanding:
> > > >
> > > > a) This series. On its own, a) will (among other things) enable stack
> > > >    traces of all running processes with the soft lockup detector if
> > > >    you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace". On
> > > >    its own, a) doesn't give a hard lockup detector.
> > > >
> > > > b) A different recently-posted series [2] that adds a hard lockup
> > > >    detector based on perf. On its own, b) gives a stack crawl of the
> > > >    locked up CPU but no stack crawls of other CPUs (even if they're
> > > >    locked too). Together with a) + b) we get everything (full lockup
> > > >    detect, full ability to get stack crawls).
> > > >
> > > > c) The old Android "buddy" hard lockup detector [3] that I'm
> > > >    considering trying to upstream. If b) lands then I believe c) would
> > > >    be redundant (at least for arm64). c) on its own is really only
> > > >    useful on arm64 for platforms that can print CPU_DBGPCSR somehow
> > > >    (see [4]). a) + c) is roughly as good as a) + b).
> >
> > > It's been 3 weeks and I haven't heard a peep on this series. That
> > > means nobody has any objections and it's all good to land, right?
> > > Right? :-P

For me it was months waiting without any feedback. So I think you are
lucky :) or atleast better than me at poking arm64 maintainers.

> >
> > FWIW, there are still longstanding soundness issues in the arm64 pseudo-NMI
> > support (and fixing that requires an overhaul of our DAIF / IRQ flag
> > management, which I've been chipping away at for a number of releases), so I
> > hadn't looked at this in detail yet because the foundations are still somewhat
> > dodgy.
> >
> > I appreciate that this has been around for a while, and it's on my queue to
> > look at.
>
> Ah, thanks for the heads up! We've been thinking about turning this on
> in production in ChromeOS because it will help us track down a whole
> class of field-generated crash reports that are otherwise opaque to
> us. It sounds as if maybe that's not a good idea quite yet? Do you
> have any idea of how much farther along this needs to go? ...of
> course, we've also run into issues with Mediatek devices because they
> don't save/restore GICR registers properly [1]. In theory, we might be
> able to work around that in the kernel.
>
> In any case, even if there are bugs that would prevent turning this on
> for production, it still seems like we could still land this series.
> It simply wouldn't do anything until someone turned on pseudo NMIs,
> which wouldn't happen till the kinks are worked out.

I agree here. We should be able to make the foundations robust later
on. IMHO, until we turn on features surrounding pseudo NMIs, I am not
sure how we can have true confidence in the underlying robustness.

-Sumit

>
> ...actually, I guess I should say that if all the patches of the
> current series do land then it actually _would_ still do something,
> even without pseudo-NMI. Assuming the last patch looks OK, it would at
> least start falling back to using regular IPIs to do backtraces. That
> wouldn't get backtraces on hard locked up CPUs but it would be better
> than what we have today where we don't get any backtraces. This would
> get arm64 on par with arm32...
>
> [1] https://issuetracker.google.com/281831288

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI
  2023-05-10 16:42     ` Doug Anderson
  2023-05-16 10:09       ` Sumit Garg
@ 2023-06-01 21:46       ` Doug Anderson
  1 sibling, 0 replies; 7+ messages in thread
From: Doug Anderson @ 2023-06-01 21:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Will Deacon, Sumit Garg, Daniel Thompson,
	Marc Zyngier, ito-yuichi, kgdb-bugreport, Chen-Yu Tsai,
	Masayoshi Mizuma, Peter Zijlstra, Ard Biesheuvel,
	Rafael J . Wysocki, linux-arm-kernel, Stephen Boyd,
	Lecopzer Chen, Thomas Gleixner, linux-perf-users,
	Alexandru Elisei, Andrey Konovalov, Ben Dooks, Borislav Petkov,
	Christophe Leroy, Darrick J. Wong, Dave Hansen, David S. Miller,
	Eric W. Biederman, Frederic Weisbecker, Gaosheng Cui,
	Gautham R. Shenoy, Greg Kroah-Hartman, Guilherme G. Piccoli,
	Guo Ren, H. Peter Anvin, Huacai Chen, Ingo Molnar, Ingo Molnar,
	Jason A. Donenfeld, Jason Wessel, Jianmin Lv, Jiaxun Yang,
	Jinyang He, Joey Gouly, Kees Cook, Laurent Dufour,
	Masahiro Yamada, Masayoshi Mizuma, Michael Ellerman,
	Nicholas Piggin, Paul E. McKenney, Philippe Mathieu-Daudé,
	Pierre Gondois, Qing Zhang, Russell King (Oracle),
	Russell King, Thomas Bogendoerfer, Ulf Hansson, WANG Xuerui,
	linux-kernel, linux-mips, linuxppc-dev, loongarch, sparclinux,
	x86

Hi,

On Wed, May 10, 2023 at 9:42 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Wed, May 10, 2023 at 9:30 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Wed, May 10, 2023 at 08:28:17AM -0700, Doug Anderson wrote:
> > > Hi,
> >
> > Hi Doug,
> >
> > > On Wed, Apr 19, 2023 at 3:57 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > > This is an attempt to resurrect Sumit's old patch series [1] that
> > > > allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> > > > also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> > > > could find was v7, so I called this series v8. I haven't copied all of
> > > > his old changelongs here, but you can find them from the link.
> > > >
> > > > Since v7, I have:
> > > > * Addressed the small amount of feedback that was there for v7.
> > > > * Rebased.
> > > > * Added a new patch that prevents us from spamming the logs with idle
> > > >   tasks.
> > > > * Added an extra patch to gracefully fall back to regular IPIs if
> > > >   pseudo-NMIs aren't there.
> > > >
> > > > Since there appear to be a few different patches series related to
> > > > being able to use NMIs to get stack traces of crashed systems, let me
> > > > try to organize them to the best of my understanding:
> > > >
> > > > a) This series. On its own, a) will (among other things) enable stack
> > > >    traces of all running processes with the soft lockup detector if
> > > >    you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace". On
> > > >    its own, a) doesn't give a hard lockup detector.
> > > >
> > > > b) A different recently-posted series [2] that adds a hard lockup
> > > >    detector based on perf. On its own, b) gives a stack crawl of the
> > > >    locked up CPU but no stack crawls of other CPUs (even if they're
> > > >    locked too). Together with a) + b) we get everything (full lockup
> > > >    detect, full ability to get stack crawls).
> > > >
> > > > c) The old Android "buddy" hard lockup detector [3] that I'm
> > > >    considering trying to upstream. If b) lands then I believe c) would
> > > >    be redundant (at least for arm64). c) on its own is really only
> > > >    useful on arm64 for platforms that can print CPU_DBGPCSR somehow
> > > >    (see [4]). a) + c) is roughly as good as a) + b).
> >
> > > It's been 3 weeks and I haven't heard a peep on this series. That
> > > means nobody has any objections and it's all good to land, right?
> > > Right? :-P
> >
> > FWIW, there are still longstanding soundness issues in the arm64 pseudo-NMI
> > support (and fixing that requires an overhaul of our DAIF / IRQ flag
> > management, which I've been chipping away at for a number of releases), so I
> > hadn't looked at this in detail yet because the foundations are still somewhat
> > dodgy.
> >
> > I appreciate that this has been around for a while, and it's on my queue to
> > look at.
>
> Ah, thanks for the heads up! We've been thinking about turning this on
> in production in ChromeOS because it will help us track down a whole
> class of field-generated crash reports that are otherwise opaque to
> us. It sounds as if maybe that's not a good idea quite yet? Do you
> have any idea of how much farther along this needs to go?

I'm still very interested in this topic. Do you have anything concrete
that is broken? Your reply gives me the vibe that there have been a
bunch of bugs found recently and, while there are no known issues,
you're worried that there might be something lingering still. Is that
correct, or do you have something concrete that's broken? Is this
anything others could help with? I could make an attempt, or we might
be able to rope some of the others interested in this topic and more
GIC-knowledgeable to help?

I still have a goal for turning this on for production but obviously
don't want to make the device crashier because of it.

Also: if this really has known bugs, should we put a big WARN_ON splat
if anyone tries to turn on pseudo NMIs? Without that, it feels like
it's a bit of a footgun.


> ...of
> course, we've also run into issues with Mediatek devices because they
> don't save/restore GICR registers properly [1]. In theory, we might be
> able to work around that in the kernel.

To followup with this, we've formulated a plan for dealing with
Mediatek Chromebooks. I doubt anyone is truly interested, but if
anyone is the details are in the public Google bug tracker [1]. None
of that should block lading the NMI backtrace series.


> In any case, even if there are bugs that would prevent turning this on
> for production, it still seems like we could still land this series.
> It simply wouldn't do anything until someone turned on pseudo NMIs,
> which wouldn't happen till the kinks are worked out.
>
> ...actually, I guess I should say that if all the patches of the
> current series do land then it actually _would_ still do something,
> even without pseudo-NMI. Assuming the last patch looks OK, it would at
> least start falling back to using regular IPIs to do backtraces. That
> wouldn't get backtraces on hard locked up CPUs but it would be better
> than what we have today where we don't get any backtraces. This would
> get arm64 on par with arm32...

The more I thought about it, the more I liked the idea of going full
hog on the idea in patch #10. I've posted v9 [2] where I've really
embraced the idea of falling back to a regular IPI if NMI isn't
available. Hopefully it looks keen.

[1] https://issuetracker.google.com/281831288
[2] https://lore.kernel.org/r/20230601213440.2488667-1-dianders@chromium.org

-Doug

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-06-01 21:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-19 22:55 [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI Douglas Anderson
2023-04-19 22:55 ` [PATCH v8 04/10] nmi: backtrace: Allow runtime arch specific override Douglas Anderson
2023-05-10 15:28 ` [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI Doug Anderson
2023-05-10 16:30   ` Mark Rutland
2023-05-10 16:42     ` Doug Anderson
2023-05-16 10:09       ` Sumit Garg
2023-06-01 21:46       ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).