linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/30] The panic notifiers refactor
@ 2022-04-27 22:48 Guilherme G. Piccoli
  2022-04-27 22:48 ` [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart Guilherme G. Piccoli
                   ` (29 more replies)
  0 siblings, 30 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:48 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alex Elder, Alexander Gordeev, Anton Ivanov, Ard Biesheuvel,
	Arjan van de Ven, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Brian Norris, Catalin Marinas, Chris Zankel,
	Christian Borntraeger, Christophe JAILLET, Cong Wang,
	Corey Minyard, David Gow, David S. Miller, Dexuan Cui,
	Doug Berger, Evan Green, Florian Fainelli, Frederic Weisbecker,
	H. Peter Anvin, Haiyang Zhang, Hari Bathini, Heiko Carstens,
	Helge Deller, Ivan Kokshaysky, James E.J. Bottomley, James Morse,
	Joel Fernandes, Johannes Berg, Jonathan Hunter, Josh Triplett,
	Julius Werner, Justin Chen, K. Y. Srinivasan, Lai Jiangshan,
	Lee Jones, Leo Yan, Marc Zyngier, Markus Mayer,
	Mathieu Desnoyers, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Mihai Carabas, Mike Leach, Mikko Perttunen, Neeraj Upadhyay,
	Nicholas Piggin, Paul Mackerras, Pavel Machek, Richard Henderson,
	Richard Weinberger, Robert Richter, Russell King, Scott Branden,
	Sebastian Andrzej Siewior, Sebastian Reichel, Shile Zhang,
	Stefano Stabellini, Stephen Brennan, Stephen Hemminger,
	Suzuki K Poulose, Sven Schnelle, Thierry Reding,
	Thomas Bogendoerfer, Tianyu Lan, Tony Luck, Valentin Schneider,
	Vasily Gorbik, Wang ShaoBo, Wei Liu, Xiaoming Ni, zhenwei pi

Hey folks, this is an attempt to improve/refactor the dated panic notifiers
infrastructure. This is strongly based in a suggestion made by Pter Mladek [0]
some time ago, and it's finally ready. Below I'll detail the patch ordering,
testing made, etc.
First, a bit about the reason behind this.

The panic notifiers list is an infrastructure that allows callbacks to execute
during panic time. Happens that anybody can add functions there, no ordering
is enforced (by default) and the decision to execute or not such notifiers
before kdump may lead to high risk of failure in crash scenarios - default is
not to execute any of them. There is a parameter acting as a switch for that.
But some architectures require some notifiers, so..it's messy.

The suggestion from Petr came after a patch submission to add a notifiers
filter, allowing the notifiers selection by function name, which was welcomed
by some people, but not by Petr, which claimed the code should indeed have a
refactor - and it made a lot of sense, his suggestion makes code more clear
and reliable.

So, this series might be split in 3 portions:

Part 1: the first 18 patches are mostly fixes (one or two might be considered
improvements), mostly replacing spinlocks/mutexes with safer alternatives for
atomic contexts, like spin_trylock, etc. We also focused on commenting
everything that is possible and clean-up code.

Part 2, the core: patches 19-25 are the main refactor, which splits the panic
notifiers list in three, introduce the concept of panic notifier level and
clean-up and highly comment the code, effectively leading to a more reliable
and clear, yet highly customizable panic path.

Part 3: The remaining 5 patches are fixes that _require the main refactor_
patches, they don't make sense without the core changes - but again, these are
small fixes and not part of the main goal of refactoring the panic code.

I've tried my best to make the patches the more "bisectable" as possible, so
they tend to be self-contained and easy to backport (specially patches from
part 1). Notice that the series is *based on 5.18-rc4* - usually a refactor
like this would be based on linux-next, but since we have many fixes in the
series, I kept it based on mainline tree. Of course I could change that in a
subsequent iteration, if desired.

Since this touches multiple architectures and drivers, it's very difficult to
test it really (by executing all touched code). So, my tests split in two
approaches: build tests and real tests, that involves panic triggering with
and without kdump, changing panic notifiers level, etc.

Build tests (using cross-compilers): alpha, arm, arm64, mips (sgi 22 and 32),
parisc, s390, sparc, um, x86_64 (couldn't get a functional xtensa cross
compiler).

Real/full tests: x86_64 (Hyper-V and QEMU guests) + PowerPC (pseries guest).

Here is the link with the .config files used: https://people.igalia.com/gpiccoli/panic_notifiers_configs/
(tried my best to build all the affected code).

Finally, a bit about my CCing strategy: I've included everybody present in the
original thread [0] plus some maintainers and other interested parties as CC
in the full series. But the patches have individual CC lists, for people that
are definitely related to them but might not care much for the whole series;
nevertheless, _everybody_ mentioned at least once in some patch is CCed in this
cover-letter. Hopefully I didn't forget to include anybody - all the mailing
lists were CCed in the whole series. Apologies in advance if (a) you received
emails you didn't want to or, (b) I forgot to include you but it was something
considered interesting by you.

Thanks in advance for reviews / comments / suggestions!
Cheers,

Guilherme

[0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

Guilherme G. Piccoli (30):
  x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
  ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  notifier: Add panic notifiers info and purge trailing whitespaces
  firmware: google: Convert regular spinlock into trylock on panic path
  misc/pvpanic: Convert regular spinlock into trylock on panic path
  soc: bcm: brcmstb: Document panic notifier action and remove useless header
  mips: ip22: Reword PANICED to PANICKED and remove useless header
  powerpc/setup: Refactor/untangle panic notifiers
  coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
  alpha: Clean-up the panic notifier code
  um: Improve panic notifiers consistency and ordering
  parisc: Replace regular spinlock with spin_trylock on panic path
  s390/consoles: Improve panic notifiers reliability
  panic: Properly identify the panic event to the notifiers' callbacks
  bus: brcmstb_gisb: Clean-up panic/die notifiers
  drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  tracing: Improve panic/die notifiers
  notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  panic: Add the panic hypervisor notifier list
  panic: Add the panic informational notifier list
  panic: Introduce the panic pre-reboot notifier list
  panic: Introduce the panic post-reboot notifier list
  printk: kmsg_dump: Introduce helper to inform number of dumpers
  panic: Refactor the panic path
  panic, printk: Add console flush parameter and convert panic_print to a notifier
  Drivers: hv: Do not force all panic notifiers to execute before kdump
  powerpc: Do not force all panic notifiers to execute before kdump
  panic: Unexport crash_kexec_post_notifiers
  powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic
  um: Avoid duplicate call to kmsg_dump()

 .../admin-guide/kernel-parameters.txt         |  54 ++-
 Documentation/admin-guide/sysctl/kernel.rst   |   5 +-
 arch/alpha/kernel/setup.c                     |  40 +--
 arch/arm/kernel/machine_kexec.c               |   3 +
 arch/arm64/kernel/setup.c                     |   2 +-
 arch/mips/kernel/relocate.c                   |   2 +-
 arch/mips/sgi-ip22/ip22-reset.c               |  13 +-
 arch/mips/sgi-ip32/ip32-reset.c               |   3 +-
 arch/parisc/include/asm/pdc.h                 |   1 +
 arch/parisc/kernel/firmware.c                 |  27 +-
 arch/parisc/kernel/pdc_chassis.c              |   3 +-
 arch/powerpc/include/asm/bug.h                |   2 +-
 arch/powerpc/kernel/fadump.c                  |   8 -
 arch/powerpc/kernel/setup-common.c            |  76 ++--
 arch/powerpc/kernel/traps.c                   |   6 +-
 arch/powerpc/platforms/powernv/opal.c         |   2 +-
 arch/powerpc/platforms/ps3/setup.c            |   2 +-
 arch/powerpc/platforms/pseries/setup.c        |   2 +-
 arch/s390/kernel/ipl.c                        |   4 +-
 arch/s390/kernel/setup.c                      |  19 +-
 arch/sparc/kernel/setup_32.c                  |  27 +-
 arch/sparc/kernel/setup_64.c                  |  29 +-
 arch/sparc/kernel/sstate.c                    |   3 +-
 arch/um/drivers/mconsole_kern.c               |  10 +-
 arch/um/kernel/um_arch.c                      |  11 +-
 arch/x86/include/asm/cpu.h                    |   1 +
 arch/x86/kernel/crash.c                       |   8 +-
 arch/x86/kernel/reboot.c                      |  14 +-
 arch/x86/kernel/setup.c                       |   2 +-
 arch/x86/xen/enlighten.c                      |   2 +-
 arch/xtensa/platforms/iss/setup.c             |   4 +-
 drivers/bus/brcmstb_gisb.c                    |  28 +-
 drivers/char/ipmi/ipmi_msghandler.c           |  12 +-
 drivers/edac/altera_edac.c                    |   3 +-
 drivers/firmware/google/gsmi.c                |  10 +-
 drivers/hv/hv_common.c                        |  12 -
 drivers/hv/vmbus_drv.c                        | 113 +++---
 .../hwtracing/coresight/coresight-cpu-debug.c |  11 +-
 drivers/leds/trigger/ledtrig-activity.c       |   4 +-
 drivers/leds/trigger/ledtrig-heartbeat.c      |   4 +-
 drivers/leds/trigger/ledtrig-panic.c          |   3 +-
 drivers/misc/bcm-vk/bcm_vk_dev.c              |   6 +-
 drivers/misc/ibmasm/heartbeat.c               |  16 +-
 drivers/misc/pvpanic/pvpanic.c                |  14 +-
 drivers/net/ipa/ipa_smp2p.c                   |   5 +-
 drivers/parisc/power.c                        |  21 +-
 drivers/power/reset/ltc2952-poweroff.c        |   4 +-
 drivers/remoteproc/remoteproc_core.c          |   6 +-
 drivers/s390/char/con3215.c                   |  38 +-
 drivers/s390/char/con3270.c                   |  36 +-
 drivers/s390/char/raw3270.c                   |  18 +
 drivers/s390/char/raw3270.h                   |   1 +
 drivers/s390/char/sclp_con.c                  |  30 +-
 drivers/s390/char/sclp_vt220.c                |  44 +--
 drivers/s390/char/zcore.c                     |   5 +-
 drivers/soc/bcm/brcmstb/pm/pm-arm.c           |  18 +-
 drivers/soc/tegra/ari-tegra186.c              |   3 +-
 drivers/staging/olpc_dcon/olpc_dcon.c         |   6 +-
 drivers/video/fbdev/hyperv_fb.c               |  12 +-
 include/linux/console.h                       |   2 +
 include/linux/kmsg_dump.h                     |   7 +
 include/linux/notifier.h                      |   8 +-
 include/linux/panic.h                         |   3 -
 include/linux/panic_notifier.h                |  12 +-
 include/linux/printk.h                        |   1 +
 kernel/hung_task.c                            |   3 +-
 kernel/kexec_core.c                           |   8 +-
 kernel/notifier.c                             |  48 ++-
 kernel/panic.c                                | 335 +++++++++++-------
 kernel/printk/printk.c                        |  76 ++++
 kernel/rcu/tree.c                             |   1 -
 kernel/rcu/tree_stall.h                       |   3 +-
 kernel/trace/trace.c                          |  59 +--
 .../selftests/pstore/pstore_crash_test        |   5 +-
 74 files changed, 953 insertions(+), 486 deletions(-)

-- 
2.36.0


^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
@ 2022-04-27 22:48 ` Guilherme G. Piccoli
  2022-05-09 12:32   ` Guilherme G. Piccoli
  2022-05-09 15:52   ` Sean Christopherson
  2022-04-27 22:48 ` [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path Guilherme G. Piccoli
                   ` (28 subsequent siblings)
  29 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:48 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	David P . Reed, Paolo Bonzini, Sean Christopherson

In the panic path we have a list of functions to be called, the panic
notifiers - such callbacks perform various actions in the machine's
last breath, and sometimes users want them to run before kdump. We
have the parameter "crash_kexec_post_notifiers" for that. When such
parameter is used, the function "crash_smp_send_stop()" is executed
to poweroff all secondary CPUs through the NMI-shootdown mechanism;
part of this process involves disabling virtualization features in
all CPUs (except the main one).

Now, in the emergency restart procedure we have also a way of
disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
what happens though is that in case we already NMI-disabled all CPUs,
the emergency restart fails due to a second addition of the same items
in the NMI list, as per the following log output:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
[...]
Rebooting in 2 seconds..
list_add double add: new=<addr1>, prev=<addr2>, next=<addr1>.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP PTI

In order to reproduce the problem, users just need to set the kernel
parameter "crash_kexec_post_notifiers" *without* kdump set in any
system with the VMX feature present.

Since there is no benefit in re-disabling VMX in all CPUs in case
it was already done, this patch prevents that by guarding the restart
routine against doubly issuing NMIs unnecessarily. Notice we still
need to disable VMX locally in the emergency restart.

Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path")
Cc: David P. Reed <dpreed@deepplum.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/x86/include/asm/cpu.h |  1 +
 arch/x86/kernel/crash.c    |  8 ++++----
 arch/x86/kernel/reboot.c   | 14 ++++++++++++--
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 86e5e4e26fcb..b6a9062d387f 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -36,6 +36,7 @@ extern int _debug_hotplug_cpu(int cpu, int action);
 #endif
 #endif
 
+extern bool crash_cpus_stopped;
 int mwait_usable(const struct cpuinfo_x86 *);
 
 unsigned int x86_family(unsigned int sig);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e8326a8d1c5d..71dd1a990e8d 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -42,6 +42,8 @@
 #include <asm/crash.h>
 #include <asm/cmdline.h>
 
+bool crash_cpus_stopped;
+
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
 	struct boot_params *params;
@@ -108,9 +110,7 @@ void kdump_nmi_shootdown_cpus(void)
 /* Override the weak function in kernel/panic.c */
 void crash_smp_send_stop(void)
 {
-	static int cpus_stopped;
-
-	if (cpus_stopped)
+	if (crash_cpus_stopped)
 		return;
 
 	if (smp_ops.crash_stop_other_cpus)
@@ -118,7 +118,7 @@ void crash_smp_send_stop(void)
 	else
 		smp_send_stop();
 
-	cpus_stopped = 1;
+	crash_cpus_stopped = true;
 }
 
 #else
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index fa700b46588e..2fc42b8402ac 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -589,8 +589,18 @@ static void native_machine_emergency_restart(void)
 	int orig_reboot_type = reboot_type;
 	unsigned short mode;
 
-	if (reboot_emergency)
-		emergency_vmx_disable_all();
+	/*
+	 * We can reach this point in the end of panic path, having
+	 * NMI-disabled all secondary CPUs. This process involves
+	 * disabling the CPU virtualization technologies, so if that
+	 * is the case, we only miss disabling the local CPU VMX...
+	 */
+	if (reboot_emergency) {
+		if (!crash_cpus_stopped)
+			emergency_vmx_disable_all();
+		else
+			cpu_emergency_vmxoff();
+	}
 
 	tboot_shutdown(TB_SHUTDOWN_REBOOT);
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
  2022-04-27 22:48 ` [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart Guilherme G. Piccoli
@ 2022-04-27 22:48 ` Guilherme G. Piccoli
  2022-04-29 16:26   ` Michael Kelley (LINUX)
  2022-04-29 18:20   ` Marc Zyngier
  2022-04-27 22:48 ` [PATCH 03/30] notifier: Add panic notifiers info and purge trailing whitespaces Guilherme G. Piccoli
                   ` (27 subsequent siblings)
  29 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:48 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Marc Zyngier, Russell King

Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
is responsible for that. This makes sense, since we're turning off
such CPUs, putting them in an endless busy-wait loop.

Problem is that there is an alternative path for disabling CPUs,
in the form of function crash_smp_send_stop(), used for kexec/panic
paths. This functions relies in a SMP call that also triggers a
busy-wait loop [at machine_crash_nonpanic_core()], but *without*
disabling interrupts. This might lead to odd scenarios, like early
interrupts in the boot of kexec'd kernel or even interrupts in
other CPUs while the main one still works in the panic path and
assumes all secondary CPUs are (really!) off.

This patch mimics the ipi_cpu_stop() interrupt disable mechanism
in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
secondary CPUs in the kexec/panic path as well.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/arm/kernel/machine_kexec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..ef788ee00519 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
 	set_cpu_online(smp_processor_id(), false);
 	atomic_dec(&waiting_for_crash_ipi);
 
+	local_fiq_disable();
+	local_irq_disable();
+
 	while (1) {
 		cpu_relax();
 		wfe();
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 03/30] notifier: Add panic notifiers info and purge trailing whitespaces
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
  2022-04-27 22:48 ` [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart Guilherme G. Piccoli
  2022-04-27 22:48 ` [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path Guilherme G. Piccoli
@ 2022-04-27 22:48 ` Guilherme G. Piccoli
  2022-04-27 22:48 ` [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path Guilherme G. Piccoli
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:48 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Arjan van de Ven, Cong Wang, Sebastian Andrzej Siewior,
	Valentin Schneider, Xiaoming Ni

Although many notifiers are mentioned in the comments, the panic
notifiers infrastructure is not. Also, the file contains some
trailing whitespaces. This commit fix both issues.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 include/linux/notifier.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 87069b8459af..0589896fc7bd 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -201,12 +201,12 @@ static inline int notifier_to_errno(int ret)
 
 /*
  *	Declared notifiers so far. I can imagine quite a few more chains
- *	over time (eg laptop power reset chains, reboot chain (to clean 
+ *	over time (eg laptop power reset chains, reboot chain (to clean
  *	device units up), device [un]mount chain, module load/unload chain,
- *	low memory chain, screenblank chain (for plug in modular screenblankers) 
+ *	low memory chain, screenblank chain (for plug in modular screenblankers)
  *	VC switch chains (for loadable kernel svgalib VC switch helpers) etc...
  */
- 
+
 /* CPU notfiers are defined in include/linux/cpu.h. */
 
 /* netdevice notifiers are defined in include/linux/netdevice.h */
@@ -217,6 +217,8 @@ static inline int notifier_to_errno(int ret)
 
 /* Virtual Terminal events are defined in include/linux/vt.h. */
 
+/* Panic notifiers are defined in include/linux/panic_notifier.h. */
+
 #define NETLINK_URELEASE	0x0001	/* Unicast netlink socket released */
 
 /* Console keyboard events.
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (2 preceding siblings ...)
  2022-04-27 22:48 ` [PATCH 03/30] notifier: Add panic notifiers info and purge trailing whitespaces Guilherme G. Piccoli
@ 2022-04-27 22:48 ` Guilherme G. Piccoli
  2022-05-03 18:03   ` Evan Green
  2022-04-27 22:48 ` [PATCH 05/30] misc/pvpanic: " Guilherme G. Piccoli
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:48 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Ard Biesheuvel, David Gow, Evan Green, Julius Werner

Currently the gsmi driver registers a panic notifier as well as
reboot and die notifiers. The callbacks registered are called in
atomic and very limited context - for instance, panic disables
preemption, local IRQs and all other CPUs that aren't running the
current panic function.

With that said, taking a spinlock in this scenario is a
dangerous invitation for a deadlock scenario. So, we fix
that in this commit by changing the regular spinlock with
a trylock, which is a safer approach.

Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: Evan Green <evgreen@chromium.org>
Cc: Julius Werner <jwerner@chromium.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 drivers/firmware/google/gsmi.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index adaa492c3d2d..b01ed02e4a87 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -629,7 +629,10 @@ static int gsmi_shutdown_reason(int reason)
 	if (saved_reason & (1 << reason))
 		return 0;
 
-	spin_lock_irqsave(&gsmi_dev.lock, flags);
+	if (!spin_trylock_irqsave(&gsmi_dev.lock, flags)) {
+		rc = -EBUSY;
+		goto out;
+	}
 
 	saved_reason |= (1 << reason);
 
@@ -646,6 +649,7 @@ static int gsmi_shutdown_reason(int reason)
 
 	spin_unlock_irqrestore(&gsmi_dev.lock, flags);
 
+out:
 	if (rc < 0)
 		printk(KERN_ERR "gsmi: Log Shutdown Reason failed\n");
 	else
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (3 preceding siblings ...)
  2022-04-27 22:48 ` [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path Guilherme G. Piccoli
@ 2022-04-27 22:48 ` Guilherme G. Piccoli
  2022-05-10 12:14   ` Petr Mladek
  2022-04-27 22:49 ` [PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header Guilherme G. Piccoli
                   ` (24 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:48 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Christophe JAILLET, Mihai Carabas, Shile Zhang, Wang ShaoBo,
	zhenwei pi

The pvpanic driver relies on panic notifiers to execute a callback
on panic event. Such function is executed in atomic context - the
panic function disables local IRQs, preemption and all other CPUs
that aren't running the panic code.

With that said, it's dangerous to use regular spinlocks in such path,
as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
This patch fixes that by replacing regular spinlocks with the trylock
safer approach.

It also fixes an old comment (about a long gone framebuffer code) and
the notifier priority - we should execute hypervisor notifiers early,
deferring this way the panic action to the hypervisor, as expected by
the users that are setting up pvpanic.

Fixes: b3c0f8774668 ("misc/pvpanic: probe multiple instances")
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Mihai Carabas <mihai.carabas@oracle.com>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Cc: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 drivers/misc/pvpanic/pvpanic.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/pvpanic/pvpanic.c b/drivers/misc/pvpanic/pvpanic.c
index 4b8f1c7d726d..049a12006348 100644
--- a/drivers/misc/pvpanic/pvpanic.c
+++ b/drivers/misc/pvpanic/pvpanic.c
@@ -34,7 +34,9 @@ pvpanic_send_event(unsigned int event)
 {
 	struct pvpanic_instance *pi_cur;
 
-	spin_lock(&pvpanic_lock);
+	if (!spin_trylock(&pvpanic_lock))
+		return;
+
 	list_for_each_entry(pi_cur, &pvpanic_list, list) {
 		if (event & pi_cur->capability & pi_cur->events)
 			iowrite8(event, pi_cur->base);
@@ -55,9 +57,13 @@ pvpanic_panic_notify(struct notifier_block *nb, unsigned long code, void *unused
 	return NOTIFY_DONE;
 }
 
+/*
+ * Call our notifier very early on panic, deferring the
+ * action taken to the hypervisor.
+ */
 static struct notifier_block pvpanic_panic_nb = {
 	.notifier_call = pvpanic_panic_notify,
-	.priority = 1, /* let this called before broken drm_fb_helper() */
+	.priority = INT_MAX,
 };
 
 static void pvpanic_remove(void *param)
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (4 preceding siblings ...)
  2022-04-27 22:48 ` [PATCH 05/30] misc/pvpanic: " Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-02 15:38   ` Florian Fainelli
  2022-04-27 22:49 ` [PATCH 07/30] mips: ip22: Reword PANICED to PANICKED " Guilherme G. Piccoli
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Brian Norris, Doug Berger, Florian Fainelli, Justin Chen,
	Lee Jones, Markus Mayer

The panic notifier of this driver is very simple code-wise, just a memory
write to a special position with some numeric code. But this is not clear
from the semantic point-of-view, and there is no public documentation
about that either.

After discussing this in the mailing-lists [0] and having Florian explained
it very well, this patch just document that in the code for the future
generations asking the same questions. Also, it removes a useless header.

[0] https://lore.kernel.org/lkml/781cafb0-8d06-8b56-907a-5175c2da196a@gmail.com

Fixes: 0b741b8234c8 ("soc: bcm: brcmstb: Add support for S2/S3/S5 suspend states (ARM)")
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Justin Chen <justinpopo6@gmail.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Markus Mayer <mmayer@broadcom.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 drivers/soc/bcm/brcmstb/pm/pm-arm.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/bcm/brcmstb/pm/pm-arm.c b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
index 3cbb165d6e30..870686ae042b 100644
--- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
+++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
@@ -25,7 +25,6 @@
 #include <linux/kernel.h>
 #include <linux/memblock.h>
 #include <linux/module.h>
-#include <linux/notifier.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/panic_notifier.h>
@@ -664,7 +663,20 @@ static void __iomem *brcmstb_ioremap_match(const struct of_device_id *matches,
 
 	return of_io_request_and_map(dn, index, dn->full_name);
 }
-
+/*
+ * The AON is a small domain in the SoC that can retain its state across
+ * various system wide sleep states and specific reset conditions; the
+ * AON DATA RAM is a small RAM of a few words (< 1KB) which can store
+ * persistent information across such events.
+ *
+ * The purpose of the below panic notifier is to help with notifying
+ * the bootloader that a panic occurred and so that it should try its
+ * best to preserve the DRAM contents holding that buffer for recovery
+ * by the kernel as opposed to wiping out DRAM clean again.
+ *
+ * Reference: comment from Florian Fainelli, at
+ * https://lore.kernel.org/lkml/781cafb0-8d06-8b56-907a-5175c2da196a@gmail.com
+ */
 static int brcmstb_pm_panic_notify(struct notifier_block *nb,
 		unsigned long action, void *data)
 {
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 07/30] mips: ip22: Reword PANICED to PANICKED and remove useless header
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (5 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-04 20:32   ` Thomas Bogendoerfer
  2022-04-27 22:49 ` [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers Guilherme G. Piccoli
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Thomas Bogendoerfer

Many other place in the kernel prefer the latter, so let's keep
it consistent in MIPS code as well. Also, removes a useless header.

Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/mips/sgi-ip22/ip22-reset.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/mips/sgi-ip22/ip22-reset.c b/arch/mips/sgi-ip22/ip22-reset.c
index 9028dbbb45dd..8f0861c58080 100644
--- a/arch/mips/sgi-ip22/ip22-reset.c
+++ b/arch/mips/sgi-ip22/ip22-reset.c
@@ -11,7 +11,6 @@
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
 #include <linux/sched/signal.h>
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/pm.h>
 #include <linux/timer.h>
@@ -41,7 +40,7 @@
 static struct timer_list power_timer, blink_timer, debounce_timer;
 static unsigned long blink_timer_timeout;
 
-#define MACHINE_PANICED		1
+#define MACHINE_PANICKED		1
 #define MACHINE_SHUTTING_DOWN	2
 
 static int machine_state;
@@ -112,7 +111,7 @@ static void debounce(struct timer_list *unused)
 		return;
 	}
 
-	if (machine_state & MACHINE_PANICED)
+	if (machine_state & MACHINE_PANICKED)
 		sgimc->cpuctrl0 |= SGIMC_CCTRL0_SYSINIT;
 
 	enable_irq(SGI_PANEL_IRQ);
@@ -120,7 +119,7 @@ static void debounce(struct timer_list *unused)
 
 static inline void power_button(void)
 {
-	if (machine_state & MACHINE_PANICED)
+	if (machine_state & MACHINE_PANICKED)
 		return;
 
 	if ((machine_state & MACHINE_SHUTTING_DOWN) ||
@@ -167,9 +166,9 @@ static irqreturn_t panel_int(int irq, void *dev_id)
 static int panic_event(struct notifier_block *this, unsigned long event,
 		      void *ptr)
 {
-	if (machine_state & MACHINE_PANICED)
+	if (machine_state & MACHINE_PANICKED)
 		return NOTIFY_DONE;
-	machine_state |= MACHINE_PANICED;
+	machine_state |= MACHINE_PANICKED;
 
 	blink_timer_timeout = PANIC_FREQ;
 	blink_timeout(&blink_timer);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (6 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 07/30] mips: ip22: Reword PANICED to PANICKED " Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-05 18:55   ` Hari Bathini
  2022-04-27 22:49 ` [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier Guilherme G. Piccoli
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Hari Bathini, Michael Ellerman,
	Nicholas Piggin, Paul Mackerras

The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.

As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.

Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:

- First one is responsible for hard-disable IRQs and fadump,
should run early;

- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;

- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.

The patch better documents the notifiers and clears the code too,
also removing a useless header.

Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment (there's no need even for adding a credit card, like many "free"
clouds require ¬¬ ).

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/kernel/setup-common.c | 74 ++++++++++++++++++++++--------
 1 file changed, 54 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 518ae5aa9410..52f96b209a96 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -23,7 +23,6 @@
 #include <linux/console.h>
 #include <linux/screen_info.h>
 #include <linux/root_dev.h>
-#include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/unistd.h>
 #include <linux/serial.h>
@@ -680,8 +679,25 @@ int check_legacy_ioport(unsigned long base_port)
 }
 EXPORT_SYMBOL(check_legacy_ioport);
 
-static int ppc_panic_event(struct notifier_block *this,
-                             unsigned long event, void *ptr)
+/*
+ * Panic notifiers setup
+ *
+ * We have 3 notifiers for powerpc, each one from a different "nature":
+ *
+ * - ppc_panic_fadump_handler() is a hypervisor notifier, which hard-disables
+ *   IRQs and deal with the Firmware-Assisted dump, when it is configured;
+ *   should run early in the panic path.
+ *
+ * - dump_kernel_offset() is an informative notifier, just showing the KASLR
+ *   offset if we have RANDOMIZE_BASE set.
+ *
+ * - ppc_panic_platform_handler() is a low-level handler that's registered
+ *   only if the platform wishes to perform final actions in the panic path,
+ *   hence it should run late and might not even return. Currently, only
+ *   pseries and ps3 platforms register callbacks.
+ */
+static int ppc_panic_fadump_handler(struct notifier_block *this,
+				    unsigned long event, void *ptr)
 {
 	/*
 	 * panic does a local_irq_disable, but we really
@@ -691,45 +707,63 @@ static int ppc_panic_event(struct notifier_block *this,
 
 	/*
 	 * If firmware-assisted dump has been registered then trigger
-	 * firmware-assisted dump and let firmware handle everything else.
+	 * its callback and let the firmware handles everything else.
 	 */
 	crash_fadump(NULL, ptr);
-	if (ppc_md.panic)
-		ppc_md.panic(ptr);  /* May not return */
+
 	return NOTIFY_DONE;
 }
 
-static struct notifier_block ppc_panic_block = {
-	.notifier_call = ppc_panic_event,
-	.priority = INT_MIN /* may not return; must be done last */
-};
-
-/*
- * Dump out kernel offset information on panic.
- */
 static int dump_kernel_offset(struct notifier_block *self, unsigned long v,
 			      void *p)
 {
 	pr_emerg("Kernel Offset: 0x%lx from 0x%lx\n",
 		 kaslr_offset(), KERNELBASE);
 
-	return 0;
+	return NOTIFY_DONE;
 }
 
+static int ppc_panic_platform_handler(struct notifier_block *this,
+				      unsigned long event, void *ptr)
+{
+	/*
+	 * This handler is only registered if we have a panic callback
+	 * on ppc_md, hence NULL check is not needed.
+	 * Also, it may not return, so it runs really late on panic path.
+	 */
+	ppc_md.panic(ptr);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block ppc_fadump_block = {
+	.notifier_call = ppc_panic_fadump_handler,
+	.priority = INT_MAX, /* run early, to notify the firmware ASAP */
+};
+
 static struct notifier_block kernel_offset_notifier = {
-	.notifier_call = dump_kernel_offset
+	.notifier_call = dump_kernel_offset,
+};
+
+static struct notifier_block ppc_panic_block = {
+	.notifier_call = ppc_panic_platform_handler,
+	.priority = INT_MIN, /* may not return; must be done last */
 };
 
 void __init setup_panic(void)
 {
+	/* Hard-disables IRQs + deal with FW-assisted dump (fadump) */
+	atomic_notifier_chain_register(&panic_notifier_list,
+				       &ppc_fadump_block);
+
 	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
 		atomic_notifier_chain_register(&panic_notifier_list,
 					       &kernel_offset_notifier);
 
-	/* PPC64 always does a hard irq disable in its panic handler */
-	if (!IS_ENABLED(CONFIG_PPC64) && !ppc_md.panic)
-		return;
-	atomic_notifier_chain_register(&panic_notifier_list, &ppc_panic_block);
+	/* Low-level platform-specific routines that should run on panic */
+	if (ppc_md.panic)
+		atomic_notifier_chain_register(&panic_notifier_list,
+					       &ppc_panic_block);
 }
 
 #ifdef CONFIG_CHECK_CACHE_COHERENCY
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (7 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-28  8:11   ` Suzuki K Poulose
  2022-04-27 22:49 ` [PATCH 10/30] alpha: Clean-up the panic notifier code Guilherme G. Piccoli
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Leo Yan, Mathieu Poirier, Mike Leach, Suzuki K Poulose

The panic notifier infrastructure executes registered callbacks when
a panic event happens - such callbacks are executed in atomic context,
with interrupts and preemption disabled in the running CPU and all other
CPUs disabled. That said, mutexes in such context are not a good idea.

This patch replaces a regular mutex with a mutex_trylock safer approach;
given the nature of the mutex used in the driver, it should be pretty
uncommon being unable to acquire such mutex in the panic path, hence
no functional change should be observed (and if it is, that would be
likely a deadlock with the regular mutex).

Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 8845ec4b4402..1874df7c6a73 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -380,9 +380,10 @@ static int debug_notifier_call(struct notifier_block *self,
 	int cpu;
 	struct debug_drvdata *drvdata;
 
-	mutex_lock(&debug_lock);
+	/* Bail out if we can't acquire the mutex or the functionality is off */
+	if (!mutex_trylock(&debug_lock))
+		return NOTIFY_DONE;
 
-	/* Bail out if the functionality is disabled */
 	if (!debug_enable)
 		goto skip_dump;
 
@@ -401,7 +402,7 @@ static int debug_notifier_call(struct notifier_block *self,
 
 skip_dump:
 	mutex_unlock(&debug_lock);
-	return 0;
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block debug_notifier = {
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 10/30] alpha: Clean-up the panic notifier code
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (8 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-09 14:13   ` Guilherme G. Piccoli
  2022-04-27 22:49 ` [PATCH 11/30] um: Improve panic notifiers consistency and ordering Guilherme G. Piccoli
                   ` (19 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Ivan Kokshaysky, Matt Turner, Richard Henderson

The alpha panic notifier has some code issues, not following
the conventions of other notifiers. Also, it might halt the
machine but still it is set to run as early as possible, which
doesn't seem to be a good idea.

This patch cleans the code, and set the notifier to run as the
latest, following the same approach other architectures are doing.
Also, we remove the unnecessary include of a header already
included indirectly.

Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/alpha/kernel/setup.c | 36 +++++++++++++++---------------------
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index b4fbbba30aa2..d88bdf852753 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -41,19 +41,11 @@
 #include <linux/sysrq.h>
 #include <linux/reboot.h>
 #endif
-#include <linux/notifier.h>
 #include <asm/setup.h>
 #include <asm/io.h>
 #include <linux/log2.h>
 #include <linux/export.h>
 
-static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
-static struct notifier_block alpha_panic_block = {
-	alpha_panic_event,
-        NULL,
-        INT_MAX /* try to do it first */
-};
-
 #include <linux/uaccess.h>
 #include <asm/hwrpb.h>
 #include <asm/dma.h>
@@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
 };
 #endif
 
+static int alpha_panic_event(struct notifier_block *this,
+			     unsigned long event, void *ptr)
+{
+	/* If we are using SRM and serial console, just hard halt here. */
+	if (alpha_using_srm && srmcons_output)
+		__halt();
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block alpha_panic_block = {
+	.notifier_call = alpha_panic_event,
+	.priority = INT_MIN, /* may not return, do it last */
+};
+
 void __init
 setup_arch(char **cmdline_p)
 {
@@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
 	.show	= show_cpuinfo,
 };
 
-
-static int
-alpha_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
-{
-#if 1
-	/* FIXME FIXME FIXME */
-	/* If we are using SRM and serial console, just hard halt here. */
-	if (alpha_using_srm && srmcons_output)
-		__halt();
-#endif
-        return NOTIFY_DONE;
-}
-
 static __init int add_pcspkr(void)
 {
 	struct platform_device *pd;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 11/30] um: Improve panic notifiers consistency and ordering
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (9 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 10/30] alpha: Clean-up the panic notifier code Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-28  8:30   ` Johannes Berg
  2022-05-10 14:28   ` Petr Mladek
  2022-04-27 22:49 ` [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path Guilherme G. Piccoli
                   ` (18 subsequent siblings)
  29 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Anton Ivanov, Johannes Berg, Richard Weinberger

Currently the panic notifiers from user mode linux don't follow
the convention for most of the other notifiers present in the
kernel (indentation, priority setting, numeric return).
More important, the priorities could be improved, since it's a
special case (userspace), hence we could run the notifiers earlier;
user mode linux shouldn't care much with other panic notifiers but
the ordering among the mconsole and arch notifier is important,
given that the arch one effectively triggers a core dump.

This patch fixes that by running the mconsole notifier as the first
panic notifier, followed by the architecture one (that coredumps).
Also, we remove a useless header inclusion.

Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/um/drivers/mconsole_kern.c | 8 +++-----
 arch/um/kernel/um_arch.c        | 8 ++++----
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 8ca67a692683..2ea0421bcc3f 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -11,7 +11,6 @@
 #include <linux/list.h>
 #include <linux/mm.h>
 #include <linux/module.h>
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/reboot.h>
 #include <linux/sched/debug.h>
@@ -846,13 +845,12 @@ static int notify_panic(struct notifier_block *self, unsigned long unused1,
 
 	mconsole_notify(notify_socket, MCONSOLE_PANIC, message,
 			strlen(message) + 1);
-	return 0;
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block panic_exit_notifier = {
-	.notifier_call 		= notify_panic,
-	.next 			= NULL,
-	.priority 		= 1
+	.notifier_call	= notify_panic,
+	.priority	= INT_MAX, /* run as soon as possible */
 };
 
 static int add_notifier(void)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 0760e24f2eba..4485b1a7c8e4 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -246,13 +246,13 @@ static int panic_exit(struct notifier_block *self, unsigned long unused1,
 	bust_spinlocks(0);
 	uml_exitcode = 1;
 	os_dump_core();
-	return 0;
+
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block panic_exit_notifier = {
-	.notifier_call 		= panic_exit,
-	.next 			= NULL,
-	.priority 		= 0
+	.notifier_call	= panic_exit,
+	.priority	= INT_MAX - 1, /* run as 2nd notifier, won't return */
 };
 
 void uml_finishsetup(void)
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (10 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 11/30] um: Improve panic notifiers consistency and ordering Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-28 16:55   ` Helge Deller
  2022-04-27 22:49 ` [PATCH 13/30] s390/consoles: Improve panic notifiers reliability Guilherme G. Piccoli
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Helge Deller, James E.J. Bottomley

The panic notifiers' callbacks execute in an atomic context, with
interrupts/preemption disabled, and all CPUs not running the panic
function are off, so it's very dangerous to wait on a regular
spinlock, there's a risk of deadlock.

This patch refactors the panic notifier of parisc/power driver
to make use of spin_trylock - for that, we've added a second
version of the soft-power function. Also, some comments were
reorganized and trailing white spaces, useless header inclusion
and blank lines were removed.

Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/parisc/include/asm/pdc.h |  1 +
 arch/parisc/kernel/firmware.c | 27 +++++++++++++++++++++++----
 drivers/parisc/power.c        | 17 ++++++++++-------
 3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h
index b643092d4b98..7a106008e258 100644
--- a/arch/parisc/include/asm/pdc.h
+++ b/arch/parisc/include/asm/pdc.h
@@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap);
 int pdc_do_reset(void);
 int pdc_soft_power_info(unsigned long *power_reg);
 int pdc_soft_power_button(int sw_control);
+int pdc_soft_power_button_panic(int sw_control);
 void pdc_io_reset(void);
 void pdc_io_reset_devices(void);
 int pdc_iodc_getc(void);
diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
index 6a7e315bcc2e..0e2f70b592f4 100644
--- a/arch/parisc/kernel/firmware.c
+++ b/arch/parisc/kernel/firmware.c
@@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long *power_reg)
 }
 
 /*
- * pdc_soft_power_button - Control the soft power button behaviour
- * @sw_control: 0 for hardware control, 1 for software control 
+ * pdc_soft_power_button{_panic} - Control the soft power button behaviour
+ * @sw_control: 0 for hardware control, 1 for software control
  *
  *
  * This PDC function places the soft power button under software or
  * hardware control.
- * Under software control the OS may control to when to allow to shut 
- * down the system. Under hardware control pressing the power button 
+ * Under software control the OS may control to when to allow to shut
+ * down the system. Under hardware control pressing the power button
  * powers off the system immediately.
+ *
+ * The _panic version relies in spin_trylock to prevent deadlock
+ * on panic path.
  */
 int pdc_soft_power_button(int sw_control)
 {
@@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
 	return retval;
 }
 
+int pdc_soft_power_button_panic(int sw_control)
+{
+	int retval;
+	unsigned long flags;
+
+	if (!spin_trylock_irqsave(&pdc_lock, flags)) {
+		pr_emerg("Couldn't enable soft power button\n");
+		return -EBUSY; /* ignored by the panic notifier */
+	}
+
+	retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, __pa(pdc_result), sw_control);
+	spin_unlock_irqrestore(&pdc_lock, flags);
+
+	return retval;
+}
+
 /*
  * pdc_io_reset - Hack to avoid overlapping range registers of Bridges devices.
  * Primarily a problem on T600 (which parisc-linux doesn't support) but
diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
index 456776bd8ee6..8512884de2cf 100644
--- a/drivers/parisc/power.c
+++ b/drivers/parisc/power.c
@@ -37,7 +37,6 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/reboot.h>
 #include <linux/sched/signal.h>
@@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x)
 
 
 
-/* parisc_panic_event() is called by the panic handler.
- * As soon as a panic occurs, our tasklets above will not be
- * executed any longer. This function then re-enables the 
- * soft-power switch and allows the user to switch off the system
+/*
+ * parisc_panic_event() is called by the panic handler.
+ *
+ * As soon as a panic occurs, our tasklets above will not
+ * be executed any longer. This function then re-enables
+ * the soft-power switch and allows the user to switch off
+ * the system. We rely in pdc_soft_power_button_panic()
+ * since this version spin_trylocks (instead of regular
+ * spinlock), preventing deadlocks on panic path.
  */
 static int parisc_panic_event(struct notifier_block *this,
 		unsigned long event, void *ptr)
 {
 	/* re-enable the soft-power switch */
-	pdc_soft_power_button(0);
+	pdc_soft_power_button_panic(0);
 	return NOTIFY_DONE;
 }
 
@@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = {
 	.priority	= INT_MAX,
 };
 
-
 static int __init power_init(void)
 {
 	unsigned long ret;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 13/30] s390/consoles: Improve panic notifiers reliability
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (11 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-29 18:46   ` Heiko Carstens
  2022-04-27 22:49 ` [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks Guilherme G. Piccoli
                   ` (16 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Christian Borntraeger, Heiko Carstens,
	Sven Schnelle, Vasily Gorbik

Currently many console drivers for s390 rely on panic/reboot notifiers
to invoke callbacks on these events. The panic() function disables local
IRQs, secondary CPUs and preemption, so callbacks invoked on panic are
effectively running in atomic context.

Happens that most of these console callbacks from s390 doesn't take the
proper care with regards to atomic context, like taking spinlocks that
might be taken in other function/CPU and hence will cause a lockup
situation.

The goal for this patch is to improve the notifiers reliability, acting
on 4 console drivers, as detailed below:

(1) con3215: changed a regular spinlock to the trylock alternative.

(2) con3270: also changed a regular spinlock to its trylock counterpart,
but here we also have another problem: raw3270_activate_view() takes a
different spinlock. So, we worked a helper to validate if this other lock
is safe to acquire, and if so, raw3270_activate_view() should be safe.

Notice though that there is a functional change here: it's now possible
to continue the notifier code [reaching con3270_wait_write() and
con3270_rebuild_update()] without executing raw3270_activate_view().

(3) sclp: a global lock is used heavily in the functions called from
the notifier, so we added a check here - if the lock is taken already,
we just bail-out, preventing the lockup.

(4) sclp_vt220: same as (3), a lock validation was added to prevent the
potential lockup problem.

Besides (1)-(4), we also removed useless void functions, adding the
code called from the notifier inside its own body, and changed the
priority of such notifiers to execute late, since they are "heavyweight"
for the panic environment, so we aim to reduce risks here.
Changed return values to NOTIFY_DONE as well, the standard one.

Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

As a design choice, the option used here to verify a given spinlock is taken
was the function "spin_is_locked()" - but we noticed that it is not often used.
An alternative would to take the lock with a spin_trylock() and if it succeeds,
just release the spinlock and continue the code. But that seemed weird...

Also, we'd like to ask a good validation of case (2) potential functionality
change from the s390 console experts - far from expert here, and in our naive
code observation, that seems fine, but that analysis might be missing some
corner case.

Thanks in advance!

 drivers/s390/char/con3215.c    | 36 +++++++++++++++--------------
 drivers/s390/char/con3270.c    | 34 +++++++++++++++------------
 drivers/s390/char/raw3270.c    | 18 +++++++++++++++
 drivers/s390/char/raw3270.h    |  1 +
 drivers/s390/char/sclp_con.c   | 28 +++++++++++++----------
 drivers/s390/char/sclp_vt220.c | 42 +++++++++++++++++++---------------
 6 files changed, 96 insertions(+), 63 deletions(-)

diff --git a/drivers/s390/char/con3215.c b/drivers/s390/char/con3215.c
index f356607835d8..192198bd3dc4 100644
--- a/drivers/s390/char/con3215.c
+++ b/drivers/s390/char/con3215.c
@@ -771,35 +771,37 @@ static struct tty_driver *con3215_device(struct console *c, int *index)
 }
 
 /*
- * panic() calls con3215_flush through a panic_notifier
- * before the system enters a disabled, endless loop.
+ * The below function is called as a panic/reboot notifier before the
+ * system enters a disabled, endless loop.
+ *
+ * Notice we must use the spin_trylock() alternative, to prevent lockups
+ * in atomic context (panic routine runs with secondary CPUs, local IRQs
+ * and preemption disabled).
  */
-static void con3215_flush(void)
-{
-	struct raw3215_info *raw;
-	unsigned long flags;
-
-	raw = raw3215[0];  /* console 3215 is the first one */
-	spin_lock_irqsave(get_ccwdev_lock(raw->cdev), flags);
-	raw3215_make_room(raw, RAW3215_BUFFER_SIZE);
-	spin_unlock_irqrestore(get_ccwdev_lock(raw->cdev), flags);
-}
-
 static int con3215_notify(struct notifier_block *self,
 			  unsigned long event, void *data)
 {
-	con3215_flush();
-	return NOTIFY_OK;
+	struct raw3215_info *raw;
+	unsigned long flags;
+
+	raw = raw3215[0];  /* console 3215 is the first one */
+	if (!spin_trylock_irqsave(get_ccwdev_lock(raw->cdev), flags))
+		return NOTIFY_DONE;
+
+	raw3215_make_room(raw, RAW3215_BUFFER_SIZE);
+	spin_unlock_irqrestore(get_ccwdev_lock(raw->cdev), flags);
+
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block on_panic_nb = {
 	.notifier_call = con3215_notify,
-	.priority = 0,
+	.priority = INT_MIN + 1, /* run the callback late */
 };
 
 static struct notifier_block on_reboot_nb = {
 	.notifier_call = con3215_notify,
-	.priority = 0,
+	.priority = INT_MIN + 1, /* run the callback late */
 };
 
 /*
diff --git a/drivers/s390/char/con3270.c b/drivers/s390/char/con3270.c
index e4592890f20a..476202f3d8a0 100644
--- a/drivers/s390/char/con3270.c
+++ b/drivers/s390/char/con3270.c
@@ -535,20 +535,29 @@ con3270_wait_write(struct con3270 *cp)
 }
 
 /*
- * panic() calls con3270_flush through a panic_notifier
- * before the system enters a disabled, endless loop.
+ * The below function is called as a panic/reboot notifier before the
+ * system enters a disabled, endless loop.
+ *
+ * Notice we must use the spin_trylock() alternative, to prevent lockups
+ * in atomic context (panic routine runs with secondary CPUs, local IRQs
+ * and preemption disabled).
  */
-static void
-con3270_flush(void)
+static int con3270_notify(struct notifier_block *self,
+			  unsigned long event, void *data)
 {
 	struct con3270 *cp;
 	unsigned long flags;
 
 	cp = condev;
 	if (!cp->view.dev)
-		return;
-	raw3270_activate_view(&cp->view);
-	spin_lock_irqsave(&cp->view.lock, flags);
+		return NOTIFY_DONE;
+
+	if (!raw3270_view_lock_unavailable(&cp->view))
+		raw3270_activate_view(&cp->view);
+
+	if (!spin_trylock_irqsave(&cp->view.lock, flags))
+		return NOTIFY_DONE;
+
 	con3270_wait_write(cp);
 	cp->nr_up = 0;
 	con3270_rebuild_update(cp);
@@ -560,23 +569,18 @@ con3270_flush(void)
 		con3270_wait_write(cp);
 	}
 	spin_unlock_irqrestore(&cp->view.lock, flags);
-}
 
-static int con3270_notify(struct notifier_block *self,
-			  unsigned long event, void *data)
-{
-	con3270_flush();
-	return NOTIFY_OK;
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block on_panic_nb = {
 	.notifier_call = con3270_notify,
-	.priority = 0,
+	.priority = INT_MIN + 1, /* run the callback late */
 };
 
 static struct notifier_block on_reboot_nb = {
 	.notifier_call = con3270_notify,
-	.priority = 0,
+	.priority = INT_MIN + 1, /* run the callback late */
 };
 
 /*
diff --git a/drivers/s390/char/raw3270.c b/drivers/s390/char/raw3270.c
index dfde0d941c3c..7b87f9451508 100644
--- a/drivers/s390/char/raw3270.c
+++ b/drivers/s390/char/raw3270.c
@@ -830,6 +830,24 @@ raw3270_create_device(struct ccw_device *cdev)
 	return rp;
 }
 
+/*
+ * This helper just validates that it is safe to activate a
+ * view in the panic() context, due to locking restrictions.
+ */
+int
+raw3270_view_lock_unavailable(struct raw3270_view *view)
+{
+	struct raw3270 *rp = view->dev;
+
+	if (!rp)
+		return -ENODEV;
+
+	if (spin_is_locked(get_ccwdev_lock(rp->cdev)))
+		return -EBUSY;
+
+	return 0;
+}
+
 /*
  * Activate a view.
  */
diff --git a/drivers/s390/char/raw3270.h b/drivers/s390/char/raw3270.h
index c6645167cd2b..4cb6b5ee44ca 100644
--- a/drivers/s390/char/raw3270.h
+++ b/drivers/s390/char/raw3270.h
@@ -160,6 +160,7 @@ struct raw3270_view {
 };
 
 int raw3270_add_view(struct raw3270_view *, struct raw3270_fn *, int, int);
+int raw3270_view_lock_unavailable(struct raw3270_view *view);
 int raw3270_activate_view(struct raw3270_view *);
 void raw3270_del_view(struct raw3270_view *);
 void raw3270_deactivate_view(struct raw3270_view *);
diff --git a/drivers/s390/char/sclp_con.c b/drivers/s390/char/sclp_con.c
index fe5ee2646fcf..e5d947c763ea 100644
--- a/drivers/s390/char/sclp_con.c
+++ b/drivers/s390/char/sclp_con.c
@@ -220,30 +220,34 @@ sclp_console_device(struct console *c, int *index)
 }
 
 /*
- * Make sure that all buffers will be flushed to the SCLP.
+ * This panic/reboot notifier makes sure that all buffers
+ * will be flushed to the SCLP.
  */
-static void
-sclp_console_flush(void)
-{
-	sclp_conbuf_emit();
-	sclp_console_sync_queue();
-}
-
 static int sclp_console_notify(struct notifier_block *self,
 			       unsigned long event, void *data)
 {
-	sclp_console_flush();
-	return NOTIFY_OK;
+	/*
+	 * Perform the lock check before effectively getting the
+	 * lock on sclp_conbuf_emit() / sclp_console_sync_queue()
+	 * to prevent potential lockups in atomic context.
+	 */
+	if (spin_is_locked(&sclp_con_lock))
+		return NOTIFY_DONE;
+
+	sclp_conbuf_emit();
+	sclp_console_sync_queue();
+
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block on_panic_nb = {
 	.notifier_call = sclp_console_notify,
-	.priority = 1,
+	.priority = INT_MIN + 1, /* run the callback late */
 };
 
 static struct notifier_block on_reboot_nb = {
 	.notifier_call = sclp_console_notify,
-	.priority = 1,
+	.priority = INT_MIN + 1, /* run the callback late */
 };
 
 /*
diff --git a/drivers/s390/char/sclp_vt220.c b/drivers/s390/char/sclp_vt220.c
index 3b4e7e5d9b71..a32f34a1c6d2 100644
--- a/drivers/s390/char/sclp_vt220.c
+++ b/drivers/s390/char/sclp_vt220.c
@@ -769,21 +769,6 @@ __initcall(sclp_vt220_tty_init);
 
 #ifdef CONFIG_SCLP_VT220_CONSOLE
 
-static void __sclp_vt220_flush_buffer(void)
-{
-	unsigned long flags;
-
-	sclp_vt220_emit_current();
-	spin_lock_irqsave(&sclp_vt220_lock, flags);
-	del_timer(&sclp_vt220_timer);
-	while (sclp_vt220_queue_running) {
-		spin_unlock_irqrestore(&sclp_vt220_lock, flags);
-		sclp_sync_wait();
-		spin_lock_irqsave(&sclp_vt220_lock, flags);
-	}
-	spin_unlock_irqrestore(&sclp_vt220_lock, flags);
-}
-
 static void
 sclp_vt220_con_write(struct console *con, const char *buf, unsigned int count)
 {
@@ -797,22 +782,41 @@ sclp_vt220_con_device(struct console *c, int *index)
 	return sclp_vt220_driver;
 }
 
+/*
+ * This panic/reboot notifier runs in atomic context, so
+ * locking restrictions apply to prevent potential lockups.
+ */
 static int
 sclp_vt220_notify(struct notifier_block *self,
 			  unsigned long event, void *data)
 {
-	__sclp_vt220_flush_buffer();
-	return NOTIFY_OK;
+	unsigned long flags;
+
+	if (spin_is_locked(&sclp_vt220_lock))
+		return NOTIFY_DONE;
+
+	sclp_vt220_emit_current();
+
+	spin_lock_irqsave(&sclp_vt220_lock, flags);
+	del_timer(&sclp_vt220_timer);
+	while (sclp_vt220_queue_running) {
+		spin_unlock_irqrestore(&sclp_vt220_lock, flags);
+		sclp_sync_wait();
+		spin_lock_irqsave(&sclp_vt220_lock, flags);
+	}
+	spin_unlock_irqrestore(&sclp_vt220_lock, flags);
+
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block on_panic_nb = {
 	.notifier_call = sclp_vt220_notify,
-	.priority = 1,
+	.priority = INT_MIN + 1, /* run the callback late */
 };
 
 static struct notifier_block on_reboot_nb = {
 	.notifier_call = sclp_vt220_notify,
-	.priority = 1,
+	.priority = INT_MIN + 1, /* run the callback late */
 };
 
 /* Structure needed to register with printk */
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (12 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 13/30] s390/consoles: Improve panic notifiers reliability Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-10 15:16   ` Petr Mladek
  2022-04-27 22:49 ` [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers Guilherme G. Piccoli
                   ` (15 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

The notifiers infrastructure provides a way to pass an "id" to the
callbacks to determine what kind of event happened, i.e., what is
the reason behind they getting called.

The panic notifier currently pass 0, but this is soon to be
used in a multi-targeted notifier, so let's pass a meaningful
"id" over there.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 include/linux/panic_notifier.h | 5 +++++
 kernel/panic.c                 | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 41e32483d7a7..07dced83a783 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -9,4 +9,9 @@ extern struct atomic_notifier_head panic_notifier_list;
 
 extern bool crash_kexec_post_notifiers;
 
+enum panic_notifier_val {
+	PANIC_UNUSED,
+	PANIC_NOTIFIER = 0xDEAD,
+};
+
 #endif	/* _LINUX_PANIC_NOTIFIERS_H */
diff --git a/kernel/panic.c b/kernel/panic.c
index eb4dfb932c85..523bc9ccd0e9 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -287,7 +287,7 @@ void panic(const char *fmt, ...)
 	 * Run any panic handlers, including those that might need to
 	 * add information to the kmsg dump output.
 	 */
-	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
+	atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);
 
 	panic_print_sys_info(false);
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (13 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-02 15:38   ` Florian Fainelli
  2022-05-10 15:28   ` Petr Mladek
  2022-04-27 22:49 ` [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers Guilherme G. Piccoli
                   ` (14 subsequent siblings)
  29 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Brian Norris, Florian Fainelli

This patch improves the panic/die notifiers in this driver by
making use of a passed "id" instead of comparing pointer
address; also, it removes an useless prototype declaration
and unnecessary header inclusion.

This is part of a panic notifiers refactor - this notifier in
the future will be moved to a new list, that encompass the
information notifiers only.

Fixes: 9eb60880d9a9 ("bus: brcmstb_gisb: add notifier handling")
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 drivers/bus/brcmstb_gisb.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 183d5cc37d42..1ea7b015e225 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -19,7 +19,6 @@
 #include <linux/pm.h>
 #include <linux/kernel.h>
 #include <linux/kdebug.h>
-#include <linux/notifier.h>
 
 #ifdef CONFIG_MIPS
 #include <asm/traps.h>
@@ -347,25 +346,14 @@ static irqreturn_t brcmstb_gisb_bp_handler(int irq, void *dev_id)
 /*
  * Dump out gisb errors on die or panic.
  */
-static int dump_gisb_error(struct notifier_block *self, unsigned long v,
-			   void *p);
-
-static struct notifier_block gisb_die_notifier = {
-	.notifier_call = dump_gisb_error,
-};
-
-static struct notifier_block gisb_panic_notifier = {
-	.notifier_call = dump_gisb_error,
-};
-
 static int dump_gisb_error(struct notifier_block *self, unsigned long v,
 			   void *p)
 {
 	struct brcmstb_gisb_arb_device *gdev;
-	const char *reason = "panic";
+	const char *reason = "die";
 
-	if (self == &gisb_die_notifier)
-		reason = "die";
+	if (v == PANIC_NOTIFIER)
+		reason = "panic";
 
 	/* iterate over each GISB arb registered handlers */
 	list_for_each_entry(gdev, &brcmstb_gisb_arb_device_list, next)
@@ -374,6 +362,14 @@ static int dump_gisb_error(struct notifier_block *self, unsigned long v,
 	return NOTIFY_DONE;
 }
 
+static struct notifier_block gisb_die_notifier = {
+	.notifier_call = dump_gisb_error,
+};
+
+static struct notifier_block gisb_panic_notifier = {
+	.notifier_call = dump_gisb_error,
+};
+
 static DEVICE_ATTR(gisb_arb_timeout, S_IWUSR | S_IRUGO,
 		gisb_arb_get_timeout, gisb_arb_set_timeout);
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (14 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-29 17:16   ` Michael Kelley (LINUX)
  2022-04-27 22:49 ` [PATCH 17/30] tracing: Improve panic/die notifiers Guilherme G. Piccoli
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Andrea Parri, Dexuan Cui, Haiyang Zhang, K. Y. Srinivasan,
	Stephen Hemminger, Tianyu Lan, Wei Liu

Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing a hypervisor <-> guest connection,
disabling a paravirtualized timer) as well as to data collection (sending
panic information to the hypervisor) and framebuffer management.

The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. This patch, as part of an
effort to clean-up the panic notifiers mechanism and better document
things, address some of the issues/complexities of Hyper-V panic handling
through the following changes:

(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.

This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.

(b) The new panic notifier created after (a) is only doing a single thing:
cleaning the vmbus connection. This procedure might cause a delay (due to
hypervisor I/O completion), so we postpone that to run late. But more
relevant: this *same* vmbus unloading happens in the crash_shutdown()
handler, so if kdump is set, we can safely skip this panic notifier and
defer such clean-up to the kexec crash handler.

(c) There is also a Hyper-V framebuffer panic notifier, which relies in
doing a vmbus operation that demands a valid connection. So, we must
order this notifier with the panic notifier from vmbus_drv.c, in order to
guarantee that the framebuffer code executes before the vmbus connection
is unloaded.

Also, this commit removes a useless header.

Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.

Fixes: 792f232d57ff ("Drivers: hv: vmbus: Fix potential crash on module unload")
Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

Special thanks to Michael Kelley for the good information about the Hyper-V
panic path in email threads some months ago, and to Fabio for the testing
performed.

Michael and all Microsoft folks: a careful analysis to double-check our changes
and assumptions here is really appreciated, this code is complex and intricate,
it is possible some corner case might have been overlooked.

Thanks in advance!

 drivers/hv/vmbus_drv.c          | 109 ++++++++++++++++++++------------
 drivers/video/fbdev/hyperv_fb.c |   8 +++
 2 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 14de17087864..f37f12d48001 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -24,11 +24,11 @@
 #include <linux/sched/task_stack.h>
 
 #include <linux/delay.h>
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/ptrace.h>
 #include <linux/screen_info.h>
 #include <linux/kdebug.h>
+#include <linux/kexec.h>
 #include <linux/efi.h>
 #include <linux/random.h>
 #include <linux/kernel.h>
@@ -68,51 +68,75 @@ static int hyperv_report_reg(void)
 	return !sysctl_record_panic_msg || !hv_panic_page;
 }
 
-static int hyperv_panic_event(struct notifier_block *nb, unsigned long val,
+/*
+ * The panic notifier below is responsible solely for unloading the
+ * vmbus connection, which is necessary in a panic event. But notice
+ * that this same unloading procedure is executed in the Hyper-V
+ * crash_shutdown() handler [see hv_crash_handler()], which basically
+ * means that we can postpone its execution if we have kdump set,
+ * since it will run the crash_shutdown() handler anyway. Even more
+ * intrincated is the relation of this notifier with Hyper-V framebuffer
+ * panic notifier - we need vmbus connection alive there in order to
+ * succeed, so we need to order both with each other [for reference see
+ * hvfb_on_panic()] - this is done using notifiers' priorities.
+ */
+static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
 			      void *args)
+{
+	if (!kexec_crash_loaded())
+		vmbus_initiate_unload(true);
+
+	return NOTIFY_DONE;
+}
+static struct notifier_block hyperv_panic_vmbus_unload_block = {
+	.notifier_call	= hv_panic_vmbus_unload,
+	.priority	= INT_MIN + 1, /* almost the latest one to execute */
+};
+
+/*
+ * The following callback works both as die and panic notifier; its
+ * goal is to provide panic information to the hypervisor unless the
+ * kmsg dumper is gonna be used [see hv_kmsg_dump()], which provides
+ * more information but is not always available.
+ *
+ * Notice that both the panic/die report notifiers are registered only
+ * if we have the capability HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE set.
+ */
+static int hv_die_panic_notify_crash(struct notifier_block *nb,
+				     unsigned long val, void *args)
 {
 	struct pt_regs *regs;
+	bool is_die;
 
-	vmbus_initiate_unload(true);
-
-	/*
-	 * Hyper-V should be notified only once about a panic.  If we will be
-	 * doing hv_kmsg_dump() with kmsg data later, don't do the notification
-	 * here.
-	 */
-	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE
-	    && hyperv_report_reg()) {
+	/* Don't notify Hyper-V unless we have a die oops event or panic. */
+	switch (val) {
+	case DIE_OOPS:
+		is_die = true;
+		regs = ((struct die_args *)args)->regs;
+		break;
+	case PANIC_NOTIFIER:
+		is_die = false;
 		regs = current_pt_regs();
-		hyperv_report_panic(regs, val, false);
-	}
-	return NOTIFY_DONE;
-}
-
-static int hyperv_die_event(struct notifier_block *nb, unsigned long val,
-			    void *args)
-{
-	struct die_args *die = args;
-	struct pt_regs *regs = die->regs;
-
-	/* Don't notify Hyper-V if the die event is other than oops */
-	if (val != DIE_OOPS)
+		break;
+	default:
 		return NOTIFY_DONE;
+	}
 
 	/*
-	 * Hyper-V should be notified only once about a panic.  If we will be
-	 * doing hv_kmsg_dump() with kmsg data later, don't do the notification
-	 * here.
+	 * Hyper-V should be notified only once about a panic/die. If we will
+	 * be calling hv_kmsg_dump() later with kmsg data, don't do the
+	 * notification here.
 	 */
 	if (hyperv_report_reg())
-		hyperv_report_panic(regs, val, true);
+		hyperv_report_panic(regs, val, is_die);
+
 	return NOTIFY_DONE;
 }
-
-static struct notifier_block hyperv_die_block = {
-	.notifier_call = hyperv_die_event,
+static struct notifier_block hyperv_die_report_block = {
+	.notifier_call = hv_die_panic_notify_crash,
 };
-static struct notifier_block hyperv_panic_block = {
-	.notifier_call = hyperv_panic_event,
+static struct notifier_block hyperv_panic_report_block = {
+	.notifier_call = hv_die_panic_notify_crash,
 };
 
 static const char *fb_mmio_name = "fb_range";
@@ -1589,16 +1613,17 @@ static int vmbus_bus_init(void)
 		if (hyperv_crash_ctl & HV_CRASH_CTL_CRASH_NOTIFY_MSG)
 			hv_kmsg_dump_register();
 
-		register_die_notifier(&hyperv_die_block);
+		register_die_notifier(&hyperv_die_report_block);
+		atomic_notifier_chain_register(&panic_notifier_list,
+						&hyperv_panic_report_block);
 	}
 
 	/*
-	 * Always register the panic notifier because we need to unload
-	 * the VMbus channel connection to prevent any VMbus
-	 * activity after the VM panics.
+	 * Always register the vmbus unload panic notifier because we
+	 * need to shut the VMbus channel connection on panic.
 	 */
 	atomic_notifier_chain_register(&panic_notifier_list,
-			       &hyperv_panic_block);
+			       &hyperv_panic_vmbus_unload_block);
 
 	vmbus_request_offers();
 
@@ -2817,15 +2842,17 @@ static void __exit vmbus_exit(void)
 
 	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
 		kmsg_dump_unregister(&hv_kmsg_dumper);
-		unregister_die_notifier(&hyperv_die_block);
+		unregister_die_notifier(&hyperv_die_report_block);
+		atomic_notifier_chain_unregister(&panic_notifier_list,
+						&hyperv_panic_report_block);
 	}
 
 	/*
-	 * The panic notifier is always registered, hence we should
+	 * The vmbus panic notifier is always registered, hence we should
 	 * also unconditionally unregister it here as well.
 	 */
 	atomic_notifier_chain_unregister(&panic_notifier_list,
-					 &hyperv_panic_block);
+					&hyperv_panic_vmbus_unload_block);
 
 	free_page((unsigned long)hv_panic_page);
 	unregister_sysctl_table(hv_ctl_table_hdr);
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index c8e0ea27caf1..f3494b868a64 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -1244,7 +1244,15 @@ static int hvfb_probe(struct hv_device *hdev,
 	par->fb_ready = true;
 
 	par->synchronous_fb = false;
+
+	/*
+	 * We need to be sure this panic notifier runs _before_ the
+	 * vmbus disconnect, so order it by priority. It must execute
+	 * before the function hv_panic_vmbus_unload() [drivers/hv/vmbus_drv.c],
+	 * which is almost at the end of list, with priority = INT_MIN + 1.
+	 */
 	par->hvfb_panic_nb.notifier_call = hvfb_on_panic;
+	par->hvfb_panic_nb.priority = INT_MIN + 10,
 	atomic_notifier_chain_register(&panic_notifier_list,
 				       &par->hvfb_panic_nb);
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 17/30] tracing: Improve panic/die notifiers
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (15 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-29  9:22   ` Sergei Shtylyov
  2022-05-11 11:45   ` Petr Mladek
  2022-04-27 22:49 ` [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set Guilherme G. Piccoli
                   ` (12 subsequent siblings)
  29 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

Currently the tracing dump_on_oops feature is implemented
through separate notifiers, one for die/oops and the other
for panic. With the addition of panic notifier "id", this
patch makes use of such "id" to unify both functions.

It also comments the function and changes the priority of the
notifier blocks, in order they run early compared to other
notifiers, to prevent useless trace data (like the callback
names for the other notifiers). Finally, we also removed an
unnecessary header inclusion.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 kernel/trace/trace.c | 57 +++++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f4de111fa18f..c1d8a3622ccc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -19,7 +19,6 @@
 #include <linux/kallsyms.h>
 #include <linux/security.h>
 #include <linux/seq_file.h>
-#include <linux/notifier.h>
 #include <linux/irqflags.h>
 #include <linux/debugfs.h>
 #include <linux/tracefs.h>
@@ -9767,38 +9766,46 @@ static __init int tracer_init_tracefs(void)
 
 fs_initcall(tracer_init_tracefs);
 
-static int trace_panic_handler(struct notifier_block *this,
-			       unsigned long event, void *unused)
+/*
+ * The idea is to execute the following die/panic callback early, in order
+ * to avoid showing irrelevant information in the trace (like other panic
+ * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
+ * warnings get disabled (to prevent potential log flooding).
+ */
+static int trace_die_panic_handler(struct notifier_block *self,
+				unsigned long ev, void *unused)
 {
-	if (ftrace_dump_on_oops)
+	int do_dump;
+
+	if (!ftrace_dump_on_oops)
+		return NOTIFY_DONE;
+
+	switch (ev) {
+	case DIE_OOPS:
+		do_dump = 1;
+		break;
+	case PANIC_NOTIFIER:
+		do_dump = 1;
+		break;
+	default:
+		do_dump = 0;
+		break;
+	}
+
+	if (do_dump)
 		ftrace_dump(ftrace_dump_on_oops);
-	return NOTIFY_OK;
+
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block trace_panic_notifier = {
-	.notifier_call  = trace_panic_handler,
-	.next           = NULL,
-	.priority       = 150   /* priority: INT_MAX >= x >= 0 */
+	.notifier_call = trace_die_panic_handler,
+	.priority = INT_MAX - 1,
 };
 
-static int trace_die_handler(struct notifier_block *self,
-			     unsigned long val,
-			     void *data)
-{
-	switch (val) {
-	case DIE_OOPS:
-		if (ftrace_dump_on_oops)
-			ftrace_dump(ftrace_dump_on_oops);
-		break;
-	default:
-		break;
-	}
-	return NOTIFY_OK;
-}
-
 static struct notifier_block trace_die_notifier = {
-	.notifier_call = trace_die_handler,
-	.priority = 200
+	.notifier_call = trace_die_panic_handler,
+	.priority = INT_MAX - 1,
 };
 
 /*
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (16 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 17/30] tracing: Improve panic/die notifiers Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-28  1:01   ` Xiaoming Ni
  2022-04-29 16:27   ` Michael Kelley (LINUX)
  2022-04-27 22:49 ` [PATCH 19/30] panic: Add the panic hypervisor notifier list Guilherme G. Piccoli
                   ` (11 subsequent siblings)
  29 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Arjan van de Ven, Cong Wang, Sebastian Andrzej Siewior,
	Valentin Schneider, Xiaoming Ni

Currently we have a debug infrastructure in the notifiers file, but
it's very simple/limited. This patch extends it by:

(a) Showing all registered/unregistered notifiers' callback names;

(b) Adding a dynamic debug tuning to allow showing called notifiers'
function names. Notice that this should be guarded as a tunable since
it can flood the kernel log buffer.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

We have some design decisions that worth discussing here:

(a) First of call, using C99 helps a lot to write clear and concise code, but
due to commit 4d94f910e79a ("Kbuild: use -Wdeclaration-after-statement") we
have a warning if mixing variable declarations with code. For this patch though,
doing that makes the code way clear, so decision was to add the debug code
inside brackets whenever this warning pops up. We can change that, but that'll
cause more ifdefs in the same function.

(b) In the symbol lookup helper function, we modify the parameter passed but
even more, we return it as well! This is unusual and seems unnecessary, but was
the strategy taken to allow embedding such function in the pr_debug() call.

Not doing that would likely requiring 3 symbol_name variables to avoid
concurrency (registering notifier A while calling notifier B) - we rely in
local variables as a serialization mechanism.

We're open for suggestions in case this design is not appropriate;
thanks in advance!

 kernel/notifier.c | 48 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index ba005ebf4730..21032ebcde57 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -7,6 +7,22 @@
 #include <linux/vmalloc.h>
 #include <linux/reboot.h>
 
+#ifdef CONFIG_DEBUG_NOTIFIERS
+#include <linux/kallsyms.h>
+
+/*
+ *	Helper to get symbol names in case DEBUG_NOTIFIERS is set.
+ *	Return the modified parameter is a strategy used to achieve
+ *	the pr_debug() functionality - with this, function is only
+ *	executed if the dynamic debug tuning is effectively set.
+ */
+static inline char *notifier_name(struct notifier_block *nb, char *sym_name)
+{
+	lookup_symbol_name((unsigned long)(nb->notifier_call), sym_name);
+	return sym_name;
+}
+#endif
+
 /*
  *	Notifier list for kernel code which wants to be called
  *	at shutdown. This is used to stop any idling DMA operations
@@ -34,20 +50,41 @@ static int notifier_chain_register(struct notifier_block **nl,
 	}
 	n->next = *nl;
 	rcu_assign_pointer(*nl, n);
+
+#ifdef CONFIG_DEBUG_NOTIFIERS
+	{
+		char sym_name[KSYM_NAME_LEN];
+
+		pr_info("notifiers: registered %s()\n",
+			notifier_name(n, sym_name));
+	}
+#endif
 	return 0;
 }
 
 static int notifier_chain_unregister(struct notifier_block **nl,
 		struct notifier_block *n)
 {
+	int ret = -ENOENT;
+
 	while ((*nl) != NULL) {
 		if ((*nl) == n) {
 			rcu_assign_pointer(*nl, n->next);
-			return 0;
+			ret = 0;
+			break;
 		}
 		nl = &((*nl)->next);
 	}
-	return -ENOENT;
+
+#ifdef CONFIG_DEBUG_NOTIFIERS
+	if (!ret) {
+		char sym_name[KSYM_NAME_LEN];
+
+		pr_info("notifiers: unregistered %s()\n",
+			notifier_name(n, sym_name));
+	}
+#endif
+	return ret;
 }
 
 /**
@@ -80,6 +117,13 @@ static int notifier_call_chain(struct notifier_block **nl,
 			nb = next_nb;
 			continue;
 		}
+
+		{
+			char sym_name[KSYM_NAME_LEN];
+
+			pr_debug("notifiers: calling %s()\n",
+				 notifier_name(nb, sym_name));
+		}
 #endif
 		ret = nb->notifier_call(nb, val, v);
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (17 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-29 17:30   ` Michael Kelley (LINUX)
  2022-05-16 14:01   ` Petr Mladek
  2022-04-27 22:49 ` [PATCH 20/30] panic: Add the panic informational " Guilherme G. Piccoli
                   ` (10 subsequent siblings)
  29 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David Gow, David S. Miller, Dexuan Cui,
	Doug Berger, Evan Green, Florian Fainelli, Haiyang Zhang,
	Hari Bathini, Heiko Carstens, Julius Werner, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Scott Branden, Sebastian Reichel, Shile Zhang, Stephen Hemminger,
	Sven Schnelle, Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik,
	Wang ShaoBo, Wei Liu, zhenwei pi

The goal of this new panic notifier is to allow its users to register
callbacks to run very early in the panic path. This aims hypervisor/FW
notification mechanisms as well as simple LED functions, and any other
simple and safe mechanism that should run early in the panic path; more
dangerous callbacks should execute later.

For now, the patch is almost a no-op (although it changes a bit the
ordering in which some panic notifiers are executed). In a subsequent
patch, the panic path will be refactored, then the panic hypervisor
notifiers will effectively run very early in the panic path.

We also defer documenting it all properly in the subsequent refactor
patch. While at it, we removed some useless header inclusions and
fixed some notifiers return too (by using the standard NOTIFY_DONE).

Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: David Gow <davidgow@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Evan Green <evgreen@chromium.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Julius Werner <jwerner@chromium.org>
Cc: Justin Chen <justinpopo6@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Markus Mayer <mmayer@broadcom.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Mihai Carabas <mihai.carabas@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Scott Branden <scott.branden@broadcom.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/mips/sgi-ip22/ip22-reset.c          | 2 +-
 arch/mips/sgi-ip32/ip32-reset.c          | 3 +--
 arch/powerpc/kernel/setup-common.c       | 2 +-
 arch/sparc/kernel/sstate.c               | 3 +--
 drivers/firmware/google/gsmi.c           | 4 ++--
 drivers/hv/vmbus_drv.c                   | 4 ++--
 drivers/leds/trigger/ledtrig-activity.c  | 4 ++--
 drivers/leds/trigger/ledtrig-heartbeat.c | 4 ++--
 drivers/misc/bcm-vk/bcm_vk_dev.c         | 6 +++---
 drivers/misc/pvpanic/pvpanic.c           | 4 ++--
 drivers/power/reset/ltc2952-poweroff.c   | 4 ++--
 drivers/s390/char/zcore.c                | 5 +++--
 drivers/soc/bcm/brcmstb/pm/pm-arm.c      | 2 +-
 include/linux/panic_notifier.h           | 1 +
 kernel/panic.c                           | 4 ++++
 15 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/arch/mips/sgi-ip22/ip22-reset.c b/arch/mips/sgi-ip22/ip22-reset.c
index 8f0861c58080..3023848acbf1 100644
--- a/arch/mips/sgi-ip22/ip22-reset.c
+++ b/arch/mips/sgi-ip22/ip22-reset.c
@@ -195,7 +195,7 @@ static int __init reboot_setup(void)
 	}
 
 	timer_setup(&blink_timer, blink_timeout, 0);
-	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
+	atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
 
 	return 0;
 }
diff --git a/arch/mips/sgi-ip32/ip32-reset.c b/arch/mips/sgi-ip32/ip32-reset.c
index 18d1c115cd53..9ee1302c9d13 100644
--- a/arch/mips/sgi-ip32/ip32-reset.c
+++ b/arch/mips/sgi-ip32/ip32-reset.c
@@ -15,7 +15,6 @@
 #include <linux/panic_notifier.h>
 #include <linux/sched.h>
 #include <linux/sched/signal.h>
-#include <linux/notifier.h>
 #include <linux/delay.h>
 #include <linux/rtc/ds1685.h>
 #include <linux/interrupt.h>
@@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
 	pm_power_off = ip32_machine_halt;
 
 	timer_setup(&blink_timer, blink_timeout, 0);
-	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
+	atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
 
 	return 0;
 }
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 52f96b209a96..1468c3937bf4 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -753,7 +753,7 @@ static struct notifier_block ppc_panic_block = {
 void __init setup_panic(void)
 {
 	/* Hard-disables IRQs + deal with FW-assisted dump (fadump) */
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_hypervisor_list,
 				       &ppc_fadump_block);
 
 	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
diff --git a/arch/sparc/kernel/sstate.c b/arch/sparc/kernel/sstate.c
index 3bcc4ddc6911..82b7b68e0bdc 100644
--- a/arch/sparc/kernel/sstate.c
+++ b/arch/sparc/kernel/sstate.c
@@ -5,7 +5,6 @@
  */
 
 #include <linux/kernel.h>
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/reboot.h>
 #include <linux/init.h>
@@ -106,7 +105,7 @@ static int __init sstate_init(void)
 
 	do_set_sstate(HV_SOFT_STATE_TRANSITION, booting_msg);
 
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_hypervisor_list,
 				       &sstate_panic_block);
 	register_reboot_notifier(&sstate_reboot_notifier);
 
diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index b01ed02e4a87..ff0bebe2f444 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
 
 	register_reboot_notifier(&gsmi_reboot_notifier);
 	register_die_notifier(&gsmi_die_notifier);
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_hypervisor_list,
 				       &gsmi_panic_notifier);
 
 	printk(KERN_INFO "gsmi version " DRIVER_VERSION " loaded\n");
@@ -1061,7 +1061,7 @@ static void __exit gsmi_exit(void)
 {
 	unregister_reboot_notifier(&gsmi_reboot_notifier);
 	unregister_die_notifier(&gsmi_die_notifier);
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_hypervisor_list,
 					 &gsmi_panic_notifier);
 #ifdef CONFIG_EFI
 	efivars_unregister(&efivars);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index f37f12d48001..901b97034308 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1614,7 +1614,7 @@ static int vmbus_bus_init(void)
 			hv_kmsg_dump_register();
 
 		register_die_notifier(&hyperv_die_report_block);
-		atomic_notifier_chain_register(&panic_notifier_list,
+		atomic_notifier_chain_register(&panic_hypervisor_list,
 						&hyperv_panic_report_block);
 	}
 
@@ -2843,7 +2843,7 @@ static void __exit vmbus_exit(void)
 	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
 		kmsg_dump_unregister(&hv_kmsg_dumper);
 		unregister_die_notifier(&hyperv_die_report_block);
-		atomic_notifier_chain_unregister(&panic_notifier_list,
+		atomic_notifier_chain_unregister(&panic_hypervisor_list,
 						&hyperv_panic_report_block);
 	}
 
diff --git a/drivers/leds/trigger/ledtrig-activity.c b/drivers/leds/trigger/ledtrig-activity.c
index 30bc9df03636..bbbcf3bc17e3 100644
--- a/drivers/leds/trigger/ledtrig-activity.c
+++ b/drivers/leds/trigger/ledtrig-activity.c
@@ -247,7 +247,7 @@ static int __init activity_init(void)
 	int rc = led_trigger_register(&activity_led_trigger);
 
 	if (!rc) {
-		atomic_notifier_chain_register(&panic_notifier_list,
+		atomic_notifier_chain_register(&panic_hypervisor_list,
 					       &activity_panic_nb);
 		register_reboot_notifier(&activity_reboot_nb);
 	}
@@ -257,7 +257,7 @@ static int __init activity_init(void)
 static void __exit activity_exit(void)
 {
 	unregister_reboot_notifier(&activity_reboot_nb);
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_hypervisor_list,
 					 &activity_panic_nb);
 	led_trigger_unregister(&activity_led_trigger);
 }
diff --git a/drivers/leds/trigger/ledtrig-heartbeat.c b/drivers/leds/trigger/ledtrig-heartbeat.c
index 7fe0a05574d2..a1ed25e83c8f 100644
--- a/drivers/leds/trigger/ledtrig-heartbeat.c
+++ b/drivers/leds/trigger/ledtrig-heartbeat.c
@@ -190,7 +190,7 @@ static int __init heartbeat_trig_init(void)
 	int rc = led_trigger_register(&heartbeat_led_trigger);
 
 	if (!rc) {
-		atomic_notifier_chain_register(&panic_notifier_list,
+		atomic_notifier_chain_register(&panic_hypervisor_list,
 					       &heartbeat_panic_nb);
 		register_reboot_notifier(&heartbeat_reboot_nb);
 	}
@@ -200,7 +200,7 @@ static int __init heartbeat_trig_init(void)
 static void __exit heartbeat_trig_exit(void)
 {
 	unregister_reboot_notifier(&heartbeat_reboot_nb);
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_hypervisor_list,
 					 &heartbeat_panic_nb);
 	led_trigger_unregister(&heartbeat_led_trigger);
 }
diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c b/drivers/misc/bcm-vk/bcm_vk_dev.c
index a16b99bdaa13..d9d5199cdb2b 100644
--- a/drivers/misc/bcm-vk/bcm_vk_dev.c
+++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
@@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	/* register for panic notifier */
 	vk->panic_nb.notifier_call = bcm_vk_on_panic;
-	err = atomic_notifier_chain_register(&panic_notifier_list,
+	err = atomic_notifier_chain_register(&panic_hypervisor_list,
 					     &vk->panic_nb);
 	if (err) {
 		dev_err(dev, "Fail to register panic notifier\n");
@@ -1486,7 +1486,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	bcm_vk_tty_exit(vk);
 
 err_unregister_panic_notifier:
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_hypervisor_list,
 					 &vk->panic_nb);
 
 err_destroy_workqueue:
@@ -1559,7 +1559,7 @@ static void bcm_vk_remove(struct pci_dev *pdev)
 	usleep_range(BCM_VK_UCODE_BOOT_US, BCM_VK_UCODE_BOOT_MAX_US);
 
 	/* unregister panic notifier */
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_hypervisor_list,
 					 &vk->panic_nb);
 
 	bcm_vk_msg_remove(vk);
diff --git a/drivers/misc/pvpanic/pvpanic.c b/drivers/misc/pvpanic/pvpanic.c
index 049a12006348..233a71d89477 100644
--- a/drivers/misc/pvpanic/pvpanic.c
+++ b/drivers/misc/pvpanic/pvpanic.c
@@ -101,7 +101,7 @@ static int pvpanic_init(void)
 	INIT_LIST_HEAD(&pvpanic_list);
 	spin_lock_init(&pvpanic_lock);
 
-	atomic_notifier_chain_register(&panic_notifier_list, &pvpanic_panic_nb);
+	atomic_notifier_chain_register(&panic_hypervisor_list, &pvpanic_panic_nb);
 
 	return 0;
 }
@@ -109,7 +109,7 @@ module_init(pvpanic_init);
 
 static void pvpanic_exit(void)
 {
-	atomic_notifier_chain_unregister(&panic_notifier_list, &pvpanic_panic_nb);
+	atomic_notifier_chain_unregister(&panic_hypervisor_list, &pvpanic_panic_nb);
 
 }
 module_exit(pvpanic_exit);
diff --git a/drivers/power/reset/ltc2952-poweroff.c b/drivers/power/reset/ltc2952-poweroff.c
index 65d9528cc989..fb5078ba3a69 100644
--- a/drivers/power/reset/ltc2952-poweroff.c
+++ b/drivers/power/reset/ltc2952-poweroff.c
@@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct platform_device *pdev)
 	pm_power_off = ltc2952_poweroff_kill;
 
 	data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_hypervisor_list,
 				       &data->panic_notifier);
 	dev_info(&pdev->dev, "probe successful\n");
 
@@ -293,7 +293,7 @@ static int ltc2952_poweroff_remove(struct platform_device *pdev)
 	pm_power_off = NULL;
 	hrtimer_cancel(&data->timer_trigger);
 	hrtimer_cancel(&data->timer_wde);
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_hypervisor_list,
 					 &data->panic_notifier);
 	return 0;
 }
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index 516783ba950f..768a8a3a9046 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -246,7 +246,7 @@ static int zcore_reboot_and_on_panic_handler(struct notifier_block *self,
 	if (hsa_available)
 		release_hsa();
 
-	return NOTIFY_OK;
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block zcore_reboot_notifier = {
@@ -322,7 +322,8 @@ static int __init zcore_init(void)
 					     NULL, &zcore_hsa_fops);
 
 	register_reboot_notifier(&zcore_reboot_notifier);
-	atomic_notifier_chain_register(&panic_notifier_list, &zcore_on_panic_notifier);
+	atomic_notifier_chain_register(&panic_hypervisor_list,
+				       &zcore_on_panic_notifier);
 
 	return 0;
 fail:
diff --git a/drivers/soc/bcm/brcmstb/pm/pm-arm.c b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
index 870686ae042b..babca66c7862 100644
--- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
+++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
@@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device *pdev)
 		goto out;
 	}
 
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_hypervisor_list,
 				       &brcmstb_pm_panic_nb);
 
 	pm_power_off = brcmstb_pm_poweroff;
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 07dced83a783..0bb9dc0dea04 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 
 extern struct atomic_notifier_head panic_notifier_list;
+extern struct atomic_notifier_head panic_hypervisor_list;
 
 extern bool crash_kexec_post_notifiers;
 
diff --git a/kernel/panic.c b/kernel/panic.c
index 523bc9ccd0e9..ef76f3f9c44d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -73,6 +73,9 @@ ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
 
 EXPORT_SYMBOL(panic_notifier_list);
 
+ATOMIC_NOTIFIER_HEAD(panic_hypervisor_list);
+EXPORT_SYMBOL(panic_hypervisor_list);
+
 static long no_blink(int state)
 {
 	return 0;
@@ -287,6 +290,7 @@ void panic(const char *fmt, ...)
 	 * Run any panic handlers, including those that might need to
 	 * add information to the kmsg dump output.
 	 */
+	atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
 	atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);
 
 	panic_print_sys_info(false);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 20/30] panic: Add the panic informational notifier list
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (18 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 19/30] panic: Add the panic hypervisor notifier list Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-27 23:49   ` Paul E. McKenney
                     ` (2 more replies)
  2022-04-27 22:49 ` [PATCH 21/30] panic: Introduce the panic pre-reboot " Guilherme G. Piccoli
                   ` (9 subsequent siblings)
  29 siblings, 3 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Catalin Marinas, Florian Fainelli,
	Frederic Weisbecker, H. Peter Anvin, Hari Bathini,
	Joel Fernandes, Jonathan Hunter, Josh Triplett, Lai Jiangshan,
	Leo Yan, Mathieu Desnoyers, Mathieu Poirier, Michael Ellerman,
	Mike Leach, Mikko Perttunen, Neeraj Upadhyay, Nicholas Piggin,
	Paul Mackerras, Suzuki K Poulose, Thierry Reding,
	Thomas Bogendoerfer

The goal of this new panic notifier is to allow its users to
register callbacks to run earlier in the panic path than they
currently do. This aims at informational mechanisms, like dumping
kernel offsets and showing device error data (in case it's simple
registers reading, for example) as well as mechanisms to disable
log flooding (like hung_task detector / RCU warnings) and the
tracing dump_on_oops (when enabled).

Any (non-invasive) information that should be provided before
kmsg_dump() as well as log flooding preventing code should fit
here, as long it offers relatively low risk for kdump.

For now, the patch is almost a no-op, although it changes a bit
the ordering in which some panic notifiers are executed - specially
affected by this are the notifiers responsible for disabling the
hung_task detector / RCU warnings, which now run first. In a
subsequent patch, the panic path will be refactored, then the
panic informational notifiers will effectively run earlier,
before ksmg_dump() (and usually before kdump as well).

We also defer documenting it all properly in the subsequent
refactor patch. Finally, while at it, we removed some useless
header inclusions too.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mikko Perttunen <mperttunen@nvidia.com>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/arm64/kernel/setup.c                         | 2 +-
 arch/mips/kernel/relocate.c                       | 2 +-
 arch/powerpc/kernel/setup-common.c                | 2 +-
 arch/x86/kernel/setup.c                           | 2 +-
 drivers/bus/brcmstb_gisb.c                        | 2 +-
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 4 ++--
 drivers/soc/tegra/ari-tegra186.c                  | 3 ++-
 include/linux/panic_notifier.h                    | 1 +
 kernel/hung_task.c                                | 3 ++-
 kernel/panic.c                                    | 4 ++++
 kernel/rcu/tree.c                                 | 1 -
 kernel/rcu/tree_stall.h                           | 3 ++-
 kernel/trace/trace.c                              | 2 +-
 13 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 3505789cf4bd..ac2c7e8c9c6a 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -444,7 +444,7 @@ static struct notifier_block arm64_panic_block = {
 
 static int __init register_arm64_panic_block(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_info_list,
 				       &arm64_panic_block);
 	return 0;
 }
diff --git a/arch/mips/kernel/relocate.c b/arch/mips/kernel/relocate.c
index 56b51de2dc51..650811f2436a 100644
--- a/arch/mips/kernel/relocate.c
+++ b/arch/mips/kernel/relocate.c
@@ -459,7 +459,7 @@ static struct notifier_block kernel_location_notifier = {
 
 static int __init register_kernel_offset_dumper(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_info_list,
 				       &kernel_location_notifier);
 	return 0;
 }
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 1468c3937bf4..d04b8bf8dbc7 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -757,7 +757,7 @@ void __init setup_panic(void)
 				       &ppc_fadump_block);
 
 	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
-		atomic_notifier_chain_register(&panic_notifier_list,
+		atomic_notifier_chain_register(&panic_info_list,
 					       &kernel_offset_notifier);
 
 	/* Low-level platform-specific routines that should run on panic */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c95b9ac5a457..599b25346964 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1266,7 +1266,7 @@ static struct notifier_block kernel_offset_notifier = {
 
 static int __init register_kernel_offset_dumper(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_info_list,
 					&kernel_offset_notifier);
 	return 0;
 }
diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 1ea7b015e225..c64e087fba7a 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -486,7 +486,7 @@ static int __init brcmstb_gisb_arb_probe(struct platform_device *pdev)
 
 	if (list_is_singular(&brcmstb_gisb_arb_device_list)) {
 		register_die_notifier(&gisb_die_notifier);
-		atomic_notifier_chain_register(&panic_notifier_list,
+		atomic_notifier_chain_register(&panic_info_list,
 					       &gisb_panic_notifier);
 	}
 
diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 1874df7c6a73..7b1012454525 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -535,7 +535,7 @@ static int debug_func_init(void)
 			    &debug_func_knob_fops);
 
 	/* Register function to be called for panic */
-	ret = atomic_notifier_chain_register(&panic_notifier_list,
+	ret = atomic_notifier_chain_register(&panic_info_list,
 					     &debug_notifier);
 	if (ret) {
 		pr_err("%s: unable to register notifier: %d\n",
@@ -552,7 +552,7 @@ static int debug_func_init(void)
 
 static void debug_func_exit(void)
 {
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_info_list,
 					 &debug_notifier);
 	debugfs_remove_recursive(debug_debugfs_dir);
 }
diff --git a/drivers/soc/tegra/ari-tegra186.c b/drivers/soc/tegra/ari-tegra186.c
index 02577853ec49..4ef05ebed739 100644
--- a/drivers/soc/tegra/ari-tegra186.c
+++ b/drivers/soc/tegra/ari-tegra186.c
@@ -73,7 +73,8 @@ static struct notifier_block tegra186_ari_panic_nb = {
 static int __init tegra186_ari_init(void)
 {
 	if (of_machine_is_compatible("nvidia,tegra186"))
-		atomic_notifier_chain_register(&panic_notifier_list, &tegra186_ari_panic_nb);
+		atomic_notifier_chain_register(&panic_info_list,
+					       &tegra186_ari_panic_nb);
 
 	return 0;
 }
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 0bb9dc0dea04..7364a346bcb0 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -7,6 +7,7 @@
 
 extern struct atomic_notifier_head panic_notifier_list;
 extern struct atomic_notifier_head panic_hypervisor_list;
+extern struct atomic_notifier_head panic_info_list;
 
 extern bool crash_kexec_post_notifiers;
 
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 52501e5f7655..1b2d7111d5ac 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -85,6 +85,7 @@ hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
 
 static struct notifier_block panic_block = {
 	.notifier_call = hung_task_panic,
+	.priority = INT_MAX, /* run early to prevent potential log flood */
 };
 
 static void check_hung_task(struct task_struct *t, unsigned long timeout)
@@ -378,7 +379,7 @@ static int watchdog(void *dummy)
 
 static int __init hung_task_init(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
+	atomic_notifier_chain_register(&panic_info_list, &panic_block);
 
 	/* Disable hung task detector on suspend */
 	pm_notifier(hungtask_pm_notify, 0);
diff --git a/kernel/panic.c b/kernel/panic.c
index ef76f3f9c44d..73ca1bc44e30 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -76,6 +76,9 @@ EXPORT_SYMBOL(panic_notifier_list);
 ATOMIC_NOTIFIER_HEAD(panic_hypervisor_list);
 EXPORT_SYMBOL(panic_hypervisor_list);
 
+ATOMIC_NOTIFIER_HEAD(panic_info_list);
+EXPORT_SYMBOL(panic_info_list);
+
 static long no_blink(int state)
 {
 	return 0;
@@ -291,6 +294,7 @@ void panic(const char *fmt, ...)
 	 * add information to the kmsg dump output.
 	 */
 	atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
+	atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
 	atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);
 
 	panic_print_sys_info(false);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4b8189455d5..d5a2674ae81c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -35,7 +35,6 @@
 #include <linux/panic.h>
 #include <linux/panic_notifier.h>
 #include <linux/percpu.h>
-#include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/mutex.h>
 #include <linux/time.h>
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 0c5d8516516a..d8a5840aad5d 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -97,11 +97,12 @@ static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
 
 static struct notifier_block rcu_panic_block = {
 	.notifier_call = rcu_panic,
+	.priority = INT_MAX, /* run early to prevent potential log flood */
 };
 
 static int __init check_cpu_stall_init(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list, &rcu_panic_block);
+	atomic_notifier_chain_register(&panic_info_list, &rcu_panic_block);
 	return 0;
 }
 early_initcall(check_cpu_stall_init);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c1d8a3622ccc..7d02f7a66bb1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10138,7 +10138,7 @@ __init static int tracer_alloc_buffers(void)
 	/* All seems OK, enable tracing */
 	tracing_disabled = 0;
 
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_info_list,
 				       &trace_panic_notifier);
 
 	register_die_notifier(&trace_die_notifier);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (19 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 20/30] panic: Add the panic informational " Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-28 14:13   ` Alex Elder
                     ` (3 more replies)
  2022-04-27 22:49 ` [PATCH 22/30] panic: Introduce the panic post-reboot " Guilherme G. Piccoli
                   ` (8 subsequent siblings)
  29 siblings, 4 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alex Elder, Alexander Gordeev, Anton Ivanov,
	Benjamin Herrenschmidt, Bjorn Andersson, Boris Ostrovsky,
	Chris Zankel, Christian Borntraeger, Corey Minyard, Dexuan Cui,
	H. Peter Anvin, Haiyang Zhang, Heiko Carstens, Helge Deller,
	Ivan Kokshaysky, James E.J. Bottomley, James Morse,
	Johannes Berg, K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Henderson,
	Richard Weinberger, Robert Richter, Stefano Stabellini,
	Stephen Hemminger, Sven Schnelle, Tony Luck, Vasily Gorbik,
	Wei Liu

This patch renames the panic_notifier_list to panic_pre_reboot_list;
the idea is that a subsequent patch will refactor the panic path
in order to better split the notifiers, running some of them very
early, some of them not so early [but still before kmsg_dump()] and
finally, the rest should execute late, after kdump. The latter ones
are now in the panic pre-reboot list - the name comes from the idea
that these notifiers execute before panic() attempts rebooting the
machine (if that option is set).

We also took the opportunity to clean-up useless header inclusions,
improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
and more important, change some priorities - we hereby set 2 notifiers
to run late in the list [iss_panic_event() and the IPMI panic_event()]
due to the risks they offer (may not return, for example).
Proper documentation is going to be provided in a subsequent patch,
that effectively refactors the panic path.

Cc: Alex Elder <elder@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: James Morse <james.morse@arm.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Robert Richter <rric@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

Notice that, with this name change, out-of-tree code that relies in the global
exported "panic_notifier_list" will fail to build. We could easily keep the
retro-compatibility by making the old symbol to still exist and point to the
pre_reboot list (or even, keep the old naming).

But our design choice was to allow the breakage, making users rethink their
notifiers, adding them in the list that fits best. If that wasn't a good
decision, we're open to change it, of course.
Thanks in advance for the review!

 arch/alpha/kernel/setup.c             |  4 ++--
 arch/parisc/kernel/pdc_chassis.c      |  3 +--
 arch/powerpc/kernel/setup-common.c    |  2 +-
 arch/s390/kernel/ipl.c                |  4 ++--
 arch/um/drivers/mconsole_kern.c       |  2 +-
 arch/um/kernel/um_arch.c              |  2 +-
 arch/x86/xen/enlighten.c              |  2 +-
 arch/xtensa/platforms/iss/setup.c     |  4 ++--
 drivers/char/ipmi/ipmi_msghandler.c   | 12 +++++++-----
 drivers/edac/altera_edac.c            |  3 +--
 drivers/hv/vmbus_drv.c                |  4 ++--
 drivers/leds/trigger/ledtrig-panic.c  |  3 +--
 drivers/misc/ibmasm/heartbeat.c       | 16 +++++++++-------
 drivers/net/ipa/ipa_smp2p.c           |  5 ++---
 drivers/parisc/power.c                |  4 ++--
 drivers/remoteproc/remoteproc_core.c  |  6 ++++--
 drivers/s390/char/con3215.c           |  2 +-
 drivers/s390/char/con3270.c           |  2 +-
 drivers/s390/char/sclp_con.c          |  2 +-
 drivers/s390/char/sclp_vt220.c        |  2 +-
 drivers/staging/olpc_dcon/olpc_dcon.c |  6 ++++--
 drivers/video/fbdev/hyperv_fb.c       |  4 ++--
 include/linux/panic_notifier.h        |  2 +-
 kernel/panic.c                        |  9 ++++-----
 24 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index d88bdf852753..8ace0d7113b6 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -472,8 +472,8 @@ setup_arch(char **cmdline_p)
 	}
 
 	/* Register a call for panic conditions. */
-	atomic_notifier_chain_register(&panic_notifier_list,
-			&alpha_panic_block);
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
+					&alpha_panic_block);
 
 #ifndef alpha_using_srm
 	/* Assume that we've booted from SRM if we haven't booted from MILO.
diff --git a/arch/parisc/kernel/pdc_chassis.c b/arch/parisc/kernel/pdc_chassis.c
index da154406d368..0fd8d87fb4f9 100644
--- a/arch/parisc/kernel/pdc_chassis.c
+++ b/arch/parisc/kernel/pdc_chassis.c
@@ -22,7 +22,6 @@
 #include <linux/kernel.h>
 #include <linux/panic_notifier.h>
 #include <linux/reboot.h>
-#include <linux/notifier.h>
 #include <linux/cache.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -135,7 +134,7 @@ void __init parisc_pdc_chassis_init(void)
 				PDC_CHASSIS_VER);
 
 		/* initialize panic notifier chain */
-		atomic_notifier_chain_register(&panic_notifier_list,
+		atomic_notifier_chain_register(&panic_pre_reboot_list,
 				&pdc_chassis_panic_block);
 
 		/* initialize reboot notifier chain */
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index d04b8bf8dbc7..3518bebc10ad 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -762,7 +762,7 @@ void __init setup_panic(void)
 
 	/* Low-level platform-specific routines that should run on panic */
 	if (ppc_md.panic)
-		atomic_notifier_chain_register(&panic_notifier_list,
+		atomic_notifier_chain_register(&panic_pre_reboot_list,
 					       &ppc_panic_block);
 }
 
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 1cc85b8ff42e..4a88c5bb6e15 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -2034,7 +2034,7 @@ static int on_panic_notify(struct notifier_block *self,
 			   unsigned long event, void *data)
 {
 	do_panic();
-	return NOTIFY_OK;
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block on_panic_nb = {
@@ -2069,7 +2069,7 @@ void __init setup_ipl(void)
 		/* We have no info to copy */
 		break;
 	}
-	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
 }
 
 void s390_reset_system(void)
diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 2ea0421bcc3f..21c13b4e24a3 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -855,7 +855,7 @@ static struct notifier_block panic_exit_notifier = {
 
 static int add_notifier(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
 			&panic_exit_notifier);
 	return 0;
 }
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 4485b1a7c8e4..fc6e443299da 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -257,7 +257,7 @@ static struct notifier_block panic_exit_notifier = {
 
 void uml_finishsetup(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
 				       &panic_exit_notifier);
 
 	uml_postsetup();
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 30c6e986a6cd..d4f4de239a21 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -290,7 +290,7 @@ static struct notifier_block xen_panic_block = {
 
 int xen_panic_handler_init(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list, &xen_panic_block);
+	atomic_notifier_chain_register(&panic_pre_reboot_list, &xen_panic_block);
 	return 0;
 }
 
diff --git a/arch/xtensa/platforms/iss/setup.c b/arch/xtensa/platforms/iss/setup.c
index d3433e1bb94e..eeeeb6cff6bd 100644
--- a/arch/xtensa/platforms/iss/setup.c
+++ b/arch/xtensa/platforms/iss/setup.c
@@ -13,7 +13,6 @@
  */
 #include <linux/init.h>
 #include <linux/kernel.h>
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/printk.h>
 #include <linux/string.h>
@@ -53,6 +52,7 @@ iss_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
 
 static struct notifier_block iss_panic_block = {
 	.notifier_call = iss_panic_event,
+	.priority = INT_MIN, /* run as late as possible, may not return */
 };
 
 void __init platform_setup(char **p_cmdline)
@@ -81,5 +81,5 @@ void __init platform_setup(char **p_cmdline)
 		}
 	}
 
-	atomic_notifier_chain_register(&panic_notifier_list, &iss_panic_block);
+	atomic_notifier_chain_register(&panic_pre_reboot_list, &iss_panic_block);
 }
diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index c59265146e9c..6c4770949c01 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -25,7 +25,6 @@
 #include <linux/slab.h>
 #include <linux/ipmi.h>
 #include <linux/ipmi_smi.h>
-#include <linux/notifier.h>
 #include <linux/init.h>
 #include <linux/proc_fs.h>
 #include <linux/rcupdate.h>
@@ -5375,10 +5374,13 @@ static int ipmi_register_driver(void)
 	return rv;
 }
 
+/*
+ * we should execute this panic callback late, since it involves
+ * a complex call-chain and panic() runs in atomic context.
+ */
 static struct notifier_block panic_block = {
 	.notifier_call	= panic_event,
-	.next		= NULL,
-	.priority	= 200	/* priority: INT_MAX >= x >= 0 */
+	.priority	= INT_MIN + 1,
 };
 
 static int ipmi_init_msghandler(void)
@@ -5406,7 +5408,7 @@ static int ipmi_init_msghandler(void)
 	timer_setup(&ipmi_timer, ipmi_timeout, 0);
 	mod_timer(&ipmi_timer, jiffies + IPMI_TIMEOUT_JIFFIES);
 
-	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
+	atomic_notifier_chain_register(&panic_pre_reboot_list, &panic_block);
 
 	initialized = true;
 
@@ -5438,7 +5440,7 @@ static void __exit cleanup_ipmi(void)
 	if (initialized) {
 		destroy_workqueue(remove_work_wq);
 
-		atomic_notifier_chain_unregister(&panic_notifier_list,
+		atomic_notifier_chain_unregister(&panic_pre_reboot_list,
 						 &panic_block);
 
 		/*
diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index e7e8e624a436..4890e9cba6fb 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -16,7 +16,6 @@
 #include <linux/kernel.h>
 #include <linux/mfd/altera-sysmgr.h>
 #include <linux/mfd/syscon.h>
-#include <linux/notifier.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
 #include <linux/of_platform.h>
@@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
 		int dberror, err_addr;
 
 		edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
-		atomic_notifier_chain_register(&panic_notifier_list,
+		atomic_notifier_chain_register(&panic_pre_reboot_list,
 					       &edac->panic_notifier);
 
 		/* Printout a message if uncorrectable error previously. */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 901b97034308..3717c323aa36 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1622,7 +1622,7 @@ static int vmbus_bus_init(void)
 	 * Always register the vmbus unload panic notifier because we
 	 * need to shut the VMbus channel connection on panic.
 	 */
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
 			       &hyperv_panic_vmbus_unload_block);
 
 	vmbus_request_offers();
@@ -2851,7 +2851,7 @@ static void __exit vmbus_exit(void)
 	 * The vmbus panic notifier is always registered, hence we should
 	 * also unconditionally unregister it here as well.
 	 */
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
 					&hyperv_panic_vmbus_unload_block);
 
 	free_page((unsigned long)hv_panic_page);
diff --git a/drivers/leds/trigger/ledtrig-panic.c b/drivers/leds/trigger/ledtrig-panic.c
index 64abf2e91608..34fd5170723f 100644
--- a/drivers/leds/trigger/ledtrig-panic.c
+++ b/drivers/leds/trigger/ledtrig-panic.c
@@ -7,7 +7,6 @@
 
 #include <linux/kernel.h>
 #include <linux/init.h>
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/leds.h>
 #include "../leds.h"
@@ -64,7 +63,7 @@ static long led_panic_blink(int state)
 
 static int __init ledtrig_panic_init(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
 				       &led_trigger_panic_nb);
 
 	led_trigger_register_simple("panic", &trigger);
diff --git a/drivers/misc/ibmasm/heartbeat.c b/drivers/misc/ibmasm/heartbeat.c
index 59c9a0d95659..d6acae88b722 100644
--- a/drivers/misc/ibmasm/heartbeat.c
+++ b/drivers/misc/ibmasm/heartbeat.c
@@ -8,7 +8,6 @@
  * Author: Max Asböck <amax@us.ibm.com>
  */
 
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include "ibmasm.h"
 #include "dot_command.h"
@@ -24,7 +23,7 @@ static int suspend_heartbeats = 0;
  * In the case of a panic the interrupt handler continues to work and thus
  * continues to respond to heartbeats, making the service processor believe
  * the OS is still running and thus preventing a reboot.
- * To prevent this from happening a callback is added the panic_notifier_list.
+ * To prevent this from happening a callback is added in a panic notifier list.
  * Before responding to a heartbeat the driver checks if a panic has happened,
  * if yes it suspends heartbeat, causing the service processor to reboot as
  * expected.
@@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
 static int panic_happened(struct notifier_block *n, unsigned long val, void *v)
 {
 	suspend_heartbeats = 1;
-	return 0;
+	return NOTIFY_DONE;
 }
 
-static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
+static struct notifier_block panic_notifier = {
+	.notifier_call = panic_happened,
+};
 
 void ibmasm_register_panic_notifier(void)
 {
-	atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
+					&panic_notifier);
 }
 
 void ibmasm_unregister_panic_notifier(void)
 {
-	atomic_notifier_chain_unregister(&panic_notifier_list,
-			&panic_notifier);
+	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
+					&panic_notifier);
 }
 
 
diff --git a/drivers/net/ipa/ipa_smp2p.c b/drivers/net/ipa/ipa_smp2p.c
index 211233612039..92cdf6e0637c 100644
--- a/drivers/net/ipa/ipa_smp2p.c
+++ b/drivers/net/ipa/ipa_smp2p.c
@@ -7,7 +7,6 @@
 #include <linux/types.h>
 #include <linux/device.h>
 #include <linux/interrupt.h>
-#include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/pm_runtime.h>
 #include <linux/soc/qcom/smem.h>
@@ -138,13 +137,13 @@ static int ipa_smp2p_panic_notifier_register(struct ipa_smp2p *smp2p)
 	smp2p->panic_notifier.notifier_call = ipa_smp2p_panic_notifier;
 	smp2p->panic_notifier.priority = INT_MAX;	/* Do it early */
 
-	return atomic_notifier_chain_register(&panic_notifier_list,
+	return atomic_notifier_chain_register(&panic_pre_reboot_list,
 					      &smp2p->panic_notifier);
 }
 
 static void ipa_smp2p_panic_notifier_unregister(struct ipa_smp2p *smp2p)
 {
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
 					 &smp2p->panic_notifier);
 }
 
diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
index 8512884de2cf..5bb0868f5f08 100644
--- a/drivers/parisc/power.c
+++ b/drivers/parisc/power.c
@@ -233,7 +233,7 @@ static int __init power_init(void)
 	}
 
 	/* Register a call for panic conditions. */
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
 			&parisc_panic_block);
 
 	return 0;
@@ -243,7 +243,7 @@ static void __exit power_exit(void)
 {
 	kthread_stop(power_task);
 
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
 			&parisc_panic_block);
 
 	pdc_soft_power_button(0);
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c510125769b9..24799ff239e6 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -2795,12 +2795,14 @@ static int rproc_panic_handler(struct notifier_block *nb, unsigned long event,
 static void __init rproc_init_panic(void)
 {
 	rproc_panic_nb.notifier_call = rproc_panic_handler;
-	atomic_notifier_chain_register(&panic_notifier_list, &rproc_panic_nb);
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
+				       &rproc_panic_nb);
 }
 
 static void __exit rproc_exit_panic(void)
 {
-	atomic_notifier_chain_unregister(&panic_notifier_list, &rproc_panic_nb);
+	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
+					 &rproc_panic_nb);
 }
 
 static int __init remoteproc_init(void)
diff --git a/drivers/s390/char/con3215.c b/drivers/s390/char/con3215.c
index 192198bd3dc4..07379dd3f1f3 100644
--- a/drivers/s390/char/con3215.c
+++ b/drivers/s390/char/con3215.c
@@ -867,7 +867,7 @@ static int __init con3215_init(void)
 		raw3215[0] = NULL;
 		return -ENODEV;
 	}
-	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
 	register_reboot_notifier(&on_reboot_nb);
 	register_console(&con3215);
 	return 0;
diff --git a/drivers/s390/char/con3270.c b/drivers/s390/char/con3270.c
index 476202f3d8a0..e79bf3e7bde3 100644
--- a/drivers/s390/char/con3270.c
+++ b/drivers/s390/char/con3270.c
@@ -645,7 +645,7 @@ con3270_init(void)
 	condev->cline->len = 0;
 	con3270_create_status(condev);
 	condev->input = alloc_string(&condev->freemem, 80);
-	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
 	register_reboot_notifier(&on_reboot_nb);
 	register_console(&con3270);
 	return 0;
diff --git a/drivers/s390/char/sclp_con.c b/drivers/s390/char/sclp_con.c
index e5d947c763ea..7ca9d4c45d60 100644
--- a/drivers/s390/char/sclp_con.c
+++ b/drivers/s390/char/sclp_con.c
@@ -288,7 +288,7 @@ sclp_console_init(void)
 	timer_setup(&sclp_con_timer, sclp_console_timeout, 0);
 
 	/* enable printk-access to this driver */
-	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
 	register_reboot_notifier(&on_reboot_nb);
 	register_console(&sclp_console);
 	return 0;
diff --git a/drivers/s390/char/sclp_vt220.c b/drivers/s390/char/sclp_vt220.c
index a32f34a1c6d2..97cf9e290c28 100644
--- a/drivers/s390/char/sclp_vt220.c
+++ b/drivers/s390/char/sclp_vt220.c
@@ -838,7 +838,7 @@ sclp_vt220_con_init(void)
 	if (rc)
 		return rc;
 	/* Attach linux console */
-	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
+	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
 	register_reboot_notifier(&on_reboot_nb);
 	register_console(&sclp_vt220_console);
 	return 0;
diff --git a/drivers/staging/olpc_dcon/olpc_dcon.c b/drivers/staging/olpc_dcon/olpc_dcon.c
index 7284cb4ac395..cb50471f2246 100644
--- a/drivers/staging/olpc_dcon/olpc_dcon.c
+++ b/drivers/staging/olpc_dcon/olpc_dcon.c
@@ -653,7 +653,8 @@ static int dcon_probe(struct i2c_client *client, const struct i2c_device_id *id)
 	}
 
 	register_reboot_notifier(&dcon->reboot_nb);
-	atomic_notifier_chain_register(&panic_notifier_list, &dcon_panic_nb);
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
+				       &dcon_panic_nb);
 
 	return 0;
 
@@ -676,7 +677,8 @@ static int dcon_remove(struct i2c_client *client)
 	struct dcon_priv *dcon = i2c_get_clientdata(client);
 
 	unregister_reboot_notifier(&dcon->reboot_nb);
-	atomic_notifier_chain_unregister(&panic_notifier_list, &dcon_panic_nb);
+	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
+					 &dcon_panic_nb);
 
 	free_irq(DCON_IRQ, dcon);
 
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index f3494b868a64..ec21e63592be 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -1253,7 +1253,7 @@ static int hvfb_probe(struct hv_device *hdev,
 	 */
 	par->hvfb_panic_nb.notifier_call = hvfb_on_panic;
 	par->hvfb_panic_nb.priority = INT_MIN + 10,
-	atomic_notifier_chain_register(&panic_notifier_list,
+	atomic_notifier_chain_register(&panic_pre_reboot_list,
 				       &par->hvfb_panic_nb);
 
 	return 0;
@@ -1276,7 +1276,7 @@ static int hvfb_remove(struct hv_device *hdev)
 	struct fb_info *info = hv_get_drvdata(hdev);
 	struct hvfb_par *par = info->par;
 
-	atomic_notifier_chain_unregister(&panic_notifier_list,
+	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
 					 &par->hvfb_panic_nb);
 
 	par->update = false;
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 7364a346bcb0..7912aacbc0e5 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -5,9 +5,9 @@
 #include <linux/notifier.h>
 #include <linux/types.h>
 
-extern struct atomic_notifier_head panic_notifier_list;
 extern struct atomic_notifier_head panic_hypervisor_list;
 extern struct atomic_notifier_head panic_info_list;
+extern struct atomic_notifier_head panic_pre_reboot_list;
 
 extern bool crash_kexec_post_notifiers;
 
diff --git a/kernel/panic.c b/kernel/panic.c
index 73ca1bc44e30..a9d43b98b05b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -69,16 +69,15 @@ EXPORT_SYMBOL_GPL(panic_timeout);
 #define PANIC_PRINT_ALL_CPU_BT		0x00000040
 unsigned long panic_print;
 
-ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
-
-EXPORT_SYMBOL(panic_notifier_list);
-
 ATOMIC_NOTIFIER_HEAD(panic_hypervisor_list);
 EXPORT_SYMBOL(panic_hypervisor_list);
 
 ATOMIC_NOTIFIER_HEAD(panic_info_list);
 EXPORT_SYMBOL(panic_info_list);
 
+ATOMIC_NOTIFIER_HEAD(panic_pre_reboot_list);
+EXPORT_SYMBOL(panic_pre_reboot_list);
+
 static long no_blink(int state)
 {
 	return 0;
@@ -295,7 +294,7 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
 	atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
-	atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);
+	atomic_notifier_call_chain(&panic_pre_reboot_list, PANIC_NOTIFIER, buf);
 
 	panic_print_sys_info(false);
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 22/30] panic: Introduce the panic post-reboot notifier list
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (20 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 21/30] panic: Introduce the panic pre-reboot " Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-09 14:16   ` Guilherme G. Piccoli
  2022-05-16 14:45   ` Petr Mladek
  2022-04-27 22:49 ` [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers Guilherme G. Piccoli
                   ` (7 subsequent siblings)
  29 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Christian Borntraeger, David S. Miller,
	Heiko Carstens, Sven Schnelle, Vasily Gorbik

Currently we have 3 notifier lists in the panic path, which will
be wired in a way to allow the notifier callbacks to run in
different moments at panic time, in a subsequent patch.

But there is also an odd set of architecture calls hardcoded in
the end of panic path, after the restart machinery. They're
responsible for late time tunings / events, like enabling a stop
button (Sparc) or effectively stopping the machine (s390).

This patch introduces yet another notifier list to offer the
architectures a way to add callbacks in such late moment on
panic path without the need of ifdefs / hardcoded approaches.

Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/s390/kernel/setup.c       | 19 ++++++++++++++++++-
 arch/sparc/kernel/setup_32.c   | 27 +++++++++++++++++++++++----
 arch/sparc/kernel/setup_64.c   | 29 ++++++++++++++++++++++++-----
 include/linux/panic_notifier.h |  1 +
 kernel/panic.c                 | 19 +++++++------------
 5 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index d860ac300919..d816b2045f1e 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -39,7 +39,6 @@
 #include <linux/kernel_stat.h>
 #include <linux/dma-map-ops.h>
 #include <linux/device.h>
-#include <linux/notifier.h>
 #include <linux/pfn.h>
 #include <linux/ctype.h>
 #include <linux/reboot.h>
@@ -51,6 +50,7 @@
 #include <linux/start_kernel.h>
 #include <linux/hugetlb.h>
 #include <linux/kmemleak.h>
+#include <linux/panic_notifier.h>
 
 #include <asm/boot_data.h>
 #include <asm/ipl.h>
@@ -943,6 +943,20 @@ static void __init log_component_list(void)
 	}
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void *unused)
+{
+	disabled_wait();
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+	.notifier_call = panic_event,
+};
+
 /*
  * Setup function called from init/main.c just after the banner
  * was printed.
@@ -1058,4 +1072,7 @@ void __init setup_arch(char **cmdline_p)
 
 	/* Add system specific data to the random pool */
 	setup_randomness();
+
+	atomic_notifier_chain_register(&panic_post_reboot_list,
+				       &post_reboot_panic_block);
 }
diff --git a/arch/sparc/kernel/setup_32.c b/arch/sparc/kernel/setup_32.c
index c8e0dd99f370..4e2428972f76 100644
--- a/arch/sparc/kernel/setup_32.c
+++ b/arch/sparc/kernel/setup_32.c
@@ -34,6 +34,7 @@
 #include <linux/kdebug.h>
 #include <linux/export.h>
 #include <linux/start_kernel.h>
+#include <linux/panic_notifier.h>
 #include <uapi/linux/mount.h>
 
 #include <asm/io.h>
@@ -51,6 +52,7 @@
 
 #include "kernel.h"
 
+int stop_a_enabled = 1;
 struct screen_info screen_info = {
 	0, 0,			/* orig-x, orig-y */
 	0,			/* unused */
@@ -293,6 +295,24 @@ void __init sparc32_start_kernel(struct linux_romvec *rp)
 	start_kernel();
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void *unused)
+{
+	/* Make sure the user can actually press Stop-A (L1-A) */
+	stop_a_enabled = 1;
+	pr_emerg("Press Stop-A (L1-A) from sun keyboard or send break\n"
+		"twice on console to return to the boot prom\n");
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+	.notifier_call = panic_event,
+};
+
 void __init setup_arch(char **cmdline_p)
 {
 	int i;
@@ -368,9 +388,10 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 
 	smp_setup_cpu_possible_map();
-}
 
-extern int stop_a_enabled;
+	atomic_notifier_chain_register(&panic_post_reboot_list,
+				       &post_reboot_panic_block);
+}
 
 void sun_do_break(void)
 {
@@ -384,8 +405,6 @@ void sun_do_break(void)
 }
 EXPORT_SYMBOL(sun_do_break);
 
-int stop_a_enabled = 1;
-
 static int __init topology_init(void)
 {
 	int i, ncpus, err;
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 48abee4eee29..9066c25ecc07 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -33,6 +33,7 @@
 #include <linux/module.h>
 #include <linux/start_kernel.h>
 #include <linux/memblock.h>
+#include <linux/panic_notifier.h>
 #include <uapi/linux/mount.h>
 
 #include <asm/io.h>
@@ -62,6 +63,8 @@
 #include "entry.h"
 #include "kernel.h"
 
+int stop_a_enabled = 1;
+
 /* Used to synchronize accesses to NatSemi SUPER I/O chip configure
  * operations in asm/ns87303.h
  */
@@ -632,6 +635,24 @@ void __init alloc_irqstack_bootmem(void)
 	}
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void *unused)
+{
+	/* Make sure the user can actually press Stop-A (L1-A) */
+	stop_a_enabled = 1;
+	pr_emerg("Press Stop-A (L1-A) from sun keyboard or send break\n"
+		"twice on console to return to the boot prom\n");
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+	.notifier_call = panic_event,
+};
+
 void __init setup_arch(char **cmdline_p)
 {
 	/* Initialize PROM console and command line. */
@@ -691,9 +712,10 @@ void __init setup_arch(char **cmdline_p)
 	 * allocate the IRQ stacks.
 	 */
 	alloc_irqstack_bootmem();
-}
 
-extern int stop_a_enabled;
+	atomic_notifier_chain_register(&panic_post_reboot_list,
+				       &post_reboot_panic_block);
+}
 
 void sun_do_break(void)
 {
@@ -706,6 +728,3 @@ void sun_do_break(void)
 	prom_cmdline();
 }
 EXPORT_SYMBOL(sun_do_break);
-
-int stop_a_enabled = 1;
-EXPORT_SYMBOL(stop_a_enabled);
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 7912aacbc0e5..bcf6a5ea9d7f 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -8,6 +8,7 @@
 extern struct atomic_notifier_head panic_hypervisor_list;
 extern struct atomic_notifier_head panic_info_list;
 extern struct atomic_notifier_head panic_pre_reboot_list;
+extern struct atomic_notifier_head panic_post_reboot_list;
 
 extern bool crash_kexec_post_notifiers;
 
diff --git a/kernel/panic.c b/kernel/panic.c
index a9d43b98b05b..bf792102b43e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -78,6 +78,9 @@ EXPORT_SYMBOL(panic_info_list);
 ATOMIC_NOTIFIER_HEAD(panic_pre_reboot_list);
 EXPORT_SYMBOL(panic_pre_reboot_list);
 
+ATOMIC_NOTIFIER_HEAD(panic_post_reboot_list);
+EXPORT_SYMBOL(panic_post_reboot_list);
+
 static long no_blink(int state)
 {
 	return 0;
@@ -359,18 +362,10 @@ void panic(const char *fmt, ...)
 			reboot_mode = panic_reboot_mode;
 		emergency_restart();
 	}
-#ifdef __sparc__
-	{
-		extern int stop_a_enabled;
-		/* Make sure the user can actually press Stop-A (L1-A) */
-		stop_a_enabled = 1;
-		pr_emerg("Press Stop-A (L1-A) from sun keyboard or send break\n"
-			 "twice on console to return to the boot prom\n");
-	}
-#endif
-#if defined(CONFIG_S390)
-	disabled_wait();
-#endif
+
+	atomic_notifier_call_chain(&panic_post_reboot_list,
+				   PANIC_NOTIFIER, buf);
+
 	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 
 	/* Do not scroll important messages printed above */
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (21 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 22/30] panic: Introduce the panic post-reboot " Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-10 17:40   ` Steven Rostedt
  2022-04-27 22:49 ` [PATCH 24/30] panic: Refactor the panic path Guilherme G. Piccoli
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

Currently we don't have a way to check if there are dumpers set,
except counting the list members maybe. This patch introduces a very
simple helper to provide this information, by just keeping track of
registered/unregistered kmsg dumpers. It's going to be used on the
panic path in the subsequent patch.

Notice that the spinlock guarding kmsg_dumpers list also guards
increment/decrement of the dumper's counter, but there's no need
for that when reading the counter in the panic path, since that is
an atomic path and there's no other (planned) user.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 include/linux/kmsg_dump.h |  7 +++++++
 kernel/printk/printk.c    | 14 ++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 906521c2329c..abea1974bff8 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -65,6 +65,8 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 
 void kmsg_dump_rewind(struct kmsg_dump_iter *iter);
 
+bool kmsg_has_dumpers(void);
+
 int kmsg_dump_register(struct kmsg_dumper *dumper);
 
 int kmsg_dump_unregister(struct kmsg_dumper *dumper);
@@ -91,6 +93,11 @@ static inline void kmsg_dump_rewind(struct kmsg_dump_iter *iter)
 {
 }
 
+static inline bool kmsg_has_dumpers(void)
+{
+	return false;
+}
+
 static inline int kmsg_dump_register(struct kmsg_dumper *dumper)
 {
 	return -EINVAL;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index da03c15ecc89..e3a1c429fbbc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3399,6 +3399,18 @@ EXPORT_SYMBOL(printk_timed_ratelimit);
 
 static DEFINE_SPINLOCK(dump_list_lock);
 static LIST_HEAD(dump_list);
+static int num_dumpers;
+
+/**
+ * kmsg_has_dumpers - inform if there is any kmsg dumper registered.
+ *
+ * Returns true if there's at least one registered dumper, or false
+ * if otherwise.
+ */
+bool kmsg_has_dumpers(void)
+{
+	return num_dumpers ? true : false;
+}
 
 /**
  * kmsg_dump_register - register a kernel log dumper.
@@ -3423,6 +3435,7 @@ int kmsg_dump_register(struct kmsg_dumper *dumper)
 		dumper->registered = 1;
 		list_add_tail_rcu(&dumper->list, &dump_list);
 		err = 0;
+		num_dumpers++;
 	}
 	spin_unlock_irqrestore(&dump_list_lock, flags);
 
@@ -3447,6 +3460,7 @@ int kmsg_dump_unregister(struct kmsg_dumper *dumper)
 		dumper->registered = 0;
 		list_del_rcu(&dumper->list);
 		err = 0;
+		num_dumpers--;
 	}
 	spin_unlock_irqrestore(&dump_list_lock, flags);
 	synchronize_rcu();
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 24/30] panic: Refactor the panic path
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (22 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-28  0:28   ` Randy Dunlap
                     ` (4 more replies)
  2022-04-27 22:49 ` [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier Guilherme G. Piccoli
                   ` (5 subsequent siblings)
  29 siblings, 5 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

The panic() function is somewhat convoluted - a lot of changes were
made over the years, adding comments that might be misleading/outdated
now, it has a code structure that is a bit complex to follow, with
lots of conditionals, for example. The panic notifier list is something
else - a single list, with multiple callbacks of different purposes,
that run in a non-deterministic order and may affect hardly kdump
reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.

This patch proposes a major refactor on the panic path based on Petr's
idea [0] - basically we split the notifiers list in three, having a set
of different call points in the panic path. Below a list of changes
proposed in this patch, culminating in the panic notifiers level
concept:

(a) First of all, we improved comments all over the function
and removed useless variables / includes. Also, as part of this
clean-up we concentrate the console flushing functions in a helper.

(b) As mentioned before, there is a split of the panic notifier list
in three, based on the purpose of the callback. The code contains
good documentation in form of comments, but a summary of the three
lists follows:

- the hypervisor list aims low-risk procedures to inform hypervisors
or firmware about the panic event, also includes LED-related functions;

- the informational list contains callbacks that provide more details,
like kernel offset or trace dump (if enabled) and also includes the
callbacks aimed at reducing log pollution or warns, like the RCU and
hung task disable callbacks;

- finally, the pre_reboot list is the old notifier list renamed,
containing the more risky callbacks that didn't fit the previous
lists. There is also a 4th list (the post_reboot one), but it's not
related with the original list - it contains late time architecture
callbacks aimed at stopping the machine, for example.

The 3 notifiers lists execute in different moments, hypervisor being
the first, followed by informational and finally the pre_reboot list.

(c) But then, there is the ordering problem of the notifiers against
the crash_kernel() call - kdump must be as reliable as possible.
For that, a simple binary "switch" as "crash_kexec_post_notifiers"
is not enough, hence we introduce here concept of panic notifier
levels: there are 5 levels, from 0 (no notifier executes before
kdump) until 4 (all notifiers run before kdump); the default level
is 2, in which the hypervisor and (iff we have any kmsg dumper)
the informational notifiers execute before kdump.

The detailed documentation of the levels is present in code comments
and in the kernel-parameters.txt file; as an analogy with the previous
panic() implementation, the level 0 is exactly the same as the old
behavior of notifiers, running all after kdump, and the level 4 is
the same as "crash_kexec_post_notifiers=Y" (we kept this parameter as
a deprecated one).

(d) Finally, an important change made here: we now use only the
function "crash_smp_send_stop()" to shut all the secondary CPUs
in the panic path. Before, there was a case of using the regular
"smp_send_stop()", but the better approach is to simplify the
code and try to use the function which was created exclusively
for the panic path. Experiments showed that it works fine, and
code was very simplified with that.

Functional change is expected from this refactor, since now we
call some notifiers by default before kdump, but the goal here
besides code clean-up is to have a better panic path, more
reliable and deterministic, but also very customizable.

[0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

Special thanks to Petr and Baoquan for the suggestion and feedback in a previous
email thread. There's some important design decisions that worth mentioning and
discussing:

* The default panic notifiers level is 2, based on Petr Mladek's suggestion,
which makes a lot of sense. Of course, this is customizable through the
parameter, but would be something worthwhile to have a KConfig option to set
the default level? It would help distros that want the old behavior
(no notifiers before kdump) as default.

* The implementation choice was to _avoid_ intricate if conditionals in the
panic path, which would _definitely_ be present with the panic notifiers levels
idea; so, instead of lots of if conditionals, the set/clear bits approach with
functions called in 2 points (but executing only in one of them) is much easier
to follow an was used here; the ordering helper function and the comments also
help a lot to avoid confusion (hopefully).

* Choice was to *always* use crash_smp_send_stop() instead of sometimes making
use of the regular smp_send_stop(); for most architectures they are the same,
including Xen (on x86). For the ones that override it, all should work fine,
in the powerpc case it's even more correct (see the subsequent patch
"powerpc: Do not force all panic notifiers to execute before kdump")

There seems to be 2 cases that requires some plumbing to work 100% right:
- ARM doesn't disable local interrupts / FIQs in the crash version of
send_stop(); we patched that early in this series;
- x86 could face an issue if we have VMX and do use crash_smp_send_stop()
_without_ kdump, but this is fixed in the first patch of the series (and
it's a bug present even before this refactor).

* Notice we didn't add a sysrq for panic notifiers level - should have it?
Alejandro proposed recently to add a sysrq for "crash_kexec_post_notifiers",
let me know if you feel the need here Alejandro, since the core parameters are
present in /sys, I didn't consider much gain in having a sysrq, but of course
I'm open to suggestions!

Thanks advance for the review!

 .../admin-guide/kernel-parameters.txt         |  42 ++-
 include/linux/panic_notifier.h                |   1 +
 kernel/kexec_core.c                           |   8 +-
 kernel/panic.c                                | 292 +++++++++++++-----
 .../selftests/pstore/pstore_crash_test        |   5 +-
 5 files changed, 252 insertions(+), 96 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3f1cc5e317ed..8d3524060ce3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -829,6 +829,13 @@
 			It will be ignored when crashkernel=X,high is not used
 			or memory reserved is below 4G.
 
+	crash_kexec_post_notifiers
+			This was DEPRECATED - users should always prefer the
+			parameter "panic_notifiers_level" - check its entry
+			in this documentation for details on how it works.
+			Setting this parameter is exactly the same as setting
+			"panic_notifiers_level=4".
+
 	cryptomgr.notests
 			[KNL] Disable crypto self-tests
 
@@ -3784,6 +3791,33 @@
 			timeout < 0: reboot immediately
 			Format: <timeout>
 
+	panic_notifiers_level=
+			[KNL] Set the panic notifiers execution order.
+			Format: <unsigned int>
+			We currently have 4 lists of panic notifiers; based
+			on the functionality and risk (for panic success) the
+			callbacks are added in a given list. The lists are:
+			- hypervisor/FW notification list (low risk);
+			- informational list (low/medium risk);
+			- pre_reboot list (higher risk);
+			- post_reboot list (only run late in panic and after
+			kdump, not configurable for now).
+			This parameter defines the ordering of the first 3
+			lists with regards to kdump; the levels determine
+			which set of notifiers execute before kdump. The
+			accepted levels are:
+			0: kdump is the first thing to run, NO list is
+			executed before kdump.
+			1: only the hypervisor list is executed before kdump.
+			2 (default level): the hypervisor list and (*if*
+			there's any kmsg_dumper defined) the informational
+			list are executed before kdump.
+			3: both the hypervisor and the informational lists
+			(always) execute before kdump.
+			4: the 3 lists (hypervisor, info and pre_reboot)
+			execute before kdump - this behavior is analog to the
+			deprecated parameter "crash_kexec_post_notifiers".
+
 	panic_print=	Bitmask for printing system info when panic happens.
 			User can chose combination of the following bits:
 			bit 0: print all tasks info
@@ -3814,14 +3848,6 @@
 	panic_on_warn	panic() instead of WARN().  Useful to cause kdump
 			on a WARN().
 
-	crash_kexec_post_notifiers
-			Run kdump after running panic-notifiers and dumping
-			kmsg. This only for the users who doubt kdump always
-			succeeds in any situation.
-			Note that this also increases risks of kdump failure,
-			because some panic notifiers can make the crashed
-			kernel more unstable.
-
 	parkbd.port=	[HW] Parallel port number the keyboard adapter is
 			connected to, default is 0.
 			Format: <parport#>
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index bcf6a5ea9d7f..b5041132321d 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -10,6 +10,7 @@ extern struct atomic_notifier_head panic_info_list;
 extern struct atomic_notifier_head panic_pre_reboot_list;
 extern struct atomic_notifier_head panic_post_reboot_list;
 
+bool panic_notifiers_before_kdump(void);
 extern bool crash_kexec_post_notifiers;
 
 enum panic_notifier_val {
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 68480f731192..f8906db8ca72 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -74,11 +74,11 @@ struct resource crashk_low_res = {
 int kexec_should_crash(struct task_struct *p)
 {
 	/*
-	 * If crash_kexec_post_notifiers is enabled, don't run
-	 * crash_kexec() here yet, which must be run after panic
-	 * notifiers in panic().
+	 * In case any panic notifiers are set to execute before kexec,
+	 * don't run crash_kexec() here yet, which must run after these
+	 * panic notifiers are executed, in the panic() path.
 	 */
-	if (crash_kexec_post_notifiers)
+	if (panic_notifiers_before_kdump())
 		return 0;
 	/*
 	 * There are 4 panic() calls in make_task_dead() path, each of which
diff --git a/kernel/panic.c b/kernel/panic.c
index bf792102b43e..b7c055d4f87f 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -15,7 +15,6 @@
 #include <linux/kgdb.h>
 #include <linux/kmsg_dump.h>
 #include <linux/kallsyms.h>
-#include <linux/notifier.h>
 #include <linux/vt_kern.h>
 #include <linux/module.h>
 #include <linux/random.h>
@@ -52,14 +51,23 @@ static unsigned long tainted_mask =
 static int pause_on_oops;
 static int pause_on_oops_flag;
 static DEFINE_SPINLOCK(pause_on_oops_lock);
-bool crash_kexec_post_notifiers;
+
 int panic_on_warn __read_mostly;
+bool panic_on_taint_nousertaint;
 unsigned long panic_on_taint;
-bool panic_on_taint_nousertaint = false;
 
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
 
+/* Initialized with all notifiers set to run before kdump */
+static unsigned long panic_notifiers_bits = 15;
+
+/* Default level is 2, see kernel-parameters.txt */
+unsigned int panic_notifiers_level = 2;
+
+/* DEPRECATED in favor of panic_notifiers_level */
+bool crash_kexec_post_notifiers;
+
 #define PANIC_PRINT_TASK_INFO		0x00000001
 #define PANIC_PRINT_MEM_INFO		0x00000002
 #define PANIC_PRINT_TIMER_INFO		0x00000004
@@ -109,10 +117,14 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs)
 }
 
 /*
- * Stop other CPUs in panic.  Architecture dependent code may override this
- * with more suitable version.  For example, if the architecture supports
- * crash dump, it should save registers of each stopped CPU and disable
- * per-CPU features such as virtualization extensions.
+ * Stop other CPUs in panic context.
+ *
+ * Architecture dependent code may override this with more suitable version.
+ * For example, if the architecture supports crash dump, it should save the
+ * registers of each stopped CPU and disable per-CPU features such as
+ * virtualization extensions. When not overridden in arch code (and for
+ * x86/xen), this is exactly the same as execute smp_send_stop(), but
+ * guarded against duplicate execution.
  */
 void __weak crash_smp_send_stop(void)
 {
@@ -183,6 +195,112 @@ static void panic_print_sys_info(bool console_flush)
 		ftrace_dump(DUMP_ALL);
 }
 
+/*
+ * Helper that accumulates all console flushing routines executed on panic.
+ */
+static void console_flushing(void)
+{
+#ifdef CONFIG_VT
+	unblank_screen();
+#endif
+	console_unblank();
+
+	/*
+	 * In this point, we may have disabled other CPUs, hence stopping the
+	 * CPU holding the lock while still having some valuable data in the
+	 * console buffer.
+	 *
+	 * Try to acquire the lock then release it regardless of the result.
+	 * The release will also print the buffers out. Locks debug should
+	 * be disabled to avoid reporting bad unlock balance when panic()
+	 * is not being called from OOPS.
+	 */
+	debug_locks_off();
+	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+
+	panic_print_sys_info(true);
+}
+
+#define PN_HYPERVISOR_BIT	0
+#define PN_INFO_BIT		1
+#define PN_PRE_REBOOT_BIT	2
+#define PN_POST_REBOOT_BIT	3
+
+/*
+ * Determine the order of panic notifiers with regards to kdump.
+ *
+ * This function relies in the "panic_notifiers_level" kernel parameter
+ * to determine how to order the notifiers with regards to kdump. We
+ * have currently 5 levels. For details, please check the kernel docs for
+ * "panic_notifiers_level" at Documentation/admin-guide/kernel-parameters.txt.
+ *
+ * Default level is 2, which means the panic hypervisor and informational
+ * (unless we don't have any kmsg_dumper) lists will execute before kdump.
+ */
+static void order_panic_notifiers_and_kdump(void)
+{
+	/*
+	 * The parameter "crash_kexec_post_notifiers" is deprecated, but
+	 * valid. Users that set it want really all panic notifiers to
+	 * execute before kdump, so it's effectively the same as setting
+	 * the panic notifiers level to 4.
+	 */
+	if (panic_notifiers_level >= 4 || crash_kexec_post_notifiers)
+		return;
+
+	/*
+	 * Based on the level configured (smaller than 4), we clear the
+	 * proper bits in "panic_notifiers_bits". Notice that this bitfield
+	 * is initialized with all notifiers set.
+	 */
+	switch (panic_notifiers_level) {
+	case 3:
+		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
+		break;
+	case 2:
+		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
+
+		if (!kmsg_has_dumpers())
+			clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
+		break;
+	case 1:
+		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
+		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
+		break;
+	case 0:
+		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
+		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
+		clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
+		break;
+	}
+}
+
+/*
+ * Set of helpers to execute the panic notifiers only once.
+ * Just the informational notifier cares about the return.
+ */
+static inline bool notifier_run_once(struct atomic_notifier_head head,
+				     char *buf, long bit)
+{
+	if (test_and_change_bit(bit, &panic_notifiers_bits)) {
+		atomic_notifier_call_chain(&head, PANIC_NOTIFIER, buf);
+		return true;
+	}
+	return false;
+}
+
+#define panic_notifier_hypervisor_once(buf)\
+	notifier_run_once(panic_hypervisor_list, buf, PN_HYPERVISOR_BIT)
+
+#define panic_notifier_info_once(buf)\
+	notifier_run_once(panic_info_list, buf, PN_INFO_BIT)
+
+#define panic_notifier_pre_reboot_once(buf)\
+	notifier_run_once(panic_pre_reboot_list, buf, PN_PRE_REBOOT_BIT)
+
+#define panic_notifier_post_reboot_once(buf)\
+	notifier_run_once(panic_post_reboot_list, buf, PN_POST_REBOOT_BIT)
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -198,32 +316,29 @@ void panic(const char *fmt, ...)
 	long i, i_next = 0, len;
 	int state = 0;
 	int old_cpu, this_cpu;
-	bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
 
-	if (panic_on_warn) {
-		/*
-		 * This thread may hit another WARN() in the panic path.
-		 * Resetting this prevents additional WARN() from panicking the
-		 * system on this thread.  Other threads are blocked by the
-		 * panic_mutex in panic().
-		 */
-		panic_on_warn = 0;
-	}
+	/*
+	 * This thread may hit another WARN() in the panic path, so
+	 * resetting this option prevents additional WARN() from
+	 * re-panicking the system here.
+	 */
+	panic_on_warn = 0;
 
 	/*
 	 * Disable local interrupts. This will prevent panic_smp_self_stop
-	 * from deadlocking the first cpu that invokes the panic, since
-	 * there is nothing to prevent an interrupt handler (that runs
-	 * after setting panic_cpu) from invoking panic() again.
+	 * from deadlocking the first cpu that invokes the panic, since there
+	 * is nothing to prevent an interrupt handler (that runs after setting
+	 * panic_cpu) from invoking panic() again. Also disables preemption
+	 * here - notice it's not safe to rely on interrupt disabling to avoid
+	 * preemption, since any cond_resched() or cond_resched_lock() might
+	 * trigger a reschedule if the preempt count is 0 (for reference, see
+	 * Documentation/locking/preempt-locking.rst). Some functions called
+	 * from here want preempt disabled, so no point enabling it later.
 	 */
 	local_irq_disable();
 	preempt_disable_notrace();
 
 	/*
-	 * It's possible to come here directly from a panic-assertion and
-	 * not have preempt disabled. Some functions called from here want
-	 * preempt to be disabled. No point enabling it later though...
-	 *
 	 * Only one CPU is allowed to execute the panic code from here. For
 	 * multiple parallel invocations of panic, all other CPUs either
 	 * stop themself or will wait until they are stopped by the 1st CPU
@@ -266,73 +381,75 @@ void panic(const char *fmt, ...)
 	kgdb_panic(buf);
 
 	/*
-	 * If we have crashed and we have a crash kernel loaded let it handle
-	 * everything else.
-	 * If we want to run this after calling panic_notifiers, pass
-	 * the "crash_kexec_post_notifiers" option to the kernel.
+	 * Here lies one of the most subtle parts of the panic path,
+	 * the panic notifiers and their order with regards to kdump.
+	 * We currently have 4 sets of notifiers:
 	 *
-	 * Bypass the panic_cpu check and call __crash_kexec directly.
+	 *  - the hypervisor list is composed by callbacks that are related
+	 *  to warn the FW / hypervisor about panic, or non-invasive LED
+	 *  controlling functions - (hopefully) low-risk for kdump, should
+	 *  run early if possible.
+	 *
+	 *  - the informational list is composed by functions dumping data
+	 *  like kernel offsets, device error registers or tracing buffer;
+	 *  also log flooding prevention callbacks fit in this list. It is
+	 *  relatively safe to run before kdump.
+	 *
+	 *  - the pre_reboot list basically is everything else, all the
+	 *  callbacks that don't fit in the 2 previous lists. It should
+	 *  run *after* kdump if possible, as it contains high-risk
+	 *  functions that may break kdump.
+	 *
+	 *  - we also have a 4th list of notifiers, the post_reboot
+	 *  callbacks. This is not strongly related to kdump since it's
+	 *  always executed late in the panic path, after the restart
+	 *  mechanism (if set); its goal is to provide a way for
+	 *  architecture code effectively power-off/disable the system.
+	 *
+	 *  The kernel provides the "panic_notifiers_level" parameter
+	 *  to adjust the ordering in which these notifiers should run
+	 *  with regards to kdump - the default level is 2, so both the
+	 *  hypervisor and informational notifiers should execute before
+	 *  the __crash_kexec(); the info notifier won't run by default
+	 *  unless there's some kmsg_dumper() registered. For details
+	 *  about it, check Documentation/admin-guide/kernel-parameters.txt.
+	 *
+	 *  Notice that the code relies in bits set/clear operations to
+	 *  determine the ordering, functions *_once() execute only one
+	 *  time, as their name implies. The goal is to prevent too much
+	 *  if conditionals and more confusion. Finally, regarding CPUs
+	 *  disabling: unless NO panic notifier executes before kdump,
+	 *  we always disable secondary CPUs before __crash_kexec() and
+	 *  the notifiers execute.
 	 */
-	if (!_crash_kexec_post_notifiers) {
+	order_panic_notifiers_and_kdump();
+
+	/* If no level, we should kdump ASAP. */
+	if (!panic_notifiers_level)
 		__crash_kexec(NULL);
 
-		/*
-		 * Note smp_send_stop is the usual smp shutdown function, which
-		 * unfortunately means it may not be hardened to work in a
-		 * panic situation.
-		 */
-		smp_send_stop();
-	} else {
-		/*
-		 * If we want to do crash dump after notifier calls and
-		 * kmsg_dump, we will need architecture dependent extra
-		 * works in addition to stopping other CPUs.
-		 */
-		crash_smp_send_stop();
-	}
+	crash_smp_send_stop();
+	panic_notifier_hypervisor_once(buf);
 
-	/*
-	 * Run any panic handlers, including those that might need to
-	 * add information to the kmsg dump output.
-	 */
-	atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
-	atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
-	atomic_notifier_call_chain(&panic_pre_reboot_list, PANIC_NOTIFIER, buf);
+	if (panic_notifier_info_once(buf)) {
+		panic_print_sys_info(false);
+		kmsg_dump(KMSG_DUMP_PANIC);
+	}
 
-	panic_print_sys_info(false);
+	panic_notifier_pre_reboot_once(buf);
 
-	kmsg_dump(KMSG_DUMP_PANIC);
+	__crash_kexec(NULL);
 
-	/*
-	 * If you doubt kdump always works fine in any situation,
-	 * "crash_kexec_post_notifiers" offers you a chance to run
-	 * panic_notifiers and dumping kmsg before kdump.
-	 * Note: since some panic_notifiers can make crashed kernel
-	 * more unstable, it can increase risks of the kdump failure too.
-	 *
-	 * Bypass the panic_cpu check and call __crash_kexec directly.
-	 */
-	if (_crash_kexec_post_notifiers)
-		__crash_kexec(NULL);
+	panic_notifier_hypervisor_once(buf);
 
-#ifdef CONFIG_VT
-	unblank_screen();
-#endif
-	console_unblank();
-
-	/*
-	 * We may have ended up stopping the CPU holding the lock (in
-	 * smp_send_stop()) while still having some valuable data in the console
-	 * buffer.  Try to acquire the lock then release it regardless of the
-	 * result.  The release will also print the buffers out.  Locks debug
-	 * should be disabled to avoid reporting bad unlock balance when
-	 * panic() is not being callled from OOPS.
-	 */
-	debug_locks_off();
-	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+	if (panic_notifier_info_once(buf)) {
+		panic_print_sys_info(false);
+		kmsg_dump(KMSG_DUMP_PANIC);
+	}
 
-	panic_print_sys_info(true);
+	panic_notifier_pre_reboot_once(buf);
 
+	console_flushing();
 	if (!panic_blink)
 		panic_blink = no_blink;
 
@@ -363,8 +480,7 @@ void panic(const char *fmt, ...)
 		emergency_restart();
 	}
 
-	atomic_notifier_call_chain(&panic_post_reboot_list,
-				   PANIC_NOTIFIER, buf);
+	panic_notifier_post_reboot_once(buf);
 
 	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 
@@ -383,6 +499,15 @@ void panic(const char *fmt, ...)
 
 EXPORT_SYMBOL(panic);
 
+/*
+ * Helper used in the kexec code, to validate if any
+ * panic notifier is set to execute early, before kdump.
+ */
+inline bool panic_notifiers_before_kdump(void)
+{
+	return panic_notifiers_level || crash_kexec_post_notifiers;
+}
+
 /*
  * TAINT_FORCED_RMMOD could be a per-module flag but the module
  * is being removed anyway.
@@ -692,6 +817,9 @@ core_param(panic, panic_timeout, int, 0644);
 core_param(panic_print, panic_print, ulong, 0644);
 core_param(pause_on_oops, pause_on_oops, int, 0644);
 core_param(panic_on_warn, panic_on_warn, int, 0644);
+core_param(panic_notifiers_level, panic_notifiers_level, uint, 0644);
+
+/* DEPRECATED in favor of panic_notifiers_level */
 core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644);
 
 static int __init oops_setup(char *s)
diff --git a/tools/testing/selftests/pstore/pstore_crash_test b/tools/testing/selftests/pstore/pstore_crash_test
index 2a329bbb4aca..1e60ce4501aa 100755
--- a/tools/testing/selftests/pstore/pstore_crash_test
+++ b/tools/testing/selftests/pstore/pstore_crash_test
@@ -25,6 +25,7 @@ touch $REBOOT_FLAG
 sync
 
 # cause crash
-# Note: If you use kdump and want to see kmesg-* files after reboot, you should
-#       specify 'crash_kexec_post_notifiers' in 1st kernel's cmdline.
+# Note: If you use kdump and want to see kmsg-* files after reboot, you should
+#       be sure that the parameter "panic_notifiers_level" is more than '2' (the
+#       default value for this parameter is '2') in the first kernel's cmdline.
 echo c > /proc/sysrq-trigger
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (23 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 24/30] panic: Refactor the panic path Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-05-16 14:56   ` Petr Mladek
  2022-04-27 22:49 ` [PATCH 26/30] Drivers: hv: Do not force all panic notifiers to execute before kdump Guilherme G. Piccoli
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

Currently the parameter "panic_print" relies in a function called
directly on panic path; one of the flags the users can set for
panic_print triggers a console replay mechanism, to show the
entire kernel log buffer (from the beginning) in a panic event.

Two problems with that: the dual nature of the panic_print
isn't really appropriate, the function was originally meant
to allow users dumping system information on panic events,
and was "overridden" to also force a console flush of the full
kernel log buffer. It also turns the code a bit more complex
and duplicate than it needs to be.

This patch proposes 2 changes: first, we decouple panic_print
from the console flushing mechanism, in the form of a new kernel
core parameter (panic_console_replay); we kept the functionality
on panic_print to avoid userspace breakage, although we comment
in both code and documentation that this panic_print usage is
deprecated.

We converted panic_print function to a panic notifier too, adding
it on the panic informational notifier list, executed as the final
callback. This allows a more clear code and makes sense, as
panic_print_sys_info() is really a panic-time only function.
We also moved its code to kernel/printk.c, it seems to make more
sense given it's related to printing stuff.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 .../admin-guide/kernel-parameters.txt         | 12 +++-
 Documentation/admin-guide/sysctl/kernel.rst   |  5 +-
 include/linux/console.h                       |  2 +
 include/linux/panic.h                         |  1 -
 include/linux/printk.h                        |  1 +
 kernel/panic.c                                | 51 +++------------
 kernel/printk/printk.c                        | 62 +++++++++++++++++++
 7 files changed, 87 insertions(+), 47 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8d3524060ce3..c99da8b2b216 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3791,6 +3791,14 @@
 			timeout < 0: reboot immediately
 			Format: <timeout>
 
+	panic_console_replay
+			[KNL] Force a kernel log replay in the console on
+			panic event. Notice that there is already a flush
+			mechanism for pending messages; this option is meant
+			for users that wish to replay the *full* buffer.
+			It deprecates the bit 5 setting on "panic_print",
+			both having the same functionality.
+
 	panic_notifiers_level=
 			[KNL] Set the panic notifiers execution order.
 			Format: <unsigned int>
@@ -3825,12 +3833,14 @@
 			bit 2: print timer info
 			bit 3: print locks info if CONFIG_LOCKDEP is on
 			bit 4: print ftrace buffer
-			bit 5: print all printk messages in buffer
+			bit 5: print all printk messages in buffer (DEPRECATED)
 			bit 6: print all CPUs backtrace (if available in the arch)
 			*Be aware* that this option may print a _lot_ of lines,
 			so there are risks of losing older messages in the log.
 			Use this option carefully, maybe worth to setup a
 			bigger log buffer with "log_buf_len" along with this.
+			Also, notice that bit 5 was deprecated in favor of the
+			parameter "panic_console_replay".
 
 	panic_on_taint=	Bitmask for conditionally calling panic() in add_taint()
 			Format: <hex>[,nousertaint]
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 1144ea3229a3..17b293a0e566 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -763,10 +763,13 @@ bit 1  print system memory info
 bit 2  print timer info
 bit 3  print locks info if ``CONFIG_LOCKDEP`` is on
 bit 4  print ftrace buffer
-bit 5  print all printk messages in buffer
+bit 5  print all printk messages in buffer (DEPRECATED)
 bit 6  print all CPUs backtrace (if available in the arch)
 =====  ============================================
 
+Notice that bit 5 was deprecated in favor of kernel core parameter
+"panic_console_replay" (see kernel-parameters.txt documentation).
+
 So for example to print tasks and memory info on panic, user can::
 
   echo 3 > /proc/sys/kernel/panic_print
diff --git a/include/linux/console.h b/include/linux/console.h
index 7cd758a4f44e..351c14f623ad 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -169,6 +169,8 @@ enum con_flush_mode {
 	CONSOLE_REPLAY_ALL,
 };
 
+extern bool panic_console_replay;
+
 extern int add_preferred_console(char *name, int idx, char *options);
 extern void register_console(struct console *);
 extern int unregister_console(struct console *);
diff --git a/include/linux/panic.h b/include/linux/panic.h
index f5844908a089..34175d0188d0 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -22,7 +22,6 @@ extern unsigned int sysctl_oops_all_cpu_backtrace;
 #endif /* CONFIG_SMP */
 
 extern int panic_timeout;
-extern unsigned long panic_print;
 extern int panic_on_oops;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1522df223c0f..aee2e8ebd541 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -13,6 +13,7 @@
 extern const char linux_banner[];
 extern const char linux_proc_banner[];
 
+extern unsigned long panic_print;
 extern int oops_in_progress;	/* If set, an oops, panic(), BUG() or die() is in progress */
 
 #define PRINTK_MAX_SINGLE_HEADER_LEN 2
diff --git a/kernel/panic.c b/kernel/panic.c
index b7c055d4f87f..ff257bd8f81b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -68,15 +68,6 @@ unsigned int panic_notifiers_level = 2;
 /* DEPRECATED in favor of panic_notifiers_level */
 bool crash_kexec_post_notifiers;
 
-#define PANIC_PRINT_TASK_INFO		0x00000001
-#define PANIC_PRINT_MEM_INFO		0x00000002
-#define PANIC_PRINT_TIMER_INFO		0x00000004
-#define PANIC_PRINT_LOCK_INFO		0x00000008
-#define PANIC_PRINT_FTRACE_INFO		0x00000010
-#define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
-#define PANIC_PRINT_ALL_CPU_BT		0x00000040
-unsigned long panic_print;
-
 ATOMIC_NOTIFIER_HEAD(panic_hypervisor_list);
 EXPORT_SYMBOL(panic_hypervisor_list);
 
@@ -168,33 +159,6 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
 }
 EXPORT_SYMBOL(nmi_panic);
 
-static void panic_print_sys_info(bool console_flush)
-{
-	if (console_flush) {
-		if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
-			console_flush_on_panic(CONSOLE_REPLAY_ALL);
-		return;
-	}
-
-	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
-		trigger_all_cpu_backtrace();
-
-	if (panic_print & PANIC_PRINT_TASK_INFO)
-		show_state();
-
-	if (panic_print & PANIC_PRINT_MEM_INFO)
-		show_mem(0, NULL);
-
-	if (panic_print & PANIC_PRINT_TIMER_INFO)
-		sysrq_timer_list_show();
-
-	if (panic_print & PANIC_PRINT_LOCK_INFO)
-		debug_show_all_locks();
-
-	if (panic_print & PANIC_PRINT_FTRACE_INFO)
-		ftrace_dump(DUMP_ALL);
-}
-
 /*
  * Helper that accumulates all console flushing routines executed on panic.
  */
@@ -218,7 +182,11 @@ static void console_flushing(void)
 	debug_locks_off();
 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
 
-	panic_print_sys_info(true);
+	/* In case users wish to replay the full log buffer... */
+	if (panic_console_replay) {
+		pr_warn("Replaying the log buffer from the beginning\n");
+		console_flush_on_panic(CONSOLE_REPLAY_ALL);
+	}
 }
 
 #define PN_HYPERVISOR_BIT	0
@@ -431,10 +399,8 @@ void panic(const char *fmt, ...)
 	crash_smp_send_stop();
 	panic_notifier_hypervisor_once(buf);
 
-	if (panic_notifier_info_once(buf)) {
-		panic_print_sys_info(false);
+	if (panic_notifier_info_once(buf))
 		kmsg_dump(KMSG_DUMP_PANIC);
-	}
 
 	panic_notifier_pre_reboot_once(buf);
 
@@ -442,10 +408,8 @@ void panic(const char *fmt, ...)
 
 	panic_notifier_hypervisor_once(buf);
 
-	if (panic_notifier_info_once(buf)) {
-		panic_print_sys_info(false);
+	if (panic_notifier_info_once(buf))
 		kmsg_dump(KMSG_DUMP_PANIC);
-	}
 
 	panic_notifier_pre_reboot_once(buf);
 
@@ -814,7 +778,6 @@ EXPORT_SYMBOL(__stack_chk_fail);
 #endif
 
 core_param(panic, panic_timeout, int, 0644);
-core_param(panic_print, panic_print, ulong, 0644);
 core_param(pause_on_oops, pause_on_oops, int, 0644);
 core_param(panic_on_warn, panic_on_warn, int, 0644);
 core_param(panic_notifiers_level, panic_notifiers_level, uint, 0644);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e3a1c429fbbc..ad91d4c04246 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -35,6 +35,7 @@
 #include <linux/memblock.h>
 #include <linux/syscalls.h>
 #include <linux/crash_core.h>
+#include <linux/panic_notifier.h>
 #include <linux/ratelimit.h>
 #include <linux/kmsg_dump.h>
 #include <linux/syslog.h>
@@ -3234,6 +3235,61 @@ void __init console_init(void)
 	}
 }
 
+#define PANIC_PRINT_TASK_INFO		0x00000001
+#define PANIC_PRINT_MEM_INFO		0x00000002
+#define PANIC_PRINT_TIMER_INFO		0x00000004
+#define PANIC_PRINT_LOCK_INFO		0x00000008
+#define PANIC_PRINT_FTRACE_INFO		0x00000010
+
+/* DEPRECATED - please use "panic_console_replay" */
+#define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
+
+#define PANIC_PRINT_ALL_CPU_BT		0x00000040
+
+unsigned long panic_print;
+bool panic_console_replay;
+
+static int panic_print_sys_info(struct notifier_block *self,
+				unsigned long ev, void *unused)
+{
+	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+		trigger_all_cpu_backtrace();
+
+	if (panic_print & PANIC_PRINT_TASK_INFO)
+		show_state();
+
+	if (panic_print & PANIC_PRINT_MEM_INFO)
+		show_mem(0, NULL);
+
+	if (panic_print & PANIC_PRINT_TIMER_INFO)
+		sysrq_timer_list_show();
+
+	if (panic_print & PANIC_PRINT_LOCK_INFO)
+		debug_show_all_locks();
+
+	if (panic_print & PANIC_PRINT_FTRACE_INFO)
+		ftrace_dump(DUMP_ALL);
+
+	/*
+	 * This is legacy/deprecated feature from panic_print,
+	 * the console force flushing. We have now the parameter
+	 * "panic_console_replay", but we need to keep the
+	 * retro-compatibility with the old stuff...
+	 */
+	if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
+		panic_console_replay = true;
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block panic_print_nb = {
+	.notifier_call = panic_print_sys_info,
+	.priority = INT_MIN, /* defer to run as late as possible */
+};
+
+core_param(panic_print, panic_print, ulong, 0644);
+core_param(panic_console_replay, panic_console_replay, bool, 0644);
+
 /*
  * Some boot consoles access data that is in the init section and which will
  * be discarded after the initcalls have been run. To make sure that no code
@@ -3253,6 +3309,12 @@ static int __init printk_late_init(void)
 	struct console *con;
 	int ret;
 
+	/*
+	 * Register the panic notifier to print user information
+	 * in case the user have that set.
+	 */
+	atomic_notifier_chain_register(&panic_info_list, &panic_print_nb);
+
 	for_each_console(con) {
 		if (!(con->flags & CON_BOOT))
 			continue;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 26/30] Drivers: hv: Do not force all panic notifiers to execute before kdump
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (24 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-27 22:49 ` [PATCH 27/30] powerpc: " Guilherme G. Piccoli
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Andrea Parri, Dexuan Cui, Haiyang Zhang, K. Y. Srinivasan,
	Stephen Brennan, Stephen Hemminger, Tianyu Lan, Wei Liu

Since commit a11589563e96 ("x86/Hyper-V: Report crash register
data or kmsg before running crash kernel") Hyper-V forcibly sets
the kernel parameter "crash_kexec_post_notifiers"; with that, it
did enforce the execution of *all* panic notifiers before kdump.
The main reason behind that is that Hyper-V has an hypervisor
notification mechanism that has the ability of warning the
hypervisor when the guest panics.

Happens that after the panic notifiers refactor, we now have 3 lists
and a level mechanism that defines the ordering of the notifiers
execution with regards to kdump. And for Hyper-V, the specific
notifier to inform the hypervisor about a panic lies in the first
list, which *by default* is set to execute before kdump. Hence,
this patch removes the hardcoded setting, effectively reverting
the aforementioned commit.

One of the problems with the forced approach was greatly exposed by
commit d57d6fe5bf34 ("drivers: hv: log when enabling crash_kexec_post_notifiers")
which ended-up confusing the user that didn't expect the notifiers
to execute before kdump, since it's a user setting and wasn't
enabled by such user. With the patch hereby proposed, that kind
of issue doesn't happen anymore, the panic notifiers level is
well-documented and users can expect a predictable behavior.

Fixes: a11589563e96 ("x86/Hyper-V: Report crash register data or kmsg before running crash kernel")
Fixes: d57d6fe5bf34 ("drivers: hv: log when enabling crash_kexec_post_notifiers"
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

Special thanks to Michael Kelley for the good information about the Hyper-V
panic path in email threads some months ago, and to Fabio for the testing
performed.

 drivers/hv/hv_common.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae68298c0dca..af59793de523 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -73,18 +73,6 @@ int __init hv_common_init(void)
 {
 	int i;
 
-	/*
-	 * Hyper-V expects to get crash register data or kmsg when
-	 * crash enlightment is available and system crashes. Set
-	 * crash_kexec_post_notifiers to be true to make sure that
-	 * calling crash enlightment interface before running kdump
-	 * kernel.
-	 */
-	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
-		crash_kexec_post_notifiers = true;
-		pr_info("Hyper-V: enabling crash_kexec_post_notifiers\n");
-	}
-
 	/*
 	 * Allocate the per-CPU state for the hypercall input arg.
 	 * If this allocation fails, we will not be able to setup
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 27/30] powerpc: Do not force all panic notifiers to execute before kdump
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (25 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 26/30] Drivers: hv: Do not force all panic notifiers to execute before kdump Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-27 22:49 ` [PATCH 28/30] panic: Unexport crash_kexec_post_notifiers Guilherme G. Piccoli
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Hari Bathini, Michael Ellerman,
	Nicholas Piggin, Paul Mackerras

Commit 06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in
vmcore generated with panic") introduced a hardcoded setting of kernel
parameter "crash_kexec_post_notifiers", effectively forcing all the
panic notifiers to execute earlier in the panic path, before kdump.

The reason for that was a fadump issue on collecting data accurately,
due to smp_send_stop() setting all CPUs offline, so the net effect
desired with this change was to avoid calling the regular CPU
shutdown function, and instead rely on crash_smp_send_stop(), which
copes fine with fadump. The collateral effect was to increase the
risk for kdump if fadump is not used, since it forces all panic
notifiers to execute early, before kdump.

Happens that, after a panic refactor, crash_smp_send_stop() is
now used by default in the panic path, so there is no reason to
mess with the notifiers ordering (which was also improved in the
refactor) from within arch code.

Fixes: 06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic")
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment.

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/kernel/fadump.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 65562c4a0a69..35ae8c09af66 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1649,14 +1649,6 @@ int __init setup_fadump(void)
 		register_fadump();
 	}
 
-	/*
-	 * In case of panic, fadump is triggered via ppc_panic_event()
-	 * panic notifier. Setting crash_kexec_post_notifiers to 'true'
-	 * lets panic() function take crash friendly path before panic
-	 * notifiers are invoked.
-	 */
-	crash_kexec_post_notifiers = true;
-
 	return 1;
 }
 /*
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 28/30] panic: Unexport crash_kexec_post_notifiers
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (26 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 27/30] powerpc: " Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-27 22:49 ` [PATCH 29/30] powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic Guilherme G. Piccoli
  2022-04-27 22:49 ` [PATCH 30/30] um: Avoid duplicate call to kmsg_dump() Guilherme G. Piccoli
  29 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

There is no users anymore of this variable that requires
it to be "exported" in the headers; also, it was deprecated
by the kernel parameter "panic_notifiers_level".

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 include/linux/panic.h          | 2 --
 include/linux/panic_notifier.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/include/linux/panic.h b/include/linux/panic.h
index 34175d0188d0..d301db07a8af 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -34,8 +34,6 @@ extern int sysctl_panic_on_rcu_stall;
 extern int sysctl_max_rcu_stall_to_panic;
 extern int sysctl_panic_on_stackoverflow;
 
-extern bool crash_kexec_post_notifiers;
-
 /*
  * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
  * holds a CPU number which is executing panic() currently. A value of
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index b5041132321d..8fda7045e2f7 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -11,7 +11,6 @@ extern struct atomic_notifier_head panic_pre_reboot_list;
 extern struct atomic_notifier_head panic_post_reboot_list;
 
 bool panic_notifiers_before_kdump(void);
-extern bool crash_kexec_post_notifiers;
 
 enum panic_notifier_val {
 	PANIC_UNUSED,
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 29/30] powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (27 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 28/30] panic: Unexport crash_kexec_post_notifiers Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  2022-04-27 22:49 ` [PATCH 30/30] um: Avoid duplicate call to kmsg_dump() Guilherme G. Piccoli
  29 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Hari Bathini, Michael Ellerman,
	Nicholas Piggin, Paul Mackerras

Currently both pseries and ps3 are platforms that define special
panic notifiers that run as callbacks inside powerpc generic panic
notifier. In both cases kmsg_dump() is called, and the reason seems
to be that both of these callbacks aims to effectively stop the
machine, so nothing would execute after that - hence, both force
a series of console flushing related operations, after calling
the kmsg dumpers.

Happens that recently the panic path was refactored, and now
kmsg_dump() is *certainly* called before the pre_reboot panic
notifiers, category in which both pseries/ps3 callbacks belong.
In other words: kmsg_dump() will execute twice in both platforms,
on panic path.

This patch prevents that by disabling the kmsg_dump() for both
platform's notifiers. But worth to notice that PowerNV still
has a legit use for executing kmsg_dump() in its unrecoverable
error path, so we rely in parameter passing to differentiate
both cases. Also, since the pre_reboot notifiers still run
earlier than console flushing routines, we kept that for
both pseries and ps3 platforms, only skipping kmsg_dump().

Fixes: 35adacd6fc48 ("powerpc/pseries, ps3: panic flush kernel messages before halting system")
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment.

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/include/asm/bug.h         | 2 +-
 arch/powerpc/kernel/traps.c            | 6 ++++--
 arch/powerpc/platforms/powernv/opal.c  | 2 +-
 arch/powerpc/platforms/ps3/setup.c     | 2 +-
 arch/powerpc/platforms/pseries/setup.c | 2 +-
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index ecbae1832de3..49e5f6f86869 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -166,7 +166,7 @@ extern void die(const char *, struct pt_regs *, long);
 void die_mce(const char *str, struct pt_regs *regs, long err);
 extern bool die_will_crash(void);
 extern void panic_flush_kmsg_start(void);
-extern void panic_flush_kmsg_end(void);
+extern void panic_flush_kmsg_end(bool dump);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a08bb7cefdc5..837a5ed98d62 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -169,9 +169,11 @@ extern void panic_flush_kmsg_start(void)
 	bust_spinlocks(1);
 }
 
-extern void panic_flush_kmsg_end(void)
+extern void panic_flush_kmsg_end(bool dump)
 {
-	kmsg_dump(KMSG_DUMP_PANIC);
+	if (dump)
+		kmsg_dump(KMSG_DUMP_PANIC);
+
 	bust_spinlocks(0);
 	debug_locks_off();
 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 55a8fbfdb5b2..d172ceedece2 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -641,7 +641,7 @@ void __noreturn pnv_platform_error_reboot(struct pt_regs *regs, const char *msg)
 		show_regs(regs);
 	smp_send_stop();
 
-	panic_flush_kmsg_end();
+	panic_flush_kmsg_end(true);
 
 	/*
 	 * Don't bother to shut things down because this will
diff --git a/arch/powerpc/platforms/ps3/setup.c b/arch/powerpc/platforms/ps3/setup.c
index 3de9145c20bc..7cb78e508fb3 100644
--- a/arch/powerpc/platforms/ps3/setup.c
+++ b/arch/powerpc/platforms/ps3/setup.c
@@ -102,7 +102,7 @@ static void ps3_panic(char *str)
 	printk("   System does not reboot automatically.\n");
 	printk("   Please press POWER button.\n");
 	printk("\n");
-	panic_flush_kmsg_end();
+	panic_flush_kmsg_end(false);
 
 	while(1)
 		lv1_pause(1);
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 955ff8aa1644..d6eea473eafd 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -856,7 +856,7 @@ static void __init pSeries_setup_arch(void)
 
 static void pseries_panic(char *str)
 {
-	panic_flush_kmsg_end();
+	panic_flush_kmsg_end(false);
 	rtas_os_term(str);
 }
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 30/30] um: Avoid duplicate call to kmsg_dump()
  2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
                   ` (28 preceding siblings ...)
  2022-04-27 22:49 ` [PATCH 29/30] powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic Guilherme G. Piccoli
@ 2022-04-27 22:49 ` Guilherme G. Piccoli
  29 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-27 22:49 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	gpiccoli, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Anton Ivanov, Johannes Berg, Richard Weinberger

Currently the panic notifier panic_exit() calls kmsg_dump() and
some console flushing routines - this makes sense since such
panic notifier exits UserMode Linux and never returns.

Happens that after a panic refactor, kmsg_dump() is now always
called *before* the pre_reboot list of panic notifiers, in which
panic_exit() belongs, leading to a double call situation.

This patch changes that by removing such call from the panic
notifier, but leaving the console flushing calls since the
pre_reboot list still runs before console flushing on panic().

Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/um/kernel/um_arch.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index fc6e443299da..651310e3e86f 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -241,7 +241,6 @@ static void __init uml_postsetup(void)
 static int panic_exit(struct notifier_block *self, unsigned long unused1,
 		      void *unused2)
 {
-	kmsg_dump(KMSG_DUMP_PANIC);
 	bust_spinlocks(1);
 	bust_spinlocks(0);
 	uml_exitcode = 1;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/30] panic: Add the panic informational notifier list
  2022-04-27 22:49 ` [PATCH 20/30] panic: Add the panic informational " Guilherme G. Piccoli
@ 2022-04-27 23:49   ` Paul E. McKenney
  2022-04-28  8:14   ` Suzuki K Poulose
  2022-05-16 14:11   ` Petr Mladek
  2 siblings, 0 replies; 183+ messages in thread
From: Paul E. McKenney @ 2022-04-27 23:49 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Benjamin Herrenschmidt, Catalin Marinas,
	Florian Fainelli, Frederic Weisbecker, H. Peter Anvin,
	Hari Bathini, Joel Fernandes, Jonathan Hunter, Josh Triplett,
	Lai Jiangshan, Leo Yan, Mathieu Desnoyers, Mathieu Poirier,
	Michael Ellerman, Mike Leach, Mikko Perttunen, Neeraj Upadhyay,
	Nicholas Piggin, Paul Mackerras, Suzuki K Poulose,
	Thierry Reding, Thomas Bogendoerfer

On Wed, Apr 27, 2022 at 07:49:14PM -0300, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to
> register callbacks to run earlier in the panic path than they
> currently do. This aims at informational mechanisms, like dumping
> kernel offsets and showing device error data (in case it's simple
> registers reading, for example) as well as mechanisms to disable
> log flooding (like hung_task detector / RCU warnings) and the
> tracing dump_on_oops (when enabled).
> 
> Any (non-invasive) information that should be provided before
> kmsg_dump() as well as log flooding preventing code should fit
> here, as long it offers relatively low risk for kdump.
> 
> For now, the patch is almost a no-op, although it changes a bit
> the ordering in which some panic notifiers are executed - specially
> affected by this are the notifiers responsible for disabling the
> hung_task detector / RCU warnings, which now run first. In a
> subsequent patch, the panic path will be refactored, then the
> panic informational notifiers will effectively run earlier,
> before ksmg_dump() (and usually before kdump as well).
> 
> We also defer documenting it all properly in the subsequent
> refactor patch. Finally, while at it, we removed some useless
> header inclusions too.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Jonathan Hunter <jonathanh@nvidia.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mikko Perttunen <mperttunen@nvidia.com>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Thierry Reding <thierry.reding@gmail.com>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

From an RCU perspective:

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  arch/arm64/kernel/setup.c                         | 2 +-
>  arch/mips/kernel/relocate.c                       | 2 +-
>  arch/powerpc/kernel/setup-common.c                | 2 +-
>  arch/x86/kernel/setup.c                           | 2 +-
>  drivers/bus/brcmstb_gisb.c                        | 2 +-
>  drivers/hwtracing/coresight/coresight-cpu-debug.c | 4 ++--
>  drivers/soc/tegra/ari-tegra186.c                  | 3 ++-
>  include/linux/panic_notifier.h                    | 1 +
>  kernel/hung_task.c                                | 3 ++-
>  kernel/panic.c                                    | 4 ++++
>  kernel/rcu/tree.c                                 | 1 -
>  kernel/rcu/tree_stall.h                           | 3 ++-
>  kernel/trace/trace.c                              | 2 +-
>  13 files changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 3505789cf4bd..ac2c7e8c9c6a 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -444,7 +444,7 @@ static struct notifier_block arm64_panic_block = {
>  
>  static int __init register_arm64_panic_block(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_info_list,
>  				       &arm64_panic_block);
>  	return 0;
>  }
> diff --git a/arch/mips/kernel/relocate.c b/arch/mips/kernel/relocate.c
> index 56b51de2dc51..650811f2436a 100644
> --- a/arch/mips/kernel/relocate.c
> +++ b/arch/mips/kernel/relocate.c
> @@ -459,7 +459,7 @@ static struct notifier_block kernel_location_notifier = {
>  
>  static int __init register_kernel_offset_dumper(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_info_list,
>  				       &kernel_location_notifier);
>  	return 0;
>  }
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 1468c3937bf4..d04b8bf8dbc7 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -757,7 +757,7 @@ void __init setup_panic(void)
>  				       &ppc_fadump_block);
>  
>  	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_info_list,
>  					       &kernel_offset_notifier);
>  
>  	/* Low-level platform-specific routines that should run on panic */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index c95b9ac5a457..599b25346964 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1266,7 +1266,7 @@ static struct notifier_block kernel_offset_notifier = {
>  
>  static int __init register_kernel_offset_dumper(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_info_list,
>  					&kernel_offset_notifier);
>  	return 0;
>  }
> diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
> index 1ea7b015e225..c64e087fba7a 100644
> --- a/drivers/bus/brcmstb_gisb.c
> +++ b/drivers/bus/brcmstb_gisb.c
> @@ -486,7 +486,7 @@ static int __init brcmstb_gisb_arb_probe(struct platform_device *pdev)
>  
>  	if (list_is_singular(&brcmstb_gisb_arb_device_list)) {
>  		register_die_notifier(&gisb_die_notifier);
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_info_list,
>  					       &gisb_panic_notifier);
>  	}
>  
> diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> index 1874df7c6a73..7b1012454525 100644
> --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
> +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> @@ -535,7 +535,7 @@ static int debug_func_init(void)
>  			    &debug_func_knob_fops);
>  
>  	/* Register function to be called for panic */
> -	ret = atomic_notifier_chain_register(&panic_notifier_list,
> +	ret = atomic_notifier_chain_register(&panic_info_list,
>  					     &debug_notifier);
>  	if (ret) {
>  		pr_err("%s: unable to register notifier: %d\n",
> @@ -552,7 +552,7 @@ static int debug_func_init(void)
>  
>  static void debug_func_exit(void)
>  {
> -	atomic_notifier_chain_unregister(&panic_notifier_list,
> +	atomic_notifier_chain_unregister(&panic_info_list,
>  					 &debug_notifier);
>  	debugfs_remove_recursive(debug_debugfs_dir);
>  }
> diff --git a/drivers/soc/tegra/ari-tegra186.c b/drivers/soc/tegra/ari-tegra186.c
> index 02577853ec49..4ef05ebed739 100644
> --- a/drivers/soc/tegra/ari-tegra186.c
> +++ b/drivers/soc/tegra/ari-tegra186.c
> @@ -73,7 +73,8 @@ static struct notifier_block tegra186_ari_panic_nb = {
>  static int __init tegra186_ari_init(void)
>  {
>  	if (of_machine_is_compatible("nvidia,tegra186"))
> -		atomic_notifier_chain_register(&panic_notifier_list, &tegra186_ari_panic_nb);
> +		atomic_notifier_chain_register(&panic_info_list,
> +					       &tegra186_ari_panic_nb);
>  
>  	return 0;
>  }
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index 0bb9dc0dea04..7364a346bcb0 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -7,6 +7,7 @@
>  
>  extern struct atomic_notifier_head panic_notifier_list;
>  extern struct atomic_notifier_head panic_hypervisor_list;
> +extern struct atomic_notifier_head panic_info_list;
>  
>  extern bool crash_kexec_post_notifiers;
>  
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 52501e5f7655..1b2d7111d5ac 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -85,6 +85,7 @@ hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
>  
>  static struct notifier_block panic_block = {
>  	.notifier_call = hung_task_panic,
> +	.priority = INT_MAX, /* run early to prevent potential log flood */
>  };
>  
>  static void check_hung_task(struct task_struct *t, unsigned long timeout)
> @@ -378,7 +379,7 @@ static int watchdog(void *dummy)
>  
>  static int __init hung_task_init(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> +	atomic_notifier_chain_register(&panic_info_list, &panic_block);
>  
>  	/* Disable hung task detector on suspend */
>  	pm_notifier(hungtask_pm_notify, 0);
> diff --git a/kernel/panic.c b/kernel/panic.c
> index ef76f3f9c44d..73ca1bc44e30 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -76,6 +76,9 @@ EXPORT_SYMBOL(panic_notifier_list);
>  ATOMIC_NOTIFIER_HEAD(panic_hypervisor_list);
>  EXPORT_SYMBOL(panic_hypervisor_list);
>  
> +ATOMIC_NOTIFIER_HEAD(panic_info_list);
> +EXPORT_SYMBOL(panic_info_list);
> +
>  static long no_blink(int state)
>  {
>  	return 0;
> @@ -291,6 +294,7 @@ void panic(const char *fmt, ...)
>  	 * add information to the kmsg dump output.
>  	 */
>  	atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
> +	atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
>  	atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);
>  
>  	panic_print_sys_info(false);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a4b8189455d5..d5a2674ae81c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -35,7 +35,6 @@
>  #include <linux/panic.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/percpu.h>
> -#include <linux/notifier.h>
>  #include <linux/cpu.h>
>  #include <linux/mutex.h>
>  #include <linux/time.h>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 0c5d8516516a..d8a5840aad5d 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -97,11 +97,12 @@ static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
>  
>  static struct notifier_block rcu_panic_block = {
>  	.notifier_call = rcu_panic,
> +	.priority = INT_MAX, /* run early to prevent potential log flood */
>  };
>  
>  static int __init check_cpu_stall_init(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list, &rcu_panic_block);
> +	atomic_notifier_chain_register(&panic_info_list, &rcu_panic_block);
>  	return 0;
>  }
>  early_initcall(check_cpu_stall_init);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index c1d8a3622ccc..7d02f7a66bb1 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -10138,7 +10138,7 @@ __init static int tracer_alloc_buffers(void)
>  	/* All seems OK, enable tracing */
>  	tracing_disabled = 0;
>  
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_info_list,
>  				       &trace_panic_notifier);
>  
>  	register_die_notifier(&trace_die_notifier);
> -- 
> 2.36.0
> 

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-04-27 22:49 ` [PATCH 24/30] panic: Refactor the panic path Guilherme G. Piccoli
@ 2022-04-28  0:28   ` Randy Dunlap
  2022-04-29 16:04     ` Guilherme G. Piccoli
  2022-04-29 17:53   ` Michael Kelley (LINUX)
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 183+ messages in thread
From: Randy Dunlap @ 2022-04-28  0:28 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will



On 4/27/22 15:49, Guilherme G. Piccoli wrote:
> +	crash_kexec_post_notifiers
> +			This was DEPRECATED - users should always prefer the

			This is DEPRECATED - users should always prefer the

> +			parameter "panic_notifiers_level" - check its entry
> +			in this documentation for details on how it works.
> +			Setting this parameter is exactly the same as setting
> +			"panic_notifiers_level=4".

-- 
~Randy

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  2022-04-27 22:49 ` [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set Guilherme G. Piccoli
@ 2022-04-28  1:01   ` Xiaoming Ni
  2022-04-29 19:38     ` Guilherme G. Piccoli
  2022-05-10 17:29     ` Steven Rostedt
  2022-04-29 16:27   ` Michael Kelley (LINUX)
  1 sibling, 2 replies; 183+ messages in thread
From: Xiaoming Ni @ 2022-04-28  1:01 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Arjan van de Ven, Cong Wang, Sebastian Andrzej Siewior,
	Valentin Schneider

On 2022/4/28 6:49, Guilherme G. Piccoli wrote:
> Currently we have a debug infrastructure in the notifiers file, but
> it's very simple/limited. This patch extends it by:
> 
> (a) Showing all registered/unregistered notifiers' callback names;
> 
> (b) Adding a dynamic debug tuning to allow showing called notifiers'
> function names. Notice that this should be guarded as a tunable since
> it can flood the kernel log buffer.
> 
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Xiaoming Ni <nixiaoming@huawei.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> We have some design decisions that worth discussing here:
> 
> (a) First of call, using C99 helps a lot to write clear and concise code, but
> due to commit 4d94f910e79a ("Kbuild: use -Wdeclaration-after-statement") we
> have a warning if mixing variable declarations with code. For this patch though,
> doing that makes the code way clear, so decision was to add the debug code
> inside brackets whenever this warning pops up. We can change that, but that'll
> cause more ifdefs in the same function.
> 
> (b) In the symbol lookup helper function, we modify the parameter passed but
> even more, we return it as well! This is unusual and seems unnecessary, but was
> the strategy taken to allow embedding such function in the pr_debug() call.
> 
> Not doing that would likely requiring 3 symbol_name variables to avoid
> concurrency (registering notifier A while calling notifier B) - we rely in
> local variables as a serialization mechanism.
> 
> We're open for suggestions in case this design is not appropriate;
> thanks in advance!
> 
>   kernel/notifier.c | 48 +++++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index ba005ebf4730..21032ebcde57 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -7,6 +7,22 @@
>   #include <linux/vmalloc.h>
>   #include <linux/reboot.h>
>   
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> +#include <linux/kallsyms.h>
> +
> +/*
> + *	Helper to get symbol names in case DEBUG_NOTIFIERS is set.
> + *	Return the modified parameter is a strategy used to achieve
> + *	the pr_debug() functionality - with this, function is only
> + *	executed if the dynamic debug tuning is effectively set.
> + */
> +static inline char *notifier_name(struct notifier_block *nb, char *sym_name)
> +{
> +	lookup_symbol_name((unsigned long)(nb->notifier_call), sym_name);
> +	return sym_name;
> +}
> +#endif
> +
>   /*
>    *	Notifier list for kernel code which wants to be called
>    *	at shutdown. This is used to stop any idling DMA operations
> @@ -34,20 +50,41 @@ static int notifier_chain_register(struct notifier_block **nl,
>   	}
>   	n->next = *nl;
>   	rcu_assign_pointer(*nl, n);
> +
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> +	{
> +		char sym_name[KSYM_NAME_LEN];
> +
> +		pr_info("notifiers: registered %s()\n",
> +			notifier_name(n, sym_name));
> +	}

Duplicate Code.

Is it better to use __func__ and %pS?

pr_info("%s: %pS\n", __func__, n->notifier_call);


> +#endif
>   	return 0;
>   }
>   
>   static int notifier_chain_unregister(struct notifier_block **nl,
>   		struct notifier_block *n)
>   {
> +	int ret = -ENOENT;
> +
>   	while ((*nl) != NULL) {
>   		if ((*nl) == n) {
>   			rcu_assign_pointer(*nl, n->next);
> -			return 0;
> +			ret = 0;
> +			break;
>   		}
>   		nl = &((*nl)->next);
>   	}
> -	return -ENOENT;
> +
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> +	if (!ret) {
> +		char sym_name[KSYM_NAME_LEN];
> +
> +		pr_info("notifiers: unregistered %s()\n",
> +			notifier_name(n, sym_name));
> +	}
Duplicate Code.

Is it better to use __func__ and %pS?

pr_info("%s: %pS\n", __func__, n->notifier_call);
> +#endif
> +	return ret;
>   }
>   
>   /**
> @@ -80,6 +117,13 @@ static int notifier_call_chain(struct notifier_block **nl,
>   			nb = next_nb;
>   			continue;
>   		}
> +
Is the "#ifdef" missing here?
> +		{
> +			char sym_name[KSYM_NAME_LEN];
> +
> +			pr_debug("notifiers: calling %s()\n",
> +				 notifier_name(nb, sym_name));
Duplicate Code.

Is it better to use __func__ and %pS?

pr_info("%s: %pS\n", __func__, n->notifier_call);
> +		}
>   #endif
>   		ret = nb->notifier_call(nb, val, v);
>   
> 

Thanks
Xiaoming Ni

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
  2022-04-27 22:49 ` [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier Guilherme G. Piccoli
@ 2022-04-28  8:11   ` Suzuki K Poulose
  2022-04-29 14:01     ` Guilherme G. Piccoli
  2022-05-09 13:09     ` Guilherme G. Piccoli
  0 siblings, 2 replies; 183+ messages in thread
From: Suzuki K Poulose @ 2022-04-28  8:11 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Leo Yan, Mathieu Poirier, Mike Leach

Hi Guilherme,

On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
> The panic notifier infrastructure executes registered callbacks when
> a panic event happens - such callbacks are executed in atomic context,
> with interrupts and preemption disabled in the running CPU and all other
> CPUs disabled. That said, mutexes in such context are not a good idea.
> 
> This patch replaces a regular mutex with a mutex_trylock safer approach;
> given the nature of the mutex used in the driver, it should be pretty
> uncommon being unable to acquire such mutex in the panic path, hence
> no functional change should be observed (and if it is, that would be
> likely a deadlock with the regular mutex).
> 
> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

How would you like to proceed with queuing this ? I am happy
either way. In case you plan to push this as part of this
series (I don't see any potential conflicts) :

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/30] panic: Add the panic informational notifier list
  2022-04-27 22:49 ` [PATCH 20/30] panic: Add the panic informational " Guilherme G. Piccoli
  2022-04-27 23:49   ` Paul E. McKenney
@ 2022-04-28  8:14   ` Suzuki K Poulose
  2022-04-29 14:50     ` Guilherme G. Piccoli
  2022-05-16 14:11   ` Petr Mladek
  2 siblings, 1 reply; 183+ messages in thread
From: Suzuki K Poulose @ 2022-04-28  8:14 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Catalin Marinas, Florian Fainelli,
	Frederic Weisbecker, H. Peter Anvin, Hari Bathini,
	Joel Fernandes, Jonathan Hunter, Josh Triplett, Lai Jiangshan,
	Leo Yan, Mathieu Desnoyers, Mathieu Poirier, Michael Ellerman,
	Mike Leach, Mikko Perttunen, Neeraj Upadhyay, Nicholas Piggin,
	Paul Mackerras, Thierry Reding, Thomas Bogendoerfer

On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to
> register callbacks to run earlier in the panic path than they
> currently do. This aims at informational mechanisms, like dumping
> kernel offsets and showing device error data (in case it's simple
> registers reading, for example) as well as mechanisms to disable
> log flooding (like hung_task detector / RCU warnings) and the
> tracing dump_on_oops (when enabled).
> 
> Any (non-invasive) information that should be provided before
> kmsg_dump() as well as log flooding preventing code should fit
> here, as long it offers relatively low risk for kdump.
> 
> For now, the patch is almost a no-op, although it changes a bit
> the ordering in which some panic notifiers are executed - specially
> affected by this are the notifiers responsible for disabling the
> hung_task detector / RCU warnings, which now run first. In a
> subsequent patch, the panic path will be refactored, then the
> panic informational notifiers will effectively run earlier,
> before ksmg_dump() (and usually before kdump as well).
> 
> We also defer documenting it all properly in the subsequent
> refactor patch. Finally, while at it, we removed some useless
> header inclusions too.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Jonathan Hunter <jonathanh@nvidia.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mikko Perttunen <mperttunen@nvidia.com>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Thierry Reding <thierry.reding@gmail.com>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>   arch/arm64/kernel/setup.c                         | 2 +-
>   arch/mips/kernel/relocate.c                       | 2 +-
>   arch/powerpc/kernel/setup-common.c                | 2 +-
>   arch/x86/kernel/setup.c                           | 2 +-
>   drivers/bus/brcmstb_gisb.c                        | 2 +-
>   drivers/hwtracing/coresight/coresight-cpu-debug.c | 4 ++--
>   drivers/soc/tegra/ari-tegra186.c                  | 3 ++-
>   include/linux/panic_notifier.h                    | 1 +
>   kernel/hung_task.c                                | 3 ++-
>   kernel/panic.c                                    | 4 ++++
>   kernel/rcu/tree.c                                 | 1 -
>   kernel/rcu/tree_stall.h                           | 3 ++-
>   kernel/trace/trace.c                              | 2 +-
>   13 files changed, 19 insertions(+), 12 deletions(-)
> 

...

> diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> index 1874df7c6a73..7b1012454525 100644
> --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
> +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> @@ -535,7 +535,7 @@ static int debug_func_init(void)
>   			    &debug_func_knob_fops);
>   
>   	/* Register function to be called for panic */
> -	ret = atomic_notifier_chain_register(&panic_notifier_list,
> +	ret = atomic_notifier_chain_register(&panic_info_list,
>   					     &debug_notifier);
>   	if (ret) {
>   		pr_err("%s: unable to register notifier: %d\n",
> @@ -552,7 +552,7 @@ static int debug_func_init(void)
>   
>   static void debug_func_exit(void)
>   {
> -	atomic_notifier_chain_unregister(&panic_notifier_list,
> +	atomic_notifier_chain_unregister(&panic_info_list,
>   					 &debug_notifier);
>   	debugfs_remove_recursive(debug_debugfs_dir);
>   }

Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering
  2022-04-27 22:49 ` [PATCH 11/30] um: Improve panic notifiers consistency and ordering Guilherme G. Piccoli
@ 2022-04-28  8:30   ` Johannes Berg
  2022-04-29 15:46     ` Guilherme G. Piccoli
  2022-05-10 14:28   ` Petr Mladek
  1 sibling, 1 reply; 183+ messages in thread
From: Johannes Berg @ 2022-04-28  8:30 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-kernel, linux-um@lists.infradead.orgAnton Ivanov,
	Richard Weinberger

[trimming massive CC list]

On Wed, 2022-04-27 at 19:49 -0300, Guilherme G. Piccoli wrote:
> 
> Also, we remove a useless header inclusion.

I wouldn't say it's useless, generally we try not to rely on implicit
includes so much? And you at least now use NOTIFY_DONE from it.

Otherwise looks fine to me.

johannes

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-04-27 22:49 ` [PATCH 21/30] panic: Introduce the panic pre-reboot " Guilherme G. Piccoli
@ 2022-04-28 14:13   ` Alex Elder
  2022-04-28 16:26   ` Corey Minyard
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 183+ messages in thread
From: Alex Elder @ 2022-04-28 14:13 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alex Elder, Alexander Gordeev, Anton Ivanov,
	Benjamin Herrenschmidt, Bjorn Andersson, Boris Ostrovsky,
	Chris Zankel, Christian Borntraeger, Corey Minyard, Dexuan Cui,
	H. Peter Anvin, Haiyang Zhang, Heiko Carstens, Helge Deller,
	Ivan Kokshaysky, James E.J. Bottomley, James Morse,
	Johannes Berg, K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Henderson,
	Richard Weinberger, Robert Richter, Stefano Stabellini,
	Stephen Hemminger, Sven Schnelle, Tony Luck, Vasily Gorbik,
	Wei Liu

On 4/27/22 5:49 PM, Guilherme G. Piccoli wrote:
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
> 
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.
> 
> Cc: Alex Elder <elder@kernel.org>

For "drivers/net/ipa/ipa_smp2p.c":

Acked-by: Alex Elder <elder@kernel.org>

> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Chris Zankel <chris@zankel.net>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Corey Minyard <minyard@acm.org>
> Cc: Dexuan Cui <decui@microsoft.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Matt Turner <mattst88@gmail.com>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Robert Richter <rric@kernel.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 

. . .

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-04-27 22:49 ` [PATCH 21/30] panic: Introduce the panic pre-reboot " Guilherme G. Piccoli
  2022-04-28 14:13   ` Alex Elder
@ 2022-04-28 16:26   ` Corey Minyard
  2022-04-29 15:18     ` Guilherme G. Piccoli
  2022-04-29 16:04   ` Max Filippov
  2022-05-16 14:33   ` Petr Mladek
  3 siblings, 1 reply; 183+ messages in thread
From: Corey Minyard @ 2022-04-28 16:26 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger, Dexuan Cui,
	H. Peter Anvin, Haiyang Zhang, Heiko Carstens, Helge Deller,
	Ivan Kokshaysky, James E.J. Bottomley, James Morse,
	Johannes Berg, K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Henderson,
	Richard Weinberger, Robert Richter, Stefano Stabellini,
	Stephen Hemminger, Sven Schnelle, Tony Luck, Vasily Gorbik,
	Wei Liu

On Wed, Apr 27, 2022 at 07:49:15PM -0300, Guilherme G. Piccoli wrote:
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
> 
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.

For the IPMI portion:

Acked-by: Corey Minyard <cminyard@mvista.com>

Note that the IPMI panic_event() should always return, but it may take
some time, especially if the IPMI controller is no longer functional.
So the risk of a long delay is there and it makes sense to move it very
late.

-corey

> 
> Cc: Alex Elder <elder@kernel.org>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Chris Zankel <chris@zankel.net>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Corey Minyard <minyard@acm.org>
> Cc: Dexuan Cui <decui@microsoft.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Matt Turner <mattst88@gmail.com>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Robert Richter <rric@kernel.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> Notice that, with this name change, out-of-tree code that relies in the global
> exported "panic_notifier_list" will fail to build. We could easily keep the
> retro-compatibility by making the old symbol to still exist and point to the
> pre_reboot list (or even, keep the old naming).
> 
> But our design choice was to allow the breakage, making users rethink their
> notifiers, adding them in the list that fits best. If that wasn't a good
> decision, we're open to change it, of course.
> Thanks in advance for the review!
> 
>  arch/alpha/kernel/setup.c             |  4 ++--
>  arch/parisc/kernel/pdc_chassis.c      |  3 +--
>  arch/powerpc/kernel/setup-common.c    |  2 +-
>  arch/s390/kernel/ipl.c                |  4 ++--
>  arch/um/drivers/mconsole_kern.c       |  2 +-
>  arch/um/kernel/um_arch.c              |  2 +-
>  arch/x86/xen/enlighten.c              |  2 +-
>  arch/xtensa/platforms/iss/setup.c     |  4 ++--
>  drivers/char/ipmi/ipmi_msghandler.c   | 12 +++++++-----
>  drivers/edac/altera_edac.c            |  3 +--
>  drivers/hv/vmbus_drv.c                |  4 ++--
>  drivers/leds/trigger/ledtrig-panic.c  |  3 +--
>  drivers/misc/ibmasm/heartbeat.c       | 16 +++++++++-------
>  drivers/net/ipa/ipa_smp2p.c           |  5 ++---
>  drivers/parisc/power.c                |  4 ++--
>  drivers/remoteproc/remoteproc_core.c  |  6 ++++--
>  drivers/s390/char/con3215.c           |  2 +-
>  drivers/s390/char/con3270.c           |  2 +-
>  drivers/s390/char/sclp_con.c          |  2 +-
>  drivers/s390/char/sclp_vt220.c        |  2 +-
>  drivers/staging/olpc_dcon/olpc_dcon.c |  6 ++++--
>  drivers/video/fbdev/hyperv_fb.c       |  4 ++--
>  include/linux/panic_notifier.h        |  2 +-
>  kernel/panic.c                        |  9 ++++-----
>  24 files changed, 54 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
> index d88bdf852753..8ace0d7113b6 100644
> --- a/arch/alpha/kernel/setup.c
> +++ b/arch/alpha/kernel/setup.c
> @@ -472,8 +472,8 @@ setup_arch(char **cmdline_p)
>  	}
>  
>  	/* Register a call for panic conditions. */
> -	atomic_notifier_chain_register(&panic_notifier_list,
> -			&alpha_panic_block);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
> +					&alpha_panic_block);
>  
>  #ifndef alpha_using_srm
>  	/* Assume that we've booted from SRM if we haven't booted from MILO.
> diff --git a/arch/parisc/kernel/pdc_chassis.c b/arch/parisc/kernel/pdc_chassis.c
> index da154406d368..0fd8d87fb4f9 100644
> --- a/arch/parisc/kernel/pdc_chassis.c
> +++ b/arch/parisc/kernel/pdc_chassis.c
> @@ -22,7 +22,6 @@
>  #include <linux/kernel.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/reboot.h>
> -#include <linux/notifier.h>
>  #include <linux/cache.h>
>  #include <linux/proc_fs.h>
>  #include <linux/seq_file.h>
> @@ -135,7 +134,7 @@ void __init parisc_pdc_chassis_init(void)
>  				PDC_CHASSIS_VER);
>  
>  		/* initialize panic notifier chain */
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_pre_reboot_list,
>  				&pdc_chassis_panic_block);
>  
>  		/* initialize reboot notifier chain */
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index d04b8bf8dbc7..3518bebc10ad 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -762,7 +762,7 @@ void __init setup_panic(void)
>  
>  	/* Low-level platform-specific routines that should run on panic */
>  	if (ppc_md.panic)
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_pre_reboot_list,
>  					       &ppc_panic_block);
>  }
>  
> diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
> index 1cc85b8ff42e..4a88c5bb6e15 100644
> --- a/arch/s390/kernel/ipl.c
> +++ b/arch/s390/kernel/ipl.c
> @@ -2034,7 +2034,7 @@ static int on_panic_notify(struct notifier_block *self,
>  			   unsigned long event, void *data)
>  {
>  	do_panic();
> -	return NOTIFY_OK;
> +	return NOTIFY_DONE;
>  }
>  
>  static struct notifier_block on_panic_nb = {
> @@ -2069,7 +2069,7 @@ void __init setup_ipl(void)
>  		/* We have no info to copy */
>  		break;
>  	}
> -	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
>  }
>  
>  void s390_reset_system(void)
> diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
> index 2ea0421bcc3f..21c13b4e24a3 100644
> --- a/arch/um/drivers/mconsole_kern.c
> +++ b/arch/um/drivers/mconsole_kern.c
> @@ -855,7 +855,7 @@ static struct notifier_block panic_exit_notifier = {
>  
>  static int add_notifier(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>  			&panic_exit_notifier);
>  	return 0;
>  }
> diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
> index 4485b1a7c8e4..fc6e443299da 100644
> --- a/arch/um/kernel/um_arch.c
> +++ b/arch/um/kernel/um_arch.c
> @@ -257,7 +257,7 @@ static struct notifier_block panic_exit_notifier = {
>  
>  void uml_finishsetup(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>  				       &panic_exit_notifier);
>  
>  	uml_postsetup();
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 30c6e986a6cd..d4f4de239a21 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -290,7 +290,7 @@ static struct notifier_block xen_panic_block = {
>  
>  int xen_panic_handler_init(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list, &xen_panic_block);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list, &xen_panic_block);
>  	return 0;
>  }
>  
> diff --git a/arch/xtensa/platforms/iss/setup.c b/arch/xtensa/platforms/iss/setup.c
> index d3433e1bb94e..eeeeb6cff6bd 100644
> --- a/arch/xtensa/platforms/iss/setup.c
> +++ b/arch/xtensa/platforms/iss/setup.c
> @@ -13,7 +13,6 @@
>   */
>  #include <linux/init.h>
>  #include <linux/kernel.h>
> -#include <linux/notifier.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/printk.h>
>  #include <linux/string.h>
> @@ -53,6 +52,7 @@ iss_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
>  
>  static struct notifier_block iss_panic_block = {
>  	.notifier_call = iss_panic_event,
> +	.priority = INT_MIN, /* run as late as possible, may not return */
>  };
>  
>  void __init platform_setup(char **p_cmdline)
> @@ -81,5 +81,5 @@ void __init platform_setup(char **p_cmdline)
>  		}
>  	}
>  
> -	atomic_notifier_chain_register(&panic_notifier_list, &iss_panic_block);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list, &iss_panic_block);
>  }
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index c59265146e9c..6c4770949c01 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -25,7 +25,6 @@
>  #include <linux/slab.h>
>  #include <linux/ipmi.h>
>  #include <linux/ipmi_smi.h>
> -#include <linux/notifier.h>
>  #include <linux/init.h>
>  #include <linux/proc_fs.h>
>  #include <linux/rcupdate.h>
> @@ -5375,10 +5374,13 @@ static int ipmi_register_driver(void)
>  	return rv;
>  }
>  
> +/*
> + * we should execute this panic callback late, since it involves
> + * a complex call-chain and panic() runs in atomic context.
> + */
>  static struct notifier_block panic_block = {
>  	.notifier_call	= panic_event,
> -	.next		= NULL,
> -	.priority	= 200	/* priority: INT_MAX >= x >= 0 */
> +	.priority	= INT_MIN + 1,
>  };
>  
>  static int ipmi_init_msghandler(void)
> @@ -5406,7 +5408,7 @@ static int ipmi_init_msghandler(void)
>  	timer_setup(&ipmi_timer, ipmi_timeout, 0);
>  	mod_timer(&ipmi_timer, jiffies + IPMI_TIMEOUT_JIFFIES);
>  
> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list, &panic_block);
>  
>  	initialized = true;
>  
> @@ -5438,7 +5440,7 @@ static void __exit cleanup_ipmi(void)
>  	if (initialized) {
>  		destroy_workqueue(remove_work_wq);
>  
> -		atomic_notifier_chain_unregister(&panic_notifier_list,
> +		atomic_notifier_chain_unregister(&panic_pre_reboot_list,
>  						 &panic_block);
>  
>  		/*
> diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
> index e7e8e624a436..4890e9cba6fb 100644
> --- a/drivers/edac/altera_edac.c
> +++ b/drivers/edac/altera_edac.c
> @@ -16,7 +16,6 @@
>  #include <linux/kernel.h>
>  #include <linux/mfd/altera-sysmgr.h>
>  #include <linux/mfd/syscon.h>
> -#include <linux/notifier.h>
>  #include <linux/of_address.h>
>  #include <linux/of_irq.h>
>  #include <linux/of_platform.h>
> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
>  		int dberror, err_addr;
>  
>  		edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_pre_reboot_list,
>  					       &edac->panic_notifier);
>  
>  		/* Printout a message if uncorrectable error previously. */
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 901b97034308..3717c323aa36 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1622,7 +1622,7 @@ static int vmbus_bus_init(void)
>  	 * Always register the vmbus unload panic notifier because we
>  	 * need to shut the VMbus channel connection on panic.
>  	 */
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>  			       &hyperv_panic_vmbus_unload_block);
>  
>  	vmbus_request_offers();
> @@ -2851,7 +2851,7 @@ static void __exit vmbus_exit(void)
>  	 * The vmbus panic notifier is always registered, hence we should
>  	 * also unconditionally unregister it here as well.
>  	 */
> -	atomic_notifier_chain_unregister(&panic_notifier_list,
> +	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
>  					&hyperv_panic_vmbus_unload_block);
>  
>  	free_page((unsigned long)hv_panic_page);
> diff --git a/drivers/leds/trigger/ledtrig-panic.c b/drivers/leds/trigger/ledtrig-panic.c
> index 64abf2e91608..34fd5170723f 100644
> --- a/drivers/leds/trigger/ledtrig-panic.c
> +++ b/drivers/leds/trigger/ledtrig-panic.c
> @@ -7,7 +7,6 @@
>  
>  #include <linux/kernel.h>
>  #include <linux/init.h>
> -#include <linux/notifier.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/leds.h>
>  #include "../leds.h"
> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>  
>  static int __init ledtrig_panic_init(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>  				       &led_trigger_panic_nb);
>  
>  	led_trigger_register_simple("panic", &trigger);
> diff --git a/drivers/misc/ibmasm/heartbeat.c b/drivers/misc/ibmasm/heartbeat.c
> index 59c9a0d95659..d6acae88b722 100644
> --- a/drivers/misc/ibmasm/heartbeat.c
> +++ b/drivers/misc/ibmasm/heartbeat.c
> @@ -8,7 +8,6 @@
>   * Author: Max Asböck <amax@us.ibm.com>
>   */
>  
> -#include <linux/notifier.h>
>  #include <linux/panic_notifier.h>
>  #include "ibmasm.h"
>  #include "dot_command.h"
> @@ -24,7 +23,7 @@ static int suspend_heartbeats = 0;
>   * In the case of a panic the interrupt handler continues to work and thus
>   * continues to respond to heartbeats, making the service processor believe
>   * the OS is still running and thus preventing a reboot.
> - * To prevent this from happening a callback is added the panic_notifier_list.
> + * To prevent this from happening a callback is added in a panic notifier list.
>   * Before responding to a heartbeat the driver checks if a panic has happened,
>   * if yes it suspends heartbeat, causing the service processor to reboot as
>   * expected.
> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
>  static int panic_happened(struct notifier_block *n, unsigned long val, void *v)
>  {
>  	suspend_heartbeats = 1;
> -	return 0;
> +	return NOTIFY_DONE;
>  }
>  
> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
> +static struct notifier_block panic_notifier = {
> +	.notifier_call = panic_happened,
> +};
>  
>  void ibmasm_register_panic_notifier(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
> +					&panic_notifier);
>  }
>  
>  void ibmasm_unregister_panic_notifier(void)
>  {
> -	atomic_notifier_chain_unregister(&panic_notifier_list,
> -			&panic_notifier);
> +	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
> +					&panic_notifier);
>  }
>  
>  
> diff --git a/drivers/net/ipa/ipa_smp2p.c b/drivers/net/ipa/ipa_smp2p.c
> index 211233612039..92cdf6e0637c 100644
> --- a/drivers/net/ipa/ipa_smp2p.c
> +++ b/drivers/net/ipa/ipa_smp2p.c
> @@ -7,7 +7,6 @@
>  #include <linux/types.h>
>  #include <linux/device.h>
>  #include <linux/interrupt.h>
> -#include <linux/notifier.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/soc/qcom/smem.h>
> @@ -138,13 +137,13 @@ static int ipa_smp2p_panic_notifier_register(struct ipa_smp2p *smp2p)
>  	smp2p->panic_notifier.notifier_call = ipa_smp2p_panic_notifier;
>  	smp2p->panic_notifier.priority = INT_MAX;	/* Do it early */
>  
> -	return atomic_notifier_chain_register(&panic_notifier_list,
> +	return atomic_notifier_chain_register(&panic_pre_reboot_list,
>  					      &smp2p->panic_notifier);
>  }
>  
>  static void ipa_smp2p_panic_notifier_unregister(struct ipa_smp2p *smp2p)
>  {
> -	atomic_notifier_chain_unregister(&panic_notifier_list,
> +	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
>  					 &smp2p->panic_notifier);
>  }
>  
> diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
> index 8512884de2cf..5bb0868f5f08 100644
> --- a/drivers/parisc/power.c
> +++ b/drivers/parisc/power.c
> @@ -233,7 +233,7 @@ static int __init power_init(void)
>  	}
>  
>  	/* Register a call for panic conditions. */
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>  			&parisc_panic_block);
>  
>  	return 0;
> @@ -243,7 +243,7 @@ static void __exit power_exit(void)
>  {
>  	kthread_stop(power_task);
>  
> -	atomic_notifier_chain_unregister(&panic_notifier_list,
> +	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
>  			&parisc_panic_block);
>  
>  	pdc_soft_power_button(0);
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index c510125769b9..24799ff239e6 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -2795,12 +2795,14 @@ static int rproc_panic_handler(struct notifier_block *nb, unsigned long event,
>  static void __init rproc_init_panic(void)
>  {
>  	rproc_panic_nb.notifier_call = rproc_panic_handler;
> -	atomic_notifier_chain_register(&panic_notifier_list, &rproc_panic_nb);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
> +				       &rproc_panic_nb);
>  }
>  
>  static void __exit rproc_exit_panic(void)
>  {
> -	atomic_notifier_chain_unregister(&panic_notifier_list, &rproc_panic_nb);
> +	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
> +					 &rproc_panic_nb);
>  }
>  
>  static int __init remoteproc_init(void)
> diff --git a/drivers/s390/char/con3215.c b/drivers/s390/char/con3215.c
> index 192198bd3dc4..07379dd3f1f3 100644
> --- a/drivers/s390/char/con3215.c
> +++ b/drivers/s390/char/con3215.c
> @@ -867,7 +867,7 @@ static int __init con3215_init(void)
>  		raw3215[0] = NULL;
>  		return -ENODEV;
>  	}
> -	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
>  	register_reboot_notifier(&on_reboot_nb);
>  	register_console(&con3215);
>  	return 0;
> diff --git a/drivers/s390/char/con3270.c b/drivers/s390/char/con3270.c
> index 476202f3d8a0..e79bf3e7bde3 100644
> --- a/drivers/s390/char/con3270.c
> +++ b/drivers/s390/char/con3270.c
> @@ -645,7 +645,7 @@ con3270_init(void)
>  	condev->cline->len = 0;
>  	con3270_create_status(condev);
>  	condev->input = alloc_string(&condev->freemem, 80);
> -	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
>  	register_reboot_notifier(&on_reboot_nb);
>  	register_console(&con3270);
>  	return 0;
> diff --git a/drivers/s390/char/sclp_con.c b/drivers/s390/char/sclp_con.c
> index e5d947c763ea..7ca9d4c45d60 100644
> --- a/drivers/s390/char/sclp_con.c
> +++ b/drivers/s390/char/sclp_con.c
> @@ -288,7 +288,7 @@ sclp_console_init(void)
>  	timer_setup(&sclp_con_timer, sclp_console_timeout, 0);
>  
>  	/* enable printk-access to this driver */
> -	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
>  	register_reboot_notifier(&on_reboot_nb);
>  	register_console(&sclp_console);
>  	return 0;
> diff --git a/drivers/s390/char/sclp_vt220.c b/drivers/s390/char/sclp_vt220.c
> index a32f34a1c6d2..97cf9e290c28 100644
> --- a/drivers/s390/char/sclp_vt220.c
> +++ b/drivers/s390/char/sclp_vt220.c
> @@ -838,7 +838,7 @@ sclp_vt220_con_init(void)
>  	if (rc)
>  		return rc;
>  	/* Attach linux console */
> -	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list, &on_panic_nb);
>  	register_reboot_notifier(&on_reboot_nb);
>  	register_console(&sclp_vt220_console);
>  	return 0;
> diff --git a/drivers/staging/olpc_dcon/olpc_dcon.c b/drivers/staging/olpc_dcon/olpc_dcon.c
> index 7284cb4ac395..cb50471f2246 100644
> --- a/drivers/staging/olpc_dcon/olpc_dcon.c
> +++ b/drivers/staging/olpc_dcon/olpc_dcon.c
> @@ -653,7 +653,8 @@ static int dcon_probe(struct i2c_client *client, const struct i2c_device_id *id)
>  	}
>  
>  	register_reboot_notifier(&dcon->reboot_nb);
> -	atomic_notifier_chain_register(&panic_notifier_list, &dcon_panic_nb);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
> +				       &dcon_panic_nb);
>  
>  	return 0;
>  
> @@ -676,7 +677,8 @@ static int dcon_remove(struct i2c_client *client)
>  	struct dcon_priv *dcon = i2c_get_clientdata(client);
>  
>  	unregister_reboot_notifier(&dcon->reboot_nb);
> -	atomic_notifier_chain_unregister(&panic_notifier_list, &dcon_panic_nb);
> +	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
> +					 &dcon_panic_nb);
>  
>  	free_irq(DCON_IRQ, dcon);
>  
> diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
> index f3494b868a64..ec21e63592be 100644
> --- a/drivers/video/fbdev/hyperv_fb.c
> +++ b/drivers/video/fbdev/hyperv_fb.c
> @@ -1253,7 +1253,7 @@ static int hvfb_probe(struct hv_device *hdev,
>  	 */
>  	par->hvfb_panic_nb.notifier_call = hvfb_on_panic;
>  	par->hvfb_panic_nb.priority = INT_MIN + 10,
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>  				       &par->hvfb_panic_nb);
>  
>  	return 0;
> @@ -1276,7 +1276,7 @@ static int hvfb_remove(struct hv_device *hdev)
>  	struct fb_info *info = hv_get_drvdata(hdev);
>  	struct hvfb_par *par = info->par;
>  
> -	atomic_notifier_chain_unregister(&panic_notifier_list,
> +	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
>  					 &par->hvfb_panic_nb);
>  
>  	par->update = false;
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index 7364a346bcb0..7912aacbc0e5 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -5,9 +5,9 @@
>  #include <linux/notifier.h>
>  #include <linux/types.h>
>  
> -extern struct atomic_notifier_head panic_notifier_list;
>  extern struct atomic_notifier_head panic_hypervisor_list;
>  extern struct atomic_notifier_head panic_info_list;
> +extern struct atomic_notifier_head panic_pre_reboot_list;
>  
>  extern bool crash_kexec_post_notifiers;
>  
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 73ca1bc44e30..a9d43b98b05b 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -69,16 +69,15 @@ EXPORT_SYMBOL_GPL(panic_timeout);
>  #define PANIC_PRINT_ALL_CPU_BT		0x00000040
>  unsigned long panic_print;
>  
> -ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
> -
> -EXPORT_SYMBOL(panic_notifier_list);
> -
>  ATOMIC_NOTIFIER_HEAD(panic_hypervisor_list);
>  EXPORT_SYMBOL(panic_hypervisor_list);
>  
>  ATOMIC_NOTIFIER_HEAD(panic_info_list);
>  EXPORT_SYMBOL(panic_info_list);
>  
> +ATOMIC_NOTIFIER_HEAD(panic_pre_reboot_list);
> +EXPORT_SYMBOL(panic_pre_reboot_list);
> +
>  static long no_blink(int state)
>  {
>  	return 0;
> @@ -295,7 +294,7 @@ void panic(const char *fmt, ...)
>  	 */
>  	atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
>  	atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
> -	atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);
> +	atomic_notifier_call_chain(&panic_pre_reboot_list, PANIC_NOTIFIER, buf);
>  
>  	panic_print_sys_info(false);
>  
> -- 
> 2.36.0
> 

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path
  2022-04-27 22:49 ` [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path Guilherme G. Piccoli
@ 2022-04-28 16:55   ` Helge Deller
  2022-04-29 14:34     ` Guilherme G. Piccoli
  2022-05-23 20:40     ` Guilherme G. Piccoli
  0 siblings, 2 replies; 183+ messages in thread
From: Helge Deller @ 2022-04-28 16:55 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	James E.J. Bottomley

On 4/28/22 00:49, Guilherme G. Piccoli wrote:
> The panic notifiers' callbacks execute in an atomic context, with
> interrupts/preemption disabled, and all CPUs not running the panic
> function are off, so it's very dangerous to wait on a regular
> spinlock, there's a risk of deadlock.
>
> This patch refactors the panic notifier of parisc/power driver
> to make use of spin_trylock - for that, we've added a second
> version of the soft-power function. Also, some comments were
> reorganized and trailing white spaces, useless header inclusion
> and blank lines were removed.
>
> Cc: Helge Deller <deller@gmx.de>
> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

You may add:
Acked-by: Helge Deller <deller@gmx.de> # parisc

Helge


> ---
>  arch/parisc/include/asm/pdc.h |  1 +
>  arch/parisc/kernel/firmware.c | 27 +++++++++++++++++++++++----
>  drivers/parisc/power.c        | 17 ++++++++++-------
>  3 files changed, 34 insertions(+), 11 deletions(-)
>
> diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h
> index b643092d4b98..7a106008e258 100644
> --- a/arch/parisc/include/asm/pdc.h
> +++ b/arch/parisc/include/asm/pdc.h
> @@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap);
>  int pdc_do_reset(void);
>  int pdc_soft_power_info(unsigned long *power_reg);
>  int pdc_soft_power_button(int sw_control);
> +int pdc_soft_power_button_panic(int sw_control);
>  void pdc_io_reset(void);
>  void pdc_io_reset_devices(void);
>  int pdc_iodc_getc(void);
> diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
> index 6a7e315bcc2e..0e2f70b592f4 100644
> --- a/arch/parisc/kernel/firmware.c
> +++ b/arch/parisc/kernel/firmware.c
> @@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long *power_reg)
>  }
>
>  /*
> - * pdc_soft_power_button - Control the soft power button behaviour
> - * @sw_control: 0 for hardware control, 1 for software control
> + * pdc_soft_power_button{_panic} - Control the soft power button behaviour
> + * @sw_control: 0 for hardware control, 1 for software control
>   *
>   *
>   * This PDC function places the soft power button under software or
>   * hardware control.
> - * Under software control the OS may control to when to allow to shut
> - * down the system. Under hardware control pressing the power button
> + * Under software control the OS may control to when to allow to shut
> + * down the system. Under hardware control pressing the power button
>   * powers off the system immediately.
> + *
> + * The _panic version relies in spin_trylock to prevent deadlock
> + * on panic path.
>   */
>  int pdc_soft_power_button(int sw_control)
>  {
> @@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
>  	return retval;
>  }
>
> +int pdc_soft_power_button_panic(int sw_control)
> +{
> +	int retval;
> +	unsigned long flags;
> +
> +	if (!spin_trylock_irqsave(&pdc_lock, flags)) {
> +		pr_emerg("Couldn't enable soft power button\n");
> +		return -EBUSY; /* ignored by the panic notifier */
> +	}
> +
> +	retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, __pa(pdc_result), sw_control);
> +	spin_unlock_irqrestore(&pdc_lock, flags);
> +
> +	return retval;
> +}
> +
>  /*
>   * pdc_io_reset - Hack to avoid overlapping range registers of Bridges devices.
>   * Primarily a problem on T600 (which parisc-linux doesn't support) but
> diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
> index 456776bd8ee6..8512884de2cf 100644
> --- a/drivers/parisc/power.c
> +++ b/drivers/parisc/power.c
> @@ -37,7 +37,6 @@
>  #include <linux/module.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
> -#include <linux/notifier.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/reboot.h>
>  #include <linux/sched/signal.h>
> @@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x)
>
>
>
> -/* parisc_panic_event() is called by the panic handler.
> - * As soon as a panic occurs, our tasklets above will not be
> - * executed any longer. This function then re-enables the
> - * soft-power switch and allows the user to switch off the system
> +/*
> + * parisc_panic_event() is called by the panic handler.
> + *
> + * As soon as a panic occurs, our tasklets above will not
> + * be executed any longer. This function then re-enables
> + * the soft-power switch and allows the user to switch off
> + * the system. We rely in pdc_soft_power_button_panic()
> + * since this version spin_trylocks (instead of regular
> + * spinlock), preventing deadlocks on panic path.
>   */
>  static int parisc_panic_event(struct notifier_block *this,
>  		unsigned long event, void *ptr)
>  {
>  	/* re-enable the soft-power switch */
> -	pdc_soft_power_button(0);
> +	pdc_soft_power_button_panic(0);
>  	return NOTIFY_DONE;
>  }
>
> @@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = {
>  	.priority	= INT_MAX,
>  };
>
> -
>  static int __init power_init(void)
>  {
>  	unsigned long ret;


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 17/30] tracing: Improve panic/die notifiers
  2022-04-27 22:49 ` [PATCH 17/30] tracing: Improve panic/die notifiers Guilherme G. Piccoli
@ 2022-04-29  9:22   ` Sergei Shtylyov
  2022-04-29 13:23     ` Steven Rostedt
  2022-05-11 11:45   ` Petr Mladek
  1 sibling, 1 reply; 183+ messages in thread
From: Sergei Shtylyov @ 2022-04-29  9:22 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

Hello!

On 4/28/22 1:49 AM, Guilherme G. Piccoli wrote:

> Currently the tracing dump_on_oops feature is implemented
> through separate notifiers, one for die/oops and the other
> for panic. With the addition of panic notifier "id", this
> patch makes use of such "id" to unify both functions.
> 
> It also comments the function and changes the priority of the
> notifier blocks, in order they run early compared to other
> notifiers, to prevent useless trace data (like the callback
> names for the other notifiers). Finally, we also removed an
> unnecessary header inclusion.
> 
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  kernel/trace/trace.c | 57 +++++++++++++++++++++++++-------------------
>  1 file changed, 32 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index f4de111fa18f..c1d8a3622ccc 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
[...]
> @@ -9767,38 +9766,46 @@ static __init int tracer_init_tracefs(void)
>  
>  fs_initcall(tracer_init_tracefs);
>  
> -static int trace_panic_handler(struct notifier_block *this,
> -			       unsigned long event, void *unused)
> +/*
> + * The idea is to execute the following die/panic callback early, in order
> + * to avoid showing irrelevant information in the trace (like other panic
> + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> + * warnings get disabled (to prevent potential log flooding).
> + */
> +static int trace_die_panic_handler(struct notifier_block *self,
> +				unsigned long ev, void *unused)
>  {
> -	if (ftrace_dump_on_oops)
> +	int do_dump;

   bool?

> +
> +	if (!ftrace_dump_on_oops)
> +		return NOTIFY_DONE;
> +
> +	switch (ev) {
> +	case DIE_OOPS:
> +		do_dump = 1;
> +		break;
> +	case PANIC_NOTIFIER:
> +		do_dump = 1;
> +		break;

   Why not:

	case DIE_OOPS:
	case PANIC_NOTIFIER:
		do_dump = 1;
		break;

> +	default:
> +		do_dump = 0;
> +		break;
> +	}
> +
> +	if (do_dump)
>  		ftrace_dump(ftrace_dump_on_oops);
> -	return NOTIFY_OK;
> +
> +	return NOTIFY_DONE;
>  }
[...]

MBR, Sergey

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 17/30] tracing: Improve panic/die notifiers
  2022-04-29  9:22   ` Sergei Shtylyov
@ 2022-04-29 13:23     ` Steven Rostedt
  2022-04-29 13:46       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-04-29 13:23 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Fri, 29 Apr 2022 12:22:44 +0300
Sergei Shtylyov <sergei.shtylyov@gmail.com> wrote:

> > +	switch (ev) {
> > +	case DIE_OOPS:
> > +		do_dump = 1;
> > +		break;
> > +	case PANIC_NOTIFIER:
> > +		do_dump = 1;
> > +		break;  
> 
>    Why not:
> 
> 	case DIE_OOPS:
> 	case PANIC_NOTIFIER:
> 		do_dump = 1;
> 		break;

Agreed.

Other than that.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 17/30] tracing: Improve panic/die notifiers
  2022-04-29 13:23     ` Steven Rostedt
@ 2022-04-29 13:46       ` Guilherme G. Piccoli
  2022-04-29 13:56         ` Steven Rostedt
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 13:46 UTC (permalink / raw)
  To: Steven Rostedt, Sergei Shtylyov
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On 29/04/2022 10:23, Steven Rostedt wrote:
> On Fri, 29 Apr 2022 12:22:44 +0300
> Sergei Shtylyov <sergei.shtylyov@gmail.com> wrote:
> 
>>> +	switch (ev) {
>>> +	case DIE_OOPS:
>>> +		do_dump = 1;
>>> +		break;
>>> +	case PANIC_NOTIFIER:
>>> +		do_dump = 1;
>>> +		break;  
>>
>>    Why not:
>>
>> 	case DIE_OOPS:
>> 	case PANIC_NOTIFIER:
>> 		do_dump = 1;
>> 		break;
> 
> Agreed.
> 
> Other than that.
> 
> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> 
> -- Steve

Thanks Sergei and Steven, good idea! I thought about the switch change
you propose, but I confess I got a bit confused by the "fallthrough"
keyword - do I need to use it?

About the s/int/bool, for sure! Not sure why I didn't use bool at
first...heheh

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 17/30] tracing: Improve panic/die notifiers
  2022-04-29 13:46       ` Guilherme G. Piccoli
@ 2022-04-29 13:56         ` Steven Rostedt
  2022-04-29 14:44           ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-04-29 13:56 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Sergei Shtylyov, akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Fri, 29 Apr 2022 10:46:35 -0300
"Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:

> Thanks Sergei and Steven, good idea! I thought about the switch change
> you propose, but I confess I got a bit confused by the "fallthrough"
> keyword - do I need to use it?

No. The fallthrough keyword is only needed when there's code between case
labels. As it is very common to list multiple cases for the same code path.
That is:

	case DIE_OOPS:
 	case PANIC_NOTIFIER:
 		do_dump = 1;
 		break;

Does not need a fall through label, as there's no code between the DIE_OOPS
and the PANIC_NOTIFIER. But if you had:

	case DIE_OOPS:
		x = true;
 	case PANIC_NOTIFIER:
 		do_dump = 1;
 		break;

Then you do.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
  2022-04-28  8:11   ` Suzuki K Poulose
@ 2022-04-29 14:01     ` Guilherme G. Piccoli
  2022-05-09 13:09     ` Guilherme G. Piccoli
  1 sibling, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 14:01 UTC (permalink / raw)
  To: Suzuki K Poulose, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Leo Yan, Mathieu Poirier, Mike Leach

On 28/04/2022 05:11, Suzuki K Poulose wrote:
> Hi Guilherme,
> [...] 
> How would you like to proceed with queuing this ? I am happy
> either way. In case you plan to push this as part of this
> series (I don't see any potential conflicts) :
> 
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Thanks for your review Suzuki, much appreciated!

About your question, I'm not sure yet - in case the core changes would
take a while (like if community find them polemic, require many changes,
etc) I might split this series in 2 parts, the fixes part vs the
improvements per se. Either way, a V2 is going to happen for sure, and
in that moment, I'll let you know what I think it's best.

But either way, any choice you prefer is fine by me as well (like if you
want to merge it now or postpone to get merged in the future), this is
not an urgent fix I think =)
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path
  2022-04-28 16:55   ` Helge Deller
@ 2022-04-29 14:34     ` Guilherme G. Piccoli
  2022-05-23 20:40     ` Guilherme G. Piccoli
  1 sibling, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 14:34 UTC (permalink / raw)
  To: Helge Deller, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, James E.J. Bottomley

On 28/04/2022 13:55, Helge Deller wrote:
> [...]
> You may add:
> Acked-by: Helge Deller <deller@gmx.de> # parisc
> 
> Helge

Thanks Helge, added!
Cheers,


Guilherme

> 
> 
>> ---
>>  arch/parisc/include/asm/pdc.h |  1 +
>>  arch/parisc/kernel/firmware.c | 27 +++++++++++++++++++++++----
>>  drivers/parisc/power.c        | 17 ++++++++++-------
>>  3 files changed, 34 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h
>> index b643092d4b98..7a106008e258 100644
>> --- a/arch/parisc/include/asm/pdc.h
>> +++ b/arch/parisc/include/asm/pdc.h
>> @@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap);
>>  int pdc_do_reset(void);
>>  int pdc_soft_power_info(unsigned long *power_reg);
>>  int pdc_soft_power_button(int sw_control);
>> +int pdc_soft_power_button_panic(int sw_control);
>>  void pdc_io_reset(void);
>>  void pdc_io_reset_devices(void);
>>  int pdc_iodc_getc(void);
>> diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
>> index 6a7e315bcc2e..0e2f70b592f4 100644
>> --- a/arch/parisc/kernel/firmware.c
>> +++ b/arch/parisc/kernel/firmware.c
>> @@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long *power_reg)
>>  }
>>
>>  /*
>> - * pdc_soft_power_button - Control the soft power button behaviour
>> - * @sw_control: 0 for hardware control, 1 for software control
>> + * pdc_soft_power_button{_panic} - Control the soft power button behaviour
>> + * @sw_control: 0 for hardware control, 1 for software control
>>   *
>>   *
>>   * This PDC function places the soft power button under software or
>>   * hardware control.
>> - * Under software control the OS may control to when to allow to shut
>> - * down the system. Under hardware control pressing the power button
>> + * Under software control the OS may control to when to allow to shut
>> + * down the system. Under hardware control pressing the power button
>>   * powers off the system immediately.
>> + *
>> + * The _panic version relies in spin_trylock to prevent deadlock
>> + * on panic path.
>>   */
>>  int pdc_soft_power_button(int sw_control)
>>  {
>> @@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
>>  	return retval;
>>  }
>>
>> +int pdc_soft_power_button_panic(int sw_control)
>> +{
>> +	int retval;
>> +	unsigned long flags;
>> +
>> +	if (!spin_trylock_irqsave(&pdc_lock, flags)) {
>> +		pr_emerg("Couldn't enable soft power button\n");
>> +		return -EBUSY; /* ignored by the panic notifier */
>> +	}
>> +
>> +	retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, __pa(pdc_result), sw_control);
>> +	spin_unlock_irqrestore(&pdc_lock, flags);
>> +
>> +	return retval;
>> +}
>> +
>>  /*
>>   * pdc_io_reset - Hack to avoid overlapping range registers of Bridges devices.
>>   * Primarily a problem on T600 (which parisc-linux doesn't support) but
>> diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
>> index 456776bd8ee6..8512884de2cf 100644
>> --- a/drivers/parisc/power.c
>> +++ b/drivers/parisc/power.c
>> @@ -37,7 +37,6 @@
>>  #include <linux/module.h>
>>  #include <linux/init.h>
>>  #include <linux/kernel.h>
>> -#include <linux/notifier.h>
>>  #include <linux/panic_notifier.h>
>>  #include <linux/reboot.h>
>>  #include <linux/sched/signal.h>
>> @@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x)
>>
>>
>>
>> -/* parisc_panic_event() is called by the panic handler.
>> - * As soon as a panic occurs, our tasklets above will not be
>> - * executed any longer. This function then re-enables the
>> - * soft-power switch and allows the user to switch off the system
>> +/*
>> + * parisc_panic_event() is called by the panic handler.
>> + *
>> + * As soon as a panic occurs, our tasklets above will not
>> + * be executed any longer. This function then re-enables
>> + * the soft-power switch and allows the user to switch off
>> + * the system. We rely in pdc_soft_power_button_panic()
>> + * since this version spin_trylocks (instead of regular
>> + * spinlock), preventing deadlocks on panic path.
>>   */
>>  static int parisc_panic_event(struct notifier_block *this,
>>  		unsigned long event, void *ptr)
>>  {
>>  	/* re-enable the soft-power switch */
>> -	pdc_soft_power_button(0);
>> +	pdc_soft_power_button_panic(0);
>>  	return NOTIFY_DONE;
>>  }
>>
>> @@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = {
>>  	.priority	= INT_MAX,
>>  };
>>
>> -
>>  static int __init power_init(void)
>>  {
>>  	unsigned long ret;
> 

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 17/30] tracing: Improve panic/die notifiers
  2022-04-29 13:56         ` Steven Rostedt
@ 2022-04-29 14:44           ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 14:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergei Shtylyov, akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, senozhatsky, stern, tglx, vgoyal,
	vkuznets, will

On 29/04/2022 10:56, Steven Rostedt wrote:
> [...]
> No. The fallthrough keyword is only needed when there's code between case
> labels. As it is very common to list multiple cases for the same code path.
> That is:
> 
> 	case DIE_OOPS:
>  	case PANIC_NOTIFIER:
>  		do_dump = 1;
>  		break;
> 
> Does not need a fall through label, as there's no code between the DIE_OOPS
> and the PANIC_NOTIFIER. But if you had:
> 
> 	case DIE_OOPS:
> 		x = true;
>  	case PANIC_NOTIFIER:
>  		do_dump = 1;
>  		break;
> 
> Then you do.
> 
> -- Steve

Thanks a bunch for the clarification, changed that for V2 =)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/30] panic: Add the panic informational notifier list
  2022-04-28  8:14   ` Suzuki K Poulose
@ 2022-04-29 14:50     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 14:50 UTC (permalink / raw)
  To: Suzuki K Poulose, paulmck
  Cc: linux-kernel, akpm, bcm-kernel-feedback-list, coresight,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, kexec,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, peterz, rostedt,
	senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Catalin Marinas, Florian Fainelli,
	Frederic Weisbecker, H. Peter Anvin, Hari Bathini,
	Joel Fernandes, Jonathan Hunter, Josh Triplett, Lai Jiangshan,
	Leo Yan, Mathieu Desnoyers, Mathieu Poirier, Michael Ellerman,
	Mike Leach, Mikko Perttunen, Neeraj Upadhyay, Nicholas Piggin,
	Paul Mackerras, Thierry Reding, Thomas Bogendoerfer, pmladek,
	bhe

Thanks Paul and Suzuki for the ACKs.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-04-28 16:26   ` Corey Minyard
@ 2022-04-29 15:18     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 15:18 UTC (permalink / raw)
  To: minyard, elder, Alex Elder, cminyard
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alexander Gordeev, Anton Ivanov,
	Benjamin Herrenschmidt, Bjorn Andersson, Boris Ostrovsky,
	Chris Zankel, Christian Borntraeger, Dexuan Cui, H. Peter Anvin,
	Haiyang Zhang, Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Henderson,
	Richard Weinberger, Robert Richter, Stefano Stabellini,
	Stephen Hemminger, Sven Schnelle, Tony Luck, Vasily Gorbik,
	Wei Liu

On 28/04/2022 13:26, Corey Minyard wrote:
> [...]
> 
> For the IPMI portion:
> 
> Acked-by: Corey Minyard <cminyard@mvista.com>

Thanks Alex and Corey for the ACKs!

> 
> Note that the IPMI panic_event() should always return, but it may take
> some time, especially if the IPMI controller is no longer functional.
> So the risk of a long delay is there and it makes sense to move it very
> late.
> 

Thanks, I agree - the patch moves it to the (latest - 1) position, since
some arch code might run as the latest and effectively stops the machine.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering
  2022-04-28  8:30   ` Johannes Berg
@ 2022-04-29 15:46     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 15:46 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linux-kernel, linux-um@lists.infradead.orgAnton Ivanov,
	Richard Weinberger

On 28/04/2022 05:30, Johannes Berg wrote:
> [trimming massive CC list]
> 
> On Wed, 2022-04-27 at 19:49 -0300, Guilherme G. Piccoli wrote:
>>
>> Also, we remove a useless header inclusion.
> 
> I wouldn't say it's useless, generally we try not to rely on implicit
> includes so much? And you at least now use NOTIFY_DONE from it.
> 
> Otherwise looks fine to me.
> 
> johannes

Hi Johannes, thanks for your prompt response, and for clearing a bit the
*huge* CC list, it got really humongous...

I agree with you here - I missed that there's also a reboot notifier in
"mconsole_kern.c", so seems it makes sense to keep the header. I'll fix
that for V2, ok? If it was only a panic notifier I would consider it
useless maybe, but we have non-panic notifier as well heh

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-04-28  0:28   ` Randy Dunlap
@ 2022-04-29 16:04     ` Guilherme G. Piccoli
  2022-05-09 14:25       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 16:04 UTC (permalink / raw)
  To: Randy Dunlap, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

On 27/04/2022 21:28, Randy Dunlap wrote:
> 
> 
> On 4/27/22 15:49, Guilherme G. Piccoli wrote:
>> +	crash_kexec_post_notifiers
>> +			This was DEPRECATED - users should always prefer the
> 
> 			This is DEPRECATED - users should always prefer the
> 
>> +			parameter "panic_notifiers_level" - check its entry
>> +			in this documentation for details on how it works.
>> +			Setting this parameter is exactly the same as setting
>> +			"panic_notifiers_level=4".
> 

Thanks Randy, for your suggestion - but I confess I couldn't understand
it properly. It's related to spaces/tabs, right? What you suggest me to
change in this formatting? Just by looking the email I can't parse.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-04-27 22:49 ` [PATCH 21/30] panic: Introduce the panic pre-reboot " Guilherme G. Piccoli
  2022-04-28 14:13   ` Alex Elder
  2022-04-28 16:26   ` Corey Minyard
@ 2022-04-29 16:04   ` Max Filippov
  2022-04-29 19:34     ` Guilherme G. Piccoli
  2022-05-16 14:33   ` Petr Mladek
  3 siblings, 1 reply; 183+ messages in thread
From: Max Filippov @ 2022-04-29 16:04 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Andrew Morton, bhe, Petr Mladek, kexec, LKML,
	bcm-kernel-feedback-list, coresight, linuxppc-dev,
	open list:ALPHA PORT, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, open list:PARISC ARCHITECTURE, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	open list:TENSILICA XTENSA PORT (xtensa),
	netdev, openipmi-developer, rcu, open list:SPARC + UltraSPAR...,
	xen-devel, maintainer:X86 ARCHITECTURE...,
	kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	Andy Shevchenko, Arnd Bergmann, Borislav Petkov, Jonathan Corbet,
	d.hatayama, Dave Hansen, dyoung, feng.tang, Greg Kroah-Hartman,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, Kees Cook,
	Andrew Lutomirski, Masami Hiramatsu, Ingo Molnar, paulmck,
	Peter Zijlstra, Steven Rostedt, Sergey Senozhatsky, stern,
	Thomas Gleixner, vgoyal, vkuznets, Will Deacon, Alex Elder,
	Alexander Gordeev, Anton Ivanov, Benjamin Herrenschmidt,
	Bjorn Andersson, Boris Ostrovsky, Chris Zankel,
	Christian Borntraeger, Corey Minyard, Dexuan Cui, H. Peter Anvin,
	Haiyang Zhang, Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Michael Ellerman, Paul Mackerras,
	Pavel Machek, Richard Henderson, Richard Weinberger,
	Robert Richter, Stefano Stabellini, Stephen Hemminger,
	Sven Schnelle, Tony Luck, Vasily Gorbik, Wei Liu

On Wed, Apr 27, 2022 at 3:55 PM Guilherme G. Piccoli
<gpiccoli@igalia.com> wrote:
>
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
>
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.

[...]

>  arch/xtensa/platforms/iss/setup.c     |  4 ++--For xtensa:

For xtensa:
Acked-by: Max Filippov <jcmvbkbc@gmail.com>

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  2022-04-27 22:48 ` [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path Guilherme G. Piccoli
@ 2022-04-29 16:26   ` Michael Kelley (LINUX)
  2022-04-29 18:20   ` Marc Zyngier
  1 sibling, 0 replies; 183+ messages in thread
From: Michael Kelley (LINUX) @ 2022-04-29 16:26 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Marc Zyngier, Russell King

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM
> 
> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> is responsible for that. This makes sense, since we're turning off
> such CPUs, putting them in an endless busy-wait loop.
> 
> Problem is that there is an alternative path for disabling CPUs,
> in the form of function crash_smp_send_stop(), used for kexec/panic
> paths. This functions relies in a SMP call that also triggers a

s/functions relies in/function relies on/

> busy-wait loop [at machine_crash_nonpanic_core()], but *without*
> disabling interrupts. This might lead to odd scenarios, like early
> interrupts in the boot of kexec'd kernel or even interrupts in
> other CPUs while the main one still works in the panic path and
> assumes all secondary CPUs are (really!) off.
> 
> This patch mimics the ipi_cpu_stop() interrupt disable mechanism
> in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
> secondary CPUs in the kexec/panic path as well.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  arch/arm/kernel/machine_kexec.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index f567032a09c0..ef788ee00519 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
>  	set_cpu_online(smp_processor_id(), false);
>  	atomic_dec(&waiting_for_crash_ipi);
> 
> +	local_fiq_disable();
> +	local_irq_disable();
> +
>  	while (1) {
>  		cpu_relax();
>  		wfe();
> --
> 2.36.0


^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  2022-04-27 22:49 ` [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set Guilherme G. Piccoli
  2022-04-28  1:01   ` Xiaoming Ni
@ 2022-04-29 16:27   ` Michael Kelley (LINUX)
  1 sibling, 0 replies; 183+ messages in thread
From: Michael Kelley (LINUX) @ 2022-04-29 16:27 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Arjan van de Ven, Cong Wang, Sebastian Andrzej Siewior,
	Valentin Schneider, Xiaoming Ni

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM
> 
> Currently we have a debug infrastructure in the notifiers file, but
> it's very simple/limited. This patch extends it by:
> 
> (a) Showing all registered/unregistered notifiers' callback names;
> 
> (b) Adding a dynamic debug tuning to allow showing called notifiers'
> function names. Notice that this should be guarded as a tunable since
> it can flood the kernel log buffer.
> 
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Xiaoming Ni <nixiaoming@huawei.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> We have some design decisions that worth discussing here:
> 
> (a) First of call, using C99 helps a lot to write clear and concise code, but

s/call/all/

> due to commit 4d94f910e79a ("Kbuild: use -Wdeclaration-after-statement") we
> have a warning if mixing variable declarations with code. For this patch though,
> doing that makes the code way clear, so decision was to add the debug code
> inside brackets whenever this warning pops up. We can change that, but that'll
> cause more ifdefs in the same function.
> 
> (b) In the symbol lookup helper function, we modify the parameter passed but
> even more, we return it as well! This is unusual and seems unnecessary, but was
> the strategy taken to allow embedding such function in the pr_debug() call.
> 
> Not doing that would likely requiring 3 symbol_name variables to avoid
> concurrency (registering notifier A while calling notifier B) - we rely in
> local variables as a serialization mechanism.
> 
> We're open for suggestions in case this design is not appropriate;
> thanks in advance!
> 
>  kernel/notifier.c | 48 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index ba005ebf4730..21032ebcde57 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -7,6 +7,22 @@
>  #include <linux/vmalloc.h>
>  #include <linux/reboot.h>
> 
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> +#include <linux/kallsyms.h>
> +
> +/*
> + *	Helper to get symbol names in case DEBUG_NOTIFIERS is set.
> + *	Return the modified parameter is a strategy used to achieve
> + *	the pr_debug() functionality - with this, function is only
> + *	executed if the dynamic debug tuning is effectively set.
> + */
> +static inline char *notifier_name(struct notifier_block *nb, char *sym_name)
> +{
> +	lookup_symbol_name((unsigned long)(nb->notifier_call), sym_name);
> +	return sym_name;
> +}
> +#endif
> +
>  /*
>   *	Notifier list for kernel code which wants to be called
>   *	at shutdown. This is used to stop any idling DMA operations
> @@ -34,20 +50,41 @@ static int notifier_chain_register(struct notifier_block **nl,
>  	}
>  	n->next = *nl;
>  	rcu_assign_pointer(*nl, n);
> +
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> +	{
> +		char sym_name[KSYM_NAME_LEN];
> +
> +		pr_info("notifiers: registered %s()\n",
> +			notifier_name(n, sym_name));
> +	}
> +#endif
>  	return 0;
>  }
> 
>  static int notifier_chain_unregister(struct notifier_block **nl,
>  		struct notifier_block *n)
>  {
> +	int ret = -ENOENT;
> +
>  	while ((*nl) != NULL) {
>  		if ((*nl) == n) {
>  			rcu_assign_pointer(*nl, n->next);
> -			return 0;
> +			ret = 0;
> +			break;
>  		}
>  		nl = &((*nl)->next);
>  	}
> -	return -ENOENT;
> +
> +#ifdef CONFIG_DEBUG_NOTIFIERS
> +	if (!ret) {
> +		char sym_name[KSYM_NAME_LEN];
> +
> +		pr_info("notifiers: unregistered %s()\n",
> +			notifier_name(n, sym_name));
> +	}
> +#endif
> +	return ret;
>  }
> 
>  /**
> @@ -80,6 +117,13 @@ static int notifier_call_chain(struct notifier_block **nl,
>  			nb = next_nb;
>  			continue;
>  		}
> +
> +		{
> +			char sym_name[KSYM_NAME_LEN];
> +
> +			pr_debug("notifiers: calling %s()\n",
> +				 notifier_name(nb, sym_name));
> +		}
>  #endif
>  		ret = nb->notifier_call(nb, val, v);
> 
> --
> 2.36.0


^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  2022-04-27 22:49 ` [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers Guilherme G. Piccoli
@ 2022-04-29 17:16   ` Michael Kelley (LINUX)
  2022-04-29 22:35     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Michael Kelley (LINUX) @ 2022-04-29 17:16 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Andrea Parri, Dexuan Cui, Haiyang Zhang, KY Srinivasan,
	Stephen Hemminger, Tianyu Lan, Wei Liu

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM
> 
> Currently Hyper-V guests are among the most relevant users of the panic
> infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
> both in cleaning-up procedures (closing a hypervisor <-> guest connection,
> disabling a paravirtualized timer) as well as to data collection (sending
> panic information to the hypervisor) and framebuffer management.
> 
> The thing is: some notifiers are related to others, ordering matters, some
> functionalities are duplicated and there are lots of conditionals behind
> sending panic information to the hypervisor. This patch, as part of an
> effort to clean-up the panic notifiers mechanism and better document
> things, address some of the issues/complexities of Hyper-V panic handling
> through the following changes:
> 
> (a) We have die and panic notifiers on vmbus_drv.c and both have goals of
> sending panic information to the hypervisor, though the panic notifier is
> also responsible for a cleaning-up procedure.
> 
> This commit clears the code by splitting the panic notifier in two, one
> for closing the vmbus connection whereas the other is only for sending
> panic info to hypervisor. With that, it was possible to merge the die and
> panic notifiers in a single/well-documented function, and clear some
> conditional complexities on sending such information to the hypervisor.
> 
> (b) The new panic notifier created after (a) is only doing a single thing:
> cleaning the vmbus connection. This procedure might cause a delay (due to
> hypervisor I/O completion), so we postpone that to run late. But more
> relevant: this *same* vmbus unloading happens in the crash_shutdown()
> handler, so if kdump is set, we can safely skip this panic notifier and
> defer such clean-up to the kexec crash handler.

While the last sentence is true for Hyper-V on x86/x64, it's not true for
Hyper-V on ARM64.  x86/x64 has the 'machine_ops' data structure
with the ability to provide a custom crash_shutdown() function, which
Hyper-V does in the form of hv_machine_crash_shutdown().  But ARM64
has no mechanism to provide such a custom function that will eventually
do the needed vmbus_initiate_unload() before running kdump.

I'm not immediately sure what the best solution is for ARM64.  At this
point, I'm just pointing out the problem and will think about the tradeoffs
for various possible solutions.  Please do the same yourself. :-)

> 
> (c) There is also a Hyper-V framebuffer panic notifier, which relies in
> doing a vmbus operation that demands a valid connection. So, we must
> order this notifier with the panic notifier from vmbus_drv.c, in order to
> guarantee that the framebuffer code executes before the vmbus connection
> is unloaded.

Patch 21 of this set puts the Hyper-V FB panic notifier on the pre_reboot
notifier list, which means it won't execute before the VMbus connection
unload in the case of kdump.   This notifier is making sure that Hyper-V
is notified about the last updates made to the frame buffer before the
panic, so maybe it needs to be put on the hypervisor notifier list.  It
sends a message to Hyper-V over its existing VMbus channel, but it
does not wait for a reply.  It does, however, obtain a spin lock on the
ring buffer used to communicate with Hyper-V.   Unless someone has
a better suggestion, I'm inclined to take the risk of blocking on that
spin lock.

> 
> Also, this commit removes a useless header.
> 
> Although there is code rework and re-ordering, we expect that this change
> has no functional regressions but instead optimize the path and increase
> panic reliability on Hyper-V. This was tested on Hyper-V with success.
> 
> Fixes: 792f232d57ff ("Drivers: hv: vmbus: Fix potential crash on module unload")
> Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback")

The "Fixes:" tags imply that these changes should be backported to older
longterm kernel versions, which I don't think is the case.  There is a
dependency on Patch 14 of your series where PANIC_NOTIFIER is
introduced.

> Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> Cc: Dexuan Cui <decui@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Michael Kelley <mikelley@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> Special thanks to Michael Kelley for the good information about the Hyper-V
> panic path in email threads some months ago, and to Fabio for the testing
> performed.
> 
> Michael and all Microsoft folks: a careful analysis to double-check our changes
> and assumptions here is really appreciated, this code is complex and intricate,
> it is possible some corner case might have been overlooked.
> 
> Thanks in advance!
> 
>  drivers/hv/vmbus_drv.c          | 109 ++++++++++++++++++++------------
>  drivers/video/fbdev/hyperv_fb.c |   8 +++
>  2 files changed, 76 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 14de17087864..f37f12d48001 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -24,11 +24,11 @@
>  #include <linux/sched/task_stack.h>
> 
>  #include <linux/delay.h>
> -#include <linux/notifier.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/ptrace.h>
>  #include <linux/screen_info.h>
>  #include <linux/kdebug.h>
> +#include <linux/kexec.h>
>  #include <linux/efi.h>
>  #include <linux/random.h>
>  #include <linux/kernel.h>
> @@ -68,51 +68,75 @@ static int hyperv_report_reg(void)
>  	return !sysctl_record_panic_msg || !hv_panic_page;
>  }
> 
> -static int hyperv_panic_event(struct notifier_block *nb, unsigned long val,
> +/*
> + * The panic notifier below is responsible solely for unloading the
> + * vmbus connection, which is necessary in a panic event. But notice
> + * that this same unloading procedure is executed in the Hyper-V
> + * crash_shutdown() handler [see hv_crash_handler()], which basically
> + * means that we can postpone its execution if we have kdump set,
> + * since it will run the crash_shutdown() handler anyway. Even more
> + * intrincated is the relation of this notifier with Hyper-V framebuffer

s/intrincated/intricate/

> + * panic notifier - we need vmbus connection alive there in order to
> + * succeed, so we need to order both with each other [for reference see
> + * hvfb_on_panic()] - this is done using notifiers' priorities.
> + */
> +static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
>  			      void *args)
> +{
> +	if (!kexec_crash_loaded())

I'm not clear on the purpose of this condition.  I think it means
we will skip the vmbus_initiate_unload() if a panic occurs in the
kdump kernel.  Is there a reason a panic in the kdump kernel
should be treated differently?  Or am I misunderstanding?

> +		vmbus_initiate_unload(true);
> +
> +	return NOTIFY_DONE;
> +}
> +static struct notifier_block hyperv_panic_vmbus_unload_block = {
> +	.notifier_call	= hv_panic_vmbus_unload,
> +	.priority	= INT_MIN + 1, /* almost the latest one to execute */
> +};
> +
> +/*
> + * The following callback works both as die and panic notifier; its
> + * goal is to provide panic information to the hypervisor unless the
> + * kmsg dumper is gonna be used [see hv_kmsg_dump()], which provides
> + * more information but is not always available.
> + *
> + * Notice that both the panic/die report notifiers are registered only
> + * if we have the capability HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE set.
> + */
> +static int hv_die_panic_notify_crash(struct notifier_block *nb,
> +				     unsigned long val, void *args)
>  {
>  	struct pt_regs *regs;
> +	bool is_die;
> 
> -	vmbus_initiate_unload(true);
> -
> -	/*
> -	 * Hyper-V should be notified only once about a panic.  If we will be
> -	 * doing hv_kmsg_dump() with kmsg data later, don't do the notification
> -	 * here.
> -	 */
> -	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE
> -	    && hyperv_report_reg()) {
> +	/* Don't notify Hyper-V unless we have a die oops event or panic. */
> +	switch (val) {
> +	case DIE_OOPS:
> +		is_die = true;
> +		regs = ((struct die_args *)args)->regs;
> +		break;
> +	case PANIC_NOTIFIER:
> +		is_die = false;
>  		regs = current_pt_regs();
> -		hyperv_report_panic(regs, val, false);
> -	}
> -	return NOTIFY_DONE;
> -}
> -
> -static int hyperv_die_event(struct notifier_block *nb, unsigned long val,
> -			    void *args)
> -{
> -	struct die_args *die = args;
> -	struct pt_regs *regs = die->regs;
> -
> -	/* Don't notify Hyper-V if the die event is other than oops */
> -	if (val != DIE_OOPS)
> +		break;
> +	default:
>  		return NOTIFY_DONE;
> +	}
> 
>  	/*
> -	 * Hyper-V should be notified only once about a panic.  If we will be
> -	 * doing hv_kmsg_dump() with kmsg data later, don't do the notification
> -	 * here.
> +	 * Hyper-V should be notified only once about a panic/die. If we will
> +	 * be calling hv_kmsg_dump() later with kmsg data, don't do the
> +	 * notification here.
>  	 */
>  	if (hyperv_report_reg())
> -		hyperv_report_panic(regs, val, true);
> +		hyperv_report_panic(regs, val, is_die);
> +
>  	return NOTIFY_DONE;
>  }
> -
> -static struct notifier_block hyperv_die_block = {
> -	.notifier_call = hyperv_die_event,
> +static struct notifier_block hyperv_die_report_block = {
> +	.notifier_call = hv_die_panic_notify_crash,
>  };
> -static struct notifier_block hyperv_panic_block = {
> -	.notifier_call = hyperv_panic_event,
> +static struct notifier_block hyperv_panic_report_block = {
> +	.notifier_call = hv_die_panic_notify_crash,
>  };
> 
>  static const char *fb_mmio_name = "fb_range";
> @@ -1589,16 +1613,17 @@ static int vmbus_bus_init(void)
>  		if (hyperv_crash_ctl & HV_CRASH_CTL_CRASH_NOTIFY_MSG)
>  			hv_kmsg_dump_register();
> 
> -		register_die_notifier(&hyperv_die_block);
> +		register_die_notifier(&hyperv_die_report_block);
> +		atomic_notifier_chain_register(&panic_notifier_list,
> +						&hyperv_panic_report_block);
>  	}
> 
>  	/*
> -	 * Always register the panic notifier because we need to unload
> -	 * the VMbus channel connection to prevent any VMbus
> -	 * activity after the VM panics.
> +	 * Always register the vmbus unload panic notifier because we
> +	 * need to shut the VMbus channel connection on panic.
>  	 */
>  	atomic_notifier_chain_register(&panic_notifier_list,
> -			       &hyperv_panic_block);
> +			       &hyperv_panic_vmbus_unload_block);
> 
>  	vmbus_request_offers();
> 
> @@ -2817,15 +2842,17 @@ static void __exit vmbus_exit(void)
> 
>  	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
>  		kmsg_dump_unregister(&hv_kmsg_dumper);
> -		unregister_die_notifier(&hyperv_die_block);
> +		unregister_die_notifier(&hyperv_die_report_block);
> +		atomic_notifier_chain_unregister(&panic_notifier_list,
> +						&hyperv_panic_report_block);
>  	}
> 
>  	/*
> -	 * The panic notifier is always registered, hence we should
> +	 * The vmbus panic notifier is always registered, hence we should
>  	 * also unconditionally unregister it here as well.
>  	 */
>  	atomic_notifier_chain_unregister(&panic_notifier_list,
> -					 &hyperv_panic_block);
> +					&hyperv_panic_vmbus_unload_block);
> 
>  	free_page((unsigned long)hv_panic_page);
>  	unregister_sysctl_table(hv_ctl_table_hdr);
> diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
> index c8e0ea27caf1..f3494b868a64 100644
> --- a/drivers/video/fbdev/hyperv_fb.c
> +++ b/drivers/video/fbdev/hyperv_fb.c
> @@ -1244,7 +1244,15 @@ static int hvfb_probe(struct hv_device *hdev,
>  	par->fb_ready = true;
> 
>  	par->synchronous_fb = false;
> +
> +	/*
> +	 * We need to be sure this panic notifier runs _before_ the
> +	 * vmbus disconnect, so order it by priority. It must execute
> +	 * before the function hv_panic_vmbus_unload() [drivers/hv/vmbus_drv.c],
> +	 * which is almost at the end of list, with priority = INT_MIN + 1.
> +	 */
>  	par->hvfb_panic_nb.notifier_call = hvfb_on_panic;
> +	par->hvfb_panic_nb.priority = INT_MIN + 10,
>  	atomic_notifier_chain_register(&panic_notifier_list,
>  				       &par->hvfb_panic_nb);
> 
> --
> 2.36.0


^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-04-27 22:49 ` [PATCH 19/30] panic: Add the panic hypervisor notifier list Guilherme G. Piccoli
@ 2022-04-29 17:30   ` Michael Kelley (LINUX)
  2022-04-29 18:04     ` Guilherme G. Piccoli
  2022-05-16 14:01   ` Petr Mladek
  1 sibling, 1 reply; 183+ messages in thread
From: Michael Kelley (LINUX) @ 2022-04-29 17:30 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David Gow, David S. Miller, Dexuan Cui,
	Doug Berger, Evan Green, Florian Fainelli, Haiyang Zhang,
	Hari Bathini, Heiko Carstens, Julius Werner, Justin Chen,
	KY Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Scott Branden, Sebastian Reichel, Shile Zhang, Stephen Hemminger,
	Sven Schnelle, Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik,
	Wang ShaoBo, Wei Liu, zhenwei pi

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM
> 
> The goal of this new panic notifier is to allow its users to register
> callbacks to run very early in the panic path. This aims hypervisor/FW
> notification mechanisms as well as simple LED functions, and any other
> simple and safe mechanism that should run early in the panic path; more
> dangerous callbacks should execute later.
> 
> For now, the patch is almost a no-op (although it changes a bit the
> ordering in which some panic notifiers are executed). In a subsequent
> patch, the panic path will be refactored, then the panic hypervisor
> notifiers will effectively run very early in the panic path.
> 
> We also defer documenting it all properly in the subsequent refactor
> patch. While at it, we removed some useless header inclusions and
> fixed some notifiers return too (by using the standard NOTIFY_DONE).
> 
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Brian Norris <computersforpeace@gmail.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> Cc: David Gow <davidgow@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Dexuan Cui <decui@microsoft.com>
> Cc: Doug Berger <opendmb@gmail.com>
> Cc: Evan Green <evgreen@chromium.org>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Julius Werner <jwerner@chromium.org>
> Cc: Justin Chen <justinpopo6@gmail.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Lee Jones <lee.jones@linaro.org>
> Cc: Markus Mayer <mmayer@broadcom.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Michael Kelley <mikelley@microsoft.com>
> Cc: Mihai Carabas <mihai.carabas@oracle.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Scott Branden <scott.branden@broadcom.com>
> Cc: Sebastian Reichel <sre@kernel.org>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: zhenwei pi <pizhenwei@bytedance.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  arch/mips/sgi-ip22/ip22-reset.c          | 2 +-
>  arch/mips/sgi-ip32/ip32-reset.c          | 3 +--
>  arch/powerpc/kernel/setup-common.c       | 2 +-
>  arch/sparc/kernel/sstate.c               | 3 +--
>  drivers/firmware/google/gsmi.c           | 4 ++--
>  drivers/hv/vmbus_drv.c                   | 4 ++--
>  drivers/leds/trigger/ledtrig-activity.c  | 4 ++--
>  drivers/leds/trigger/ledtrig-heartbeat.c | 4 ++--
>  drivers/misc/bcm-vk/bcm_vk_dev.c         | 6 +++---
>  drivers/misc/pvpanic/pvpanic.c           | 4 ++--
>  drivers/power/reset/ltc2952-poweroff.c   | 4 ++--
>  drivers/s390/char/zcore.c                | 5 +++--
>  drivers/soc/bcm/brcmstb/pm/pm-arm.c      | 2 +-
>  include/linux/panic_notifier.h           | 1 +
>  kernel/panic.c                           | 4 ++++
>  15 files changed, 28 insertions(+), 24 deletions(-)

[ snip]

> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index f37f12d48001..901b97034308 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1614,7 +1614,7 @@ static int vmbus_bus_init(void)
>  			hv_kmsg_dump_register();
> 
>  		register_die_notifier(&hyperv_die_report_block);
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_hypervisor_list,
>  						&hyperv_panic_report_block);
>  	}
> 
> @@ -2843,7 +2843,7 @@ static void __exit vmbus_exit(void)
>  	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
>  		kmsg_dump_unregister(&hv_kmsg_dumper);
>  		unregister_die_notifier(&hyperv_die_report_block);
> -		atomic_notifier_chain_unregister(&panic_notifier_list,
> +		atomic_notifier_chain_unregister(&panic_hypervisor_list,
>  						&hyperv_panic_report_block);
>  	}
> 

Using the hypervisor_list here produces a bit of a mismatch.  In many cases
this notifier will do nothing, and will defer to the kmsg_dump() mechanism
to notify the hypervisor about the panic.   Running the kmsg_dump()
mechanism is linked to the info_list, so I'm thinking the Hyper-V panic report
notifier should be on the info_list as well.  That way the reporting behavior
is triggered at the same point in the panic path regardless of which
reporting mechanism is used.




^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 24/30] panic: Refactor the panic path
  2022-04-27 22:49 ` [PATCH 24/30] panic: Refactor the panic path Guilherme G. Piccoli
  2022-04-28  0:28   ` Randy Dunlap
@ 2022-04-29 17:53   ` Michael Kelley (LINUX)
  2022-04-29 20:38     ` Guilherme G. Piccoli
  2022-05-09 15:16   ` d.hatayama
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 183+ messages in thread
From: Michael Kelley (LINUX) @ 2022-04-29 17:53 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM
> 
> The panic() function is somewhat convoluted - a lot of changes were
> made over the years, adding comments that might be misleading/outdated
> now, it has a code structure that is a bit complex to follow, with
> lots of conditionals, for example. The panic notifier list is something
> else - a single list, with multiple callbacks of different purposes,
> that run in a non-deterministic order and may affect hardly kdump
> reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.
> 
> This patch proposes a major refactor on the panic path based on Petr's
> idea [0] - basically we split the notifiers list in three, having a set
> of different call points in the panic path. Below a list of changes
> proposed in this patch, culminating in the panic notifiers level
> concept:
> 
> (a) First of all, we improved comments all over the function
> and removed useless variables / includes. Also, as part of this
> clean-up we concentrate the console flushing functions in a helper.
> 
> (b) As mentioned before, there is a split of the panic notifier list
> in three, based on the purpose of the callback. The code contains
> good documentation in form of comments, but a summary of the three
> lists follows:
> 
> - the hypervisor list aims low-risk procedures to inform hypervisors
> or firmware about the panic event, also includes LED-related functions;
> 
> - the informational list contains callbacks that provide more details,
> like kernel offset or trace dump (if enabled) and also includes the
> callbacks aimed at reducing log pollution or warns, like the RCU and
> hung task disable callbacks;
> 
> - finally, the pre_reboot list is the old notifier list renamed,
> containing the more risky callbacks that didn't fit the previous
> lists. There is also a 4th list (the post_reboot one), but it's not
> related with the original list - it contains late time architecture
> callbacks aimed at stopping the machine, for example.
> 
> The 3 notifiers lists execute in different moments, hypervisor being
> the first, followed by informational and finally the pre_reboot list.
> 
> (c) But then, there is the ordering problem of the notifiers against
> the crash_kernel() call - kdump must be as reliable as possible.
> For that, a simple binary "switch" as "crash_kexec_post_notifiers"
> is not enough, hence we introduce here concept of panic notifier
> levels: there are 5 levels, from 0 (no notifier executes before
> kdump) until 4 (all notifiers run before kdump); the default level
> is 2, in which the hypervisor and (iff we have any kmsg dumper)
> the informational notifiers execute before kdump.
> 
> The detailed documentation of the levels is present in code comments
> and in the kernel-parameters.txt file; as an analogy with the previous
> panic() implementation, the level 0 is exactly the same as the old
> behavior of notifiers, running all after kdump, and the level 4 is
> the same as "crash_kexec_post_notifiers=Y" (we kept this parameter as
> a deprecated one).
> 
> (d) Finally, an important change made here: we now use only the
> function "crash_smp_send_stop()" to shut all the secondary CPUs
> in the panic path. Before, there was a case of using the regular
> "smp_send_stop()", but the better approach is to simplify the
> code and try to use the function which was created exclusively
> for the panic path. Experiments showed that it works fine, and
> code was very simplified with that.
> 
> Functional change is expected from this refactor, since now we
> call some notifiers by default before kdump, but the goal here
> besides code clean-up is to have a better panic path, more
> reliable and deterministic, but also very customizable.
> 
> [0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> Special thanks to Petr and Baoquan for the suggestion and feedback in a previous
> email thread. There's some important design decisions that worth mentioning and
> discussing:
> 
> * The default panic notifiers level is 2, based on Petr Mladek's suggestion,
> which makes a lot of sense. Of course, this is customizable through the
> parameter, but would be something worthwhile to have a KConfig option to set
> the default level? It would help distros that want the old behavior
> (no notifiers before kdump) as default.
> 
> * The implementation choice was to _avoid_ intricate if conditionals in the
> panic path, which would _definitely_ be present with the panic notifiers levels
> idea; so, instead of lots of if conditionals, the set/clear bits approach with
> functions called in 2 points (but executing only in one of them) is much easier
> to follow an was used here; the ordering helper function and the comments also
> help a lot to avoid confusion (hopefully).
> 
> * Choice was to *always* use crash_smp_send_stop() instead of sometimes making
> use of the regular smp_send_stop(); for most architectures they are the same,
> including Xen (on x86). For the ones that override it, all should work fine,
> in the powerpc case it's even more correct (see the subsequent patch
> "powerpc: Do not force all panic notifiers to execute before kdump")
> 
> There seems to be 2 cases that requires some plumbing to work 100% right:
> - ARM doesn't disable local interrupts / FIQs in the crash version of
> send_stop(); we patched that early in this series;
> - x86 could face an issue if we have VMX and do use crash_smp_send_stop()
> _without_ kdump, but this is fixed in the first patch of the series (and
> it's a bug present even before this refactor).
> 
> * Notice we didn't add a sysrq for panic notifiers level - should have it?
> Alejandro proposed recently to add a sysrq for "crash_kexec_post_notifiers",
> let me know if you feel the need here Alejandro, since the core parameters are
> present in /sys, I didn't consider much gain in having a sysrq, but of course
> I'm open to suggestions!
> 
> Thanks advance for the review!
> 
>  .../admin-guide/kernel-parameters.txt         |  42 ++-
>  include/linux/panic_notifier.h                |   1 +
>  kernel/kexec_core.c                           |   8 +-
>  kernel/panic.c                                | 292 +++++++++++++-----
>  .../selftests/pstore/pstore_crash_test        |   5 +-
>  5 files changed, 252 insertions(+), 96 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index 3f1cc5e317ed..8d3524060ce3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -829,6 +829,13 @@
>  			It will be ignored when crashkernel=X,high is not used
>  			or memory reserved is below 4G.
> 
> +	crash_kexec_post_notifiers
> +			This was DEPRECATED - users should always prefer the
> +			parameter "panic_notifiers_level" - check its entry
> +			in this documentation for details on how it works.
> +			Setting this parameter is exactly the same as setting
> +			"panic_notifiers_level=4".
> +
>  	cryptomgr.notests
>  			[KNL] Disable crypto self-tests
> 
> @@ -3784,6 +3791,33 @@
>  			timeout < 0: reboot immediately
>  			Format: <timeout>
> 
> +	panic_notifiers_level=
> +			[KNL] Set the panic notifiers execution order.
> +			Format: <unsigned int>
> +			We currently have 4 lists of panic notifiers; based
> +			on the functionality and risk (for panic success) the
> +			callbacks are added in a given list. The lists are:
> +			- hypervisor/FW notification list (low risk);
> +			- informational list (low/medium risk);
> +			- pre_reboot list (higher risk);
> +			- post_reboot list (only run late in panic and after
> +			kdump, not configurable for now).
> +			This parameter defines the ordering of the first 3
> +			lists with regards to kdump; the levels determine
> +			which set of notifiers execute before kdump. The
> +			accepted levels are:
> +			0: kdump is the first thing to run, NO list is
> +			executed before kdump.
> +			1: only the hypervisor list is executed before kdump.
> +			2 (default level): the hypervisor list and (*if*
> +			there's any kmsg_dumper defined) the informational
> +			list are executed before kdump.
> +			3: both the hypervisor and the informational lists
> +			(always) execute before kdump.

I'm not clear on why level 2 exists.  What is the scenario where
execution of the info list before kdump should be conditional on the
existence of a kmsg_dumper?   Maybe the scenario is described
somewhere in the patch set and I just missed it.

> +			4: the 3 lists (hypervisor, info and pre_reboot)
> +			execute before kdump - this behavior is analog to the
> +			deprecated parameter "crash_kexec_post_notifiers".
> +
>  	panic_print=	Bitmask for printing system info when panic happens.
>  			User can chose combination of the following bits:
>  			bit 0: print all tasks info
> @@ -3814,14 +3848,6 @@
>  	panic_on_warn	panic() instead of WARN().  Useful to cause kdump
>  			on a WARN().
> 
> -	crash_kexec_post_notifiers
> -			Run kdump after running panic-notifiers and dumping
> -			kmsg. This only for the users who doubt kdump always
> -			succeeds in any situation.
> -			Note that this also increases risks of kdump failure,
> -			because some panic notifiers can make the crashed
> -			kernel more unstable.
> -
>  	parkbd.port=	[HW] Parallel port number the keyboard adapter is
>  			connected to, default is 0.
>  			Format: <parport#>
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index bcf6a5ea9d7f..b5041132321d 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -10,6 +10,7 @@ extern struct atomic_notifier_head panic_info_list;
>  extern struct atomic_notifier_head panic_pre_reboot_list;
>  extern struct atomic_notifier_head panic_post_reboot_list;
> 
> +bool panic_notifiers_before_kdump(void);
>  extern bool crash_kexec_post_notifiers;
> 
>  enum panic_notifier_val {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 68480f731192..f8906db8ca72 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -74,11 +74,11 @@ struct resource crashk_low_res = {
>  int kexec_should_crash(struct task_struct *p)
>  {
>  	/*
> -	 * If crash_kexec_post_notifiers is enabled, don't run
> -	 * crash_kexec() here yet, which must be run after panic
> -	 * notifiers in panic().
> +	 * In case any panic notifiers are set to execute before kexec,
> +	 * don't run crash_kexec() here yet, which must run after these
> +	 * panic notifiers are executed, in the panic() path.
>  	 */
> -	if (crash_kexec_post_notifiers)
> +	if (panic_notifiers_before_kdump())
>  		return 0;
>  	/*
>  	 * There are 4 panic() calls in make_task_dead() path, each of which
> diff --git a/kernel/panic.c b/kernel/panic.c
> index bf792102b43e..b7c055d4f87f 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -15,7 +15,6 @@
>  #include <linux/kgdb.h>
>  #include <linux/kmsg_dump.h>
>  #include <linux/kallsyms.h>
> -#include <linux/notifier.h>
>  #include <linux/vt_kern.h>
>  #include <linux/module.h>
>  #include <linux/random.h>
> @@ -52,14 +51,23 @@ static unsigned long tainted_mask =
>  static int pause_on_oops;
>  static int pause_on_oops_flag;
>  static DEFINE_SPINLOCK(pause_on_oops_lock);
> -bool crash_kexec_post_notifiers;
> +
>  int panic_on_warn __read_mostly;
> +bool panic_on_taint_nousertaint;
>  unsigned long panic_on_taint;
> -bool panic_on_taint_nousertaint = false;
> 
>  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>  EXPORT_SYMBOL_GPL(panic_timeout);
> 
> +/* Initialized with all notifiers set to run before kdump */
> +static unsigned long panic_notifiers_bits = 15;
> +
> +/* Default level is 2, see kernel-parameters.txt */
> +unsigned int panic_notifiers_level = 2;
> +
> +/* DEPRECATED in favor of panic_notifiers_level */
> +bool crash_kexec_post_notifiers;
> +
>  #define PANIC_PRINT_TASK_INFO		0x00000001
>  #define PANIC_PRINT_MEM_INFO		0x00000002
>  #define PANIC_PRINT_TIMER_INFO		0x00000004
> @@ -109,10 +117,14 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs)
>  }
> 
>  /*
> - * Stop other CPUs in panic.  Architecture dependent code may override this
> - * with more suitable version.  For example, if the architecture supports
> - * crash dump, it should save registers of each stopped CPU and disable
> - * per-CPU features such as virtualization extensions.
> + * Stop other CPUs in panic context.
> + *
> + * Architecture dependent code may override this with more suitable version.
> + * For example, if the architecture supports crash dump, it should save the
> + * registers of each stopped CPU and disable per-CPU features such as
> + * virtualization extensions. When not overridden in arch code (and for
> + * x86/xen), this is exactly the same as execute smp_send_stop(), but
> + * guarded against duplicate execution.
>   */
>  void __weak crash_smp_send_stop(void)
>  {
> @@ -183,6 +195,112 @@ static void panic_print_sys_info(bool console_flush)
>  		ftrace_dump(DUMP_ALL);
>  }
> 
> +/*
> + * Helper that accumulates all console flushing routines executed on panic.
> + */
> +static void console_flushing(void)
> +{
> +#ifdef CONFIG_VT
> +	unblank_screen();
> +#endif
> +	console_unblank();
> +
> +	/*
> +	 * In this point, we may have disabled other CPUs, hence stopping the
> +	 * CPU holding the lock while still having some valuable data in the
> +	 * console buffer.
> +	 *
> +	 * Try to acquire the lock then release it regardless of the result.
> +	 * The release will also print the buffers out. Locks debug should
> +	 * be disabled to avoid reporting bad unlock balance when panic()
> +	 * is not being called from OOPS.
> +	 */
> +	debug_locks_off();
> +	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
> +	panic_print_sys_info(true);
> +}
> +
> +#define PN_HYPERVISOR_BIT	0
> +#define PN_INFO_BIT		1
> +#define PN_PRE_REBOOT_BIT	2
> +#define PN_POST_REBOOT_BIT	3
> +
> +/*
> + * Determine the order of panic notifiers with regards to kdump.
> + *
> + * This function relies in the "panic_notifiers_level" kernel parameter
> + * to determine how to order the notifiers with regards to kdump. We
> + * have currently 5 levels. For details, please check the kernel docs for
> + * "panic_notifiers_level" at Documentation/admin-guide/kernel-parameters.txt.
> + *
> + * Default level is 2, which means the panic hypervisor and informational
> + * (unless we don't have any kmsg_dumper) lists will execute before kdump.
> + */
> +static void order_panic_notifiers_and_kdump(void)
> +{
> +	/*
> +	 * The parameter "crash_kexec_post_notifiers" is deprecated, but
> +	 * valid. Users that set it want really all panic notifiers to
> +	 * execute before kdump, so it's effectively the same as setting
> +	 * the panic notifiers level to 4.
> +	 */
> +	if (panic_notifiers_level >= 4 || crash_kexec_post_notifiers)
> +		return;
> +
> +	/*
> +	 * Based on the level configured (smaller than 4), we clear the
> +	 * proper bits in "panic_notifiers_bits". Notice that this bitfield
> +	 * is initialized with all notifiers set.
> +	 */
> +	switch (panic_notifiers_level) {
> +	case 3:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		break;
> +	case 2:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +
> +		if (!kmsg_has_dumpers())
> +			clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		break;
> +	case 1:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		break;
> +	case 0:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
> +		break;
> +	}

I think the above switch statement could be done as follows:

if (panic_notifiers_level <= 3)
	clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
if (panic_notifiers_level <= 2)
	if (!kmsg_has_dumpers())
		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
if (panic_notifiers_level <=1)
	clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
if (panic_notifiers_level == 0)
	clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);

That's about half the lines of code.  It's somewhat a matter of style,
so treat this as just a suggestion to consider.  I just end up looking
for a better solution when I see the same line of code repeated
3 or 4 times!

> +}
> +
> +/*
> + * Set of helpers to execute the panic notifiers only once.
> + * Just the informational notifier cares about the return.
> + */
> +static inline bool notifier_run_once(struct atomic_notifier_head head,
> +				     char *buf, long bit)
> +{
> +	if (test_and_change_bit(bit, &panic_notifiers_bits)) {
> +		atomic_notifier_call_chain(&head, PANIC_NOTIFIER, buf);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +#define panic_notifier_hypervisor_once(buf)\
> +	notifier_run_once(panic_hypervisor_list, buf, PN_HYPERVISOR_BIT)
> +
> +#define panic_notifier_info_once(buf)\
> +	notifier_run_once(panic_info_list, buf, PN_INFO_BIT)
> +
> +#define panic_notifier_pre_reboot_once(buf)\
> +	notifier_run_once(panic_pre_reboot_list, buf, PN_PRE_REBOOT_BIT)
> +
> +#define panic_notifier_post_reboot_once(buf)\
> +	notifier_run_once(panic_post_reboot_list, buf, PN_POST_REBOOT_BIT)
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -198,32 +316,29 @@ void panic(const char *fmt, ...)
>  	long i, i_next = 0, len;
>  	int state = 0;
>  	int old_cpu, this_cpu;
> -	bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
> 
> -	if (panic_on_warn) {
> -		/*
> -		 * This thread may hit another WARN() in the panic path.
> -		 * Resetting this prevents additional WARN() from panicking the
> -		 * system on this thread.  Other threads are blocked by the
> -		 * panic_mutex in panic().
> -		 */
> -		panic_on_warn = 0;
> -	}
> +	/*
> +	 * This thread may hit another WARN() in the panic path, so
> +	 * resetting this option prevents additional WARN() from
> +	 * re-panicking the system here.
> +	 */
> +	panic_on_warn = 0;
> 
>  	/*
>  	 * Disable local interrupts. This will prevent panic_smp_self_stop
> -	 * from deadlocking the first cpu that invokes the panic, since
> -	 * there is nothing to prevent an interrupt handler (that runs
> -	 * after setting panic_cpu) from invoking panic() again.
> +	 * from deadlocking the first cpu that invokes the panic, since there
> +	 * is nothing to prevent an interrupt handler (that runs after setting
> +	 * panic_cpu) from invoking panic() again. Also disables preemption
> +	 * here - notice it's not safe to rely on interrupt disabling to avoid
> +	 * preemption, since any cond_resched() or cond_resched_lock() might
> +	 * trigger a reschedule if the preempt count is 0 (for reference, see
> +	 * Documentation/locking/preempt-locking.rst). Some functions called
> +	 * from here want preempt disabled, so no point enabling it later.
>  	 */
>  	local_irq_disable();
>  	preempt_disable_notrace();
> 
>  	/*
> -	 * It's possible to come here directly from a panic-assertion and
> -	 * not have preempt disabled. Some functions called from here want
> -	 * preempt to be disabled. No point enabling it later though...
> -	 *
>  	 * Only one CPU is allowed to execute the panic code from here. For
>  	 * multiple parallel invocations of panic, all other CPUs either
>  	 * stop themself or will wait until they are stopped by the 1st CPU
> @@ -266,73 +381,75 @@ void panic(const char *fmt, ...)
>  	kgdb_panic(buf);
> 
>  	/*
> -	 * If we have crashed and we have a crash kernel loaded let it handle
> -	 * everything else.
> -	 * If we want to run this after calling panic_notifiers, pass
> -	 * the "crash_kexec_post_notifiers" option to the kernel.
> +	 * Here lies one of the most subtle parts of the panic path,
> +	 * the panic notifiers and their order with regards to kdump.
> +	 * We currently have 4 sets of notifiers:
>  	 *
> -	 * Bypass the panic_cpu check and call __crash_kexec directly.
> +	 *  - the hypervisor list is composed by callbacks that are related
> +	 *  to warn the FW / hypervisor about panic, or non-invasive LED
> +	 *  controlling functions - (hopefully) low-risk for kdump, should
> +	 *  run early if possible.
> +	 *
> +	 *  - the informational list is composed by functions dumping data
> +	 *  like kernel offsets, device error registers or tracing buffer;
> +	 *  also log flooding prevention callbacks fit in this list. It is
> +	 *  relatively safe to run before kdump.
> +	 *
> +	 *  - the pre_reboot list basically is everything else, all the
> +	 *  callbacks that don't fit in the 2 previous lists. It should
> +	 *  run *after* kdump if possible, as it contains high-risk
> +	 *  functions that may break kdump.
> +	 *
> +	 *  - we also have a 4th list of notifiers, the post_reboot
> +	 *  callbacks. This is not strongly related to kdump since it's
> +	 *  always executed late in the panic path, after the restart
> +	 *  mechanism (if set); its goal is to provide a way for
> +	 *  architecture code effectively power-off/disable the system.
> +	 *
> +	 *  The kernel provides the "panic_notifiers_level" parameter
> +	 *  to adjust the ordering in which these notifiers should run
> +	 *  with regards to kdump - the default level is 2, so both the
> +	 *  hypervisor and informational notifiers should execute before
> +	 *  the __crash_kexec(); the info notifier won't run by default
> +	 *  unless there's some kmsg_dumper() registered. For details
> +	 *  about it, check Documentation/admin-guide/kernel-parameters.txt.
> +	 *
> +	 *  Notice that the code relies in bits set/clear operations to
> +	 *  determine the ordering, functions *_once() execute only one
> +	 *  time, as their name implies. The goal is to prevent too much
> +	 *  if conditionals and more confusion. Finally, regarding CPUs
> +	 *  disabling: unless NO panic notifier executes before kdump,
> +	 *  we always disable secondary CPUs before __crash_kexec() and
> +	 *  the notifiers execute.
>  	 */
> -	if (!_crash_kexec_post_notifiers) {
> +	order_panic_notifiers_and_kdump();
> +
> +	/* If no level, we should kdump ASAP. */
> +	if (!panic_notifiers_level)
>  		__crash_kexec(NULL);
> 
> -		/*
> -		 * Note smp_send_stop is the usual smp shutdown function, which
> -		 * unfortunately means it may not be hardened to work in a
> -		 * panic situation.
> -		 */
> -		smp_send_stop();
> -	} else {
> -		/*
> -		 * If we want to do crash dump after notifier calls and
> -		 * kmsg_dump, we will need architecture dependent extra
> -		 * works in addition to stopping other CPUs.
> -		 */
> -		crash_smp_send_stop();
> -	}
> +	crash_smp_send_stop();
> +	panic_notifier_hypervisor_once(buf);
> 
> -	/*
> -	 * Run any panic handlers, including those that might need to
> -	 * add information to the kmsg dump output.
> -	 */
> -	atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
> -	atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
> -	atomic_notifier_call_chain(&panic_pre_reboot_list, PANIC_NOTIFIER, buf);
> +	if (panic_notifier_info_once(buf)) {
> +		panic_print_sys_info(false);
> +		kmsg_dump(KMSG_DUMP_PANIC);
> +	}
> 
> -	panic_print_sys_info(false);
> +	panic_notifier_pre_reboot_once(buf);
> 
> -	kmsg_dump(KMSG_DUMP_PANIC);
> +	__crash_kexec(NULL);
> 
> -	/*
> -	 * If you doubt kdump always works fine in any situation,
> -	 * "crash_kexec_post_notifiers" offers you a chance to run
> -	 * panic_notifiers and dumping kmsg before kdump.
> -	 * Note: since some panic_notifiers can make crashed kernel
> -	 * more unstable, it can increase risks of the kdump failure too.
> -	 *
> -	 * Bypass the panic_cpu check and call __crash_kexec directly.
> -	 */
> -	if (_crash_kexec_post_notifiers)
> -		__crash_kexec(NULL);
> +	panic_notifier_hypervisor_once(buf);
> 
> -#ifdef CONFIG_VT
> -	unblank_screen();
> -#endif
> -	console_unblank();
> -
> -	/*
> -	 * We may have ended up stopping the CPU holding the lock (in
> -	 * smp_send_stop()) while still having some valuable data in the console
> -	 * buffer.  Try to acquire the lock then release it regardless of the
> -	 * result.  The release will also print the buffers out.  Locks debug
> -	 * should be disabled to avoid reporting bad unlock balance when
> -	 * panic() is not being callled from OOPS.
> -	 */
> -	debug_locks_off();
> -	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +	if (panic_notifier_info_once(buf)) {
> +		panic_print_sys_info(false);
> +		kmsg_dump(KMSG_DUMP_PANIC);
> +	}
> 
> -	panic_print_sys_info(true);
> +	panic_notifier_pre_reboot_once(buf);
> 
> +	console_flushing();
>  	if (!panic_blink)
>  		panic_blink = no_blink;
> 
> @@ -363,8 +480,7 @@ void panic(const char *fmt, ...)
>  		emergency_restart();
>  	}
> 
> -	atomic_notifier_call_chain(&panic_post_reboot_list,
> -				   PANIC_NOTIFIER, buf);
> +	panic_notifier_post_reboot_once(buf);
> 
>  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> 
> @@ -383,6 +499,15 @@ void panic(const char *fmt, ...)
> 
>  EXPORT_SYMBOL(panic);
> 
> +/*
> + * Helper used in the kexec code, to validate if any
> + * panic notifier is set to execute early, before kdump.
> + */
> +inline bool panic_notifiers_before_kdump(void)
> +{
> +	return panic_notifiers_level || crash_kexec_post_notifiers;
> +}
> +
>  /*
>   * TAINT_FORCED_RMMOD could be a per-module flag but the module
>   * is being removed anyway.
> @@ -692,6 +817,9 @@ core_param(panic, panic_timeout, int, 0644);
>  core_param(panic_print, panic_print, ulong, 0644);
>  core_param(pause_on_oops, pause_on_oops, int, 0644);
>  core_param(panic_on_warn, panic_on_warn, int, 0644);
> +core_param(panic_notifiers_level, panic_notifiers_level, uint, 0644);
> +
> +/* DEPRECATED in favor of panic_notifiers_level */
>  core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644);
> 
>  static int __init oops_setup(char *s)
> diff --git a/tools/testing/selftests/pstore/pstore_crash_test
> b/tools/testing/selftests/pstore/pstore_crash_test
> index 2a329bbb4aca..1e60ce4501aa 100755
> --- a/tools/testing/selftests/pstore/pstore_crash_test
> +++ b/tools/testing/selftests/pstore/pstore_crash_test
> @@ -25,6 +25,7 @@ touch $REBOOT_FLAG
>  sync
> 
>  # cause crash
> -# Note: If you use kdump and want to see kmesg-* files after reboot, you should
> -#       specify 'crash_kexec_post_notifiers' in 1st kernel's cmdline.
> +# Note: If you use kdump and want to see kmsg-* files after reboot, you should
> +#       be sure that the parameter "panic_notifiers_level" is more than '2' (the
> +#       default value for this parameter is '2') in the first kernel's cmdline.
>  echo c > /proc/sysrq-trigger
> --
> 2.36.0


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-04-29 17:30   ` Michael Kelley (LINUX)
@ 2022-04-29 18:04     ` Guilherme G. Piccoli
  2022-05-03 17:44       ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 18:04 UTC (permalink / raw)
  To: Michael Kelley (LINUX), akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alexander Gordeev, Andrea Parri,
	Ard Biesheuvel, Benjamin Herrenschmidt, Brian Norris,
	Christian Borntraeger, Christophe JAILLET, David Gow,
	David S. Miller, Dexuan Cui, Doug Berger, Evan Green,
	Florian Fainelli, Haiyang Zhang, Hari Bathini, Heiko Carstens,
	Julius Werner, Justin Chen, KY Srinivasan, Lee Jones,
	Markus Mayer, Michael Ellerman, Mihai Carabas, Nicholas Piggin,
	Paul Mackerras, Pavel Machek, Scott Branden, Sebastian Reichel,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On 29/04/2022 14:30, Michael Kelley (LINUX) wrote:
> From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM
>> [...]
>>
>> @@ -2843,7 +2843,7 @@ static void __exit vmbus_exit(void)
>>  	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
>>  		kmsg_dump_unregister(&hv_kmsg_dumper);
>>  		unregister_die_notifier(&hyperv_die_report_block);
>> -		atomic_notifier_chain_unregister(&panic_notifier_list,
>> +		atomic_notifier_chain_unregister(&panic_hypervisor_list,
>>  						&hyperv_panic_report_block);
>>  	}
>>
> 
> Using the hypervisor_list here produces a bit of a mismatch.  In many cases
> this notifier will do nothing, and will defer to the kmsg_dump() mechanism
> to notify the hypervisor about the panic.   Running the kmsg_dump()
> mechanism is linked to the info_list, so I'm thinking the Hyper-V panic report
> notifier should be on the info_list as well.  That way the reporting behavior
> is triggered at the same point in the panic path regardless of which
> reporting mechanism is used.
> 

Hi Michael, thanks for your feedback! I agree that your idea could work,
but...there is one downside: imagine the kmsg_dump() approach is not set
in some Hyper-V guest, then we would rely in the regular notification
mechanism [hv_die_panic_notify_crash()], right?
But...you want then to run this notifier in the informational list,
which...won't execute *by default* before kdump if no kmsg_dump() is
set. So, this logic is convoluted when you mix it with the default level
concept + kdump.

May I suggest something? If possible, take a run with this patch set +
DEBUG_NOTIFIER=y, in *both* cases (with and without the kmsg_dump()
set). I did that and they run almost at the same time...I've checked the
notifiers called, it's like almost nothing runs in-between.

I feel the panic notification mechanism does really fit with a
hypervisor list, it's a good match with the nature of the list, which
aims at informing the panic notification to the hypervisor/FW.
Of course we can modify it if you prefer...but please take into account
the kdump case and how it complicates the logic.

Let me know your considerations, in case you can experiment with the
patch set as-is.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  2022-04-27 22:48 ` [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path Guilherme G. Piccoli
  2022-04-29 16:26   ` Michael Kelley (LINUX)
@ 2022-04-29 18:20   ` Marc Zyngier
  2022-04-29 21:38     ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Marc Zyngier @ 2022-04-29 18:20 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Russell King

On Wed, 27 Apr 2022 23:48:56 +0100,
"Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
> 
> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> is responsible for that. This makes sense, since we're turning off
> such CPUs, putting them in an endless busy-wait loop.
> 
> Problem is that there is an alternative path for disabling CPUs,
> in the form of function crash_smp_send_stop(), used for kexec/panic
> paths. This functions relies in a SMP call that also triggers a
> busy-wait loop [at machine_crash_nonpanic_core()], but *without*
> disabling interrupts. This might lead to odd scenarios, like early
> interrupts in the boot of kexec'd kernel or even interrupts in
> other CPUs while the main one still works in the panic path and
> assumes all secondary CPUs are (really!) off.
> 
> This patch mimics the ipi_cpu_stop() interrupt disable mechanism
> in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
> secondary CPUs in the kexec/panic path as well.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  arch/arm/kernel/machine_kexec.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index f567032a09c0..ef788ee00519 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
>  	set_cpu_online(smp_processor_id(), false);
>  	atomic_dec(&waiting_for_crash_ipi);
>  
> +	local_fiq_disable();
> +	local_irq_disable();
> +

My expectations would be that, since we're getting here using an IPI,
interrupts are already masked. So what reenabled them the first place?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 13/30] s390/consoles: Improve panic notifiers reliability
  2022-04-27 22:49 ` [PATCH 13/30] s390/consoles: Improve panic notifiers reliability Guilherme G. Piccoli
@ 2022-04-29 18:46   ` Heiko Carstens
  2022-04-29 19:31     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Heiko Carstens @ 2022-04-29 18:46 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Vasily Gorbik

On Wed, Apr 27, 2022 at 07:49:07PM -0300, Guilherme G. Piccoli wrote:
> Currently many console drivers for s390 rely on panic/reboot notifiers
> to invoke callbacks on these events. The panic() function disables local
> IRQs, secondary CPUs and preemption, so callbacks invoked on panic are
> effectively running in atomic context.
> 
> Happens that most of these console callbacks from s390 doesn't take the
> proper care with regards to atomic context, like taking spinlocks that
> might be taken in other function/CPU and hence will cause a lockup
> situation.
> 
> The goal for this patch is to improve the notifiers reliability, acting
> on 4 console drivers, as detailed below:
> 
> (1) con3215: changed a regular spinlock to the trylock alternative.
> 
> (2) con3270: also changed a regular spinlock to its trylock counterpart,
> but here we also have another problem: raw3270_activate_view() takes a
> different spinlock. So, we worked a helper to validate if this other lock
> is safe to acquire, and if so, raw3270_activate_view() should be safe.
> 
> Notice though that there is a functional change here: it's now possible
> to continue the notifier code [reaching con3270_wait_write() and
> con3270_rebuild_update()] without executing raw3270_activate_view().
> 
> (3) sclp: a global lock is used heavily in the functions called from
> the notifier, so we added a check here - if the lock is taken already,
> we just bail-out, preventing the lockup.
> 
> (4) sclp_vt220: same as (3), a lock validation was added to prevent the
> potential lockup problem.
> 
> Besides (1)-(4), we also removed useless void functions, adding the
> code called from the notifier inside its own body, and changed the
> priority of such notifiers to execute late, since they are "heavyweight"
> for the panic environment, so we aim to reduce risks here.
> Changed return values to NOTIFY_DONE as well, the standard one.
> 
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> As a design choice, the option used here to verify a given spinlock is taken
> was the function "spin_is_locked()" - but we noticed that it is not often used.
> An alternative would to take the lock with a spin_trylock() and if it succeeds,
> just release the spinlock and continue the code. But that seemed weird...
> 
> Also, we'd like to ask a good validation of case (2) potential functionality
> change from the s390 console experts - far from expert here, and in our naive
> code observation, that seems fine, but that analysis might be missing some
> corner case.
> 
> Thanks in advance!
> 
>  drivers/s390/char/con3215.c    | 36 +++++++++++++++--------------
>  drivers/s390/char/con3270.c    | 34 +++++++++++++++------------
>  drivers/s390/char/raw3270.c    | 18 +++++++++++++++
>  drivers/s390/char/raw3270.h    |  1 +
>  drivers/s390/char/sclp_con.c   | 28 +++++++++++++----------
>  drivers/s390/char/sclp_vt220.c | 42 +++++++++++++++++++---------------
>  6 files changed, 96 insertions(+), 63 deletions(-)

Code looks good, and everything still seems to work. I applied this
internally for the time being, and if it passes testing, I'll schedule
it for the next merge window.

Thanks!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 13/30] s390/consoles: Improve panic notifiers reliability
  2022-04-29 18:46   ` Heiko Carstens
@ 2022-04-29 19:31     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 19:31 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Vasily Gorbik

On 29/04/2022 15:46, Heiko Carstens wrote:
> [...]
> 
> Code looks good, and everything still seems to work. I applied this
> internally for the time being, and if it passes testing, I'll schedule
> it for the next merge window.
> 
> Thanks!

Perfect Heiko, thanks a bunch for your review and tests!
Let me know if anything breaks heh
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-04-29 16:04   ` Max Filippov
@ 2022-04-29 19:34     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 19:34 UTC (permalink / raw)
  To: Max Filippov
  Cc: Andrew Morton, bhe, Petr Mladek, kexec, LKML,
	bcm-kernel-feedback-list, linuxppc-dev, open list:ALPHA PORT,
	linux-edac, linux-hyperv, linux-leds, linux-mips,
	open list:PARISC ARCHITECTURE, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um,
	open list:TENSILICA XTENSA PORT (xtensa),
	netdev, openipmi-developer, rcu, open list:SPARC + UltraSPAR...,
	xen-devel, maintainer:X86 ARCHITECTURE...,
	kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	Andy Shevchenko, Arnd Bergmann, Borislav Petkov, Jonathan Corbet,
	d.hatayama, Dave Hansen, dyoung, feng.tang, Greg Kroah-Hartman,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, Kees Cook,
	Andrew Lutomirski, Masami Hiramatsu, Ingo Molnar, paulmck,
	Peter Zijlstra, Steven Rostedt, Sergey Senozhatsky, stern,
	Thomas Gleixner, vgoyal, vkuznets, Will Deacon, Alex Elder,
	Alexander Gordeev, Anton Ivanov, Benjamin Herrenschmidt,
	Bjorn Andersson, Boris Ostrovsky, Chris Zankel,
	Christian Borntraeger, Corey Minyard, Dexuan Cui, H. Peter Anvin,
	Haiyang Zhang, Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Michael Ellerman, Paul Mackerras,
	Pavel Machek, Richard Henderson, Richard Weinberger,
	Robert Richter, Stefano Stabellini, Stephen Hemminger,
	Sven Schnelle, Tony Luck, Vasily Gorbik, Wei Liu

On 29/04/2022 13:04, Max Filippov wrote:
> [...]
>>  arch/xtensa/platforms/iss/setup.c     |  4 ++--For xtensa:
> 
> For xtensa:
> Acked-by: Max Filippov <jcmvbkbc@gmail.com>
> 

Perfect, thanks Max!
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  2022-04-28  1:01   ` Xiaoming Ni
@ 2022-04-29 19:38     ` Guilherme G. Piccoli
  2022-05-10 17:29     ` Steven Rostedt
  1 sibling, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 19:38 UTC (permalink / raw)
  To: Xiaoming Ni, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Arjan van de Ven, Cong Wang,
	Sebastian Andrzej Siewior, Valentin Schneider

On 27/04/2022 22:01, Xiaoming Ni wrote:
> [...]
> Duplicate Code.
> 
> Is it better to use __func__ and %pS?
> 
> pr_info("%s: %pS\n", __func__, n->notifier_call);
> 
> 

This is a great suggestion Xiaoming, much appreciated!
I feel like reinventing the wheel here - with your idea, code was super
clear and concise, very nice suggestion!!

The only 2 things that diverge from your idea: I'm using '%ps' (not
showing offsets) and also, kept the wording "(un)registered/calling",
not using __func__ - I feel it's a bit odd in the output.
OK for you?

I'm definitely using your idea in V2 heh
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-04-29 17:53   ` Michael Kelley (LINUX)
@ 2022-04-29 20:38     ` Guilherme G. Piccoli
  2022-05-03 17:31       ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 20:38 UTC (permalink / raw)
  To: Michael Kelley (LINUX), akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On 29/04/2022 14:53, Michael Kelley (LINUX) wrote:
> From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM
>> [...]
>> +	panic_notifiers_level=
>> +			[KNL] Set the panic notifiers execution order.
>> +			Format: <unsigned int>
>> +			We currently have 4 lists of panic notifiers; based
>> +			on the functionality and risk (for panic success) the
>> +			callbacks are added in a given list. The lists are:
>> +			- hypervisor/FW notification list (low risk);
>> +			- informational list (low/medium risk);
>> +			- pre_reboot list (higher risk);
>> +			- post_reboot list (only run late in panic and after
>> +			kdump, not configurable for now).
>> +			This parameter defines the ordering of the first 3
>> +			lists with regards to kdump; the levels determine
>> +			which set of notifiers execute before kdump. The
>> +			accepted levels are:
>> +			0: kdump is the first thing to run, NO list is
>> +			executed before kdump.
>> +			1: only the hypervisor list is executed before kdump.
>> +			2 (default level): the hypervisor list and (*if*
>> +			there's any kmsg_dumper defined) the informational
>> +			list are executed before kdump.
>> +			3: both the hypervisor and the informational lists
>> +			(always) execute before kdump.
> 
> I'm not clear on why level 2 exists.  What is the scenario where
> execution of the info list before kdump should be conditional on the
> existence of a kmsg_dumper?   Maybe the scenario is described
> somewhere in the patch set and I just missed it.
> 

Hi Michael, thanks for your review/consideration. So, this idea started
kind of some time ago. It all started with a need of exposing more
information on kernel log *before* kdump and *before* pstore -
specifically, we're talking about panic_print. But this cause some
reactions, Baoquan was very concerned with that [0]. Soon after, I've
proposed a panic notifiers filter (orthogonal) approach, to which Petr
suggested instead doing a major refactor [1] - it finally is alive in
the form of this series.

The theory behind the level 2 is to allow a scenario of kdump with the
minimum amount of notifiers - what is the point in printing more
information if the user doesn't care, since it's going to kdump? Now, if
there is a kmsg dumper, it means that there is likely some interest in
collecting information, and that might as well be required before the
potential kdump (which is my case, hence the proposal on [0]).

Instead of forcing one of the two behaviors (level 1 or level 3), we
have a middle-term/compromise: if there's interest in collecting such
data (in the form of a kmsg dumper), we then execute the informational
notifiers before kdump. If not, why to increase (even slightly) the risk
for kdump?

I'm OK in removing the level 2 if people prefer, but I don't feel it's a
burden, quite opposite - seems a good way to accommodate the somewhat
antagonistic ideas (jump to kdump ASAP vs collecting more info in the
panicked kernel log).

[0] https://lore.kernel.org/lkml/20220126052246.GC2086@MiWiFi-R3L-srv/

[1] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/


>[...]
>> +	 * Based on the level configured (smaller than 4), we clear the
>> +	 * proper bits in "panic_notifiers_bits". Notice that this bitfield
>> +	 * is initialized with all notifiers set.
>> +	 */
>> +	switch (panic_notifiers_level) {
>> +	case 3:
>> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
>> +		break;
>> +	case 2:
>> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
>> +
>> +		if (!kmsg_has_dumpers())
>> +			clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
>> +		break;
>> +	case 1:
>> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
>> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
>> +		break;
>> +	case 0:
>> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
>> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
>> +		clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
>> +		break;
>> +	}
> 
> I think the above switch statement could be done as follows:
> 
> if (panic_notifiers_level <= 3)
> 	clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> if (panic_notifiers_level <= 2)
> 	if (!kmsg_has_dumpers())
> 		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> if (panic_notifiers_level <=1)
> 	clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> if (panic_notifiers_level == 0)
> 	clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
> 
> That's about half the lines of code.  It's somewhat a matter of style,
> so treat this as just a suggestion to consider.  I just end up looking
> for a better solution when I see the same line of code repeated
> 3 or 4 times!
> 

It's a good idea - I liked your code. The switch seems more
natural/explicit for me, even duplicating some lines, but in case more
people prefer your way, I can definitely change the code - thanks for
the suggestion.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  2022-04-29 18:20   ` Marc Zyngier
@ 2022-04-29 21:38     ` Guilherme G. Piccoli
  2022-04-29 21:45       ` Russell King (Oracle)
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 21:38 UTC (permalink / raw)
  To: Marc Zyngier, Michael Kelley (LINUX)
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Russell King

Thanks Marc and Michael for the review/discussion.

On 29/04/2022 15:20, Marc Zyngier wrote:
> [...]

> My expectations would be that, since we're getting here using an IPI,
> interrupts are already masked. So what reenabled them the first place?
> 
> Thanks,
> 
> 	M.
> 

Marc, I did some investigation in the code (and tried/failed in the ARM
documentation as well heh), but this is still not 100% clear for me.

You're saying IPI calls disable IRQs/FIQs by default in the the target
CPUs? Where does it happen? I'm a bit confused if this a processor
mechanism, or it's in code.

Looking the smp_send_stop() in arch/arm/, it does IPI the CPUs, with the
flag IPI_CPU_STOP, eventually calling ipi_cpu_stop(), and the latter
does disable IRQ/FIQ in code - that's where I stole my code from.

But crash_smp_send_stop() is different, it seems to IPI the other CPUs
with the flag IPI_CALL_FUNC, which leads to calling
generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
as well? I couldn't find it.

Appreciate your clarifications about that, thanks again.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  2022-04-29 21:38     ` Guilherme G. Piccoli
@ 2022-04-29 21:45       ` Russell King (Oracle)
  2022-04-29 21:56         ` Guilherme G. Piccoli
  2022-04-29 22:00         ` Marc Zyngier
  0 siblings, 2 replies; 183+ messages in thread
From: Russell King (Oracle) @ 2022-04-29 21:45 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Marc Zyngier, Michael Kelley (LINUX),
	akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Fri, Apr 29, 2022 at 06:38:19PM -0300, Guilherme G. Piccoli wrote:
> Thanks Marc and Michael for the review/discussion.
> 
> On 29/04/2022 15:20, Marc Zyngier wrote:
> > [...]
> 
> > My expectations would be that, since we're getting here using an IPI,
> > interrupts are already masked. So what reenabled them the first place?
> > 
> > Thanks,
> > 
> > 	M.
> > 
> 
> Marc, I did some investigation in the code (and tried/failed in the ARM
> documentation as well heh), but this is still not 100% clear for me.
> 
> You're saying IPI calls disable IRQs/FIQs by default in the the target
> CPUs? Where does it happen? I'm a bit confused if this a processor
> mechanism, or it's in code.

When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are
themselves interrupts, so IRQs will be masked while the IPI is being
processed. Therefore, there should be no need to re-disable the
already disabled interrupts.

> But crash_smp_send_stop() is different, it seems to IPI the other CPUs
> with the flag IPI_CALL_FUNC, which leads to calling
> generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
> as well? I couldn't find it.

It's buried in the architecture behaviour. When the CPU takes an
interrupt and jumps to the interrupt vector in the vectors page, it is
architecturally defined that interrupts will be disabled. If they
weren't architecturally disabled at this point, then as soon as the
first instruction is processed (at the interrupt vector, likely a
branch) the CPU would immediately take another jump to the interrupt
vector, and this process would continue indefinitely, making interrupt
handling utterly useless.

So, you won't find an explicit instruction in the code path from the
vectors to the IPI handler that disables interrupts - because it's
written into the architecture that this is what must happen.

IRQs are a lower priority than FIQs, so FIQs remain unmasked.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  2022-04-29 21:45       ` Russell King (Oracle)
@ 2022-04-29 21:56         ` Guilherme G. Piccoli
  2022-04-29 22:00         ` Marc Zyngier
  1 sibling, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 21:56 UTC (permalink / raw)
  To: Russell King (Oracle), Marc Zyngier
  Cc: Michael Kelley (LINUX),
	akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On 29/04/2022 18:45, Russell King (Oracle) wrote:
> [...]
>> Marc, I did some investigation in the code (and tried/failed in the ARM
>> documentation as well heh), but this is still not 100% clear for me.
>>
>> You're saying IPI calls disable IRQs/FIQs by default in the the target
>> CPUs? Where does it happen? I'm a bit confused if this a processor
>> mechanism, or it's in code.
> 
> When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are
> themselves interrupts, so IRQs will be masked while the IPI is being
> processed. Therefore, there should be no need to re-disable the
> already disabled interrupts.
> 
>> But crash_smp_send_stop() is different, it seems to IPI the other CPUs
>> with the flag IPI_CALL_FUNC, which leads to calling
>> generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
>> as well? I couldn't find it.
> 
> It's buried in the architecture behaviour. When the CPU takes an
> interrupt and jumps to the interrupt vector in the vectors page, it is
> architecturally defined that interrupts will be disabled. If they
> weren't architecturally disabled at this point, then as soon as the
> first instruction is processed (at the interrupt vector, likely a
> branch) the CPU would immediately take another jump to the interrupt
> vector, and this process would continue indefinitely, making interrupt
> handling utterly useless.
> 
> So, you won't find an explicit instruction in the code path from the
> vectors to the IPI handler that disables interrupts - because it's
> written into the architecture that this is what must happen.
> 
> IRQs are a lower priority than FIQs, so FIQs remain unmasked.
> 

Thanks a lot for the *great* explanation Russell, much appreciated.
So, this leads to the both following questions:

a) Shall we then change the patch to only disable FIQs, since it's panic
path and we don't want secondary CPUs getting interrupted, but only
spinning quietly "forever"?

b) How about cleaning ipi_cpu_stop() then, by dropping the call to
local_irq_disable() there, to avoid the double IRQ disabling?

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  2022-04-29 21:45       ` Russell King (Oracle)
  2022-04-29 21:56         ` Guilherme G. Piccoli
@ 2022-04-29 22:00         ` Marc Zyngier
  1 sibling, 0 replies; 183+ messages in thread
From: Marc Zyngier @ 2022-04-29 22:00 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Guilherme G. Piccoli, Michael Kelley (LINUX),
	akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Fri, 29 Apr 2022 22:45:14 +0100,
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> On Fri, Apr 29, 2022 at 06:38:19PM -0300, Guilherme G. Piccoli wrote:
> > Thanks Marc and Michael for the review/discussion.
> > 
> > On 29/04/2022 15:20, Marc Zyngier wrote:
> > > [...]
> > 
> > > My expectations would be that, since we're getting here using an IPI,
> > > interrupts are already masked. So what reenabled them the first place?
> > > 
> > > Thanks,
> > > 
> > > 	M.
> > > 
> > 
> > Marc, I did some investigation in the code (and tried/failed in the ARM
> > documentation as well heh), but this is still not 100% clear for me.
> > 
> > You're saying IPI calls disable IRQs/FIQs by default in the the target
> > CPUs? Where does it happen? I'm a bit confused if this a processor
> > mechanism, or it's in code.
> 
> When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are
> themselves interrupts, so IRQs will be masked while the IPI is being
> processed. Therefore, there should be no need to re-disable the
> already disabled interrupts.
> 
> > But crash_smp_send_stop() is different, it seems to IPI the other CPUs
> > with the flag IPI_CALL_FUNC, which leads to calling
> > generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
> > as well? I couldn't find it.
> 
> It's buried in the architecture behaviour. When the CPU takes an
> interrupt and jumps to the interrupt vector in the vectors page, it is
> architecturally defined that interrupts will be disabled. If they
> weren't architecturally disabled at this point, then as soon as the
> first instruction is processed (at the interrupt vector, likely a
> branch) the CPU would immediately take another jump to the interrupt
> vector, and this process would continue indefinitely, making interrupt
> handling utterly useless.
> 
> So, you won't find an explicit instruction in the code path from the
> vectors to the IPI handler that disables interrupts - because it's
> written into the architecture that this is what must happen.
> 
> IRQs are a lower priority than FIQs, so FIQs remain unmasked.

Ah, you're of course right. That's one of the huge differences between
AArch32 and AArch64, where the former has per target mode masking
rules, and the later masks everything on entry...

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  2022-04-29 17:16   ` Michael Kelley (LINUX)
@ 2022-04-29 22:35     ` Guilherme G. Piccoli
  2022-05-03 18:13       ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-04-29 22:35 UTC (permalink / raw)
  To: Michael Kelley (LINUX), akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Andrea Parri, Dexuan Cui, Haiyang Zhang, KY Srinivasan,
	Stephen Hemminger, Tianyu Lan, Wei Liu

Hi Michael, first of all thanks for the great review, much appreciated.
Some comments inline below:

On 29/04/2022 14:16, Michael Kelley (LINUX) wrote:
> [...]
>> hypervisor I/O completion), so we postpone that to run late. But more
>> relevant: this *same* vmbus unloading happens in the crash_shutdown()
>> handler, so if kdump is set, we can safely skip this panic notifier and
>> defer such clean-up to the kexec crash handler.
> 
> While the last sentence is true for Hyper-V on x86/x64, it's not true for
> Hyper-V on ARM64.  x86/x64 has the 'machine_ops' data structure
> with the ability to provide a custom crash_shutdown() function, which
> Hyper-V does in the form of hv_machine_crash_shutdown().  But ARM64
> has no mechanism to provide such a custom function that will eventually
> do the needed vmbus_initiate_unload() before running kdump.
> 
> I'm not immediately sure what the best solution is for ARM64.  At this
> point, I'm just pointing out the problem and will think about the tradeoffs
> for various possible solutions.  Please do the same yourself. :-)
> 

Oh, you're totally right! I just assumed ARM64 would the the same, my
bad. Just to propose some alternatives, so you/others can also discuss
here and we can reach a consensus about the trade-offs:

(a) We could forget about this change, and always do the clean-up here,
not relying in machine_crash_shutdown().
Pro: really simple, behaves the same as it is doing currently.
Con: less elegant/concise, doesn't allow arm64 customization.

(b) Add a way to allow ARM64 customization of shutdown crash handler.
Pro: matches x86, more customizable, improves arm64 arch code.
Con: A tad more complex.

Also, a question that came-up: if ARM64 has no way of calling special
crash shutdown handler, how can you execute hv_stimer_cleanup() and
hv_synic_disable_regs() there? Or are they not required in ARM64?


>>
>> (c) There is also a Hyper-V framebuffer panic notifier, which relies in
>> doing a vmbus operation that demands a valid connection. So, we must
>> order this notifier with the panic notifier from vmbus_drv.c, in order to
>> guarantee that the framebuffer code executes before the vmbus connection
>> is unloaded.
> 
> Patch 21 of this set puts the Hyper-V FB panic notifier on the pre_reboot
> notifier list, which means it won't execute before the VMbus connection
> unload in the case of kdump.   This notifier is making sure that Hyper-V
> is notified about the last updates made to the frame buffer before the
> panic, so maybe it needs to be put on the hypervisor notifier list.  It
> sends a message to Hyper-V over its existing VMbus channel, but it
> does not wait for a reply.  It does, however, obtain a spin lock on the
> ring buffer used to communicate with Hyper-V.   Unless someone has
> a better suggestion, I'm inclined to take the risk of blocking on that
> spin lock.

The logic behind that was: when kdump is set, we'd skip the vmbus
disconnect on notifiers, deferring that to crash_shutdown(), logic this
one refuted in the above discussion on ARM64 (one more Pro argument to
the idea of refactoring aarch64 code to allow a custom crash shutdown
handler heh). But you're right, for the default level 2, we skip the
pre_reboot notifiers on kdump, effectively skipping this notifier.

Some ideas of what we can do here:

I) we could change the framebuffer notifier to rely on trylocks, instead
of risking a lockup scenario, and with that, we can execute it before
the vmbus disconnect in the hypervisor list;

II) we ignore the hypervisor notifier in case of kdump _by default_, and
if the users don't want that, they can always set the panic notifier
level to 4 and run all notifiers prior to kdump; would that be terrible
you think? Kdump users might don't care about the framebuffer...

III) we go with approach (b) above and refactor arm64 code to allow the
custom crash handler on kdump time, then [with point (I) above] the
logic proposed in this series is still valid - seems more and more the
most correct/complete solution.

In any case, I guess we should avoid workarounds if possible and do the
things the best way we can, to encompass all (or almost all) the
possible scenarios and don't force things on users (like enforcing panic
notifier level 4 for Hyper-V or something like this...)

More feedback from you / Hyper-V folks is pretty welcome about this.


> 
>> [...]
> The "Fixes:" tags imply that these changes should be backported to older
> longterm kernel versions, which I don't think is the case.  There is a
> dependency on Patch 14 of your series where PANIC_NOTIFIER is
> introduced.
> 

Oh, this was more related with archeology of the kernel. When I'm
investigating stuff, I really want to understand why code was added and
that usually require some time git blaming stuff, so having that pronto
in the commit message is a bonus.

But of course we don't need to use the Fixes tag for that, easy to only
mention it in the text. A secondary benefit by using this tag is to
indicate this is a _real fix_ to some code, and not an improvement, but
as you say, I agree we shouldn't backport it to previous releases having
or not the Fixes tag (AFAIK it's not mandatory to backport stuff with
Fixes tag).


>> [...]
>> + * intrincated is the relation of this notifier with Hyper-V framebuffer
> 
> s/intrincated/intricate/

Thanks, fixed in V2!


>
>> [...]
>> +static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
>>  			      void *args)
>> +{
>> +	if (!kexec_crash_loaded())
> 
> I'm not clear on the purpose of this condition.  I think it means
> we will skip the vmbus_initiate_unload() if a panic occurs in the
> kdump kernel.  Is there a reason a panic in the kdump kernel
> should be treated differently?  Or am I misunderstanding?

This is really related with the point discussed in the top of this
response - I assumed both ARM64/x86_64 would behave the same and
disconnect the vmbus through the custom crash handler when kdump is set,
so worth skipping it here in the notifier. But that's not true for ARM64
as you pointed, so this guard against kexec is really part of the
decision/discussion on what to do with ARM64 heh

Cheers!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers
  2022-04-27 22:49 ` [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers Guilherme G. Piccoli
@ 2022-05-02 15:38   ` Florian Fainelli
  2022-05-02 15:50     ` Guilherme G. Piccoli
  2022-05-10 15:28   ` Petr Mladek
  1 sibling, 1 reply; 183+ messages in thread
From: Florian Fainelli @ 2022-05-02 15:38 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Brian Norris, Florian Fainelli



On 4/27/2022 3:49 PM, Guilherme G. Piccoli wrote:
> This patch improves the panic/die notifiers in this driver by
> making use of a passed "id" instead of comparing pointer
> address; also, it removes an useless prototype declaration
> and unnecessary header inclusion.
> 
> This is part of a panic notifiers refactor - this notifier in
> the future will be moved to a new list, that encompass the
> information notifiers only.
> 
> Fixes: 9eb60880d9a9 ("bus: brcmstb_gisb: add notifier handling")
> Cc: Brian Norris <computersforpeace@gmail.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

Not sure if the Fixes tag is warranted however as this is a clean up, 
and not really fixing a bug.
-- 
Florian

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header
  2022-04-27 22:49 ` [PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header Guilherme G. Piccoli
@ 2022-05-02 15:38   ` Florian Fainelli
  2022-05-02 15:47     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Florian Fainelli @ 2022-05-02 15:38 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Brian Norris, Doug Berger, Florian Fainelli, Justin Chen,
	Lee Jones, Markus Mayer



On 4/27/2022 3:49 PM, Guilherme G. Piccoli wrote:
> The panic notifier of this driver is very simple code-wise, just a memory
> write to a special position with some numeric code. But this is not clear
> from the semantic point-of-view, and there is no public documentation
> about that either.
> 
> After discussing this in the mailing-lists [0] and having Florian explained
> it very well, this patch just document that in the code for the future
> generations asking the same questions. Also, it removes a useless header.
> 
> [0] https://lore.kernel.org/lkml/781cafb0-8d06-8b56-907a-5175c2da196a@gmail.com
> 
> Fixes: 0b741b8234c8 ("soc: bcm: brcmstb: Add support for S2/S3/S5 suspend states (ARM)")
> Cc: Brian Norris <computersforpeace@gmail.com>
> Cc: Doug Berger <opendmb@gmail.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Justin Chen <justinpopo6@gmail.com>
> Cc: Lee Jones <lee.jones@linaro.org>
> Cc: Markus Mayer <mmayer@broadcom.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

Likewise, I am not sure if the Fixes tag is necessary here.
-- 
Florian

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header
  2022-05-02 15:38   ` Florian Fainelli
@ 2022-05-02 15:47     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-02 15:47 UTC (permalink / raw)
  To: Florian Fainelli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Brian Norris, Doug Berger, Justin Chen, Lee Jones, Markus Mayer

On 02/05/2022 12:38, Florian Fainelli wrote:
> [...] 
> Acked-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Likewise, I am not sure if the Fixes tag is necessary here.

Perfect Florian, thanks!  I'll add your Acked-by tag and remove the
fixes for V2 =)
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers
  2022-05-02 15:38   ` Florian Fainelli
@ 2022-05-02 15:50     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-02 15:50 UTC (permalink / raw)
  To: Florian Fainelli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Brian Norris

On 02/05/2022 12:38, Florian Fainelli wrote:
> [...] 
> 
> Acked-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Not sure if the Fixes tag is warranted however as this is a clean up, 
> and not really fixing a bug.

Perfect, thanks Florian. I'll add your ACK and remove the fixes tag in V2.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 24/30] panic: Refactor the panic path
  2022-04-29 20:38     ` Guilherme G. Piccoli
@ 2022-05-03 17:31       ` Michael Kelley (LINUX)
  2022-05-03 18:06         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Michael Kelley (LINUX) @ 2022-05-03 17:31 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Friday, April 29, 2022 1:38 PM
> 
> On 29/04/2022 14:53, Michael Kelley (LINUX) wrote:
> > From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022
> 3:49 PM
> >> [...]
> >> +	panic_notifiers_level=
> >> +			[KNL] Set the panic notifiers execution order.
> >> +			Format: <unsigned int>
> >> +			We currently have 4 lists of panic notifiers; based
> >> +			on the functionality and risk (for panic success) the
> >> +			callbacks are added in a given list. The lists are:
> >> +			- hypervisor/FW notification list (low risk);
> >> +			- informational list (low/medium risk);
> >> +			- pre_reboot list (higher risk);
> >> +			- post_reboot list (only run late in panic and after
> >> +			kdump, not configurable for now).
> >> +			This parameter defines the ordering of the first 3
> >> +			lists with regards to kdump; the levels determine
> >> +			which set of notifiers execute before kdump. The
> >> +			accepted levels are:
> >> +			0: kdump is the first thing to run, NO list is
> >> +			executed before kdump.
> >> +			1: only the hypervisor list is executed before kdump.
> >> +			2 (default level): the hypervisor list and (*if*
> >> +			there's any kmsg_dumper defined) the informational
> >> +			list are executed before kdump.
> >> +			3: both the hypervisor and the informational lists
> >> +			(always) execute before kdump.
> >
> > I'm not clear on why level 2 exists.  What is the scenario where
> > execution of the info list before kdump should be conditional on the
> > existence of a kmsg_dumper?   Maybe the scenario is described
> > somewhere in the patch set and I just missed it.
> >
> 
> Hi Michael, thanks for your review/consideration. So, this idea started
> kind of some time ago. It all started with a need of exposing more
> information on kernel log *before* kdump and *before* pstore -
> specifically, we're talking about panic_print. But this cause some
> reactions, Baoquan was very concerned with that [0]. Soon after, I've
> proposed a panic notifiers filter (orthogonal) approach, to which Petr
> suggested instead doing a major refactor [1] - it finally is alive in
> the form of this series.
> 
> The theory behind the level 2 is to allow a scenario of kdump with the
> minimum amount of notifiers - what is the point in printing more
> information if the user doesn't care, since it's going to kdump? Now, if
> there is a kmsg dumper, it means that there is likely some interest in
> collecting information, and that might as well be required before the
> potential kdump (which is my case, hence the proposal on [0]).
> 
> Instead of forcing one of the two behaviors (level 1 or level 3), we
> have a middle-term/compromise: if there's interest in collecting such
> data (in the form of a kmsg dumper), we then execute the informational
> notifiers before kdump. If not, why to increase (even slightly) the risk
> for kdump?
> 
> I'm OK in removing the level 2 if people prefer, but I don't feel it's a
> burden, quite opposite - seems a good way to accommodate the somewhat
> antagonistic ideas (jump to kdump ASAP vs collecting more info in the
> panicked kernel log).
> 
> [0] https://lore.kernel.org/lkml/20220126052246.GC2086@MiWiFi-R3L-srv/
> 
> [1] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/
> 

To me, it's a weak correlation between having a kmsg dumper, and
wanting or not wanting the info level output to come before kdump.
Hyper-V is one of only a few places that register a kmsg dumper, so most
Linux instances outside of Hyper-V guest (and PowerPC systems?) will have
the info level output after kdump.  It seems like anyone who cared strongly
about the info level output would set the panic_notifier_level to 1 or to 3
so that the result is more deterministic.  But that's just my opinion, and
it's probably an opinion that is not as well informed on the topic as some
others in the discussion. So keeping things as in your patch set is not a
show-stopper for me.

However, I would request a clarification in the documentation.   The
panic_notifier_level affects not only the hypervisor, informational,
and pre_reboot lists, but it also affects panic_print_sys_info() and
kmsg_dump().  Specifically, at level 1, panic_print_sys_info() and
kmsg_dump() will not be run before kdump.  At level 3, they will
always be run before kdump.  Your documentation above mentions
"informational lists" (plural), which I take to vaguely include
kmsg_dump() and panic_print_sys_info(), but being explicit about
the effect would be better.

Michael

> 
> >[...]
> >> +	 * Based on the level configured (smaller than 4), we clear the
> >> +	 * proper bits in "panic_notifiers_bits". Notice that this bitfield
> >> +	 * is initialized with all notifiers set.
> >> +	 */
> >> +	switch (panic_notifiers_level) {
> >> +	case 3:
> >> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> >> +		break;
> >> +	case 2:
> >> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> >> +
> >> +		if (!kmsg_has_dumpers())
> >> +			clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> >> +		break;
> >> +	case 1:
> >> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> >> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> >> +		break;
> >> +	case 0:
> >> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> >> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> >> +		clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
> >> +		break;
> >> +	}
> >
> > I think the above switch statement could be done as follows:
> >
> > if (panic_notifiers_level <= 3)
> > 	clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> > if (panic_notifiers_level <= 2)
> > 	if (!kmsg_has_dumpers())
> > 		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> > if (panic_notifiers_level <=1)
> > 	clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> > if (panic_notifiers_level == 0)
> > 	clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
> >
> > That's about half the lines of code.  It's somewhat a matter of style,
> > so treat this as just a suggestion to consider.  I just end up looking
> > for a better solution when I see the same line of code repeated
> > 3 or 4 times!
> >
> 
> It's a good idea - I liked your code. The switch seems more
> natural/explicit for me, even duplicating some lines, but in case more
> people prefer your way, I can definitely change the code - thanks for
> the suggestion.
> Cheers,
> 
> 
> Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-04-29 18:04     ` Guilherme G. Piccoli
@ 2022-05-03 17:44       ` Michael Kelley (LINUX)
  2022-05-03 17:56         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Michael Kelley (LINUX) @ 2022-05-03 17:44 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alexander Gordeev, Andrea Parri,
	Ard Biesheuvel, Benjamin Herrenschmidt, Brian Norris,
	Christian Borntraeger, Christophe JAILLET, David Gow,
	David S. Miller, Dexuan Cui, Doug Berger, Evan Green,
	Florian Fainelli, Haiyang Zhang, Hari Bathini, Heiko Carstens,
	Julius Werner, Justin Chen, KY Srinivasan, Lee Jones,
	Markus Mayer, Michael Ellerman, Mihai Carabas, Nicholas Piggin,
	Paul Mackerras, Pavel Machek, Scott Branden, Sebastian Reichel,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Friday, April 29, 2022 11:04 AM
> 
> On 29/04/2022 14:30, Michael Kelley (LINUX) wrote:
> > From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022
> 3:49 PM
> >> [...]
> >>
> >> @@ -2843,7 +2843,7 @@ static void __exit vmbus_exit(void)
> >>  	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
> >>  		kmsg_dump_unregister(&hv_kmsg_dumper);
> >>  		unregister_die_notifier(&hyperv_die_report_block);
> >> -		atomic_notifier_chain_unregister(&panic_notifier_list,
> >> +		atomic_notifier_chain_unregister(&panic_hypervisor_list,
> >>  						&hyperv_panic_report_block);
> >>  	}
> >>
> >
> > Using the hypervisor_list here produces a bit of a mismatch.  In many cases
> > this notifier will do nothing, and will defer to the kmsg_dump() mechanism
> > to notify the hypervisor about the panic.   Running the kmsg_dump()
> > mechanism is linked to the info_list, so I'm thinking the Hyper-V panic report
> > notifier should be on the info_list as well.  That way the reporting behavior
> > is triggered at the same point in the panic path regardless of which
> > reporting mechanism is used.
> >
> 
> Hi Michael, thanks for your feedback! I agree that your idea could work,
> but...there is one downside: imagine the kmsg_dump() approach is not set
> in some Hyper-V guest, then we would rely in the regular notification
> mechanism [hv_die_panic_notify_crash()], right?
> But...you want then to run this notifier in the informational list,
> which...won't execute *by default* before kdump if no kmsg_dump() is
> set. So, this logic is convoluted when you mix it with the default level
> concept + kdump.

Yes, you are right.  But to me that speaks as much to the linkage
between the informational list and kmsg_dump() being the core
problem.  But as I described in my reply to Patch 24, I can live with
the linkage as-is.

FWIW, guests on newer versions of Hyper-V will always register a
kmsg dumper.  The flags that are tested to decide whether to
register provide compatibility with older versions of Hyper-V that 
don’t support the 4K bytes of notification info.

> 
> May I suggest something? If possible, take a run with this patch set +
> DEBUG_NOTIFIER=y, in *both* cases (with and without the kmsg_dump()
> set). I did that and they run almost at the same time...I've checked the
> notifiers called, it's like almost nothing runs in-between.
> 
> I feel the panic notification mechanism does really fit with a
> hypervisor list, it's a good match with the nature of the list, which
> aims at informing the panic notification to the hypervisor/FW.
> Of course we can modify it if you prefer...but please take into account
> the kdump case and how it complicates the logic.

I agree that the runtime effect of one list vs. the other is nil.  The
code works and can stay as you written it.

I was trying to align from a conceptual standpoint.  It was a bit
unexpected that one path would be on the hypervisor list, and the
other path effectively on the informational list.  When I see
conceptual mismatches like that, I tend to want to understand why,
and if there is something more fundamental that is out-of-whack.


> 
> Let me know your considerations, in case you can experiment with the
> patch set as-is.
> Cheers,
> 
> 
> Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-03 17:44       ` Michael Kelley (LINUX)
@ 2022-05-03 17:56         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-03 17:56 UTC (permalink / raw)
  To: Michael Kelley (LINUX), akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alexander Gordeev, Andrea Parri,
	Ard Biesheuvel, Benjamin Herrenschmidt, Brian Norris,
	Christian Borntraeger, Christophe JAILLET, David Gow,
	David S. Miller, Dexuan Cui, Doug Berger, Evan Green,
	Florian Fainelli, Haiyang Zhang, Hari Bathini, Heiko Carstens,
	Julius Werner, Justin Chen, KY Srinivasan, Lee Jones,
	Markus Mayer, Michael Ellerman, Mihai Carabas, Nicholas Piggin,
	Paul Mackerras, Pavel Machek, Scott Branden, Sebastian Reichel,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On 03/05/2022 14:44, Michael Kelley (LINUX) wrote:
> [...]
>>
>> Hi Michael, thanks for your feedback! I agree that your idea could work,
>> but...there is one downside: imagine the kmsg_dump() approach is not set
>> in some Hyper-V guest, then we would rely in the regular notification
>> mechanism [hv_die_panic_notify_crash()], right?
>> But...you want then to run this notifier in the informational list,
>> which...won't execute *by default* before kdump if no kmsg_dump() is
>> set. So, this logic is convoluted when you mix it with the default level
>> concept + kdump.
> 
> Yes, you are right.  But to me that speaks as much to the linkage
> between the informational list and kmsg_dump() being the core
> problem.  But as I described in my reply to Patch 24, I can live with
> the linkage as-is.

Thanks for the feedback Michael!

> [...] 
>> I feel the panic notification mechanism does really fit with a
>> hypervisor list, it's a good match with the nature of the list, which
>> aims at informing the panic notification to the hypervisor/FW.
>> Of course we can modify it if you prefer...but please take into account
>> the kdump case and how it complicates the logic.
> 
> I agree that the runtime effect of one list vs. the other is nil.  The
> code works and can stay as you written it.
> 
> I was trying to align from a conceptual standpoint.  It was a bit
> unexpected that one path would be on the hypervisor list, and the
> other path effectively on the informational list.  When I see
> conceptual mismatches like that, I tend to want to understand why,
> and if there is something more fundamental that is out-of-whack.
> 

Totally agree with you here, I am like that as well - try to really
understand the details, this is very important specially in this patch
set, since it's a refactor and affects every user of the notifiers
infrastructure.

Again, just to double-say it: feel free to suggest any change for the
Hyper-V portion (might as well for any patch in the series, indeed) -
you and the other Hyper-V maintainers own this code and I'd be glad to
align with your needs, you are honor citizens in the panic notifiers
area, being one the most heavy users for that =)

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-04-27 22:48 ` [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path Guilherme G. Piccoli
@ 2022-05-03 18:03   ` Evan Green
  2022-05-03 19:12     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Evan Green @ 2022-05-03 18:03 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Andrew Morton, bhe, pmladek, kexec, LKML,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm Mailing List, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, Linux PM, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Ard Biesheuvel, David Gow, Julius Werner

On Wed, Apr 27, 2022 at 3:51 PM Guilherme G. Piccoli
<gpiccoli@igalia.com> wrote:
>
> Currently the gsmi driver registers a panic notifier as well as
> reboot and die notifiers. The callbacks registered are called in
> atomic and very limited context - for instance, panic disables
> preemption, local IRQs and all other CPUs that aren't running the
> current panic function.
>
> With that said, taking a spinlock in this scenario is a
> dangerous invitation for a deadlock scenario. So, we fix
> that in this commit by changing the regular spinlock with
> a trylock, which is a safer approach.
>
> Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: David Gow <davidgow@google.com>
> Cc: Evan Green <evgreen@chromium.org>
> Cc: Julius Werner <jwerner@chromium.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  drivers/firmware/google/gsmi.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
> index adaa492c3d2d..b01ed02e4a87 100644
> --- a/drivers/firmware/google/gsmi.c
> +++ b/drivers/firmware/google/gsmi.c
> @@ -629,7 +629,10 @@ static int gsmi_shutdown_reason(int reason)
>         if (saved_reason & (1 << reason))
>                 return 0;
>
> -       spin_lock_irqsave(&gsmi_dev.lock, flags);
> +       if (!spin_trylock_irqsave(&gsmi_dev.lock, flags)) {
> +               rc = -EBUSY;
> +               goto out;
> +       }

gsmi_shutdown_reason() is a common function called in other scenarios
as well, like reboot and thermal trip, where it may still make sense
to wait to acquire a spinlock. Maybe we should add a parameter to
gsmi_shutdown_reason() so that you can get your change on panic, but
we don't convert other callbacks into try-fail scenarios causing us to
miss logs.

Though thinking more about it, is this really a Good Change (TM)? The
spinlock itself already disables interrupts, meaning the only case
where this change makes a difference is if the panic happens from
within the function that grabbed the spinlock (in which case the
callback is also likely to panic), or in an NMI that panics within
that window. The downside of this change is that if one core was
politely working through an event with the lock held, and another core
panics, we now might lose the panic log, even though it probably would
have gone through fine assuming the other core has a chance to
continue.

-Evan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-03 17:31       ` Michael Kelley (LINUX)
@ 2022-05-03 18:06         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-03 18:06 UTC (permalink / raw)
  To: Michael Kelley (LINUX), akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On 03/05/2022 14:31, Michael Kelley (LINUX) wrote:
> [...]
> 
> To me, it's a weak correlation between having a kmsg dumper, and
> wanting or not wanting the info level output to come before kdump.
> Hyper-V is one of only a few places that register a kmsg dumper, so most
> Linux instances outside of Hyper-V guest (and PowerPC systems?) will have
> the info level output after kdump.  It seems like anyone who cared strongly
> about the info level output would set the panic_notifier_level to 1 or to 3
> so that the result is more deterministic.  But that's just my opinion, and
> it's probably an opinion that is not as well informed on the topic as some
> others in the discussion. So keeping things as in your patch set is not a
> show-stopper for me.
> 
> However, I would request a clarification in the documentation.   The
> panic_notifier_level affects not only the hypervisor, informational,
> and pre_reboot lists, but it also affects panic_print_sys_info() and
> kmsg_dump().  Specifically, at level 1, panic_print_sys_info() and
> kmsg_dump() will not be run before kdump.  At level 3, they will
> always be run before kdump.  Your documentation above mentions
> "informational lists" (plural), which I take to vaguely include
> kmsg_dump() and panic_print_sys_info(), but being explicit about
> the effect would be better.
> 
> Michael

Thanks again Michael, to express your points and concerns - great idea
of documentation improvement here, I'll do that for V2, for sure.

The idea of "defaulting" to skip the info list on kdump (if no
kmsg_dump() is set) is again a mechanism that aims at accommodating all
users and concerns of antagonistic goals, kdump vs notifier lists.

Before this patch set, by default no notifier executed before kdump. So,
the "pendulum"  was strongly on kdump side, and clearly this was a
sub-optimal decision - proof of that is that both Hyper-V / PowerPC code
forcibly set the "crash_kexec_post_notifiers". The goal here is to have
a more lightweight list that by default runs before kdump, a secondary
list that only runs before kdump if there's usage for that (either user
sets that or kmsg_dumper set is considered a valid user), and the
remaining notifiers run by default only after kdump, all of that very
customizable through the levels idea.

Now, one thing we could do to improve consistency for the hyper-v case:
having a kmsg_dump_once() helper, and *for Hyper-V only*, call it on the
hypervisor list, within the info notifier (that would be moved to
hypervisor list, ofc).
Let's wait for more feedback on that, just throwing some ideas in order
we can have everyone happy with the end-result!

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  2022-04-29 22:35     ` Guilherme G. Piccoli
@ 2022-05-03 18:13       ` Michael Kelley (LINUX)
  2022-05-03 18:57         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Michael Kelley (LINUX) @ 2022-05-03 18:13 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Andrea Parri, Dexuan Cui, Haiyang Zhang, KY Srinivasan,
	Stephen Hemminger, Tianyu Lan, Wei Liu

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Friday, April 29, 2022 3:35 PM
> 
> Hi Michael, first of all thanks for the great review, much appreciated.
> Some comments inline below:
> 
> On 29/04/2022 14:16, Michael Kelley (LINUX) wrote:
> > [...]
> >> hypervisor I/O completion), so we postpone that to run late. But more
> >> relevant: this *same* vmbus unloading happens in the crash_shutdown()
> >> handler, so if kdump is set, we can safely skip this panic notifier and
> >> defer such clean-up to the kexec crash handler.
> >
> > While the last sentence is true for Hyper-V on x86/x64, it's not true for
> > Hyper-V on ARM64.  x86/x64 has the 'machine_ops' data structure
> > with the ability to provide a custom crash_shutdown() function, which
> > Hyper-V does in the form of hv_machine_crash_shutdown().  But ARM64
> > has no mechanism to provide such a custom function that will eventually
> > do the needed vmbus_initiate_unload() before running kdump.
> >
> > I'm not immediately sure what the best solution is for ARM64.  At this
> > point, I'm just pointing out the problem and will think about the tradeoffs
> > for various possible solutions.  Please do the same yourself. :-)
> >
> 
> Oh, you're totally right! I just assumed ARM64 would the the same, my
> bad. Just to propose some alternatives, so you/others can also discuss
> here and we can reach a consensus about the trade-offs:
> 
> (a) We could forget about this change, and always do the clean-up here,
> not relying in machine_crash_shutdown().
> Pro: really simple, behaves the same as it is doing currently.
> Con: less elegant/concise, doesn't allow arm64 customization.
> 
> (b) Add a way to allow ARM64 customization of shutdown crash handler.
> Pro: matches x86, more customizable, improves arm64 arch code.
> Con: A tad more complex.
> 
> Also, a question that came-up: if ARM64 has no way of calling special
> crash shutdown handler, how can you execute hv_stimer_cleanup() and
> hv_synic_disable_regs() there? Or are they not required in ARM64?
> 

My suggestion is to do (a) for now.  I suspect (b) could be a more
extended discussion and I wouldn't want your patch set to get held
up on that discussion.  I don't know what the sense of the ARM64
maintainers would be toward (b).  They have tried to avoid picking
up code warts like have accumulated on the x86/x64 side over the
years, and I agree with that effort.  But as more and varied
hypervisors become available for ARM64, it seems like a framework
for supporting a custom shutdown handler may become necessary.
But that could take a little time.

You are right about hv_stimer_cleanup() and hv_synic_disable_regs().
We are not running these when a panic occurs on ARM64, and we
should be, though the risk is small.   We will pursue (b) and add
these additional cleanups as part of that.  But again, I would suggest
doing (a) for now, and we will switch back to your solution once
(b) is in place.

> 
> >>
> >> (c) There is also a Hyper-V framebuffer panic notifier, which relies in
> >> doing a vmbus operation that demands a valid connection. So, we must
> >> order this notifier with the panic notifier from vmbus_drv.c, in order to
> >> guarantee that the framebuffer code executes before the vmbus connection
> >> is unloaded.
> >
> > Patch 21 of this set puts the Hyper-V FB panic notifier on the pre_reboot
> > notifier list, which means it won't execute before the VMbus connection
> > unload in the case of kdump.   This notifier is making sure that Hyper-V
> > is notified about the last updates made to the frame buffer before the
> > panic, so maybe it needs to be put on the hypervisor notifier list.  It
> > sends a message to Hyper-V over its existing VMbus channel, but it
> > does not wait for a reply.  It does, however, obtain a spin lock on the
> > ring buffer used to communicate with Hyper-V.   Unless someone has
> > a better suggestion, I'm inclined to take the risk of blocking on that
> > spin lock.
> 
> The logic behind that was: when kdump is set, we'd skip the vmbus
> disconnect on notifiers, deferring that to crash_shutdown(), logic this
> one refuted in the above discussion on ARM64 (one more Pro argument to
> the idea of refactoring aarch64 code to allow a custom crash shutdown
> handler heh). But you're right, for the default level 2, we skip the
> pre_reboot notifiers on kdump, effectively skipping this notifier.
> 
> Some ideas of what we can do here:
> 
> I) we could change the framebuffer notifier to rely on trylocks, instead
> of risking a lockup scenario, and with that, we can execute it before
> the vmbus disconnect in the hypervisor list;

I think we have to do this approach for now.

> 
> II) we ignore the hypervisor notifier in case of kdump _by default_, and
> if the users don't want that, they can always set the panic notifier
> level to 4 and run all notifiers prior to kdump; would that be terrible
> you think? Kdump users might don't care about the framebuffer...
> 
> III) we go with approach (b) above and refactor arm64 code to allow the
> custom crash handler on kdump time, then [with point (I) above] the
> logic proposed in this series is still valid - seems more and more the
> most correct/complete solution.

But even when/if we get approach (b) implemented, having the
framebuffer notifier on the pre_reboot list is still too late with the
default of panic_notifier_level = 2.  The kdump path will reset the
VMbus connection and then the framebuffer notifier won't work.

> 
> In any case, I guess we should avoid workarounds if possible and do the
> things the best way we can, to encompass all (or almost all) the
> possible scenarios and don't force things on users (like enforcing panic
> notifier level 4 for Hyper-V or something like this...)
> 
> More feedback from you / Hyper-V folks is pretty welcome about this.
> 
> 
> >
> >> [...]
> > The "Fixes:" tags imply that these changes should be backported to older
> > longterm kernel versions, which I don't think is the case.  There is a
> > dependency on Patch 14 of your series where PANIC_NOTIFIER is
> > introduced.
> >
> 
> Oh, this was more related with archeology of the kernel. When I'm
> investigating stuff, I really want to understand why code was added and
> that usually require some time git blaming stuff, so having that pronto
> in the commit message is a bonus.
> 
> But of course we don't need to use the Fixes tag for that, easy to only
> mention it in the text. A secondary benefit by using this tag is to
> indicate this is a _real fix_ to some code, and not an improvement, but
> as you say, I agree we shouldn't backport it to previous releases having
> or not the Fixes tag (AFAIK it's not mandatory to backport stuff with
> Fixes tag).
> 
> 
> >> [...]
> >> + * intrincated is the relation of this notifier with Hyper-V framebuffer
> >
> > s/intrincated/intricate/
> 
> Thanks, fixed in V2!
> 
> 
> >
> >> [...]
> >> +static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
> >>  			      void *args)
> >> +{
> >> +	if (!kexec_crash_loaded())
> >
> > I'm not clear on the purpose of this condition.  I think it means
> > we will skip the vmbus_initiate_unload() if a panic occurs in the
> > kdump kernel.  Is there a reason a panic in the kdump kernel
> > should be treated differently?  Or am I misunderstanding?
> 
> This is really related with the point discussed in the top of this
> response - I assumed both ARM64/x86_64 would behave the same and
> disconnect the vmbus through the custom crash handler when kdump is set,
> so worth skipping it here in the notifier. But that's not true for ARM64
> as you pointed, so this guard against kexec is really part of the
> decision/discussion on what to do with ARM64 heh

But note that vmbus_initiate_unload() already has a guard built-in.
If the intent of this test is just as a guard against running twice,
then it isn't needed.

> 
> Cheers!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  2022-05-03 18:13       ` Michael Kelley (LINUX)
@ 2022-05-03 18:57         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-03 18:57 UTC (permalink / raw)
  To: Michael Kelley (LINUX), akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Andrea Parri, Dexuan Cui, Haiyang Zhang, KY Srinivasan,
	Stephen Hemminger, Tianyu Lan, Wei Liu

On 03/05/2022 15:13, Michael Kelley (LINUX) wrote:
> [...]
>> (a) We could forget about this change, and always do the clean-up here,
>> not relying in machine_crash_shutdown().
>> Pro: really simple, behaves the same as it is doing currently.
>> Con: less elegant/concise, doesn't allow arm64 customization.
>>
>> (b) Add a way to allow ARM64 customization of shutdown crash handler.
>> Pro: matches x86, more customizable, improves arm64 arch code.
>> Con: A tad more complex.
>>
>> Also, a question that came-up: if ARM64 has no way of calling special
>> crash shutdown handler, how can you execute hv_stimer_cleanup() and
>> hv_synic_disable_regs() there? Or are they not required in ARM64?
>>
> 
> My suggestion is to do (a) for now.  I suspect (b) could be a more
> extended discussion and I wouldn't want your patch set to get held
> up on that discussion.  I don't know what the sense of the ARM64
> maintainers would be toward (b).  They have tried to avoid picking
> up code warts like have accumulated on the x86/x64 side over the
> years, and I agree with that effort.  But as more and varied
> hypervisors become available for ARM64, it seems like a framework
> for supporting a custom shutdown handler may become necessary.
> But that could take a little time.
> 
> You are right about hv_stimer_cleanup() and hv_synic_disable_regs().
> We are not running these when a panic occurs on ARM64, and we
> should be, though the risk is small.   We will pursue (b) and add
> these additional cleanups as part of that.  But again, I would suggest
> doing (a) for now, and we will switch back to your solution once
> (b) is in place.
> 

Thanks again Michael, I'll stick with (a) for now. I'll check with ARM64
community about that, and I might even try to implement something in
parallel (if you are not already working on that - lemme know please),
so we don't get stuck here. As you said, I feel that this is more and
more relevant as the number of panic/crash/kexec scenarios tend to
increase in ARM64.


>> [...]
>> Some ideas of what we can do here:
>>
>> I) we could change the framebuffer notifier to rely on trylocks, instead
>> of risking a lockup scenario, and with that, we can execute it before
>> the vmbus disconnect in the hypervisor list;
> 
> I think we have to do this approach for now.
> 
>>
>> II) we ignore the hypervisor notifier in case of kdump _by default_, and
>> if the users don't want that, they can always set the panic notifier
>> level to 4 and run all notifiers prior to kdump; would that be terrible
>> you think? Kdump users might don't care about the framebuffer...
>>
>> III) we go with approach (b) above and refactor arm64 code to allow the
>> custom crash handler on kdump time, then [with point (I) above] the
>> logic proposed in this series is still valid - seems more and more the
>> most correct/complete solution.
> 
> But even when/if we get approach (b) implemented, having the
> framebuffer notifier on the pre_reboot list is still too late with the
> default of panic_notifier_level = 2.  The kdump path will reset the
> VMbus connection and then the framebuffer notifier won't work.
> 

OK, perfect! I'll work something along these lines in V2, allowing the
FB notifier to always run in the hypervisor list before the vmbus unload
mechanism.


>> [...]
>>>> +static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
>>>>  			      void *args)
>>>> +{
>>>> +	if (!kexec_crash_loaded())
>>>
>>> I'm not clear on the purpose of this condition.  I think it means
>>> we will skip the vmbus_initiate_unload() if a panic occurs in the
>>> kdump kernel.  Is there a reason a panic in the kdump kernel
>>> should be treated differently?  Or am I misunderstanding?
>>
>> This is really related with the point discussed in the top of this
>> response - I assumed both ARM64/x86_64 would behave the same and
>> disconnect the vmbus through the custom crash handler when kdump is set,
>> so worth skipping it here in the notifier. But that's not true for ARM64
>> as you pointed, so this guard against kexec is really part of the
>> decision/discussion on what to do with ARM64 heh
> 
> But note that vmbus_initiate_unload() already has a guard built-in.
> If the intent of this test is just as a guard against running twice,
> then it isn't needed.

Since we're going to avoid relying in the custom crash_shutdown(), due
to the lack of ARM64 support for now, this check will be removed in V2.

Its purpose was to skip the notifier *proactively* in case kexec is set,
given that...once kexec happens, the custom crash_shutdown() would run
the same function (wrong assumption for ARM64, my bad).

Postponing that slightly would maybe gain us some time while the
hypervisor finish its work, so we'd delay less in the vmbus unload path
- that was the rationale behind this check.


Cheers!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-05-03 18:03   ` Evan Green
@ 2022-05-03 19:12     ` Guilherme G. Piccoli
  2022-05-03 21:56       ` Evan Green
  2022-05-10 11:38       ` Petr Mladek
  0 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-03 19:12 UTC (permalink / raw)
  To: Evan Green
  Cc: Andrew Morton, bhe, pmladek, kexec, LKML,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, Linux PM,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Ard Biesheuvel, David Gow, Julius Werner

On 03/05/2022 15:03, Evan Green wrote:
> [...]
> gsmi_shutdown_reason() is a common function called in other scenarios
> as well, like reboot and thermal trip, where it may still make sense
> to wait to acquire a spinlock. Maybe we should add a parameter to
> gsmi_shutdown_reason() so that you can get your change on panic, but
> we don't convert other callbacks into try-fail scenarios causing us to
> miss logs.
> 

Hi Evan, thanks for your feedback, much appreciated!
What I've done in other cases like this was to have a helper checking
the spinlock in the panic notifier - if we can acquire that, go ahead
but if not, bail out. For a proper example of an implementation, check
patch 13 of the series:
https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ .

Do you agree with that, or prefer really a parameter in
gsmi_shutdown_reason() ? I'll follow your choice =)


> Though thinking more about it, is this really a Good Change (TM)? The
> spinlock itself already disables interrupts, meaning the only case
> where this change makes a difference is if the panic happens from
> within the function that grabbed the spinlock (in which case the
> callback is also likely to panic), or in an NMI that panics within
> that window. The downside of this change is that if one core was
> politely working through an event with the lock held, and another core
> panics, we now might lose the panic log, even though it probably would
> have gone through fine assuming the other core has a chance to
> continue.

My feeling is that this is a good change, indeed - a lot of places are
getting changed like this, in this series.

Reasoning: the problem with your example is that, by default, secondary
CPUs are disabled in the panic path, through an IPI mechanism. IPIs take
precedence and interrupt the work in these CPUs, effectively
interrupting the "polite work" with the lock held heh

Then, such CPU is put to sleep and we finally reach the panic notifier
hereby discussed, in the main CPU. If the other CPU was shut-off *with
the lock held*, it's never finishing such work, so the lock is never to
be released. Conclusion: the spinlock can't be acquired, hence we broke
the machine (which is already broken, given it's panic) in the path of
this notifier.
This should be really rare, but..possible. So I think we should protect
against this scenario.

We can grab others' feedback if you prefer, and of course you have the
rights to refuse this change in the gsmi code, but from my
point-of-view, I don't see any advantage in just assume the risk,
specially since the change is very very simple.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-05-03 19:12     ` Guilherme G. Piccoli
@ 2022-05-03 21:56       ` Evan Green
  2022-05-04 12:45         ` Guilherme G. Piccoli
  2022-05-10 11:38       ` Petr Mladek
  1 sibling, 1 reply; 183+ messages in thread
From: Evan Green @ 2022-05-03 21:56 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Andrew Morton, bhe, pmladek, kexec, LKML,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, Linux PM,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Ard Biesheuvel, David Gow, Julius Werner

Hi Guilherme,

On Tue, May 3, 2022 at 12:12 PM Guilherme G. Piccoli
<gpiccoli@igalia.com> wrote:
>
> On 03/05/2022 15:03, Evan Green wrote:
> > [...]
> > gsmi_shutdown_reason() is a common function called in other scenarios
> > as well, like reboot and thermal trip, where it may still make sense
> > to wait to acquire a spinlock. Maybe we should add a parameter to
> > gsmi_shutdown_reason() so that you can get your change on panic, but
> > we don't convert other callbacks into try-fail scenarios causing us to
> > miss logs.
> >
>
> Hi Evan, thanks for your feedback, much appreciated!
> What I've done in other cases like this was to have a helper checking
> the spinlock in the panic notifier - if we can acquire that, go ahead
> but if not, bail out. For a proper example of an implementation, check
> patch 13 of the series:
> https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ .
>
> Do you agree with that, or prefer really a parameter in
> gsmi_shutdown_reason() ? I'll follow your choice =)

I'm fine with either, thanks for the link. Mostly I want to make sure
other paths to gsmi_shutdown_reason() aren't also converted to a try.

>
>
> > Though thinking more about it, is this really a Good Change (TM)? The
> > spinlock itself already disables interrupts, meaning the only case
> > where this change makes a difference is if the panic happens from
> > within the function that grabbed the spinlock (in which case the
> > callback is also likely to panic), or in an NMI that panics within
> > that window. The downside of this change is that if one core was
> > politely working through an event with the lock held, and another core
> > panics, we now might lose the panic log, even though it probably would
> > have gone through fine assuming the other core has a chance to
> > continue.
>
> My feeling is that this is a good change, indeed - a lot of places are
> getting changed like this, in this series.
>
> Reasoning: the problem with your example is that, by default, secondary
> CPUs are disabled in the panic path, through an IPI mechanism. IPIs take
> precedence and interrupt the work in these CPUs, effectively
> interrupting the "polite work" with the lock held heh

The IPI can only interrupt a CPU with irqs disabled if the IPI is an
NMI. I haven't looked before to see if we use NMI IPIs to corral the
other CPUs on panic. On x86, I grepped my way down to
native_stop_other_cpus(), which looks like it does a normal IPI, waits
1 second, then does an NMI IPI. So, if a secondary CPU has the lock
held, on x86 it has roughly 1s to finish what it's doing and re-enable
interrupts before smp_send_stop() brings the NMI hammer down. I think
this should be more than enough time for the secondary CPU to get out
and release the lock.

So then it makes sense to me that you're fixing cases where we
panicked with the lock held, or hung with the lock held. Given the 1
second grace period x86 gives us, I'm on board, as that helps mitigate
the risk that we bailed out early with the try and should have spun a
bit longer instead. Thanks.

-Evan

>
> Then, such CPU is put to sleep and we finally reach the panic notifier
> hereby discussed, in the main CPU. If the other CPU was shut-off *with
> the lock held*, it's never finishing such work, so the lock is never to
> be released. Conclusion: the spinlock can't be acquired, hence we broke
> the machine (which is already broken, given it's panic) in the path of
> this notifier.
> This should be really rare, but..possible. So I think we should protect
> against this scenario.
>
> We can grab others' feedback if you prefer, and of course you have the
> rights to refuse this change in the gsmi code, but from my
> point-of-view, I don't see any advantage in just assume the risk,
> specially since the change is very very simple.
>
> Cheers,
>
>
> Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-05-03 21:56       ` Evan Green
@ 2022-05-04 12:45         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-04 12:45 UTC (permalink / raw)
  To: Evan Green
  Cc: Andrew Morton, bhe, pmladek, kexec, LKML,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, Linux PM,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Ard Biesheuvel, David Gow, Julius Werner

On 03/05/2022 18:56, Evan Green wrote:
> Hi Guilherme,
> [...] 
>> Do you agree with that, or prefer really a parameter in
>> gsmi_shutdown_reason() ? I'll follow your choice =)
> 
> I'm fine with either, thanks for the link. Mostly I want to make sure
> other paths to gsmi_shutdown_reason() aren't also converted to a try.

Hi Evan, thanks for the prompt response! So, I'll proceed like I did in
s390, for consistency.

> [...]
>> Reasoning: the problem with your example is that, by default, secondary
>> CPUs are disabled in the panic path, through an IPI mechanism. IPIs take
>> precedence and interrupt the work in these CPUs, effectively
>> interrupting the "polite work" with the lock held heh
> 
> The IPI can only interrupt a CPU with irqs disabled if the IPI is an
> NMI. I haven't looked before to see if we use NMI IPIs to corral the
> other CPUs on panic. On x86, I grepped my way down to
> native_stop_other_cpus(), which looks like it does a normal IPI, waits
> 1 second, then does an NMI IPI. So, if a secondary CPU has the lock
> held, on x86 it has roughly 1s to finish what it's doing and re-enable
> interrupts before smp_send_stop() brings the NMI hammer down. I think
> this should be more than enough time for the secondary CPU to get out
> and release the lock.
> 
> So then it makes sense to me that you're fixing cases where we
> panicked with the lock held, or hung with the lock held. Given the 1
> second grace period x86 gives us, I'm on board, as that helps mitigate
> the risk that we bailed out early with the try and should have spun a
> bit longer instead. Thanks.
> 
> -Evan

Well, in the old path without "crash_kexec_post_notifiers", we indeed
end-up relying on native_stop_other_cpus() for x86 as you said, and the
"1s rule" makes sense. But after this series (or even before, if the
kernel parameter "crash_kexec_post_notifiers" was used) the function
used to stop CPUs in the panic path is crash_smp_send_stop(), and the
call chain is like:

Main CPU:
crash_smp_send_stop()
--kdump_nmi_shootdown_cpus()
----nmi_shootdown_cpus()

Then, in each CPU (except the main one, running panic() path),
we execute kdump_nmi_callback() in NMI context.

So, we seem to indeed interrupt any context (even with IRQs disabled),
increasing the likelihood of the potential lockups due to stopped CPUs
holding the locks heheh

Thanks again for the good discussion, let me know if anything I'm saying
doesn't make sense - this crash path is a bit convoluted, specially in
x86, I might have understood something wrongly =)
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 07/30] mips: ip22: Reword PANICED to PANICKED and remove useless header
  2022-04-27 22:49 ` [PATCH 07/30] mips: ip22: Reword PANICED to PANICKED " Guilherme G. Piccoli
@ 2022-05-04 20:32   ` Thomas Bogendoerfer
  2022-05-04 21:26     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Thomas Bogendoerfer @ 2022-05-04 20:32 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On Wed, Apr 27, 2022 at 07:49:01PM -0300, Guilherme G. Piccoli wrote:
> Many other place in the kernel prefer the latter, so let's keep
> it consistent in MIPS code as well. Also, removes a useless header.
> 
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  arch/mips/sgi-ip22/ip22-reset.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)

applied to mips-next.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 07/30] mips: ip22: Reword PANICED to PANICKED and remove useless header
  2022-05-04 20:32   ` Thomas Bogendoerfer
@ 2022-05-04 21:26     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-04 21:26 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On 04/05/2022 17:32, Thomas Bogendoerfer wrote:
> [...]
> 
> applied to mips-next.
> 
> Thomas.
> 

Thanks a bunch Thomas =)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers
  2022-04-27 22:49 ` [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers Guilherme G. Piccoli
@ 2022-05-05 18:55   ` Hari Bathini
  2022-05-05 19:28     ` Guilherme G. Piccoli
  2022-05-09 12:50     ` Guilherme G. Piccoli
  0 siblings, 2 replies; 183+ messages in thread
From: Hari Bathini @ 2022-05-05 18:55 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Michael Ellerman, Nicholas Piggin,
	Paul Mackerras



On 28/04/22 4:19 am, Guilherme G. Piccoli wrote:
> The panic notifiers infrastructure is a bit limited in the scope of
> the callbacks - basically every kind of functionality is dropped
> in a list that runs in the same point during the kernel panic path.
> This is not really on par with the complexities and particularities
> of architecture / hypervisors' needs, and a refactor is ongoing.
> 
> As part of this refactor, it was observed that powerpc has 2 notifiers,
> with mixed goals: one is just a KASLR offset dumper, whereas the other
> aims to hard-disable IRQs (necessary on panic path), warn firmware of
> the panic event (fadump) and run low-level platform-specific machinery
> that might stop kernel execution and never come back.
> 
> Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
> should run earlier while low-level platform actions should
> run late since it might not even return. Hence, this patch decouples
> the notifiers splitting them in three:
> 
> - First one is responsible for hard-disable IRQs and fadump,
> should run early;
> 
> - The kernel KASLR offset dumper is really an informative notifier,
> harmless and may run at any moment in the panic path;
> 
> - The last notifier should run last, since it aims to perform
> low-level actions for specific platforms, and might never return.
> It is also only registered for 2 platforms, pseries and ps3.
> 
> The patch better documents the notifiers and clears the code too,
> also removing a useless header.
> 
> Currently no functionality change should be observed, but after
> the planned panic refactor we should expect more panic reliability
> with this patch.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

The change looks good. I have tested it on an LPAR (ppc64).

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>

> ---
> 
> We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
> that allow us to test PowerPC code in a very complete, functional and FREE
> environment (there's no need even for adding a credit card, like many "free"
> clouds require ¬¬ ).
> 
> [0] https://openpower.ic.unicamp.br/minicloud
> 
>   arch/powerpc/kernel/setup-common.c | 74 ++++++++++++++++++++++--------
>   1 file changed, 54 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 518ae5aa9410..52f96b209a96 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -23,7 +23,6 @@
>   #include <linux/console.h>
>   #include <linux/screen_info.h>
>   #include <linux/root_dev.h>
> -#include <linux/notifier.h>
>   #include <linux/cpu.h>
>   #include <linux/unistd.h>
>   #include <linux/serial.h>
> @@ -680,8 +679,25 @@ int check_legacy_ioport(unsigned long base_port)
>   }
>   EXPORT_SYMBOL(check_legacy_ioport);
>   
> -static int ppc_panic_event(struct notifier_block *this,
> -                             unsigned long event, void *ptr)
> +/*
> + * Panic notifiers setup
> + *
> + * We have 3 notifiers for powerpc, each one from a different "nature":
> + *
> + * - ppc_panic_fadump_handler() is a hypervisor notifier, which hard-disables
> + *   IRQs and deal with the Firmware-Assisted dump, when it is configured;
> + *   should run early in the panic path.
> + *
> + * - dump_kernel_offset() is an informative notifier, just showing the KASLR
> + *   offset if we have RANDOMIZE_BASE set.
> + *
> + * - ppc_panic_platform_handler() is a low-level handler that's registered
> + *   only if the platform wishes to perform final actions in the panic path,
> + *   hence it should run late and might not even return. Currently, only
> + *   pseries and ps3 platforms register callbacks.
> + */
> +static int ppc_panic_fadump_handler(struct notifier_block *this,
> +				    unsigned long event, void *ptr)
>   {
>   	/*
>   	 * panic does a local_irq_disable, but we really
> @@ -691,45 +707,63 @@ static int ppc_panic_event(struct notifier_block *this,
>   
>   	/*
>   	 * If firmware-assisted dump has been registered then trigger
> -	 * firmware-assisted dump and let firmware handle everything else.
> +	 * its callback and let the firmware handles everything else.
>   	 */
>   	crash_fadump(NULL, ptr);
> -	if (ppc_md.panic)
> -		ppc_md.panic(ptr);  /* May not return */
> +
>   	return NOTIFY_DONE;
>   }
>   
> -static struct notifier_block ppc_panic_block = {
> -	.notifier_call = ppc_panic_event,
> -	.priority = INT_MIN /* may not return; must be done last */
> -};
> -
> -/*
> - * Dump out kernel offset information on panic.
> - */
>   static int dump_kernel_offset(struct notifier_block *self, unsigned long v,
>   			      void *p)
>   {
>   	pr_emerg("Kernel Offset: 0x%lx from 0x%lx\n",
>   		 kaslr_offset(), KERNELBASE);
>   
> -	return 0;
> +	return NOTIFY_DONE;
>   }
>   
> +static int ppc_panic_platform_handler(struct notifier_block *this,
> +				      unsigned long event, void *ptr)
> +{
> +	/*
> +	 * This handler is only registered if we have a panic callback
> +	 * on ppc_md, hence NULL check is not needed.
> +	 * Also, it may not return, so it runs really late on panic path.
> +	 */
> +	ppc_md.panic(ptr);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block ppc_fadump_block = {
> +	.notifier_call = ppc_panic_fadump_handler,
> +	.priority = INT_MAX, /* run early, to notify the firmware ASAP */
> +};
> +
>   static struct notifier_block kernel_offset_notifier = {
> -	.notifier_call = dump_kernel_offset
> +	.notifier_call = dump_kernel_offset,
> +};
> +
> +static struct notifier_block ppc_panic_block = {
> +	.notifier_call = ppc_panic_platform_handler,
> +	.priority = INT_MIN, /* may not return; must be done last */
>   };
>   
>   void __init setup_panic(void)
>   {
> +	/* Hard-disables IRQs + deal with FW-assisted dump (fadump) */
> +	atomic_notifier_chain_register(&panic_notifier_list,
> +				       &ppc_fadump_block);
> +
>   	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
>   		atomic_notifier_chain_register(&panic_notifier_list,
>   					       &kernel_offset_notifier);
>   
> -	/* PPC64 always does a hard irq disable in its panic handler */
> -	if (!IS_ENABLED(CONFIG_PPC64) && !ppc_md.panic)
> -		return;
> -	atomic_notifier_chain_register(&panic_notifier_list, &ppc_panic_block);
> +	/* Low-level platform-specific routines that should run on panic */
> +	if (ppc_md.panic)
> +		atomic_notifier_chain_register(&panic_notifier_list,
> +					       &ppc_panic_block);
>   }
>   
>   #ifdef CONFIG_CHECK_CACHE_COHERENCY

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers
  2022-05-05 18:55   ` Hari Bathini
@ 2022-05-05 19:28     ` Guilherme G. Piccoli
  2022-05-09 12:50     ` Guilherme G. Piccoli
  1 sibling, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-05 19:28 UTC (permalink / raw)
  To: Hari Bathini, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Benjamin Herrenschmidt,
	Michael Ellerman, Nicholas Piggin, Paul Mackerras

On 05/05/2022 15:55, Hari Bathini wrote:
> [...] 
> 
> The change looks good. I have tested it on an LPAR (ppc64).
> 
> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>

Thanks a bunch Hari, much appreciated!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
  2022-04-27 22:48 ` [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart Guilherme G. Piccoli
@ 2022-05-09 12:32   ` Guilherme G. Piccoli
  2022-05-09 15:52   ` Sean Christopherson
  1 sibling, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-09 12:32 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, vkuznets
  Cc: kexec, pmladek, bhe, akpm, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, will, David P . Reed

On 27/04/2022 19:48, Guilherme G. Piccoli wrote:
> In the panic path we have a list of functions to be called, the panic
> notifiers - such callbacks perform various actions in the machine's
> last breath, and sometimes users want them to run before kdump. We
> have the parameter "crash_kexec_post_notifiers" for that. When such
> parameter is used, the function "crash_smp_send_stop()" is executed
> to poweroff all secondary CPUs through the NMI-shootdown mechanism;
> part of this process involves disabling virtualization features in
> all CPUs (except the main one).
> 
> Now, in the emergency restart procedure we have also a way of
> disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
> what happens though is that in case we already NMI-disabled all CPUs,
> the emergency restart fails due to a second addition of the same items
> in the NMI list, as per the following log output:
> 
> sysrq: Trigger a crash
> Kernel panic - not syncing: sysrq triggered crash
> [...]
> Rebooting in 2 seconds..
> list_add double add: new=<addr1>, prev=<addr2>, next=<addr1>.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI
> 
> In order to reproduce the problem, users just need to set the kernel
> parameter "crash_kexec_post_notifiers" *without* kdump set in any
> system with the VMX feature present.
> 
> Since there is no benefit in re-disabling VMX in all CPUs in case
> it was already done, this patch prevents that by guarding the restart
> routine against doubly issuing NMIs unnecessarily. Notice we still
> need to disable VMX locally in the emergency restart.
> 
> Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
> Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path")
> Cc: David P. Reed <dpreed@deepplum.com>
> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  arch/x86/include/asm/cpu.h |  1 +
>  arch/x86/kernel/crash.c    |  8 ++++----
>  arch/x86/kernel/reboot.c   | 14 ++++++++++++--
>  3 files changed, 17 insertions(+), 6 deletions(-)
> 

Hi Paolo / Sean / Vitaly, sorry for the ping.
But do you think this fix is OK from the VMX point-of-view?

I'd like to send a V2 of this set soon, so any review here is highly
appreciated!

Cheers,


Guilherme


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers
  2022-05-05 18:55   ` Hari Bathini
  2022-05-05 19:28     ` Guilherme G. Piccoli
@ 2022-05-09 12:50     ` Guilherme G. Piccoli
  2022-05-10 13:53       ` Michael Ellerman
  1 sibling, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-09 12:50 UTC (permalink / raw)
  To: Hari Bathini, Michael Ellerman
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, pmladek, kexec, bhe,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Nicholas Piggin, Paul Mackerras, akpm

On 05/05/2022 15:55, Hari Bathini wrote:
> [...] 
> The change looks good. I have tested it on an LPAR (ppc64).
> 
> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
> 

Hi Michael. do you think it's possible to add this one to powerpc/next
(or something like that), or do you prefer a V2 with his tag?
Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
  2022-04-28  8:11   ` Suzuki K Poulose
  2022-04-29 14:01     ` Guilherme G. Piccoli
@ 2022-05-09 13:09     ` Guilherme G. Piccoli
  2022-05-09 16:14       ` Suzuki K Poulose
  1 sibling, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-09 13:09 UTC (permalink / raw)
  To: Suzuki K Poulose, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Leo Yan, Mathieu Poirier, Mike Leach

On 28/04/2022 05:11, Suzuki K Poulose wrote:
> Hi Guilherme,
> 
> On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
>> The panic notifier infrastructure executes registered callbacks when
>> a panic event happens - such callbacks are executed in atomic context,
>> with interrupts and preemption disabled in the running CPU and all other
>> CPUs disabled. That said, mutexes in such context are not a good idea.
>>
>> This patch replaces a regular mutex with a mutex_trylock safer approach;
>> given the nature of the mutex used in the driver, it should be pretty
>> uncommon being unable to acquire such mutex in the panic path, hence
>> no functional change should be observed (and if it is, that would be
>> likely a deadlock with the regular mutex).
>>
>> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
>> Cc: Leo Yan <leo.yan@linaro.org>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> How would you like to proceed with queuing this ? I am happy
> either way. In case you plan to push this as part of this
> series (I don't see any potential conflicts) :
> 
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Hi Suzuki, some other maintainers are taking the patches to their next
branches for example. I'm working on V2, and I guess in the end would be
nice to reduce the size of the series a bit.

So, do you think you could pick this one for your coresight/next branch
(or even for rc cycle, your call - this is really a fix)?
This way, I won't re-submit this one in V2, since it's gonna be merged
already in your branch.

Thanks in advance,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 10/30] alpha: Clean-up the panic notifier code
  2022-04-27 22:49 ` [PATCH 10/30] alpha: Clean-up the panic notifier code Guilherme G. Piccoli
@ 2022-05-09 14:13   ` Guilherme G. Piccoli
  2022-05-10 14:16     ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-09 14:13 UTC (permalink / raw)
  To: Ivan Kokshaysky, Matt Turner, rth
  Cc: akpm, linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, bhe, kexec, linux-edac, pmladek, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> The alpha panic notifier has some code issues, not following
> the conventions of other notifiers. Also, it might halt the
> machine but still it is set to run as early as possible, which
> doesn't seem to be a good idea.
> 
> This patch cleans the code, and set the notifier to run as the
> latest, following the same approach other architectures are doing.
> Also, we remove the unnecessary include of a header already
> included indirectly.
> 
> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> Cc: Matt Turner <mattst88@gmail.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  arch/alpha/kernel/setup.c | 36 +++++++++++++++---------------------
>  1 file changed, 15 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
> index b4fbbba30aa2..d88bdf852753 100644
> --- a/arch/alpha/kernel/setup.c
> +++ b/arch/alpha/kernel/setup.c
> @@ -41,19 +41,11 @@
>  #include <linux/sysrq.h>
>  #include <linux/reboot.h>
>  #endif
> -#include <linux/notifier.h>
>  #include <asm/setup.h>
>  #include <asm/io.h>
>  #include <linux/log2.h>
>  #include <linux/export.h>
>  
> -static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
> -static struct notifier_block alpha_panic_block = {
> -	alpha_panic_event,
> -        NULL,
> -        INT_MAX /* try to do it first */
> -};
> -
>  #include <linux/uaccess.h>
>  #include <asm/hwrpb.h>
>  #include <asm/dma.h>
> @@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
>  };
>  #endif
>  
> +static int alpha_panic_event(struct notifier_block *this,
> +			     unsigned long event, void *ptr)
> +{
> +	/* If we are using SRM and serial console, just hard halt here. */
> +	if (alpha_using_srm && srmcons_output)
> +		__halt();
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block alpha_panic_block = {
> +	.notifier_call = alpha_panic_event,
> +	.priority = INT_MIN, /* may not return, do it last */
> +};
> +
>  void __init
>  setup_arch(char **cmdline_p)
>  {
> @@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
>  	.show	= show_cpuinfo,
>  };
>  
> -
> -static int
> -alpha_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
> -{
> -#if 1
> -	/* FIXME FIXME FIXME */
> -	/* If we are using SRM and serial console, just hard halt here. */
> -	if (alpha_using_srm && srmcons_output)
> -		__halt();
> -#endif
> -        return NOTIFY_DONE;
> -}
> -
>  static __init int add_pcspkr(void)
>  {
>  	struct platform_device *pd;


Hi folks, I'm updating Richard's email and re-sending the V1, any
reviews are greatly appreciated!

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list
  2022-04-27 22:49 ` [PATCH 22/30] panic: Introduce the panic post-reboot " Guilherme G. Piccoli
@ 2022-05-09 14:16   ` Guilherme G. Piccoli
  2022-05-11 16:45     ` Heiko Carstens
  2022-05-16 14:45   ` Petr Mladek
  1 sibling, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-09 14:16 UTC (permalink / raw)
  To: Alexander Gordeev, Christian Borntraeger, David S. Miller,
	Heiko Carstens, Sven Schnelle, Vasily Gorbik
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, pmladek, bhe,
	akpm, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, kexec, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
> 
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
> 
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.
> 
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

Hey S390/SPARC folks, sorry for the ping!

Any reviews on this V1 would be greatly appreciated, I'm working on V2
and seeking feedback in the non-reviewed patches.

Thanks in advance,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-04-29 16:04     ` Guilherme G. Piccoli
@ 2022-05-09 14:25       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-09 14:25 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, bhe, akpm, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, kexec, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, pmladek

On 29/04/2022 13:04, Guilherme G. Piccoli wrote:
> On 27/04/2022 21:28, Randy Dunlap wrote:
>>
>>
>> On 4/27/22 15:49, Guilherme G. Piccoli wrote:
>>> +	crash_kexec_post_notifiers
>>> +			This was DEPRECATED - users should always prefer the
>>
>> 			This is DEPRECATED - users should always prefer the
>>
>>> +			parameter "panic_notifiers_level" - check its entry
>>> +			in this documentation for details on how it works.
>>> +			Setting this parameter is exactly the same as setting
>>> +			"panic_notifiers_level=4".
>>
> 
> Thanks Randy, for your suggestion - but I confess I couldn't understand
> it properly. It's related to spaces/tabs, right? What you suggest me to
> change in this formatting? Just by looking the email I can't parse.
> 
> Cheers,
> 
> 
> Guilherme

Complete lack of attention from me, apologies!
The suggestions was s/was/is - already fixed for V2, thanks Randy.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-04-27 22:49 ` [PATCH 24/30] panic: Refactor the panic path Guilherme G. Piccoli
  2022-04-28  0:28   ` Randy Dunlap
  2022-04-29 17:53   ` Michael Kelley (LINUX)
@ 2022-05-09 15:16   ` d.hatayama
  2022-05-09 16:39     ` Guilherme G. Piccoli
  2022-05-12 14:03   ` Petr Mladek
  2022-05-24 14:44   ` Eric W. Biederman
  4 siblings, 1 reply; 183+ messages in thread
From: d.hatayama @ 2022-05-09 15:16 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, dave.hansen, dyoung,
	feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	d.hatayama

Sorry for the delayed response. Unfortunately, I had 10 days holidays
until yesterday...

>  .../admin-guide/kernel-parameters.txt         |  42 ++-
>  include/linux/panic_notifier.h                |   1 +
>  kernel/kexec_core.c                           |   8 +-
>  kernel/panic.c                                | 292 +++++++++++++-----
>  .../selftests/pstore/pstore_crash_test        |   5 +-
>  5 files changed, 252 insertions(+), 96 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 3f1cc5e317ed..8d3524060ce3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
...snip...
> @@ -3784,6 +3791,33 @@
>                         timeout < 0: reboot immediately
>                         Format: <timeout>
> 
> +       panic_notifiers_level=
> +                       [KNL] Set the panic notifiers execution order.
> +                       Format: <unsigned int>
> +                       We currently have 4 lists of panic notifiers; based
> +                       on the functionality and risk (for panic success) the
> +                       callbacks are added in a given list. The lists are:
> +                       - hypervisor/FW notification list (low risk);
> +                       - informational list (low/medium risk);
> +                       - pre_reboot list (higher risk);
> +                       - post_reboot list (only run late in panic and after
> +                       kdump, not configurable for now).
> +                       This parameter defines the ordering of the first 3
> +                       lists with regards to kdump; the levels determine
> +                       which set of notifiers execute before kdump. The
> +                       accepted levels are:
> +                       0: kdump is the first thing to run, NO list is
> +                       executed before kdump.
> +                       1: only the hypervisor list is executed before kdump.
> +                       2 (default level): the hypervisor list and (*if*

Hmmm, why are you trying to change default setting?

Based on the current design of kdump, it's natural to put what the
handlers for these level 1 and level 2 handlers do in
machine_crash_shutdown(), as these are necessary by default, right?

Or have you already tried that and figured out it's difficult in some
reason and reached the current design? If so, why is that difficult?
Could you point to if there is already such discussion online?

kdump is designed to perform as little things as possible before
transferring the execution to the 2nd kernel in order to increase
reliability. Just detour to panic() increases risks of kdump failure
in the sense of increasing the executed codes in the abnormal
situation, which is very the note in the explanation of
crash_kexec_post_notifiers.

Also, the current implementation of crash_kexec_post_notifiers uses
the panic notifier, but this is not from the technical
reason. Ideally, it should have been implemented in the context of
crash_kexec() independently of panic().

That is, it looks to me that, in addition to changing design of panic
notifier, you are trying to integrate shutdown code of the crash kexec
and the panic paths. If so, this is a big design change for kdump.
I'm concerned about increase of reliability. I'd like you to discuss
them carefully.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
  2022-04-27 22:48 ` [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart Guilherme G. Piccoli
  2022-05-09 12:32   ` Guilherme G. Piccoli
@ 2022-05-09 15:52   ` Sean Christopherson
  2022-05-10 20:11     ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Sean Christopherson @ 2022-05-09 15:52 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, David P . Reed, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 5687 bytes --]

I find the shortlog to be very confusing, the bug has nothing to do with disabling
VMX and I distinctly remember wrapping VMXOFF with exception fixup to prevent doom
if VMX is already disabled :-).  The issue is really that nmi_shootdown_cpus() doesn't
play nice with being called twice.

On Wed, Apr 27, 2022, Guilherme G. Piccoli wrote:
> In the panic path we have a list of functions to be called, the panic
> notifiers - such callbacks perform various actions in the machine's
> last breath, and sometimes users want them to run before kdump. We
> have the parameter "crash_kexec_post_notifiers" for that. When such
> parameter is used, the function "crash_smp_send_stop()" is executed
> to poweroff all secondary CPUs through the NMI-shootdown mechanism;
> part of this process involves disabling virtualization features in
> all CPUs (except the main one).
> 
> Now, in the emergency restart procedure we have also a way of
> disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
> what happens though is that in case we already NMI-disabled all CPUs,
> the emergency restart fails due to a second addition of the same items
> in the NMI list, as per the following log output:
> 
> sysrq: Trigger a crash
> Kernel panic - not syncing: sysrq triggered crash
> [...]
> Rebooting in 2 seconds..
> list_add double add: new=<addr1>, prev=<addr2>, next=<addr1>.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI

Call stacks for the two callers would be very, very helpful.

> In order to reproduce the problem, users just need to set the kernel
> parameter "crash_kexec_post_notifiers" *without* kdump set in any
> system with the VMX feature present.
> 
> Since there is no benefit in re-disabling VMX in all CPUs in case
> it was already done, this patch prevents that by guarding the restart
> routine against doubly issuing NMIs unnecessarily. Notice we still
> need to disable VMX locally in the emergency restart.
> 
> Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
> Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path")
> Cc: David P. Reed <dpreed@deepplum.com>
> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  arch/x86/include/asm/cpu.h |  1 +
>  arch/x86/kernel/crash.c    |  8 ++++----
>  arch/x86/kernel/reboot.c   | 14 ++++++++++++--
>  3 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> index 86e5e4e26fcb..b6a9062d387f 100644
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -36,6 +36,7 @@ extern int _debug_hotplug_cpu(int cpu, int action);
>  #endif
>  #endif
>  
> +extern bool crash_cpus_stopped;
>  int mwait_usable(const struct cpuinfo_x86 *);
>  
>  unsigned int x86_family(unsigned int sig);
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index e8326a8d1c5d..71dd1a990e8d 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -42,6 +42,8 @@
>  #include <asm/crash.h>
>  #include <asm/cmdline.h>
>  
> +bool crash_cpus_stopped;
> +
>  /* Used while preparing memory map entries for second kernel */
>  struct crash_memmap_data {
>  	struct boot_params *params;
> @@ -108,9 +110,7 @@ void kdump_nmi_shootdown_cpus(void)
>  /* Override the weak function in kernel/panic.c */
>  void crash_smp_send_stop(void)
>  {
> -	static int cpus_stopped;
> -
> -	if (cpus_stopped)
> +	if (crash_cpus_stopped)
>  		return;
>  
>  	if (smp_ops.crash_stop_other_cpus)
> @@ -118,7 +118,7 @@ void crash_smp_send_stop(void)
>  	else
>  		smp_send_stop();
>  
> -	cpus_stopped = 1;
> +	crash_cpus_stopped = true;

This feels like were just adding more duct tape to the mess.  nmi_shootdown() is
still unsafe for more than one caller, and it takes a _lot_ of staring and searching
to understand that crash_smp_send_stop() is invoked iff CONFIG_KEXEC_CORE=y, i.e.
that it will call smp_ops.crash_stop_other_cpus() and not just smp_send_stop().

Rather than shared a flag between two relatively unrelated functions, what if we
instead disabling virtualization in crash_nmi_callback() and then turn the reboot
call into a nop if an NMI shootdown has already occurred?  That will also add a
bit of documentation about multiple shootdowns not working.

And I believe there's also a lurking bug in native_machine_emergency_restart() that
can be fixed with cleanup.  SVM can also block INIT and so should be disabled during
an emergency reboot.

The attached patches are compile tested only.  If they seem sane, I'll post an
official mini series.

>  }
>  
>  #else
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index fa700b46588e..2fc42b8402ac 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -589,8 +589,18 @@ static void native_machine_emergency_restart(void)
>  	int orig_reboot_type = reboot_type;
>  	unsigned short mode;
>  
> -	if (reboot_emergency)
> -		emergency_vmx_disable_all();
> +	/*
> +	 * We can reach this point in the end of panic path, having
> +	 * NMI-disabled all secondary CPUs. This process involves
> +	 * disabling the CPU virtualization technologies, so if that
> +	 * is the case, we only miss disabling the local CPU VMX...
> +	 */
> +	if (reboot_emergency) {
> +		if (!crash_cpus_stopped)
> +			emergency_vmx_disable_all();
> +		else
> +			cpu_emergency_vmxoff();
> +	}
>  
>  	tboot_shutdown(TB_SHUTDOWN_REBOOT);
>  
> -- 
> 2.36.0
> 

[-- Attachment #2: 0001-x86-crash-Disable-virt-in-core-NMI-crash-handler-to-.patch --]
[-- Type: text/x-diff, Size: 6625 bytes --]

From 8a4573b7cf3a3e49b409ba3a504934de181c259d Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 9 May 2022 07:36:34 -0700
Subject: [PATCH 1/2] x86/crash: Disable virt in core NMI crash handler to
 avoid double list_add

Disable virtualization in crash_nmi_callback() and skip the requested NMI
shootdown if a shootdown has already occurred, i.e. a callback has been
registered.  The NMI crash shootdown path doesn't play nice with multiple
invocations, e.g. attempting to register the NMI handler multiple times
will trigger a double list_add() and hang the sytem (in addition to
multiple other issues).  If "crash_kexec_post_notifiers" is specified on
the kernel command line, panic() will invoke crash_smp_send_stop() and
result in a second call to nmi_shootdown_cpus() during
native_machine_emergency_restart().

Invoke the callback _before_ disabling virtualization, as the current
VMCS needs to be cleared before doing VMXOFF.  Note, this results in a
subtle change in ordering between disabling virtualization and stopping
Intel PT on the responding CPUs.  While VMX and Intel PT do interact,
VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one
another, which is all that matters when panicking.

WARN if nmi_shootdown_cpus() is called a second time with anything other
than the reboot path's "nop" handler, as bailing means the requested
isn't being invoked.  Punt true handling of multiple shootdown callbacks
until there's an actual use case for doing so (beyond disabling
virtualization).

Extract the disabling logic to a common helper to deduplicate code, and
to prepare for doing the shootdown in the emergency reboot path if SVM
is supported.

Note, prior to commit ed72736183c4 ("x86/reboot: Force all cpus to exit
VMX root if VMX is supported), nmi_shootdown_cpus() was subtly protected
against a second invocation by a cpu_vmx_enabled() check as the kdump
handler would disable VMX if it ran first.

Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
Cc: stable@vger.kernel.org
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/reboot.h |  1 +
 arch/x86/kernel/crash.c       | 16 +--------------
 arch/x86/kernel/reboot.c      | 38 ++++++++++++++++++++++++++++++++---
 3 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 04c17be9b5fd..8f2da36435a6 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,6 +25,7 @@ void __noreturn machine_real_restart(unsigned int type);
 #define MRR_BIOS	0
 #define MRR_APM		1
 
+void cpu_crash_disable_virtualization(void);
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_panic_self_stop(struct pt_regs *regs);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e8326a8d1c5d..fe0cf83843ba 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -81,15 +81,6 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 	 */
 	cpu_crash_vmclear_loaded_vmcss();
 
-	/* Disable VMX or SVM if needed.
-	 *
-	 * We need to disable virtualization on all CPUs.
-	 * Having VMX or SVM enabled on any CPU may break rebooting
-	 * after the kdump kernel has finished its task.
-	 */
-	cpu_emergency_vmxoff();
-	cpu_emergency_svm_disable();
-
 	/*
 	 * Disable Intel PT to stop its logging
 	 */
@@ -148,12 +139,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	 */
 	cpu_crash_vmclear_loaded_vmcss();
 
-	/* Booting kdump kernel with VMX or SVM enabled won't work,
-	 * because (among other limitations) we can't disable paging
-	 * with the virt flags.
-	 */
-	cpu_emergency_vmxoff();
-	cpu_emergency_svm_disable();
+	cpu_crash_disable_virtualization();
 
 	/*
 	 * Disable Intel PT to stop its logging
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index fa700b46588e..f9543a4e9b09 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -528,9 +528,9 @@ static inline void kb_wait(void)
 	}
 }
 
-static void vmxoff_nmi(int cpu, struct pt_regs *regs)
+static void nmi_shootdown_nop(int cpu, struct pt_regs *regs)
 {
-	cpu_emergency_vmxoff();
+	/* Nothing to do, the NMI shootdown handler disables virtualization. */
 }
 
 /* Use NMIs as IPIs to tell all CPUs to disable virtualization */
@@ -554,7 +554,7 @@ static void emergency_vmx_disable_all(void)
 		__cpu_emergency_vmxoff();
 
 		/* Halt and exit VMX root operation on the other CPUs. */
-		nmi_shootdown_cpus(vmxoff_nmi);
+		nmi_shootdown_cpus(nmi_shootdown_nop);
 	}
 }
 
@@ -802,6 +802,18 @@ static nmi_shootdown_cb shootdown_callback;
 static atomic_t waiting_for_crash_ipi;
 static int crash_ipi_issued;
 
+void cpu_crash_disable_virtualization(void)
+{
+	/*
+	 * Disable virtualization, i.e. VMX or SVM, so that INIT is recognized
+	 * during reboot.  VMX blocks INIT if the CPU is post-VMXON, and SVM
+	 * blocks INIT if GIF=0.  Note, CLGI #UDs if SVM isn't enabled, so it's
+	 * easier to just disable SVM unconditionally.
+	 */
+	cpu_emergency_vmxoff();
+	cpu_emergency_svm_disable();
+}
+
 static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
 	int cpu;
@@ -819,6 +831,12 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 
 	shootdown_callback(cpu, regs);
 
+	/*
+	 * Prepare the CPU for reboot _after_ invoking the callback so that the
+	 * callback can safely use virtualization instructions, e.g. VMCLEAR.
+	 */
+	cpu_crash_disable_virtualization();
+
 	atomic_dec(&waiting_for_crash_ipi);
 	/* Assume hlt works */
 	halt();
@@ -840,6 +858,20 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	unsigned long msecs;
 	local_irq_disable();
 
+	/*
+	 * Invoking multiple callbacks is not currently supported, registering
+	 * the NMI handler twice will cause a list_add() double add BUG().
+	 * The exception is the "nop" handler in the emergency reboot path,
+	 * which can run after e.g. kdump's shootdown.  Do nothing if the crash
+	 * handler has already run, i.e. has already prepared other CPUs, the
+	 * reboot path doesn't have any work of its to do, it just needs to
+	 * ensure all CPUs have prepared for reboot.
+	 */
+	if (shootdown_callback) {
+		WARN_ON_ONCE(callback != nmi_shootdown_nop);
+		return;
+	}
+
 	/* Make a note of crashing cpu. Will be used in NMI callback. */
 	crashing_cpu = safe_smp_processor_id();
 

base-commit: 2764011106d0436cb44702cfb0981339d68c3509
-- 
2.36.0.512.ge40c2bad7a-goog


[-- Attachment #3: 0002-x86-reboot-Disable-virtualization-in-an-emergency-if.patch --]
[-- Type: text/x-diff, Size: 2720 bytes --]

From ce4b8fb50962c00a9bb29663e96501e90d68bd8b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 9 May 2022 08:28:14 -0700
Subject: [PATCH 2/2] x86/reboot: Disable virtualization in an emergency if SVM
 is supported

Disable SVM on all CPUs via NMI shootdown during an emergency reboot.
Like VMX, SVM can block INIT and thus prevent bringing up other CPUs via
INIT-SIPI-SIPI.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/reboot.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f9543a4e9b09..33c1f4883b27 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -533,27 +533,29 @@ static void nmi_shootdown_nop(int cpu, struct pt_regs *regs)
 	/* Nothing to do, the NMI shootdown handler disables virtualization. */
 }
 
-/* Use NMIs as IPIs to tell all CPUs to disable virtualization */
-static void emergency_vmx_disable_all(void)
+static void emergency_reboot_disable_virtualization(void)
 {
 	/* Just make sure we won't change CPUs while doing this */
 	local_irq_disable();
 
 	/*
-	 * Disable VMX on all CPUs before rebooting, otherwise we risk hanging
-	 * the machine, because the CPU blocks INIT when it's in VMX root.
+	 * Disable virtualization on all CPUs before rebooting to avoid hanging
+	 * the system, as VMX and SVM block INIT when running in the host
 	 *
 	 * We can't take any locks and we may be on an inconsistent state, so
-	 * use NMIs as IPIs to tell the other CPUs to exit VMX root and halt.
+	 * use NMIs as IPIs to tell the other CPUs to disable VMX/SVM and halt.
 	 *
-	 * Do the NMI shootdown even if VMX if off on _this_ CPU, as that
-	 * doesn't prevent a different CPU from being in VMX root operation.
+	 * Do the NMI shootdown even if virtualization is off on _this_ CPU, as
+	 * other CPUs may have virtualization enabled.
 	 */
-	if (cpu_has_vmx()) {
-		/* Safely force _this_ CPU out of VMX root operation. */
-		__cpu_emergency_vmxoff();
+	if (cpu_has_vmx() || cpu_has_svm(NULL)) {
+		/* Safely force _this_ CPU out of VMX/SVM operation. */
+		if (cpu_has_vmx())
+			__cpu_emergency_vmxoff();
+		else
+			cpu_emergency_svm_disable();
 
-		/* Halt and exit VMX root operation on the other CPUs. */
+		/* Disable VMX/SVM and halt on other CPUs. */
 		nmi_shootdown_cpus(nmi_shootdown_nop);
 	}
 }
@@ -590,7 +592,7 @@ static void native_machine_emergency_restart(void)
 	unsigned short mode;
 
 	if (reboot_emergency)
-		emergency_vmx_disable_all();
+		emergency_reboot_disable_virtualization();
 
 	tboot_shutdown(TB_SHUTDOWN_REBOOT);
 
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
  2022-05-09 13:09     ` Guilherme G. Piccoli
@ 2022-05-09 16:14       ` Suzuki K Poulose
  2022-05-09 16:26         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Suzuki K Poulose @ 2022-05-09 16:14 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Leo Yan, Mathieu Poirier, Mike Leach

Hi

On 09/05/2022 14:09, Guilherme G. Piccoli wrote:
> On 28/04/2022 05:11, Suzuki K Poulose wrote:
>> Hi Guilherme,
>>
>> On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
>>> The panic notifier infrastructure executes registered callbacks when
>>> a panic event happens - such callbacks are executed in atomic context,
>>> with interrupts and preemption disabled in the running CPU and all other
>>> CPUs disabled. That said, mutexes in such context are not a good idea.
>>>
>>> This patch replaces a regular mutex with a mutex_trylock safer approach;
>>> given the nature of the mutex used in the driver, it should be pretty
>>> uncommon being unable to acquire such mutex in the panic path, hence
>>> no functional change should be observed (and if it is, that would be
>>> likely a deadlock with the regular mutex).
>>>
>>> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
>>> Cc: Leo Yan <leo.yan@linaro.org>
>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>> Cc: Mike Leach <mike.leach@linaro.org>
>>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>>
>> How would you like to proceed with queuing this ? I am happy
>> either way. In case you plan to push this as part of this
>> series (I don't see any potential conflicts) :
>>
>> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> Hi Suzuki, some other maintainers are taking the patches to their next
> branches for example. I'm working on V2, and I guess in the end would be
> nice to reduce the size of the series a bit.
> 
> So, do you think you could pick this one for your coresight/next branch
> (or even for rc cycle, your call - this is really a fix)?
> This way, I won't re-submit this one in V2, since it's gonna be merged
> already in your branch.

I have queued this to coresight/next.

Thanks
Suzuki

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
  2022-05-09 16:14       ` Suzuki K Poulose
@ 2022-05-09 16:26         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-09 16:26 UTC (permalink / raw)
  To: Suzuki K Poulose, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, bcm-kernel-feedback-list, coresight, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Leo Yan, Mathieu Poirier, Mike Leach

On 09/05/2022 13:14, Suzuki K Poulose wrote:
> [...]> 
> I have queued this to coresight/next.
> 
> Thanks
> Suzuki


Thanks a lot Suzuki!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-09 15:16   ` d.hatayama
@ 2022-05-09 16:39     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-09 16:39 UTC (permalink / raw)
  To: d.hatayama, akpm, bhe, pmladek, kexec, Mark Rutland,
	Marc Zyngier, mikelley, Hari Bathini
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, dave.hansen, dyoung,
	feng.tang, gregkh, hidehiro.kawai.ez, jgross, john.ogness,
	keescook, luto, mhiramat, mingo, paulmck, peterz, rostedt,
	senozhatsky, stern, tglx, vgoyal, vkuznets, will

Hey Hatayma, thanks for your great analysis and no need for apologies!

I'll comment/respond properly inline below, just noticing here that I've
CCed Mark and Marc (from the ARM64 perspective), Michael (Hyper-V
perspective) and Hari (PowerPC perspective), besides the usual suspects
as Petr, Baoquan, etc.

On 09/05/2022 12:16, d.hatayama@fujitsu.com wrote:
> Sorry for the delayed response. Unfortunately, I had 10 days holidays
> until yesterday...
> [...] 
>> +                       We currently have 4 lists of panic notifiers; based
>> +                       on the functionality and risk (for panic success) the
>> +                       callbacks are added in a given list. The lists are:
>> +                       - hypervisor/FW notification list (low risk);
>> +                       - informational list (low/medium risk);
>> +                       - pre_reboot list (higher risk);
>> +                       - post_reboot list (only run late in panic and after
>> +                       kdump, not configurable for now).
>> +                       This parameter defines the ordering of the first 3
>> +                       lists with regards to kdump; the levels determine
>> +                       which set of notifiers execute before kdump. The
>> +                       accepted levels are:
>> +                       0: kdump is the first thing to run, NO list is
>> +                       executed before kdump.
>> +                       1: only the hypervisor list is executed before kdump.
>> +                       2 (default level): the hypervisor list and (*if*
> 
> Hmmm, why are you trying to change default setting?
> 
> Based on the current design of kdump, it's natural to put what the
> handlers for these level 1 and level 2 handlers do in
> machine_crash_shutdown(), as these are necessary by default, right?
> 
> Or have you already tried that and figured out it's difficult in some
> reason and reached the current design? If so, why is that difficult?
> Could you point to if there is already such discussion online?
> 
> kdump is designed to perform as little things as possible before
> transferring the execution to the 2nd kernel in order to increase
> reliability. Just detour to panic() increases risks of kdump failure
> in the sense of increasing the executed codes in the abnormal
> situation, which is very the note in the explanation of
> crash_kexec_post_notifiers.
> 
> Also, the current implementation of crash_kexec_post_notifiers uses
> the panic notifier, but this is not from the technical
> reason. Ideally, it should have been implemented in the context of
> crash_kexec() independently of panic().
> 
> That is, it looks to me that, in addition to changing design of panic
> notifier, you are trying to integrate shutdown code of the crash kexec
> and the panic paths. If so, this is a big design change for kdump.
> I'm concerned about increase of reliability. I'd like you to discuss
> them carefully.

From my understanding (specially based on both these threads [0] and
[1]), 3 facts are clear and quite complex in nature:

(a) Currently, the panic notifier list is a "no man's land" - it's a
mess, all sort of callbacks are added there, some of them are extremely
risk for breaking kdump, others are quite safe (like setting a
variable). Petr's details in thread [0] are really clear and express in
great way how confusing and conflicting the panic notifiers goals are.

(b) In order to "address" problems in the antagonistic goals of
notifiers (see point (a) above and thread [0]), we have this
quirk/workaround called "crash_kexec_post_notifiers". This is useful,
but (almost as for attesting how this is working as band-aid over
complex and fundamental issues) see the following commits:

a11589563e96 ("x86/Hyper-V: Report crash register data or kmsg before
running crash kernel")

06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in vmcore
generated with panic")

They hardcode such workaround, because they *need* some notifiers'
callbacks. But notice they *don't need all of them*, only some important
ones (that usually are good considering the risk, it's a good
cost/benefit). Since we currently have an all-or-nothing behavior for
the panic notifiers, both PowerPC and Hyper-V end-up bringing all of
them to run before kdump due to the lack of flexibility, increasing a
lot the risk of failure for kdump.

(c) To add on top of all such complexity, we don't have a custom
machine_crash_shutdown() handler for some architectures like ARM64, and
the feeling is that's not right to add a bunch of callbacks / complexity
in such architecture code, specially since we have the notifiers
infrastructure in the kernel. I've recently started a discussion about
that with ARM64 community, please take a look in [1].

With that said, we can use (a) + (b) + (c) to justify our argument here:
the panic notifiers should be refactored! We need to try to encompass
the antagonistic goals of kdump (wants to be the first thing to run,
early as possible) VS. the notifiers that are necessary, even before
kdump (like Hyper-V / PowerPC fadump ones, but there are more of these,
the FW/hypervisors notifiers).

I guarantee to you we cannot make 100% of people happy - the
panic-related areas like kdump, etc are full of trade-offs. We improve
something, other stuff breaks. This series attempts to clearly
distribute the notifiers in 3 buckets, and introduces a level setting
that tunes such buckets to run before or after the kdump in a highly
flexible way, trying to make the most users happy and capable of tuning
their systems.

I understand that likely setting the notifiers to 0 would make kdump
maintainers happy, because we'd keep the same behavior as before.
But..then we make PPC / Hyper-V and some other users unsatisfied. Level
2 seems to me a good compromise. I'm willing to add a KConfig option to
allow distros to hard set their defaults - that might make sense to
legacy distros as older RHEL / CentOS or server-only distros.

Finally, you mention about integrating the crash shutdown and the panic
paths - I'm not officially doing that, but I see that they are very
related and all cases I've ever seen in the last 3-4 years I've been
working with kdump, it was triggered from panic, from settings "panic_on_X".

I understand we have the regular kexec path (!kdump), some notifiers
might make sense in both paths, and in this case we could duplicate them
(one callback, 2 notifier lists) and control against double execution. I
tried doing that in patch 16 of this series (Hyper-V stuff), but due to
the lack of ARM64 custom crash shutdown handler, I couldn't. We can
think in some alternatives, and improve these (naturally)
connected/related paths, but this is not the goal of this specific
series. I'm hereby trying to improve/clarify the panic (and kdump) path,
not the pure kexec path.

Let me know if I can clarify more specific points you have or if there
are flaws in my logic - I appreciate the discussions/reviews.
Cheers,


Guilherme



[0] "notifier/panic: Introduce panic_notifier_filter" -
https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

[1] "Should arm64 have a custom crash shutdown handler?" -
https://lore.kernel.org/lkml/427a8277-49f0-4317-d6c3-4a15d7070e55@igalia.com/

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-05-03 19:12     ` Guilherme G. Piccoli
  2022-05-03 21:56       ` Evan Green
@ 2022-05-10 11:38       ` Petr Mladek
  2022-05-10 13:04         ` Guilherme G. Piccoli
  2022-05-10 17:20         ` Steven Rostedt
  1 sibling, 2 replies; 183+ messages in thread
From: Petr Mladek @ 2022-05-10 11:38 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Evan Green, Andrew Morton, bhe, kexec, LKML,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, Linux PM,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Ard Biesheuvel, David Gow, Julius Werner

On Tue 2022-05-03 16:12:09, Guilherme G. Piccoli wrote:
> On 03/05/2022 15:03, Evan Green wrote:
> > [...]
> > gsmi_shutdown_reason() is a common function called in other scenarios
> > as well, like reboot and thermal trip, where it may still make sense
> > to wait to acquire a spinlock. Maybe we should add a parameter to
> > gsmi_shutdown_reason() so that you can get your change on panic, but
> > we don't convert other callbacks into try-fail scenarios causing us to
> > miss logs.
> > 
> 
> Hi Evan, thanks for your feedback, much appreciated!
> What I've done in other cases like this was to have a helper checking
> the spinlock in the panic notifier - if we can acquire that, go ahead
> but if not, bail out. For a proper example of an implementation, check
> patch 13 of the series:
> https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ .
> 
> Do you agree with that, or prefer really a parameter in
> gsmi_shutdown_reason() ? I'll follow your choice =)

I see two more alternative solutions:

1st variant is a trick already used in console write() callbacks.
They do trylock() when oops_in_progress is set. They remember
the result to prevent double unlock when printing Oops messages and
the system will try to continue working. For example:

pl011_console_write(struct console *co, const char *s, unsigned int count)
{
[...]
	int locked = 1;
[...]
	if (uap->port.sysrq)
		locked = 0;
	else if (oops_in_progress)
		locked = spin_trylock(&uap->port.lock);
	else
		spin_lock(&uap->port.lock);

[...]

	if (locked)
		spin_unlock(&uap->port.lock);
}


2nd variant is to check panic_cpu variable. It is used in printk.c.
We might move the function to panic.h:

static bool panic_in_progress(void)
{
	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
}

and then do:

	if (panic_in_progress()) {
		...


> > Though thinking more about it, is this really a Good Change (TM)? The
> > spinlock itself already disables interrupts, meaning the only case
> > where this change makes a difference is if the panic happens from
> > within the function that grabbed the spinlock (in which case the
> > callback is also likely to panic), or in an NMI that panics within
> > that window.

As already mentioned in the other reply, panic() sometimes stops
the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().

Another situation is when the CPU using the lock ends in some
infinite loop because something went wrong. The system is in
an unpredictable state during panic().

I am not sure if this is possible with the code under gsmi_dev.lock
but such things really happen during panic() in other subsystems.
Using trylock in the panic() code path is a good practice.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path
  2022-04-27 22:48 ` [PATCH 05/30] misc/pvpanic: " Guilherme G. Piccoli
@ 2022-05-10 12:14   ` Petr Mladek
  2022-05-10 13:00     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-10 12:14 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Christophe JAILLET, Mihai Carabas,
	Shile Zhang, Wang ShaoBo, zhenwei pi

On Wed 2022-04-27 19:48:59, Guilherme G. Piccoli wrote:
> The pvpanic driver relies on panic notifiers to execute a callback
> on panic event. Such function is executed in atomic context - the
> panic function disables local IRQs, preemption and all other CPUs
> that aren't running the panic code.
> 
> With that said, it's dangerous to use regular spinlocks in such path,
> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
> This patch fixes that by replacing regular spinlocks with the trylock
> safer approach.

It seems that the lock is used just to manipulating a list. A super
safe solution would be to use the rcu API: rcu_add_rcu() and
list_del_rcu() under rcu_read_lock(). The spin lock will not be
needed and the list will always be valid.

The advantage would be that it will always call members that
were successfully added earlier. That said, I am not familiar
with pvpanic and am not sure if it is worth it.

> It also fixes an old comment (about a long gone framebuffer code) and
> the notifier priority - we should execute hypervisor notifiers early,
> deferring this way the panic action to the hypervisor, as expected by
> the users that are setting up pvpanic.

This should be done in a separate patch. It changes the behavior.
Also there might be a discussion whether it really should be
the maximal priority.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path
  2022-05-10 12:14   ` Petr Mladek
@ 2022-05-10 13:00     ` Guilherme G. Piccoli
  2022-05-17 10:58       ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-10 13:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Christophe JAILLET, Mihai Carabas,
	Shile Zhang, Wang ShaoBo, zhenwei pi

On 10/05/2022 09:14, Petr Mladek wrote:
> [...]
>> With that said, it's dangerous to use regular spinlocks in such path,
>> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
>> This patch fixes that by replacing regular spinlocks with the trylock
>> safer approach.
> 
> It seems that the lock is used just to manipulating a list. A super
> safe solution would be to use the rcu API: rcu_add_rcu() and
> list_del_rcu() under rcu_read_lock(). The spin lock will not be
> needed and the list will always be valid.
> 
> The advantage would be that it will always call members that
> were successfully added earlier. That said, I am not familiar
> with pvpanic and am not sure if it is worth it.
> 
>> It also fixes an old comment (about a long gone framebuffer code) and
>> the notifier priority - we should execute hypervisor notifiers early,
>> deferring this way the panic action to the hypervisor, as expected by
>> the users that are setting up pvpanic.
> 
> This should be done in a separate patch. It changes the behavior.
> Also there might be a discussion whether it really should be
> the maximal priority.
> 
> Best Regards,
> Petr

Thanks for the review Petr. Patch was already merged - my goal was to be
concise, i.e., a patch per driver / module, so the patch kinda fixes
whatever I think is wrong with the driver with regards panic handling.

Do you think it worth to remove this patch from Greg's branch just to
split it in 2? Personally I think it's not worth, but opinions are welcome.

About the RCU part, this one really could be a new patch, a good
improvement patch - it makes sense to me, we can think about that after
the fixes I guess.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-05-10 11:38       ` Petr Mladek
@ 2022-05-10 13:04         ` Guilherme G. Piccoli
  2022-05-10 17:20         ` Steven Rostedt
  1 sibling, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-10 13:04 UTC (permalink / raw)
  To: Petr Mladek, Evan Green
  Cc: Andrew Morton, bhe, kexec, LKML, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, Linux PM, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Ard Biesheuvel, David Gow, Julius Werner

On 10/05/2022 08:38, Petr Mladek wrote:
> [...]
> I see two more alternative solutions:
> 
> 1st variant is a trick already used in console write() callbacks.
> They do trylock() when oops_in_progress is set. They remember
> the result to prevent double unlock when printing Oops messages and
> the system will try to continue working. For example:
> 
> pl011_console_write(struct console *co, const char *s, unsigned int count)
> {
> [...]
> 	int locked = 1;
> [...]
> 	if (uap->port.sysrq)
> 		locked = 0;
> 	else if (oops_in_progress)
> 		locked = spin_trylock(&uap->port.lock);
> 	else
> 		spin_lock(&uap->port.lock);
> 
> [...]
> 
> 	if (locked)
> 		spin_unlock(&uap->port.lock);
> }
> 
> 
> 2nd variant is to check panic_cpu variable. It is used in printk.c.
> We might move the function to panic.h:
> 
> static bool panic_in_progress(void)
> {
> 	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
> }
> 
> and then do:
> 
> 	if (panic_in_progress()) {
> 		...

Thanks for the review Petr! I feel alternative two is way better, it
checks for panic - the oops_in_progress isn't really enough, since we
can call panic() directly, not necessarily through an oops path, correct?

For me, we could stick with the lock check, but I'll defer to Evan - I
didn't work the V2 patch yet, what do you prefer Evan?


> [...]
> As already mentioned in the other reply, panic() sometimes stops
> the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> 
> Another situation is when the CPU using the lock ends in some
> infinite loop because something went wrong. The system is in
> an unpredictable state during panic().
> 
> I am not sure if this is possible with the code under gsmi_dev.lock
> but such things really happen during panic() in other subsystems.
> Using trylock in the panic() code path is a good practice.
> 
> Best Regards,
> Petr

Makes total sense, thanks for confirming!
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers
  2022-05-09 12:50     ` Guilherme G. Piccoli
@ 2022-05-10 13:53       ` Michael Ellerman
  2022-05-10 14:10         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Michael Ellerman @ 2022-05-10 13:53 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Hari Bathini
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, pmladek, kexec, bhe,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Nicholas Piggin, Paul Mackerras, akpm

"Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
> On 05/05/2022 15:55, Hari Bathini wrote:
>> [...] 
>> The change looks good. I have tested it on an LPAR (ppc64).
>> 
>> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
>> 
>
> Hi Michael. do you think it's possible to add this one to powerpc/next
> (or something like that), or do you prefer a V2 with his tag?

Ah sorry, I assumed it was going in as part of the whole series. I guess
I misread the cover letter.

So you want me to take this patch on its own via the powerpc tree?

cheers

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers
  2022-05-10 13:53       ` Michael Ellerman
@ 2022-05-10 14:10         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-10 14:10 UTC (permalink / raw)
  To: Michael Ellerman, Hari Bathini
  Cc: linux-kernel, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, pmladek, kexec, bhe,
	linux-leds, linux-mips, linux-parisc, linux-pm, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Benjamin Herrenschmidt, Nicholas Piggin, Paul Mackerras, akpm

On 10/05/2022 10:53, Michael Ellerman wrote:
> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
>> On 05/05/2022 15:55, Hari Bathini wrote:
>>> [...] 
>>> The change looks good. I have tested it on an LPAR (ppc64).
>>>
>>> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
>>>
>>
>> Hi Michael. do you think it's possible to add this one to powerpc/next
>> (or something like that), or do you prefer a V2 with his tag?
> 
> Ah sorry, I assumed it was going in as part of the whole series. I guess
> I misread the cover letter.
> 
> So you want me to take this patch on its own via the powerpc tree?
> 
> cheers

Hi Michael, thanks for the prompt response!

You didn't misread, that was the plan heh
But some maintainers start to take patches and merge in their trees, and
in the end, it seems to make sense - almost half of this series are
fixes or clean-ups, that are not really necessary to get merged altogether.

So, if you can take this one, I'd appreciate - it'll make V2 a bit
smaller =)

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 10/30] alpha: Clean-up the panic notifier code
  2022-05-09 14:13   ` Guilherme G. Piccoli
@ 2022-05-10 14:16     ` Petr Mladek
  2022-05-11 20:10       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-10 14:16 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Ivan Kokshaysky, Matt Turner, rth, akpm, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, bhe, kexec,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Mon 2022-05-09 11:13:17, Guilherme G. Piccoli wrote:
> On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> > The alpha panic notifier has some code issues, not following
> > the conventions of other notifiers. Also, it might halt the
> > machine but still it is set to run as early as possible, which
> > doesn't seem to be a good idea.

Yeah, it is pretty strange behavior.

I looked into the history. This notifier was added into the alpha code
in 2.4.0-test2pre2. In this historic code, the default panic() code
either rebooted after a timeout or ended in a infinite loop. There
was not crasdump at that times.

The notifier allowed to change the behavior. There were 3 notifiers:

   + mips and mips64 ended with blinking in panic()
   + alpha did __halt() in this srm case

They both still do this. I guess that it is some historic behavior
that people using these architectures are used to.

Anyway, it makes sense to do this as the last notifier after
dumping other information.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering
  2022-04-27 22:49 ` [PATCH 11/30] um: Improve panic notifiers consistency and ordering Guilherme G. Piccoli
  2022-04-28  8:30   ` Johannes Berg
@ 2022-05-10 14:28   ` Petr Mladek
  2022-05-11 20:22     ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-10 14:28 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Anton Ivanov, Johannes Berg,
	Richard Weinberger

On Wed 2022-04-27 19:49:05, Guilherme G. Piccoli wrote:
> Currently the panic notifiers from user mode linux don't follow
> the convention for most of the other notifiers present in the
> kernel (indentation, priority setting, numeric return).
> More important, the priorities could be improved, since it's a
> special case (userspace), hence we could run the notifiers earlier;
> user mode linux shouldn't care much with other panic notifiers but
> the ordering among the mconsole and arch notifier is important,
> given that the arch one effectively triggers a core dump.

It is not clear to me why user mode linux should not care about
the other notifiers. It might be because I do not know much
about the user mode linux.

Is the because they always create core dump or are never running
in a hypervisor or ...?

AFAIK, the notifiers do many different things. For example, there
is a notifier that disables RCU watchdog, print some extra
information. Why none of them make sense here?

> This patch fixes that by running the mconsole notifier as the first
> panic notifier, followed by the architecture one (that coredumps).
> Also, we remove a useless header inclusion.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks
  2022-04-27 22:49 ` [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks Guilherme G. Piccoli
@ 2022-05-10 15:16   ` Petr Mladek
  2022-05-10 16:16     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-10 15:16 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Wed 2022-04-27 19:49:08, Guilherme G. Piccoli wrote:
> The notifiers infrastructure provides a way to pass an "id" to the
> callbacks to determine what kind of event happened, i.e., what is
> the reason behind they getting called.
> 
> The panic notifier currently pass 0, but this is soon to be
> used in a multi-targeted notifier, so let's pass a meaningful
> "id" over there.
>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  include/linux/panic_notifier.h | 5 +++++
>  kernel/panic.c                 | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index 41e32483d7a7..07dced83a783 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -9,4 +9,9 @@ extern struct atomic_notifier_head panic_notifier_list;
>  
>  extern bool crash_kexec_post_notifiers;
>  
> +enum panic_notifier_val {
> +	PANIC_UNUSED,
> +	PANIC_NOTIFIER = 0xDEAD,
> +};

Hmm, this looks like a hack. PANIC_UNUSED will never be used.
All notifiers will be always called with PANIC_NOTIFIER.

The @val parameter is normally used when the same notifier_list
is used in different situations.

But you are going to use it when the same notifier is used
in more lists. This is normally distinguished by the @nh
(atomic_notifier_head) parameter.

IMHO, it is a bad idea. First, it would confuse people because
it does not follow the original design of the parameters.
Second, the related code must be touched anyway when
the notifier is moved into another list so it does not
help much.

Or do I miss anything, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers
  2022-04-27 22:49 ` [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers Guilherme G. Piccoli
  2022-05-02 15:38   ` Florian Fainelli
@ 2022-05-10 15:28   ` Petr Mladek
  2022-05-17 15:32     ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-10 15:28 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Brian Norris, Florian Fainelli

On Wed 2022-04-27 19:49:09, Guilherme G. Piccoli wrote:
> This patch improves the panic/die notifiers in this driver by
> making use of a passed "id" instead of comparing pointer
> address; also, it removes an useless prototype declaration
> and unnecessary header inclusion.
> 
> This is part of a panic notifiers refactor - this notifier in
> the future will be moved to a new list, that encompass the
> information notifiers only.
> 
> --- a/drivers/bus/brcmstb_gisb.c
> +++ b/drivers/bus/brcmstb_gisb.c
> @@ -347,25 +346,14 @@ static irqreturn_t brcmstb_gisb_bp_handler(int irq, void *dev_id)
>  /*
>   * Dump out gisb errors on die or panic.
>   */
> -static int dump_gisb_error(struct notifier_block *self, unsigned long v,
> -			   void *p);
> -
> -static struct notifier_block gisb_die_notifier = {
> -	.notifier_call = dump_gisb_error,
> -};
> -
> -static struct notifier_block gisb_panic_notifier = {
> -	.notifier_call = dump_gisb_error,
> -};
> -
>  static int dump_gisb_error(struct notifier_block *self, unsigned long v,
>  			   void *p)
>  {
>  	struct brcmstb_gisb_arb_device *gdev;
> -	const char *reason = "panic";
> +	const char *reason = "die";
>  
> -	if (self == &gisb_die_notifier)
> -		reason = "die";
> +	if (v == PANIC_NOTIFIER)
> +		reason = "panic";

IMHO, the check of the @self parameter was the proper solution.

"gisb_die_notifier" list uses @val from enum die_val.
"gisb_panic_notifier" list uses @val from enum panic_notifier_val.

These are unrelated types. It might easily break when
someone defines the same constant also in enum die_val.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks
  2022-05-10 15:16   ` Petr Mladek
@ 2022-05-10 16:16     ` Guilherme G. Piccoli
  2022-05-17 13:11       ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-10 16:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On 10/05/2022 12:16, Petr Mladek wrote:
> [...]
> Hmm, this looks like a hack. PANIC_UNUSED will never be used.
> All notifiers will be always called with PANIC_NOTIFIER.
> 
> The @val parameter is normally used when the same notifier_list
> is used in different situations.
> 
> But you are going to use it when the same notifier is used
> in more lists. This is normally distinguished by the @nh
> (atomic_notifier_head) parameter.
> 
> IMHO, it is a bad idea. First, it would confuse people because
> it does not follow the original design of the parameters.
> Second, the related code must be touched anyway when
> the notifier is moved into another list so it does not
> help much.
> 
> Or do I miss anything, please?
> 
> Best Regards,
> Petr

Hi Petr, thanks for the review.

I'm not strong attached to this patch, so we could drop it and refactor
the code of next patches to use the @nh as identification - but
personally, I feel this parameter could be used to identify the list
that called such function, in other words, what is the event that
triggered the callback. Some notifiers are even declared with this
parameter called "ev", like the event that triggers the notifier.


You mentioned 2 cases:

(a) Same notifier_list used in different situations;

(b) Same *notifier callback* used in different lists;

Mine is case (b), right? Can you show me an example of case (a)? You can
see in the following patches (or grep the kernel) that people are using
this identification parameter to determine which kind of OOPS trigger
the callback to condition the execution of the function to specific
cases. IIUIC, this is more or less what I'm doing, but extending the
idea for panic notifiers.

Again, as a personal preference, it makes sense to me using id's VS
comparing pointers to differentiate events/callers.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-05-10 11:38       ` Petr Mladek
  2022-05-10 13:04         ` Guilherme G. Piccoli
@ 2022-05-10 17:20         ` Steven Rostedt
  2022-05-10 19:40           ` John Ogness
  1 sibling, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-05-10 17:20 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Guilherme G. Piccoli, Evan Green, Andrew Morton, bhe, kexec,
	LKML, bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	Linux PM, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, Alan Stern,
	Thomas Gleixner, vgoyal, vkuznets, Will Deacon, Ard Biesheuvel,
	David Gow, Julius Werner

On Tue, 10 May 2022 13:38:39 +0200
Petr Mladek <pmladek@suse.com> wrote:

> As already mentioned in the other reply, panic() sometimes stops
> the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> 
> Another situation is when the CPU using the lock ends in some
> infinite loop because something went wrong. The system is in
> an unpredictable state during panic().
> 
> I am not sure if this is possible with the code under gsmi_dev.lock
> but such things really happen during panic() in other subsystems.
> Using trylock in the panic() code path is a good practice.

I believe that Peter Zijlstra had a special spin lock for NMIs or early
printk, where it would not block if the lock was held on the same CPU. That
is, if an NMI happened and paniced while this lock was held on the same
CPU, it would not deadlock. But it would block if the lock was held on
another CPU.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  2022-04-28  1:01   ` Xiaoming Ni
  2022-04-29 19:38     ` Guilherme G. Piccoli
@ 2022-05-10 17:29     ` Steven Rostedt
  2022-05-16 16:14       ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-05-10 17:29 UTC (permalink / raw)
  To: Xiaoming Ni
  Cc: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Arjan van de Ven, Cong Wang,
	Sebastian Andrzej Siewior, Valentin Schneider

On Thu, 28 Apr 2022 09:01:13 +0800
Xiaoming Ni <nixiaoming@huawei.com> wrote:

> > +#ifdef CONFIG_DEBUG_NOTIFIERS
> > +	{
> > +		char sym_name[KSYM_NAME_LEN];
> > +
> > +		pr_info("notifiers: registered %s()\n",
> > +			notifier_name(n, sym_name));
> > +	}  
> 
> Duplicate Code.
> 
> Is it better to use __func__ and %pS?
> 
> pr_info("%s: %pS\n", __func__, n->notifier_call);
> 
> 
> > +#endif

Also, don't sprinkle #ifdef in C code. Instead:

	if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
		pr_info("notifers: regsiter %ps()\n",
			n->notifer_call);


Or define a print macro at the start of the C file that is a nop if it is
not defined, and use the macro.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers
  2022-04-27 22:49 ` [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers Guilherme G. Piccoli
@ 2022-05-10 17:40   ` Steven Rostedt
  2022-05-11 20:03     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2022-05-10 17:40 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Wed, 27 Apr 2022 19:49:17 -0300
"Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:

> Currently we don't have a way to check if there are dumpers set,
> except counting the list members maybe. This patch introduces a very
> simple helper to provide this information, by just keeping track of
> registered/unregistered kmsg dumpers. It's going to be used on the
> panic path in the subsequent patch.

FYI, it is considered "bad form" to reference in the change log "this
patch". We know this is a patch. The change log should just talk about what
is being done. So can you reword your change logs (you do this is almost
every patch). Here's what I would reword the above to be:

 Currently we don't have a way to check if there are dumpers set, except
 perhaps by counting the list members. Introduce a very simple helper to
 provide this information, by just keeping track of registered/unregistered
 kmsg dumpers. This will simplify the refactoring of the panic path.


-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-05-10 17:20         ` Steven Rostedt
@ 2022-05-10 19:40           ` John Ogness
  2022-05-11 11:13             ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: John Ogness @ 2022-05-10 19:40 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Guilherme G. Piccoli, Evan Green, Andrew Morton, bhe, kexec,
	LKML, bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	Linux PM, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, Kees Cook, luto, mhiramat, mingo,
	paulmck, peterz, senozhatsky, Alan Stern, Thomas Gleixner,
	vgoyal, vkuznets, Will Deacon, Ard Biesheuvel, David Gow,
	Julius Werner

On 2022-05-10, Steven Rostedt <rostedt@goodmis.org> wrote:
>> As already mentioned in the other reply, panic() sometimes stops the
>> other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
>> 
>> Another situation is when the CPU using the lock ends in some
>> infinite loop because something went wrong. The system is in
>> an unpredictable state during panic().
>> 
>> I am not sure if this is possible with the code under gsmi_dev.lock
>> but such things really happen during panic() in other subsystems.
>> Using trylock in the panic() code path is a good practice.
>
> I believe that Peter Zijlstra had a special spin lock for NMIs or
> early printk, where it would not block if the lock was held on the
> same CPU. That is, if an NMI happened and paniced while this lock was
> held on the same CPU, it would not deadlock. But it would block if the
> lock was held on another CPU.

Yes. And starting with 5.19 it will be carrying the name that _you_ came
up with (cpu_sync):

printk_cpu_sync_get_irqsave()
printk_cpu_sync_put_irqrestore()

John

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
  2022-05-09 15:52   ` Sean Christopherson
@ 2022-05-10 20:11     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-10 20:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, David P . Reed, Paolo Bonzini

On 09/05/2022 12:52, Sean Christopherson wrote:
> I find the shortlog to be very confusing, the bug has nothing to do with disabling
> VMX and I distinctly remember wrapping VMXOFF with exception fixup to prevent doom
> if VMX is already disabled :-).  The issue is really that nmi_shootdown_cpus() doesn't
> play nice with being called twice.
> 

Hey Sean, OK - I agree with you, the issue is really about the double
list addition.

> [...]
> 
> Call stacks for the two callers would be very, very helpful.
> [...]

> This feels like were just adding more duct tape to the mess.  nmi_shootdown() is
> still unsafe for more than one caller, and it takes a _lot_ of staring and searching
> to understand that crash_smp_send_stop() is invoked iff CONFIG_KEXEC_CORE=y, i.e.
> that it will call smp_ops.crash_stop_other_cpus() and not just smp_send_stop().
> 
> Rather than shared a flag between two relatively unrelated functions, what if we
> instead disabling virtualization in crash_nmi_callback() and then turn the reboot
> call into a nop if an NMI shootdown has already occurred?  That will also add a
> bit of documentation about multiple shootdowns not working.
> 
> And I believe there's also a lurking bug in native_machine_emergency_restart() that
> can be fixed with cleanup.  SVM can also block INIT and so should be disabled during
> an emergency reboot.
> 
> The attached patches are compile tested only.  If they seem sane, I'll post an
> official mini series.

Thanks Sean, it makes sense - my patch is more a "band-aid" whereas
yours fixes it in a more generic way. Confess I found the logic of your
patch complex, but as you said, it requires a *lot* of code analysis to
understand these multiple shutdown patches, the problem is complicated
by nature heh

I've tested your patch 0001 and it works well for all cases [0], so go
ahead and submit the miniseries, feel free to add:

Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>


I've read patch 0002 and it makes sense to me as well, a good proactive
bug fix =)

With that said, I'll of course drop this one from V2 of this series.
Cheers,


Guilherme




[0]
A summary of my tests and the code paths that the panic shutdown take
depending on some conditions:

New function that disables VMX/SVM: cpu_crash_disable_virtualization()
[should be executed in every online CPU on shutdown)

The panic path triggers the following call stacks depending on kdump and
post_notifiers:


(1) kexec/kdump + !crash_kexec_post_notifiers
->machine_crash_shutdown()
----.crash_shutdown() <custom handler>
------native_machine_crash_shutdown() [all custom handlers except Xen PV
call the native generic function]
--------crash_smp_send_stop()
----------kdump_nmi_shootdown_cpus()
------------nmi_shootdown_cpus(kdump_nmi_callback)
--------------crash_nmi_callback()
----------------kdump_nmi_callback()
------------------cpu_crash_disable_virtualization()


(2) kexec/kdump + crash_kexec_post_notifiers
->crash_smp_send_stop()
----kdump_nmi_shootdown_cpus()
------nmi_shootdown_cpus(kdump_nmi_callback)
--------crash_nmi_callback()
----------kdump_nmi_callback()
------------cpu_crash_disable_virtualization()

After this path, will execute machine_crash_shutdown() but
crash_smp_send_stop()
is guarded against double execution. Also, emergency restart calls
emergency_vmx_disable_all() .


(3) !kexec/kdump + crash_kexec_post_notifiers

Same as (2)


(4) !kexec/kdump + !crash_kexec_post_notifiers
-> smp_send_stop()
----native_stop_other_cpus()
------apic_send_IPI_allbutself(REBOOT_VECTOR)
--------sysvec_reboot
----------cpu_emergency_vmxoff() <if the IPI approach succeeded, CPU
stopped here>

If not:
------register_stop_handler()
--------apic_send_IPI_allbutself(NMI_VECTOR)
----------smp_stop_nmi_callback()
------------cpu_emergency_vmxoff()

After that, emergency_vmx_disable_all() gets called in the emergency
restart path as well.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
  2022-05-10 19:40           ` John Ogness
@ 2022-05-11 11:13             ` Petr Mladek
  0 siblings, 0 replies; 183+ messages in thread
From: Petr Mladek @ 2022-05-11 11:13 UTC (permalink / raw)
  To: John Ogness
  Cc: Steven Rostedt, Guilherme G. Piccoli, Evan Green, Andrew Morton,
	bhe, kexec, LKML, bcm-kernel-feedback-list, linuxppc-dev,
	linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips,
	linux-parisc, Linux PM, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, Kees Cook, luto, mhiramat, mingo,
	paulmck, peterz, senozhatsky, Alan Stern, Thomas Gleixner,
	vgoyal, vkuznets, Will Deacon, Ard Biesheuvel, David Gow,
	Julius Werner

On Tue 2022-05-10 21:46:38, John Ogness wrote:
> On 2022-05-10, Steven Rostedt <rostedt@goodmis.org> wrote:
> >> As already mentioned in the other reply, panic() sometimes stops the
> >> other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> >> 
> >> Another situation is when the CPU using the lock ends in some
> >> infinite loop because something went wrong. The system is in
> >> an unpredictable state during panic().
> >> 
> >> I am not sure if this is possible with the code under gsmi_dev.lock
> >> but such things really happen during panic() in other subsystems.
> >> Using trylock in the panic() code path is a good practice.
> >
> > I believe that Peter Zijlstra had a special spin lock for NMIs or
> > early printk, where it would not block if the lock was held on the
> > same CPU. That is, if an NMI happened and paniced while this lock was
> > held on the same CPU, it would not deadlock. But it would block if the
> > lock was held on another CPU.
> 
> Yes. And starting with 5.19 it will be carrying the name that _you_ came
> up with (cpu_sync):
> 
> printk_cpu_sync_get_irqsave()
> printk_cpu_sync_put_irqrestore()

There is a risk that this lock might become a big kernel lock.

This special lock would need to be used even during normal
system operation. It does not make sense to suddenly start using
another lock during panic.

So I think that we should think twice before using it.
I would prefer using trylock of the original lock when
possible during panic.

It is possible that I miss something.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 17/30] tracing: Improve panic/die notifiers
  2022-04-27 22:49 ` [PATCH 17/30] tracing: Improve panic/die notifiers Guilherme G. Piccoli
  2022-04-29  9:22   ` Sergei Shtylyov
@ 2022-05-11 11:45   ` Petr Mladek
  2022-05-17 15:33     ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-11 11:45 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Wed 2022-04-27 19:49:11, Guilherme G. Piccoli wrote:
> Currently the tracing dump_on_oops feature is implemented
> through separate notifiers, one for die/oops and the other
> for panic. With the addition of panic notifier "id", this
> patch makes use of such "id" to unify both functions.
> 
> It also comments the function and changes the priority of the
> notifier blocks, in order they run early compared to other
> notifiers, to prevent useless trace data (like the callback
> names for the other notifiers). Finally, we also removed an
> unnecessary header inclusion.
> 
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -9767,38 +9766,46 @@ static __init int tracer_init_tracefs(void)
>  
>  fs_initcall(tracer_init_tracefs);
>  
> -static int trace_panic_handler(struct notifier_block *this,
> -			       unsigned long event, void *unused)
> +/*
> + * The idea is to execute the following die/panic callback early, in order
> + * to avoid showing irrelevant information in the trace (like other panic
> + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> + * warnings get disabled (to prevent potential log flooding).
> + */
> +static int trace_die_panic_handler(struct notifier_block *self,
> +				unsigned long ev, void *unused)
>  {
> -	if (ftrace_dump_on_oops)
> +	int do_dump;
> +
> +	if (!ftrace_dump_on_oops)
> +		return NOTIFY_DONE;
> +
> +	switch (ev) {
> +	case DIE_OOPS:
> +		do_dump = 1;
> +		break;
> +	case PANIC_NOTIFIER:
> +		do_dump = 1;
> +		break;

DIE_OOPS and PANIC_NOTIFIER are from different enum.
It feels like comparing apples with oranges here.

IMHO, the proper way to unify the two notifiers is
a check of the @self parameter. Something like:

static int trace_die_panic_handler(struct notifier_block *self,
				unsigned long ev, void *unused)
{
	if (self == trace_die_notifier && val != DIE_OOPS)
		goto out;

	ftrace_dump(ftrace_dump_on_oops);
out:
	return NOTIFY_DONE;
}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list
  2022-05-09 14:16   ` Guilherme G. Piccoli
@ 2022-05-11 16:45     ` Heiko Carstens
  2022-05-11 19:58       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Heiko Carstens @ 2022-05-11 16:45 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Alexander Gordeev, Christian Borntraeger, David S. Miller,
	Sven Schnelle, Vasily Gorbik, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, pmladek, bhe, akpm, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390, kexec,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On Mon, May 09, 2022 at 11:16:10AM -0300, Guilherme G. Piccoli wrote:
> On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> > Currently we have 3 notifier lists in the panic path, which will
> > be wired in a way to allow the notifier callbacks to run in
> > different moments at panic time, in a subsequent patch.
> > 
> > But there is also an odd set of architecture calls hardcoded in
> > the end of panic path, after the restart machinery. They're
> > responsible for late time tunings / events, like enabling a stop
> > button (Sparc) or effectively stopping the machine (s390).
> > 
> > This patch introduces yet another notifier list to offer the
> > architectures a way to add callbacks in such late moment on
> > panic path without the need of ifdefs / hardcoded approaches.
> > 
> > Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> > Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Heiko Carstens <hca@linux.ibm.com>
> > Cc: Sven Schnelle <svens@linux.ibm.com>
> > Cc: Vasily Gorbik <gor@linux.ibm.com>
> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> Hey S390/SPARC folks, sorry for the ping!
> 
> Any reviews on this V1 would be greatly appreciated, I'm working on V2
> and seeking feedback in the non-reviewed patches.

Sorry, missed that this is quite s390 specific. So, yes, this looks
good to me and nice to see that one of the remaining CONFIG_S390 in
common code will be removed!

For the s390 bits:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list
  2022-05-11 16:45     ` Heiko Carstens
@ 2022-05-11 19:58       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-11 19:58 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Alexander Gordeev, Christian Borntraeger, David S. Miller,
	Sven Schnelle, Vasily Gorbik, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, pmladek, bhe, akpm, linux-mips,
	linux-parisc, linux-pm, linux-remoteproc, linux-s390, kexec,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On 11/05/2022 13:45, Heiko Carstens wrote:
> [...]
>>
>> Hey S390/SPARC folks, sorry for the ping!
>>
>> Any reviews on this V1 would be greatly appreciated, I'm working on V2
>> and seeking feedback in the non-reviewed patches.
> 
> Sorry, missed that this is quite s390 specific. So, yes, this looks
> good to me and nice to see that one of the remaining CONFIG_S390 in
> common code will be removed!
> 
> For the s390 bits:
> Acked-by: Heiko Carstens <hca@linux.ibm.com>

No need for apologies, I really appreciate your review!
Will add your Ack for V2 =)

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers
  2022-05-10 17:40   ` Steven Rostedt
@ 2022-05-11 20:03     ` Guilherme G. Piccoli
  2022-05-16 14:50       ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-11 20:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, senozhatsky, stern, tglx, vgoyal,
	vkuznets, will

On 10/05/2022 14:40, Steven Rostedt wrote:
> On Wed, 27 Apr 2022 19:49:17 -0300
> "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
> 
>> Currently we don't have a way to check if there are dumpers set,
>> except counting the list members maybe. This patch introduces a very
>> simple helper to provide this information, by just keeping track of
>> registered/unregistered kmsg dumpers. It's going to be used on the
>> panic path in the subsequent patch.
> 
> FYI, it is considered "bad form" to reference in the change log "this
> patch". We know this is a patch. The change log should just talk about what
> is being done. So can you reword your change logs (you do this is almost
> every patch). Here's what I would reword the above to be:
> 
>  Currently we don't have a way to check if there are dumpers set, except
>  perhaps by counting the list members. Introduce a very simple helper to
>  provide this information, by just keeping track of registered/unregistered
>  kmsg dumpers. This will simplify the refactoring of the panic path.

Thanks for the hint, you're right - it's almost in all of my patches.
I'll reword all of them (except the ones already merged) to remove this
"bad form".

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 10/30] alpha: Clean-up the panic notifier code
  2022-05-10 14:16     ` Petr Mladek
@ 2022-05-11 20:10       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-11 20:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Ivan Kokshaysky, Matt Turner, rth, akpm, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, bhe, kexec,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On 10/05/2022 11:16, Petr Mladek wrote:
> [...]
> Yeah, it is pretty strange behavior.
> 
> I looked into the history. This notifier was added into the alpha code
> in 2.4.0-test2pre2. In this historic code, the default panic() code
> either rebooted after a timeout or ended in a infinite loop. There
> was not crasdump at that times.
> 
> The notifier allowed to change the behavior. There were 3 notifiers:
> 
>    + mips and mips64 ended with blinking in panic()
>    + alpha did __halt() in this srm case
> 
> They both still do this. I guess that it is some historic behavior
> that people using these architectures are used to.
> 
> Anyway, it makes sense to do this as the last notifier after
> dumping other information.
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> 
> Best Regards,
> Petr

Thanks a bunch for the review - added your tag for V2 =)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering
  2022-05-10 14:28   ` Petr Mladek
@ 2022-05-11 20:22     ` Guilherme G. Piccoli
  2022-05-13 14:44       ` Johannes Berg
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-11 20:22 UTC (permalink / raw)
  To: Petr Mladek, Anton Ivanov, Johannes Berg, Richard Weinberger
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On 10/05/2022 11:28, Petr Mladek wrote:
> [...]
> It is not clear to me why user mode linux should not care about
> the other notifiers. It might be because I do not know much
> about the user mode linux.
> 
> Is the because they always create core dump or are never running
> in a hypervisor or ...?
> 
> AFAIK, the notifiers do many different things. For example, there
> is a notifier that disables RCU watchdog, print some extra
> information. Why none of them make sense here?
>

Hi Petr, my understanding is that UML is a form of running Linux as a
regular userspace process for testing purposes. With that said, as soon
as we exit in the error path, less "pollution" would happen, so users
can use GDB to debug the core dump for example.

In later patches of this series (when we split the panic notifiers in 3
lists) these UML notifiers run in the pre-reboot list, so they run after
the informational notifiers for example (in the default level).
But without the list split we cannot order properly, so my gut feeling
is that makes sense to run them rather earlier than later in the panic
process...

Maybe Anton / Johannes / Richard could give their opinions - appreciate
that, I'm not attached to the priority here, it's more about users'
common usage of UML I can think of...

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-04-27 22:49 ` [PATCH 24/30] panic: Refactor the panic path Guilherme G. Piccoli
                     ` (2 preceding siblings ...)
  2022-05-09 15:16   ` d.hatayama
@ 2022-05-12 14:03   ` Petr Mladek
  2022-05-15 22:47     ` Guilherme G. Piccoli
  2022-05-24 14:44   ` Eric W. Biederman
  4 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-12 14:03 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

Hello,

first, I am sorry for stepping into the discussion so late.
I was busy with some other stuff and this patchset is far
from trivial.

Second, thanks a lot for putting so much effort into it.
Most of the changes look pretty good, especially all
the fixes of particular notifiers and split into
four lists.

Though this patch will need some more love. See below
for more details.


On Wed 2022-04-27 19:49:18, Guilherme G. Piccoli wrote:
> The panic() function is somewhat convoluted - a lot of changes were
> made over the years, adding comments that might be misleading/outdated
> now, it has a code structure that is a bit complex to follow, with
> lots of conditionals, for example. The panic notifier list is something
> else - a single list, with multiple callbacks of different purposes,
> that run in a non-deterministic order and may affect hardly kdump
> reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.
> 
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3784,6 +3791,33 @@
>  			timeout < 0: reboot immediately
>  			Format: <timeout>
>  
> +	panic_notifiers_level=
> +			[KNL] Set the panic notifiers execution order.
> +			Format: <unsigned int>
> +			We currently have 4 lists of panic notifiers; based
> +			on the functionality and risk (for panic success) the
> +			callbacks are added in a given list. The lists are:
> +			- hypervisor/FW notification list (low risk);
> +			- informational list (low/medium risk);
> +			- pre_reboot list (higher risk);
> +			- post_reboot list (only run late in panic and after
> +			kdump, not configurable for now).
> +			This parameter defines the ordering of the first 3
> +			lists with regards to kdump; the levels determine
> +			which set of notifiers execute before kdump. The
> +			accepted levels are:

This talks only about kdump. The reality is much more complicated.
The level affect the order of:

    + notifiers vs. kdump
    + notifiers vs. crash_dump
    + crash_dump vs. kdump

There might theoretically many variants of the ordering of kdump,
crash_dump, and the 4 notifier list. Some variants do not make
much sense. You choose 5 variants and tried to select them by
a level number.

The question is if we really could easily describe the meaning this
way. It is not only about a "level" of notifiers before kdump. It is
also about the ordering of crash_dump vs. kdump. IMHO, "level"
semantic does not fit there.

Maybe more parameters might be easier to understand the effect.
Anyway, we first need to agree on the chosen variants.
I am going to discuss it more in the code, see below.



> +			0: kdump is the first thing to run, NO list is
> +			executed before kdump.
> +			1: only the hypervisor list is executed before kdump.
> +			2 (default level): the hypervisor list and (*if*
> +			there's any kmsg_dumper defined) the informational
> +			list are executed before kdump.
> +			3: both the hypervisor and the informational lists
> +			(always) execute before kdump.
> +			4: the 3 lists (hypervisor, info and pre_reboot)
> +			execute before kdump - this behavior is analog to the
> +			deprecated parameter "crash_kexec_post_notifiers".
> +
>  	panic_print=	Bitmask for printing system info when panic happens.
>  			User can chose combination of the following bits:
>  			bit 0: print all tasks info
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -183,6 +195,112 @@ static void panic_print_sys_info(bool console_flush)
>  		ftrace_dump(DUMP_ALL);
>  }
>  
> +/*
> + * Helper that accumulates all console flushing routines executed on panic.
> + */
> +static void console_flushing(void)
> +{
> +#ifdef CONFIG_VT
> +	unblank_screen();
> +#endif
> +	console_unblank();
> +
> +	/*
> +	 * In this point, we may have disabled other CPUs, hence stopping the
> +	 * CPU holding the lock while still having some valuable data in the
> +	 * console buffer.
> +	 *
> +	 * Try to acquire the lock then release it regardless of the result.
> +	 * The release will also print the buffers out. Locks debug should
> +	 * be disabled to avoid reporting bad unlock balance when panic()
> +	 * is not being called from OOPS.
> +	 */
> +	debug_locks_off();
> +	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
> +	panic_print_sys_info(true);
> +}
> +
> +#define PN_HYPERVISOR_BIT	0
> +#define PN_INFO_BIT		1
> +#define PN_PRE_REBOOT_BIT	2
> +#define PN_POST_REBOOT_BIT	3
> +
> +/*
> + * Determine the order of panic notifiers with regards to kdump.
> + *
> + * This function relies in the "panic_notifiers_level" kernel parameter
> + * to determine how to order the notifiers with regards to kdump. We
> + * have currently 5 levels. For details, please check the kernel docs for
> + * "panic_notifiers_level" at Documentation/admin-guide/kernel-parameters.txt.
> + *
> + * Default level is 2, which means the panic hypervisor and informational
> + * (unless we don't have any kmsg_dumper) lists will execute before kdump.
> + */
> +static void order_panic_notifiers_and_kdump(void)
> +{
> +	/*
> +	 * The parameter "crash_kexec_post_notifiers" is deprecated, but
> +	 * valid. Users that set it want really all panic notifiers to
> +	 * execute before kdump, so it's effectively the same as setting
> +	 * the panic notifiers level to 4.
> +	 */
> +	if (panic_notifiers_level >= 4 || crash_kexec_post_notifiers)
> +		return;
> +
> +	/*
> +	 * Based on the level configured (smaller than 4), we clear the
> +	 * proper bits in "panic_notifiers_bits". Notice that this bitfield
> +	 * is initialized with all notifiers set.
> +	 */
> +	switch (panic_notifiers_level) {
> +	case 3:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		break;
> +	case 2:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +
> +		if (!kmsg_has_dumpers())
> +			clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		break;
> +	case 1:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		break;
> +	case 0:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
> +		break;
> +	}
> +}
>
> +/*
> + * Set of helpers to execute the panic notifiers only once.
> + * Just the informational notifier cares about the return.
> + */
> +static inline bool notifier_run_once(struct atomic_notifier_head head,
> +				     char *buf, long bit)
> +{
> +	if (test_and_change_bit(bit, &panic_notifiers_bits)) {
> +		atomic_notifier_call_chain(&head, PANIC_NOTIFIER, buf);
> +		return true;
> +	}
> +	return false;
> +}

Here is the code using the above functions. It helps to discuss
the design and logic.

<kernel/panic.c>
	order_panic_notifiers_and_kdump();

	/* If no level, we should kdump ASAP. */
	if (!panic_notifiers_level)
		__crash_kexec(NULL);

	crash_smp_send_stop();
	panic_notifier_hypervisor_once(buf);

	if (panic_notifier_info_once(buf))
		kmsg_dump(KMSG_DUMP_PANIC);

	panic_notifier_pre_reboot_once(buf);

	__crash_kexec(NULL);

	panic_notifier_hypervisor_once(buf);

	if (panic_notifier_info_once(buf))
		kmsg_dump(KMSG_DUMP_PANIC);

	panic_notifier_pre_reboot_once(buf);
</kernel/panic.c>

I have to say that the logic is very unclear. Almost all
functions are called twice:

   + __crash_kexec()
   + kmsg_dump()
   + panic_notifier_hypervisor_once()
   + panic_notifier_pre_reboot_once()
   + panic_notifier_info_once()

It is pretty hard to find what functions are always called in the same
order and where the order can be inverted.

The really used code path is defined by order_panic_notifiers_and_kdump()
that encodes "level" into "bits". The bits are then flipped in
panic_notifier_*_once() calls that either do something or not.
kmsg_dump() is called according to the bit flip.

It is an interesting approach. I guess that you wanted to avoid too
many if/then/else levels in panic(). But honestly, it looks like
a black magic to me.

IMHO, it is always easier to follow if/then/else logic than using
a translation table that requires additional bit flips when
a value is used more times.

Also I guess that it is good proof that "level" abstraction does
not fit here. Normal levels would not need this kind of magic.


OK, the question is how to make it better. Let's start with
a clear picture of the problem:

1. panic() has basically two funtions:

      + show/store debug information (optional ways and amount)
      + do something with the system (reboot, stay hanged)


2. There are 4 ways how to show/store the information:

      + tell hypervisor to store what it is interested about
      + crash_dump
      + kmsg_dump()
      + consoles

  , where crash_dump and consoles are special:

     + crash_dump does not return. Instead it ends up with reboot.

     + Consoles work transparently. They just need an extra flush
       before reboot or staying hanged.


3. The various notifiers do things like:

     + tell hypervisor about the crash
     + print more information (also stop watchdogs)
     + prepare system for reboot (touch some interfaces)
     + prepare system for staying hanged (blinking)

   Note that it pretty nicely matches the 4 notifier lists.


Now, we need to decide about the ordering. The main area is how
to store the debug information. Consoles are transparent so
the quesition is about:

     + hypervisor
     + crash_dump
     + kmsg_dump

Some people need none and some people want all. There is a
risk that system might hung at any stage. This why people want to
make the order configurable.

But crash_dump() does not return when it succeeds. And kmsg_dump()
users havn't complained about hypervisor problems yet. So, that
two variants might be enough:

    + crash_dump (hypervisor, kmsg_dump as fallback)
    + hypervisor, kmsg_dump, crash_dump

One option "panic_prefer_crash_dump" should be enough.
And the code might look like:

void panic()
{
[...]
	dump_stack();
	kgdb_panic(buf);

	< ---  here starts the reworked code --- >

	/* crash dump is enough when enabled and preferred. */
	if (panic_prefer_crash_dump)
		__crash_kexec(NULL);

	/* Stop other CPUs and focus on handling the panic state. */
	if (has_kexec_crash_image)
		crash_smp_send_stop();
	else
		smp_send_stop()

	/* Notify hypervisor about the system panic. */
	atomic_notifier_call_chain(&panic_hypervisor_list, 0, NULL);

	/*
	 * No need to risk extra info when there is no kmsg dumper
	 * registered.
	 */
	if (!has_kmsg_dumper())
		__crash_kexec(NULL);

	/* Add extra info from different subsystems. */
	atomic_notifier_call_chain(&panic_info_list, 0, NULL);

	kmsg_dump(KMSG_DUMP_PANIC);
	__crash_kexec(NULL);

	/* Flush console */
	unblank_screen();
	console_unblank();
	debug_locks_off();
	console_flush_on_panic(CONSOLE_FLUSH_PENDING);

	if (panic_timeout > 0) {
		delay()
	}

	/*
	 * Prepare system for eventual reboot and allow custom
	 * reboot handling.
	 */
	atomic_notifier_call_chain(&panic_reboot_list, 0, NULL);

	if (panic_timeout != 0) {
		reboot();
	}

	/*
	 * Prepare system for the infinite waiting, for example,
	 * setup blinking.
	 */
	atomic_notifier_call_chain(&panic_loop_list, 0, NULL);

	infinite_loop();
}


__crash_kexec() is there 3 times but otherwise the code looks
quite straight forward.

Note 1: I renamed the two last notifier list. The name 'post-reboot'
	did sound strange from the logical POV ;-)

Note 2: We have to avoid the possibility to call "reboot" list
	before kmsg_dump(). All callbacks providing info
	have to be in the info list. It a callback combines
	info and reboot functionality then it should be split.

	There must be another way to calm down problematic
	info callbacks. And it has to be solved when such
	a problem is reported. Is there any known issue, please?

It is possible that I have missed something important.
But I would really like to make the logic as simple as possible.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering
  2022-05-11 20:22     ` Guilherme G. Piccoli
@ 2022-05-13 14:44       ` Johannes Berg
  2022-05-15 22:12         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Johannes Berg @ 2022-05-13 14:44 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Petr Mladek, Anton Ivanov, Richard Weinberger
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On Wed, 2022-05-11 at 17:22 -0300, Guilherme G. Piccoli wrote:
> On 10/05/2022 11:28, Petr Mladek wrote:
> > [...]
> > It is not clear to me why user mode linux should not care about
> > the other notifiers. It might be because I do not know much
> > about the user mode linux.
> > 
> > Is the because they always create core dump or are never running
> > in a hypervisor or ...?
> > 
> > AFAIK, the notifiers do many different things. For example, there
> > is a notifier that disables RCU watchdog, print some extra
> > information. Why none of them make sense here?
> > 
> 
> Hi Petr, my understanding is that UML is a form of running Linux as a
> regular userspace process for testing purposes.

Correct.

> With that said, as soon
> as we exit in the error path, less "pollution" would happen, so users
> can use GDB to debug the core dump for example.
> 
> In later patches of this series (when we split the panic notifiers in 3
> lists) these UML notifiers run in the pre-reboot list, so they run after
> the informational notifiers for example (in the default level).
> But without the list split we cannot order properly, so my gut feeling
> is that makes sense to run them rather earlier than later in the panic
> process...
> 
> Maybe Anton / Johannes / Richard could give their opinions - appreciate
> that, I'm not attached to the priority here, it's more about users'
> common usage of UML I can think of...

It's hard to say ... In a sense I'm not sure it matters?

OTOH something like the ftrace dump notifier (kernel/trace/trace.c)
might still be useful to run before the mconsole and coredump ones, even
if you could probably use gdb to figure out the information.

Personally, I don't have a scenario where I'd care about the trace
buffers though, and most of the others I found would seem irrelevant
(drivers that aren't even compiled, hung tasks won't really happen since
we exit immediately, and similar.)

johannes

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering
  2022-05-13 14:44       ` Johannes Berg
@ 2022-05-15 22:12         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-15 22:12 UTC (permalink / raw)
  To: Johannes Berg, Petr Mladek, Anton Ivanov, Richard Weinberger
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On 13/05/2022 11:44, Johannes Berg wrote:
> [...]
>> Maybe Anton / Johannes / Richard could give their opinions - appreciate
>> that, I'm not attached to the priority here, it's more about users'
>> common usage of UML I can think of...
> 
> It's hard to say ... In a sense I'm not sure it matters?
> 
> OTOH something like the ftrace dump notifier (kernel/trace/trace.c)
> might still be useful to run before the mconsole and coredump ones, even
> if you could probably use gdb to figure out the information.
> 
> Personally, I don't have a scenario where I'd care about the trace
> buffers though, and most of the others I found would seem irrelevant
> (drivers that aren't even compiled, hung tasks won't really happen since
> we exit immediately, and similar.)
> 
> johannes

Thanks Johannes, I agree with you.

We don't have great ordering now, one thing we need to enforce is the
order between the 2 UML notifiers, and this patch is doing that..trying
to order against other callbacks like the ftrace dumper is messy in the
current code.

OTOH if this patch set is accepted at some point, we'll likely have 3
lists, and with that we can improve ordering a lot - this notifier for
instance would run in the pre-reboot list, *after* the ftrace dumper (if
a kmsg dumper is set).

So, my intention is to keep this patch as is for V2 (with some changes
Johannes suggested before), unless Petr or the other maintainers want
something different.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-12 14:03   ` Petr Mladek
@ 2022-05-15 22:47     ` Guilherme G. Piccoli
  2022-05-16 10:21       ` Petr Mladek
  2022-05-19 23:45       ` Baoquan He
  0 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-15 22:47 UTC (permalink / raw)
  To: Petr Mladek, michael Kelley (LINUX), Baoquan He, Dave Young, d.hatayama
  Cc: akpm, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-arm-kernel, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	dave.hansen, feng.tang, gregkh, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

On 12/05/2022 11:03, Petr Mladek wrote:
> Hello,
> 
> first, I am sorry for stepping into the discussion so late.
> I was busy with some other stuff and this patchset is far
> from trivial.
> 
> Second, thanks a lot for putting so much effort into it.
> Most of the changes look pretty good, especially all
> the fixes of particular notifiers and split into
> four lists.
> 
> Though this patch will need some more love. See below
> for more details.

Thanks a lot for your review Petr, it is much appreciated! No need for
apologies, there is no urgency here =)


> [...] 
> This talks only about kdump. The reality is much more complicated.
> The level affect the order of:
> 
>     + notifiers vs. kdump
>     + notifiers vs. crash_dump
>     + crash_dump vs. kdump

First of all, I'd like to ask you please to clarify to me *exactly* what
are the differences between "crash_dump" and "kdump". I'm sorry if
that's a silly question, I need to be 100% sure I understand the
concepts the same way you do.


> There might theoretically many variants of the ordering of kdump,
> crash_dump, and the 4 notifier list. Some variants do not make
> much sense. You choose 5 variants and tried to select them by
> a level number.
> 
> The question is if we really could easily describe the meaning this
> way. It is not only about a "level" of notifiers before kdump. It is
> also about the ordering of crash_dump vs. kdump. IMHO, "level"
> semantic does not fit there.
> 
> Maybe more parameters might be easier to understand the effect.
> Anyway, we first need to agree on the chosen variants.
> I am going to discuss it more in the code, see below.
> 
> 
> [...] 
> Here is the code using the above functions. It helps to discuss
> the design and logic.
> 
> <kernel/panic.c>
> 	order_panic_notifiers_and_kdump();
> 
> 	/* If no level, we should kdump ASAP. */
> 	if (!panic_notifiers_level)
> 		__crash_kexec(NULL);
> 
> 	crash_smp_send_stop();
> 	panic_notifier_hypervisor_once(buf);
> 
> 	if (panic_notifier_info_once(buf))
> 		kmsg_dump(KMSG_DUMP_PANIC);
> 
> 	panic_notifier_pre_reboot_once(buf);
> 
> 	__crash_kexec(NULL);
> 
> 	panic_notifier_hypervisor_once(buf);
> 
> 	if (panic_notifier_info_once(buf))
> 		kmsg_dump(KMSG_DUMP_PANIC);
> 
> 	panic_notifier_pre_reboot_once(buf);
> </kernel/panic.c>
> 
> I have to say that the logic is very unclear. Almost all
> functions are called twice:
> 
>    + __crash_kexec()
>    + kmsg_dump()
>    + panic_notifier_hypervisor_once()
>    + panic_notifier_pre_reboot_once()
>    + panic_notifier_info_once()
> 
> It is pretty hard to find what functions are always called in the same
> order and where the order can be inverted.
> 
> The really used code path is defined by order_panic_notifiers_and_kdump()
> that encodes "level" into "bits". The bits are then flipped in
> panic_notifier_*_once() calls that either do something or not.
> kmsg_dump() is called according to the bit flip.
> 
> It is an interesting approach. I guess that you wanted to avoid too
> many if/then/else levels in panic(). But honestly, it looks like
> a black magic to me.
> 
> IMHO, it is always easier to follow if/then/else logic than using
> a translation table that requires additional bit flips when
> a value is used more times.
> 
> Also I guess that it is good proof that "level" abstraction does
> not fit here. Normal levels would not need this kind of magic.

Heheh OK, I appreciate your opinion, but I guess we'll need to agree in
disagree here - I'm much more fond to this kind of code than a bunch of
if/else blocks that almost give headaches. Encoding such "level" logic
in the if/else scheme is very convoluted, generates a very big code. And
the functions aren't so black magic - they map a level in bits, and the
functions _once() are called...once! Although we switch the position in
the code, so there are 2 calls, one of them is called and the other not.

But that's totally fine to change - especially if we're moving away from
the "level" logic. I see below you propose a much simpler approach - if
we follow that, definitely we won't need the "black magic" approach heheh


> 
> OK, the question is how to make it better. Let's start with
> a clear picture of the problem:
> 
> 1. panic() has basically two funtions:
> 
>       + show/store debug information (optional ways and amount)
>       + do something with the system (reboot, stay hanged)
> 
> 
> 2. There are 4 ways how to show/store the information:
> 
>       + tell hypervisor to store what it is interested about
>       + crash_dump
>       + kmsg_dump()
>       + consoles
> 
>   , where crash_dump and consoles are special:
> 
>      + crash_dump does not return. Instead it ends up with reboot.
> 
>      + Consoles work transparently. They just need an extra flush
>        before reboot or staying hanged.
> 
> 
> 3. The various notifiers do things like:
> 
>      + tell hypervisor about the crash
>      + print more information (also stop watchdogs)
>      + prepare system for reboot (touch some interfaces)
>      + prepare system for staying hanged (blinking)
> 
>    Note that it pretty nicely matches the 4 notifier lists.
> 

I really appreciate the summary skill you have, to convert complex
problems in very clear and concise ideas. Thanks for that, very useful!
I agree with what was summarized above.


> Now, we need to decide about the ordering. The main area is how
> to store the debug information. Consoles are transparent so
> the quesition is about:
> 
>      + hypervisor
>      + crash_dump
>      + kmsg_dump
> 
> Some people need none and some people want all. There is a
> risk that system might hung at any stage. This why people want to
> make the order configurable.
> 
> But crash_dump() does not return when it succeeds. And kmsg_dump()
> users havn't complained about hypervisor problems yet. So, that
> two variants might be enough:
> 
>     + crash_dump (hypervisor, kmsg_dump as fallback)
>     + hypervisor, kmsg_dump, crash_dump
> 
> One option "panic_prefer_crash_dump" should be enough.
> And the code might look like:
> 
> void panic()
> {
> [...]
> 	dump_stack();
> 	kgdb_panic(buf);
> 
> 	< ---  here starts the reworked code --- >
> 
> 	/* crash dump is enough when enabled and preferred. */
> 	if (panic_prefer_crash_dump)
> 		__crash_kexec(NULL);
> 
> 	/* Stop other CPUs and focus on handling the panic state. */
> 	if (has_kexec_crash_image)
> 		crash_smp_send_stop();
> 	else
> 		smp_send_stop()
> 

Here we have a very important point. Why do we need 2 variants of SMP
CPU stopping functions? I disagree with that - my understanding of this
after some study in architectures is that the crash_() variant is
"stronger", should work in all cases and if not, we should fix that -
that'd be a bug.

Such variant either maps to smp_send_stop() (in various architectures,
including XEN/x86) or overrides the basic function with more proper
handling for panic() case...I don't see why we still need such
distinction, if you / others have some insight about that, I'd like to
hear =)


> 	/* Notify hypervisor about the system panic. */
> 	atomic_notifier_call_chain(&panic_hypervisor_list, 0, NULL);
> 
> 	/*
> 	 * No need to risk extra info when there is no kmsg dumper
> 	 * registered.
> 	 */
> 	if (!has_kmsg_dumper())
> 		__crash_kexec(NULL);
> 
> 	/* Add extra info from different subsystems. */
> 	atomic_notifier_call_chain(&panic_info_list, 0, NULL);
> 
> 	kmsg_dump(KMSG_DUMP_PANIC);
> 	__crash_kexec(NULL);
> 
> 	/* Flush console */
> 	unblank_screen();
> 	console_unblank();
> 	debug_locks_off();
> 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> 
> 	if (panic_timeout > 0) {
> 		delay()
> 	}
> 
> 	/*
> 	 * Prepare system for eventual reboot and allow custom
> 	 * reboot handling.
> 	 */
> 	atomic_notifier_call_chain(&panic_reboot_list, 0, NULL);

You had the order of panic_reboot_list VS. consoles flushing inverted.
It might make sense, although I didn't do that in V1...
Are you OK in having a helper for console flushing, as I did in V1? It
makes code of panic() a bit less polluted / more focused I feel.


> 
> 	if (panic_timeout != 0) {
> 		reboot();
> 	}
> 
> 	/*
> 	 * Prepare system for the infinite waiting, for example,
> 	 * setup blinking.
> 	 */
> 	atomic_notifier_call_chain(&panic_loop_list, 0, NULL);
> 
> 	infinite_loop();
> }
> 
> 
> __crash_kexec() is there 3 times but otherwise the code looks
> quite straight forward.
> 
> Note 1: I renamed the two last notifier list. The name 'post-reboot'
> 	did sound strange from the logical POV ;-)
> 
> Note 2: We have to avoid the possibility to call "reboot" list
> 	before kmsg_dump(). All callbacks providing info
> 	have to be in the info list. It a callback combines
> 	info and reboot functionality then it should be split.
> 
> 	There must be another way to calm down problematic
> 	info callbacks. And it has to be solved when such
> 	a problem is reported. Is there any known issue, please?
> 
> It is possible that I have missed something important.
> But I would really like to make the logic as simple as possible.

OK, I agree with you! It's indeed simpler and if others agree, I can
happily change the logic to what you proposed. Although...currently the
"crash_kexec_post_notifiers" allows to call _all_ panic_reboot_list
callbacks _before kdump_.

We need to mention this change in the commit messages, but I really
would like to hear the opinions of heavy users of notifiers (as
Michael/Hyper-V) and the kdump interested parties (like Baoquan / Dave
Young / Hayatama). If we all agree on such approach, will change that
for V2 =)

Thanks again Petr, for the time spent in such detailed review!
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-15 22:47     ` Guilherme G. Piccoli
@ 2022-05-16 10:21       ` Petr Mladek
  2022-05-16 16:32         ` Guilherme G. Piccoli
  2022-05-19 23:45       ` Baoquan He
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-16 10:21 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: michael Kelley (LINUX),
	Baoquan He, Dave Young, d.hatayama, akpm, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, dave.hansen, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets,
	will

On Sun 2022-05-15 19:47:39, Guilherme G. Piccoli wrote:
> On 12/05/2022 11:03, Petr Mladek wrote:
> > This talks only about kdump. The reality is much more complicated.
> > The level affect the order of:
> > 
> >     + notifiers vs. kdump
> >     + notifiers vs. crash_dump
> >     + crash_dump vs. kdump
> 
> First of all, I'd like to ask you please to clarify to me *exactly* what
> are the differences between "crash_dump" and "kdump". I'm sorry if
> that's a silly question, I need to be 100% sure I understand the
> concepts the same way you do.

Ah, it should have been:

     + notifiers vs. kmsg_dump
     + notifiers vs. crash_dump
     + crash_dump vs. kmsg_dump

I am sorry for the confusion. Even "crash_dump" is slightly
misleading because there is no function with this name.
But it seems to be easier to understand than __crash_kexec().


> > There might theoretically many variants of the ordering of kdump,
> > crash_dump, and the 4 notifier list. Some variants do not make
> > much sense. You choose 5 variants and tried to select them by
> > a level number.
> > 
> > The question is if we really could easily describe the meaning this
> > way. It is not only about a "level" of notifiers before kdump. It is
> > also about the ordering of crash_dump vs. kdump. IMHO, "level"
> > semantic does not fit there.
> > 
> > Maybe more parameters might be easier to understand the effect.
> > Anyway, we first need to agree on the chosen variants.
> > I am going to discuss it more in the code, see below.
> > 
> > 
> > [...] 
> > Here is the code using the above functions. It helps to discuss
> > the design and logic.
> > 
> > I have to say that the logic is very unclear. Almost all
> > functions are called twice:
> > 
> > The really used code path is defined by order_panic_notifiers_and_kdump()
> > that encodes "level" into "bits". The bits are then flipped in
> > panic_notifier_*_once() calls that either do something or not.
> > kmsg_dump() is called according to the bit flip.
> > 
> > Also I guess that it is good proof that "level" abstraction does
> > not fit here. Normal levels would not need this kind of magic.
> 
> Heheh OK, I appreciate your opinion, but I guess we'll need to agree in
> disagree here - I'm much more fond to this kind of code than a bunch of
> if/else blocks that almost give headaches. Encoding such "level" logic
> in the if/else scheme is very convoluted, generates a very big code. And
> the functions aren't so black magic - they map a level in bits, and the
> functions _once() are called...once! Although we switch the position in
> the code, so there are 2 calls, one of them is called and the other not.

I see. Well, I would consider this as a warning that the approach is
too complex. If the code, using if/then/else, would cause headaches
then also understanding of the behavior would cause headaches for
both users and programmers.


> But that's totally fine to change - especially if we're moving away from
> the "level" logic. I see below you propose a much simpler approach - if
> we follow that, definitely we won't need the "black magic" approach heheh

I do not say that my proposal is fully correct. But we really need
this kind of simpler approach.


> > OK, the question is how to make it better.

> > One option "panic_prefer_crash_dump" should be enough.
> > And the code might look like:
> > 
> > void panic()
> > {
> > [...]
> > 	dump_stack();
> > 	kgdb_panic(buf);
> > 
> > 	< ---  here starts the reworked code --- >
> > 
> > 	/* crash dump is enough when enabled and preferred. */
> > 	if (panic_prefer_crash_dump)
> > 		__crash_kexec(NULL);
> > 
> > 	/* Stop other CPUs and focus on handling the panic state. */
> > 	if (has_kexec_crash_image)
> > 		crash_smp_send_stop();
> > 	else
> > 		smp_send_stop()
> > 
> 
> Here we have a very important point. Why do we need 2 variants of SMP
> CPU stopping functions? I disagree with that - my understanding of this
> after some study in architectures is that the crash_() variant is
> "stronger", should work in all cases and if not, we should fix that -
> that'd be a bug.
> 
> Such variant either maps to smp_send_stop() (in various architectures,
> including XEN/x86) or overrides the basic function with more proper
> handling for panic() case...I don't see why we still need such
> distinction, if you / others have some insight about that, I'd like to
> hear =)

The two variants were introduced by the commit 0ee59413c967c35a6dd
("x86/panic: replace smp_send_stop() with kdump friendly version in
panic path")

It points to https://lkml.org/lkml/2015/6/24/44 that talks about
still running watchdogs.

It is possible that the problem could be fixed another way. It is
even possible that it has already been fixed by the notifiers
that disable the watchdogs.

Anyway, any change of the smp_send_stop() behavior should be done
in a separate patch. It will help with bisection of possible
regression. Also it would require a good explanation in
the commit message. I would personally do it in a separate
patch(set).


> > 	/* Notify hypervisor about the system panic. */
> > 	atomic_notifier_call_chain(&panic_hypervisor_list, 0, NULL);
> > 
> > 	/*
> > 	 * No need to risk extra info when there is no kmsg dumper
> > 	 * registered.
> > 	 */
> > 	if (!has_kmsg_dumper())
> > 		__crash_kexec(NULL);
> > 
> > 	/* Add extra info from different subsystems. */
> > 	atomic_notifier_call_chain(&panic_info_list, 0, NULL);
> > 
> > 	kmsg_dump(KMSG_DUMP_PANIC);
> > 	__crash_kexec(NULL);
> > 
> > 	/* Flush console */
> > 	unblank_screen();
> > 	console_unblank();
> > 	debug_locks_off();
> > 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > 
> > 	if (panic_timeout > 0) {
> > 		delay()
> > 	}
> > 
> > 	/*
> > 	 * Prepare system for eventual reboot and allow custom
> > 	 * reboot handling.
> > 	 */
> > 	atomic_notifier_call_chain(&panic_reboot_list, 0, NULL);
> 
> You had the order of panic_reboot_list VS. consoles flushing inverted.
> It might make sense, although I didn't do that in V1...

IMHO, it makes sense:

  1. panic_reboot_list contains notifiers that do the reboot
     immediately, for example, xen_panic_event, alpha_panic_event.
     The consoles have to be flushed earlier.

  2. console_flush_on_panic() ignores the result of console_trylock()
     and always calls console_unlock(). As a result the lock should
     be unlocked at the end. And any further printk() should be able
     to printk the messages to the console immediately. It means
     that any messages printed by the reboot notifiers should appear
     on the console as well.

> Are you OK in having a helper for console flushing, as I did in V1? It
> makes code of panic() a bit less polluted / more focused I feel.

Yes, it makes sense. Well, it would better to do it in a separate
patch. The patch patch reworking the logic should be as small
as possible. It will simplify the review.


> > 	if (panic_timeout != 0) {
> > 		reboot();
> > 	}
> > 
> > 	/*
> > 	 * Prepare system for the infinite waiting, for example,
> > 	 * setup blinking.
> > 	 */
> > 	atomic_notifier_call_chain(&panic_loop_list, 0, NULL);
> > 
> > 	infinite_loop();
> > }
> > 
> > 
> > __crash_kexec() is there 3 times but otherwise the code looks
> > quite straight forward.
> > 
> > Note 1: I renamed the two last notifier list. The name 'post-reboot'
> > 	did sound strange from the logical POV ;-)
> > 
> > Note 2: We have to avoid the possibility to call "reboot" list
> > 	before kmsg_dump(). All callbacks providing info
> > 	have to be in the info list. It a callback combines
> > 	info and reboot functionality then it should be split.
> > 
> > 	There must be another way to calm down problematic
> > 	info callbacks. And it has to be solved when such
> > 	a problem is reported. Is there any known issue, please?
> > 
> > It is possible that I have missed something important.
> > But I would really like to make the logic as simple as possible.
> 
> OK, I agree with you! It's indeed simpler and if others agree, I can
> happily change the logic to what you proposed. Although...currently the
> "crash_kexec_post_notifiers" allows to call _all_ panic_reboot_list
> callbacks _before kdump_.
>
> We need to mention this change in the commit messages, but I really
> would like to hear the opinions of heavy users of notifiers (as
> Michael/Hyper-V) and the kdump interested parties (like Baoquan / Dave
> Young / Hayatama). If we all agree on such approach, will change that
> for V2 =)

Sure, we need to make sure that we call everything that is needed.
And it should be documented.

I believe that this is the right way because:

  + It was actually the motivation for this patchset. We split
    the notifiers into separate lists because we want to call
    only the really needed ones before kmsg_dump and crash_dump.

  + If anything is needed for crash_dump that it should be called
    even when crash_dump is called first. It should be either
    hardcoded into crash_dump() or we would need another notifier
    list that will be always called before crash_dump.


Thanks a lot for working on this.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-04-27 22:49 ` [PATCH 19/30] panic: Add the panic hypervisor notifier list Guilherme G. Piccoli
  2022-04-29 17:30   ` Michael Kelley (LINUX)
@ 2022-05-16 14:01   ` Petr Mladek
  2022-05-16 15:06     ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-16 14:01 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alexander Gordeev, Andrea Parri,
	Ard Biesheuvel, Benjamin Herrenschmidt, Brian Norris,
	Christian Borntraeger, Christophe JAILLET, David Gow,
	David S. Miller, Dexuan Cui, Doug Berger, Evan Green,
	Florian Fainelli, Haiyang Zhang, Hari Bathini, Heiko Carstens,
	Julius Werner, Justin Chen, K. Y. Srinivasan, Lee Jones,
	Markus Mayer, Michael Ellerman, Mihai Carabas, Nicholas Piggin,
	Paul Mackerras, Pavel Machek, Scott Branden, Sebastian Reichel,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On Wed 2022-04-27 19:49:13, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to register
> callbacks to run very early in the panic path. This aims hypervisor/FW
> notification mechanisms as well as simple LED functions, and any other
> simple and safe mechanism that should run early in the panic path; more
> dangerous callbacks should execute later.
> 
> For now, the patch is almost a no-op (although it changes a bit the
> ordering in which some panic notifiers are executed). In a subsequent
> patch, the panic path will be refactored, then the panic hypervisor
> notifiers will effectively run very early in the panic path.
> 
> We also defer documenting it all properly in the subsequent refactor
> patch. While at it, we removed some useless header inclusions and
> fixed some notifiers return too (by using the standard NOTIFY_DONE).

> --- a/arch/mips/sgi-ip22/ip22-reset.c
> +++ b/arch/mips/sgi-ip22/ip22-reset.c
> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
>  	}
>  
>  	timer_setup(&blink_timer, blink_timeout, 0);
> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> +	atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);

This notifier enables blinking. It is not much safe. It calls
mod_timer() that takes a lock internally.

This kind of functionality should go into the last list called
before panic() enters the infinite loop. IMHO, all the blinking
stuff should go there.

>  
>  	return 0;
>  }
> diff --git a/arch/mips/sgi-ip32/ip32-reset.c b/arch/mips/sgi-ip32/ip32-reset.c
> index 18d1c115cd53..9ee1302c9d13 100644
> --- a/arch/mips/sgi-ip32/ip32-reset.c
> +++ b/arch/mips/sgi-ip32/ip32-reset.c
> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
>  	pm_power_off = ip32_machine_halt;
>  
>  	timer_setup(&blink_timer, blink_timeout, 0);
> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> +	atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);

Same here. Should be done only before the "loop".

>  
>  	return 0;
>  }
> --- a/drivers/firmware/google/gsmi.c
> +++ b/drivers/firmware/google/gsmi.c
> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
>  
>  	register_reboot_notifier(&gsmi_reboot_notifier);
>  	register_die_notifier(&gsmi_die_notifier);
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_hypervisor_list,
>  				       &gsmi_panic_notifier);

I am not sure about this one. It looks like some logging or
pre_reboot stuff.


>  
>  	printk(KERN_INFO "gsmi version " DRIVER_VERSION " loaded\n");
> --- a/drivers/leds/trigger/ledtrig-activity.c
> +++ b/drivers/leds/trigger/ledtrig-activity.c
> @@ -247,7 +247,7 @@ static int __init activity_init(void)
>  	int rc = led_trigger_register(&activity_led_trigger);
>  
>  	if (!rc) {
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_hypervisor_list,
>  					       &activity_panic_nb);

The notifier is trivial. It just sets a variable.

But still, it is about blinking and should be done
in the last "loop" list.


>  		register_reboot_notifier(&activity_reboot_nb);
>  	}
> --- a/drivers/leds/trigger/ledtrig-heartbeat.c
> +++ b/drivers/leds/trigger/ledtrig-heartbeat.c
> @@ -190,7 +190,7 @@ static int __init heartbeat_trig_init(void)
>  	int rc = led_trigger_register(&heartbeat_led_trigger);
>  
>  	if (!rc) {
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_hypervisor_list,
>  					       &heartbeat_panic_nb);

Same here. Blinking => loop list.

>  		register_reboot_notifier(&heartbeat_reboot_nb);
>  	}
> diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c b/drivers/misc/bcm-vk/bcm_vk_dev.c
> index a16b99bdaa13..d9d5199cdb2b 100644
> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  
>  	/* register for panic notifier */
>  	vk->panic_nb.notifier_call = bcm_vk_on_panic;
> -	err = atomic_notifier_chain_register(&panic_notifier_list,
> +	err = atomic_notifier_chain_register(&panic_hypervisor_list,
>  					     &vk->panic_nb);

It seems to reset some hardware or so. IMHO, it should go into the
pre-reboot list.


>  	if (err) {
>  		dev_err(dev, "Fail to register panic notifier\n");
> --- a/drivers/power/reset/ltc2952-poweroff.c
> +++ b/drivers/power/reset/ltc2952-poweroff.c
> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct platform_device *pdev)
>  	pm_power_off = ltc2952_poweroff_kill;
>  
>  	data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_hypervisor_list,
>  				       &data->panic_notifier);

I looks like this somehow triggers the reboot. IMHO, it should go
into the pre_reboot list.

>  	dev_info(&pdev->dev, "probe successful\n");
>  
> --- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> +++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> @@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device *pdev)
>  		goto out;
>  	}
>  
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_hypervisor_list,
>  				       &brcmstb_pm_panic_nb);

I am not sure about this one. It instruct some HW to preserve DRAM.
IMHO, it better fits into pre_reboot category but I do not have
strong opinion.

>  
>  	pm_power_off = brcmstb_pm_poweroff;

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/30] panic: Add the panic informational notifier list
  2022-04-27 22:49 ` [PATCH 20/30] panic: Add the panic informational " Guilherme G. Piccoli
  2022-04-27 23:49   ` Paul E. McKenney
  2022-04-28  8:14   ` Suzuki K Poulose
@ 2022-05-16 14:11   ` Petr Mladek
  2022-05-16 14:28     ` Guilherme G. Piccoli
  2 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-16 14:11 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Benjamin Herrenschmidt, Catalin Marinas,
	Florian Fainelli, Frederic Weisbecker, H. Peter Anvin,
	Hari Bathini, Joel Fernandes, Jonathan Hunter, Josh Triplett,
	Lai Jiangshan, Leo Yan, Mathieu Desnoyers, Mathieu Poirier,
	Michael Ellerman, Mike Leach, Mikko Perttunen, Neeraj Upadhyay,
	Nicholas Piggin, Paul Mackerras, Suzuki K Poulose,
	Thierry Reding, Thomas Bogendoerfer

On Wed 2022-04-27 19:49:14, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to
> register callbacks to run earlier in the panic path than they
> currently do. This aims at informational mechanisms, like dumping
> kernel offsets and showing device error data (in case it's simple
> registers reading, for example) as well as mechanisms to disable
> log flooding (like hung_task detector / RCU warnings) and the
> tracing dump_on_oops (when enabled).
> 
> Any (non-invasive) information that should be provided before
> kmsg_dump() as well as log flooding preventing code should fit
> here, as long it offers relatively low risk for kdump.
> 
> For now, the patch is almost a no-op, although it changes a bit
> the ordering in which some panic notifiers are executed - specially
> affected by this are the notifiers responsible for disabling the
> hung_task detector / RCU warnings, which now run first. In a
> subsequent patch, the panic path will be refactored, then the
> panic informational notifiers will effectively run earlier,
> before ksmg_dump() (and usually before kdump as well).
> 
> We also defer documenting it all properly in the subsequent
> refactor patch. Finally, while at it, we removed some useless
> header inclusions too.
> 
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

All notifiers moved in this patch seems to fit well the "info"
notifier list. The patch looks good from this POV.

I still have to review the rest of the patches to see if it
is complete.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/30] panic: Add the panic informational notifier list
  2022-05-16 14:11   ` Petr Mladek
@ 2022-05-16 14:28     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 14:28 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Benjamin Herrenschmidt,
	Catalin Marinas, Florian Fainelli, Frederic Weisbecker,
	H. Peter Anvin, Hari Bathini, Joel Fernandes, Jonathan Hunter,
	Josh Triplett, Lai Jiangshan, Leo Yan, Mathieu Desnoyers,
	Mathieu Poirier, Michael Ellerman, Mike Leach, Mikko Perttunen,
	Neeraj Upadhyay, Nicholas Piggin, Paul Mackerras,
	Suzuki K Poulose, Thierry Reding, Thomas Bogendoerfer

On 16/05/2022 11:11, Petr Mladek wrote:
> [...]
> 
> All notifiers moved in this patch seems to fit well the "info"
> notifier list. The patch looks good from this POV.
> 
> I still have to review the rest of the patches to see if it
> is complete.
> 
> Best Regards,
> Petr

Thanks a bunch for the review =)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-04-27 22:49 ` [PATCH 21/30] panic: Introduce the panic pre-reboot " Guilherme G. Piccoli
                     ` (2 preceding siblings ...)
  2022-04-29 16:04   ` Max Filippov
@ 2022-05-16 14:33   ` Petr Mladek
  2022-05-16 16:05     ` Guilherme G. Piccoli
  3 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-16 14:33 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle, Tony Luck,
	Vasily Gorbik, Wei Liu

On Wed 2022-04-27 19:49:15, Guilherme G. Piccoli wrote:
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
> 
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.
> 
> --- a/drivers/edac/altera_edac.c
> +++ b/drivers/edac/altera_edac.c
> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
>  		int dberror, err_addr;
>  
>  		edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
> -		atomic_notifier_chain_register(&panic_notifier_list,
> +		atomic_notifier_chain_register(&panic_pre_reboot_list,

My understanding is that this notifier first prints info about ECC
errors and then triggers reboot. It might make sense to split it
into two notifiers.


>  					       &edac->panic_notifier);
>  
>  		/* Printout a message if uncorrectable error previously. */
> --- a/drivers/leds/trigger/ledtrig-panic.c
> +++ b/drivers/leds/trigger/ledtrig-panic.c
> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>  
>  static int __init ledtrig_panic_init(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list,
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>  				       &led_trigger_panic_nb);

Blinking => should go to the last "post_reboot/loop" list.


>  
>  	led_trigger_register_simple("panic", &trigger);
> --- a/drivers/misc/ibmasm/heartbeat.c
> +++ b/drivers/misc/ibmasm/heartbeat.c
> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
>  static int panic_happened(struct notifier_block *n, unsigned long val, void *v)
>  {
>  	suspend_heartbeats = 1;
> -	return 0;
> +	return NOTIFY_DONE;
>  }
>  
> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
> +static struct notifier_block panic_notifier = {
> +	.notifier_call = panic_happened,
> +};
>  
>  void ibmasm_register_panic_notifier(void)
>  {
> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
> +					&panic_notifier);

Same here. Blinking => should go to the last "post_reboot/loop" list.


>  }
>  
>  void ibmasm_unregister_panic_notifier(void)
>  {
> -	atomic_notifier_chain_unregister(&panic_notifier_list,
> -			&panic_notifier);
> +	atomic_notifier_chain_unregister(&panic_pre_reboot_list,
> +					&panic_notifier);
>  }


The rest of the moved notifiers seem to fit well this "pre_reboot"
list.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list
  2022-04-27 22:49 ` [PATCH 22/30] panic: Introduce the panic post-reboot " Guilherme G. Piccoli
  2022-05-09 14:16   ` Guilherme G. Piccoli
@ 2022-05-16 14:45   ` Petr Mladek
  2022-05-16 16:08     ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-16 14:45 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alexander Gordeev, Christian Borntraeger,
	David S. Miller, Heiko Carstens, Sven Schnelle, Vasily Gorbik

On Wed 2022-04-27 19:49:16, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
> 
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
> 
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.

The patch looks good to me. I would just suggest two changes.

1. I would rename the list to "panic_loop_list" instead of
   "panic_post_reboot_list".

   It will be more clear that it includes things that are
   needed before panic() enters the infinite loop.


2. I would move all the notifiers that enable blinking here.

   The blinking should be done only during the infinite
   loop when there is nothing else to do. If we enable
   earlier then it might disturb/break more important
   functionality (dumping information, reboot).

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers
  2022-05-11 20:03     ` Guilherme G. Piccoli
@ 2022-05-16 14:50       ` Petr Mladek
  2022-05-16 16:09         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-16 14:50 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Steven Rostedt, akpm, bhe, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, senozhatsky, stern, tglx, vgoyal,
	vkuznets, will

On Wed 2022-05-11 17:03:51, Guilherme G. Piccoli wrote:
> On 10/05/2022 14:40, Steven Rostedt wrote:
> > On Wed, 27 Apr 2022 19:49:17 -0300
> > "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
> > 
> >> Currently we don't have a way to check if there are dumpers set,
> >> except counting the list members maybe. This patch introduces a very
> >> simple helper to provide this information, by just keeping track of
> >> registered/unregistered kmsg dumpers. It's going to be used on the
> >> panic path in the subsequent patch.
> > 
> > FYI, it is considered "bad form" to reference in the change log "this
> > patch". We know this is a patch. The change log should just talk about what
> > is being done. So can you reword your change logs (you do this is almost
> > every patch). Here's what I would reword the above to be:
> > 
> >  Currently we don't have a way to check if there are dumpers set, except
> >  perhaps by counting the list members. Introduce a very simple helper to
> >  provide this information, by just keeping track of registered/unregistered
> >  kmsg dumpers. This will simplify the refactoring of the panic path.
> 
> Thanks for the hint, you're right - it's almost in all of my patches.
> I'll reword all of them (except the ones already merged) to remove this
> "bad form".

Shame on me that I do not care that much about the style of the commit
message :-)

Anyway, the code looks good to me. With the better commit message:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier
  2022-04-27 22:49 ` [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier Guilherme G. Piccoli
@ 2022-05-16 14:56   ` Petr Mladek
  2022-05-16 16:11     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-16 14:56 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On Wed 2022-04-27 19:49:19, Guilherme G. Piccoli wrote:
> Currently the parameter "panic_print" relies in a function called
> directly on panic path; one of the flags the users can set for
> panic_print triggers a console replay mechanism, to show the
> entire kernel log buffer (from the beginning) in a panic event.
> 
> Two problems with that: the dual nature of the panic_print
> isn't really appropriate, the function was originally meant
> to allow users dumping system information on panic events,
> and was "overridden" to also force a console flush of the full
> kernel log buffer. It also turns the code a bit more complex
> and duplicate than it needs to be.
> 
> This patch proposes 2 changes: first, we decouple panic_print
> from the console flushing mechanism, in the form of a new kernel
> core parameter (panic_console_replay); we kept the functionality
> on panic_print to avoid userspace breakage, although we comment
> in both code and documentation that this panic_print usage is
> deprecated.
> 
> We converted panic_print function to a panic notifier too, adding
> it on the panic informational notifier list, executed as the final
> callback. This allows a more clear code and makes sense, as
> panic_print_sys_info() is really a panic-time only function.
> We also moved its code to kernel/printk.c, it seems to make more
> sense given it's related to printing stuff.

I really like both changes. Just please split it them into two
patchset. I mean one patch for the new "panic_console_replay"
parameter and 2nd that moves "printk_info" into the notifier.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-16 14:01   ` Petr Mladek
@ 2022-05-16 15:06     ` Guilherme G. Piccoli
  2022-05-16 16:02       ` Evan Green
  2022-05-17 13:57       ` Petr Mladek
  0 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 15:06 UTC (permalink / raw)
  To: Petr Mladek, David Gow, Evan Green, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, linux-pm,
	Florian Fainelli
  Cc: akpm, bhe, kexec, linux-kernel, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alexander Gordeev, Andrea Parri,
	Ard Biesheuvel, Benjamin Herrenschmidt, Brian Norris,
	Christian Borntraeger, Christophe JAILLET, David S. Miller,
	Dexuan Cui, Doug Berger, Haiyang Zhang, Hari Bathini,
	Heiko Carstens, Justin Chen, K. Y. Srinivasan, Lee Jones,
	Markus Mayer, Michael Ellerman, Mihai Carabas, Nicholas Piggin,
	Paul Mackerras, Pavel Machek, Shile Zhang, Stephen Hemminger,
	Sven Schnelle, Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik,
	Wang ShaoBo, Wei Liu, zhenwei pi

Thanks for the review!

I agree with the blinking stuff, I can rework and add all LED/blinking
stuff into the loop list, it does make sense. I'll comment a bit in the
others below...

On 16/05/2022 11:01, Petr Mladek wrote:
> [...]
>> --- a/arch/mips/sgi-ip22/ip22-reset.c
>> +++ b/arch/mips/sgi-ip22/ip22-reset.c
>> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
>>  	}
>>  
>>  	timer_setup(&blink_timer, blink_timeout, 0);
>> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
>> +	atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
> 
> This notifier enables blinking. It is not much safe. It calls
> mod_timer() that takes a lock internally.
> 
> This kind of functionality should go into the last list called
> before panic() enters the infinite loop. IMHO, all the blinking
> stuff should go there.
> [...] 
>> --- a/arch/mips/sgi-ip32/ip32-reset.c
>> +++ b/arch/mips/sgi-ip32/ip32-reset.c
>> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
>>  	pm_power_off = ip32_machine_halt;
>>  
>>  	timer_setup(&blink_timer, blink_timeout, 0);
>> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
>> +	atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
> 
> Same here. Should be done only before the "loop".
> [...] 

Ack.


>> --- a/drivers/firmware/google/gsmi.c
>> +++ b/drivers/firmware/google/gsmi.c
>> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
>>  
>>  	register_reboot_notifier(&gsmi_reboot_notifier);
>>  	register_die_notifier(&gsmi_die_notifier);
>> -	atomic_notifier_chain_register(&panic_notifier_list,
>> +	atomic_notifier_chain_register(&panic_hypervisor_list,
>>  				       &gsmi_panic_notifier);
> 
> I am not sure about this one. It looks like some logging or
> pre_reboot stuff.
> 

Disagree here. I'm looping Google maintainers, so they can comment.
(CCed Evan, David, Julius)

This notifier is clearly a hypervisor notification mechanism. I've fixed
a locking stuff there (in previous patch), I feel it's low-risk but even
if it's mid-risk, the class of such callback remains a perfect fit with
the hypervisor list IMHO.


> [...] 
>> --- a/drivers/leds/trigger/ledtrig-activity.c
>> +++ b/drivers/leds/trigger/ledtrig-activity.c
>> @@ -247,7 +247,7 @@ static int __init activity_init(void)
>>  	int rc = led_trigger_register(&activity_led_trigger);
>>  
>>  	if (!rc) {
>> -		atomic_notifier_chain_register(&panic_notifier_list,
>> +		atomic_notifier_chain_register(&panic_hypervisor_list,
>>  					       &activity_panic_nb);
> 
> The notifier is trivial. It just sets a variable.
> 
> But still, it is about blinking and should be done
> in the last "loop" list.
> 
> 
>>  		register_reboot_notifier(&activity_reboot_nb);
>>  	}
>> --- a/drivers/leds/trigger/ledtrig-heartbeat.c
>> +++ b/drivers/leds/trigger/ledtrig-heartbeat.c
>> @@ -190,7 +190,7 @@ static int __init heartbeat_trig_init(void)
>>  	int rc = led_trigger_register(&heartbeat_led_trigger);
>>  
>>  	if (!rc) {
>> -		atomic_notifier_chain_register(&panic_notifier_list,
>> +		atomic_notifier_chain_register(&panic_hypervisor_list,
>>  					       &heartbeat_panic_nb);
> 
> Same here. Blinking => loop list.

Ack.


>> [...]
>> diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c b/drivers/misc/bcm-vk/bcm_vk_dev.c
>> index a16b99bdaa13..d9d5199cdb2b 100644
>> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
>> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
>> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>>  
>>  	/* register for panic notifier */
>>  	vk->panic_nb.notifier_call = bcm_vk_on_panic;
>> -	err = atomic_notifier_chain_register(&panic_notifier_list,
>> +	err = atomic_notifier_chain_register(&panic_hypervisor_list,
>>  					     &vk->panic_nb);
> 
> It seems to reset some hardware or so. IMHO, it should go into the
> pre-reboot list.

Mixed feelings here, I'm looping Broadcom maintainers to comment.
(CC Scott and Broadcom list)

I'm afraid it breaks kdump if this device is not reset beforehand - it's
a doorbell write, so not high risk I think...

But in case the not-reset device can be probed normally in kdump kernel,
then I'm fine in moving this to the reboot list! I don't have the HW to
test myself.


> [...]
>> --- a/drivers/power/reset/ltc2952-poweroff.c
>> +++ b/drivers/power/reset/ltc2952-poweroff.c
>> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct platform_device *pdev)
>>  	pm_power_off = ltc2952_poweroff_kill;
>>  
>>  	data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
>> -	atomic_notifier_chain_register(&panic_notifier_list,
>> +	atomic_notifier_chain_register(&panic_hypervisor_list,
>>  				       &data->panic_notifier);
> 
> I looks like this somehow triggers the reboot. IMHO, it should go
> into the pre_reboot list.

Mixed feeling again here - CCing the maintainers for comments (Sebastian
/ PM folks).

This is setting a variable only, and once it's set (data->kernel_panic
is the bool's name), it just bails out the IRQ handler and a timer
setting - this timer seems kinda tricky, so bailing out ASAP makes sense
IMHO.

But my mixed feeling comes from the fact this notifier really is not a
fit to any list - it's just a "watchdog"/device quiesce in some form.
Since it's very low-risk (IIUC), I've put it here.


> [...]
>> --- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
>> +++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
>> @@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device *pdev)
>>  		goto out;
>>  	}
>>  
>> -	atomic_notifier_chain_register(&panic_notifier_list,
>> +	atomic_notifier_chain_register(&panic_hypervisor_list,
>>  				       &brcmstb_pm_panic_nb);
> 
> I am not sure about this one. It instruct some HW to preserve DRAM.
> IMHO, it better fits into pre_reboot category but I do not have
> strong opinion.

Disagree here, I'm CCing Florian for information.

This notifier preserves RAM so it's *very interesting* if we have
kmsg_dump() for example, but maybe might be also relevant in case kdump
kernel is configured to store something in a persistent RAM (then,
without this notifier, after kdump reboots the system data would be lost).

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-16 15:06     ` Guilherme G. Piccoli
@ 2022-05-16 16:02       ` Evan Green
  2022-05-17 13:28         ` Petr Mladek
  2022-05-17 13:57       ` Petr Mladek
  1 sibling, 1 reply; 183+ messages in thread
From: Evan Green @ 2022-05-16 16:02 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Petr Mladek, David Gow, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, Linux PM,
	Florian Fainelli, Andrew Morton, bhe, kexec, LKML, linuxppc-dev,
	linux-alpha, linux-arm Mailing List, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	Andy Shevchenko, Arnd Bergmann, Borislav Petkov, Jonathan Corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, Greg Kroah-Hartman,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, Kees Cook,
	luto, mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi, Stephen Boyd

On Mon, May 16, 2022 at 8:07 AM Guilherme G. Piccoli
<gpiccoli@igalia.com> wrote:
>
> Thanks for the review!
>
> I agree with the blinking stuff, I can rework and add all LED/blinking
> stuff into the loop list, it does make sense. I'll comment a bit in the
> others below...
>
> On 16/05/2022 11:01, Petr Mladek wrote:
> > [...]
> >> --- a/arch/mips/sgi-ip22/ip22-reset.c
> >> +++ b/arch/mips/sgi-ip22/ip22-reset.c
> >> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
> >>      }
> >>
> >>      timer_setup(&blink_timer, blink_timeout, 0);
> >> -    atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> >> +    atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
> >
> > This notifier enables blinking. It is not much safe. It calls
> > mod_timer() that takes a lock internally.
> >
> > This kind of functionality should go into the last list called
> > before panic() enters the infinite loop. IMHO, all the blinking
> > stuff should go there.
> > [...]
> >> --- a/arch/mips/sgi-ip32/ip32-reset.c
> >> +++ b/arch/mips/sgi-ip32/ip32-reset.c
> >> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
> >>      pm_power_off = ip32_machine_halt;
> >>
> >>      timer_setup(&blink_timer, blink_timeout, 0);
> >> -    atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> >> +    atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
> >
> > Same here. Should be done only before the "loop".
> > [...]
>
> Ack.
>
>
> >> --- a/drivers/firmware/google/gsmi.c
> >> +++ b/drivers/firmware/google/gsmi.c
> >> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
> >>
> >>      register_reboot_notifier(&gsmi_reboot_notifier);
> >>      register_die_notifier(&gsmi_die_notifier);
> >> -    atomic_notifier_chain_register(&panic_notifier_list,
> >> +    atomic_notifier_chain_register(&panic_hypervisor_list,
> >>                                     &gsmi_panic_notifier);
> >
> > I am not sure about this one. It looks like some logging or
> > pre_reboot stuff.
> >
>
> Disagree here. I'm looping Google maintainers, so they can comment.
> (CCed Evan, David, Julius)
>
> This notifier is clearly a hypervisor notification mechanism. I've fixed
> a locking stuff there (in previous patch), I feel it's low-risk but even
> if it's mid-risk, the class of such callback remains a perfect fit with
> the hypervisor list IMHO.

This logs a panic to our "eventlog", a tiny logging area in SPI flash
for critical and power-related events. In some cases this ends up
being the only clue we get in a Chromebook feedback report that a
panic occurred, so from my perspective moving it to the front of the
line seems like a good idea.

-Evan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-05-16 14:33   ` Petr Mladek
@ 2022-05-16 16:05     ` Guilherme G. Piccoli
  2022-05-16 16:18       ` Luck, Tony
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 16:05 UTC (permalink / raw)
  To: Petr Mladek, Tony Luck, Dinh Nguyen
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu

Thanks again for the review! Comments inline below:


On 16/05/2022 11:33, Petr Mladek wrote:
> [...]
>> --- a/drivers/edac/altera_edac.c
>> +++ b/drivers/edac/altera_edac.c
>> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
>>  		int dberror, err_addr;
>>  
>>  		edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
>> -		atomic_notifier_chain_register(&panic_notifier_list,
>> +		atomic_notifier_chain_register(&panic_pre_reboot_list,
> 
> My understanding is that this notifier first prints info about ECC
> errors and then triggers reboot. It might make sense to split it
> into two notifiers.

I disagree here - looping the maintainers for comments (CCing Dinh /
Tony). BTW, sorry for not having you on CC already Dinh, it was my mistake.

So, my reasoning here is: this notifier should fit the info list,
definitely! But...it's very high risk for kdump. It deep dives into the
regmap API (there are locks in such code) plus there is an (MM)IO write
to the device and an ARM firmware call. So, despite the nature of this
notifier _fits the informational list_, the _code is risky_ so we should
avoid running it before a kdump.

Now, we indeed have a chicken/egg problem: want to avoid it before
kdump, BUT in case kdump is not set, kmsg_dump() (and console flushing,
after your suggestion Petr) will run before it and not save collected
information from EDAC PoV.

My idea: I could call a second kmsg_dump() or at least a panic console
flush for within such notifier. Let me know what you think Petr (also
Dinh / Tony and all interested parties).


> [...] 
>> --- a/drivers/leds/trigger/ledtrig-panic.c
>> +++ b/drivers/leds/trigger/ledtrig-panic.c
>> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>>  
>>  static int __init ledtrig_panic_init(void)
>>  {
>> -	atomic_notifier_chain_register(&panic_notifier_list,
>> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>>  				       &led_trigger_panic_nb);
> 
> Blinking => should go to the last "post_reboot/loop" list.
> [...] 
>> --- a/drivers/misc/ibmasm/heartbeat.c
>> +++ b/drivers/misc/ibmasm/heartbeat.c
>> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
>>  static int panic_happened(struct notifier_block *n, unsigned long val, void *v)
>>  {
>>  	suspend_heartbeats = 1;
>> -	return 0;
>> +	return NOTIFY_DONE;
>>  }
>>  
>> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
>> +static struct notifier_block panic_notifier = {
>> +	.notifier_call = panic_happened,
>> +};
>>  
>>  void ibmasm_register_panic_notifier(void)
>>  {
>> -	atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
>> +	atomic_notifier_chain_register(&panic_pre_reboot_list,
>> +					&panic_notifier);
> 
> Same here. Blinking => should go to the last "post_reboot/loop" list.

Ack on both.

IBMasm is not blinking IIUC, but still fits properly the loop list. This
notifier would make a heartbeat mechanism stop, and once it's stopped,
service processor is allowed to reboot - that's my understanding.


Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list
  2022-05-16 14:45   ` Petr Mladek
@ 2022-05-16 16:08     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 16:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alexander Gordeev,
	Christian Borntraeger, David S. Miller, Heiko Carstens,
	Sven Schnelle, Vasily Gorbik

On 16/05/2022 11:45, Petr Mladek wrote:
> [...]
> 
> The patch looks good to me. I would just suggest two changes.
> 
> 1. I would rename the list to "panic_loop_list" instead of
>    "panic_post_reboot_list".
> 
>    It will be more clear that it includes things that are
>    needed before panic() enters the infinite loop.
> 
> 
> 2. I would move all the notifiers that enable blinking here.
> 
>    The blinking should be done only during the infinite
>    loop when there is nothing else to do. If we enable
>    earlier then it might disturb/break more important
>    functionality (dumping information, reboot).
> 

Perfect, I agree with you. I'll change both points in V2 =)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers
  2022-05-16 14:50       ` Petr Mladek
@ 2022-05-16 16:09         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 16:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, akpm, bhe, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, senozhatsky, stern, tglx, vgoyal,
	vkuznets, will

On 16/05/2022 11:50, Petr Mladek wrote:
> [...]
> Shame on me that I do not care that much about the style of the commit
> message :-)
> 
> Anyway, the code looks good to me. With the better commit message:
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> 

Heheh, cool - I'll add your tag and improve the message in V2.
Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier
  2022-05-16 14:56   ` Petr Mladek
@ 2022-05-16 16:11     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 16:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On 16/05/2022 11:56, Petr Mladek wrote:
> [...]
> I really like both changes. Just please split it them into two
> patchset. I mean one patch for the new "panic_console_replay"
> parameter and 2nd that moves "printk_info" into the notifier.
> 

OK sure, will do that in V2.
Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  2022-05-10 17:29     ` Steven Rostedt
@ 2022-05-16 16:14       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 16:14 UTC (permalink / raw)
  To: Steven Rostedt, Xiaoming Ni
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Arjan van de Ven, Cong Wang,
	Sebastian Andrzej Siewior, Valentin Schneider

On 10/05/2022 14:29, Steven Rostedt wrote:
> [...]
> Also, don't sprinkle #ifdef in C code. Instead:
> 
> 	if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
> 		pr_info("notifers: regsiter %ps()\n",
> 			n->notifer_call);
> 
> 
> Or define a print macro at the start of the C file that is a nop if it is
> not defined, and use the macro.

Thanks, I'll go with the IS_ENABLED() idea in V2 - appreciate the hint.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-05-16 16:05     ` Guilherme G. Piccoli
@ 2022-05-16 16:18       ` Luck, Tony
  2022-05-16 16:33         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Luck, Tony @ 2022-05-16 16:18 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Petr Mladek, Dinh Nguyen
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, Tang, Feng, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu

> So, my reasoning here is: this notifier should fit the info list,
> definitely! But...it's very high risk for kdump. It deep dives into the
> regmap API (there are locks in such code) plus there is an (MM)IO write
> to the device and an ARM firmware call. So, despite the nature of this
> notifier _fits the informational list_, the _code is risky_ so we should
> avoid running it before a kdump.
>
> Now, we indeed have a chicken/egg problem: want to avoid it before
> kdump, BUT in case kdump is not set, kmsg_dump() (and console flushing,
> after your suggestion Petr) will run before it and not save collected
> information from EDAC PoV.

Would it be possible to have some global "kdump is configured + enabled" flag?

Then notifiers could make an informed choice on whether to deep dive to
get all the possible details (when there is no kdump) or just skim the high
level stuff (to maximize chance of getting a successful kdump).

-Tony

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-16 10:21       ` Petr Mladek
@ 2022-05-16 16:32         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 16:32 UTC (permalink / raw)
  To: Petr Mladek
  Cc: michael Kelley (LINUX),
	Baoquan He, Dave Young, d.hatayama, akpm, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, dave.hansen, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets,
	will, Marc Zyngier, Mark Rutland, linux-arm-kernel

On 16/05/2022 07:21, Petr Mladek wrote:
> [...]
> Ah, it should have been:
> 
>      + notifiers vs. kmsg_dump
>      + notifiers vs. crash_dump
>      + crash_dump vs. kmsg_dump
> 
> I am sorry for the confusion. Even "crash_dump" is slightly
> misleading because there is no function with this name.
> But it seems to be easier to understand than __crash_kexec().

Cool, thanks! Now it's totally clear for me =)
I feel crash dump is the proper term, but I personally prefer kdump to
avoid mess-up with user space "core dump" concept heheh
Also, KDUMP is an entry on MAINTAINERS file.


> [...]
>> Heheh OK, I appreciate your opinion, but I guess we'll need to agree in
>> disagree here - I'm much more fond to this kind of code than a bunch of
>> if/else blocks that almost give headaches. Encoding such "level" logic
>> in the if/else scheme is very convoluted, generates a very big code. And
>> the functions aren't so black magic - they map a level in bits, and the
>> functions _once() are called...once! Although we switch the position in
>> the code, so there are 2 calls, one of them is called and the other not.
> 
> I see. Well, I would consider this as a warning that the approach is
> too complex. If the code, using if/then/else, would cause headaches
> then also understanding of the behavior would cause headaches for
> both users and programmers.
> 
> 
>> But that's totally fine to change - especially if we're moving away from
>> the "level" logic. I see below you propose a much simpler approach - if
>> we follow that, definitely we won't need the "black magic" approach heheh
> 
> I do not say that my proposal is fully correct. But we really need
> this kind of simpler approach.

It's cool, I agree that your idea is much simpler and makes sense - mine
seems to be an over-engineering effort. Let's see the opinions of the
interested parties, I'm curious to see if everybody agrees here, that'd
would be ideal (and kind of "wishful thinking" I guess heheh - panic
path is polemic).


> [...] 
>> Here we have a very important point. Why do we need 2 variants of SMP
>> CPU stopping functions? I disagree with that - my understanding of this
>> after some study in architectures is that the crash_() variant is
>> "stronger", should work in all cases and if not, we should fix that -
>> that'd be a bug.
>>
>> Such variant either maps to smp_send_stop() (in various architectures,
>> including XEN/x86) or overrides the basic function with more proper
>> handling for panic() case...I don't see why we still need such
>> distinction, if you / others have some insight about that, I'd like to
>> hear =)
> 
> The two variants were introduced by the commit 0ee59413c967c35a6dd
> ("x86/panic: replace smp_send_stop() with kdump friendly version in
> panic path")
> 
> It points to https://lkml.org/lkml/2015/6/24/44 that talks about
> still running watchdogs.
> 
> It is possible that the problem could be fixed another way. It is
> even possible that it has already been fixed by the notifiers
> that disable the watchdogs.
> 
> Anyway, any change of the smp_send_stop() behavior should be done
> in a separate patch. It will help with bisection of possible
> regression. Also it would require a good explanation in
> the commit message. I would personally do it in a separate
> patch(set).

Thanks for the archeology and interesting findings. I agree that is
better to split in smaller patches. I'm planning a split in 3 patches
for V2: clean-up (comment, console flushing idea, useless header), the
refactor itself and finally, this SMP change.


> [...] 
>> You had the order of panic_reboot_list VS. consoles flushing inverted.
>> It might make sense, although I didn't do that in V1...
> 
> IMHO, it makes sense:
> 
>   1. panic_reboot_list contains notifiers that do the reboot
>      immediately, for example, xen_panic_event, alpha_panic_event.
>      The consoles have to be flushed earlier.
> 
>   2. console_flush_on_panic() ignores the result of console_trylock()
>      and always calls console_unlock(). As a result the lock should
>      be unlocked at the end. And any further printk() should be able
>      to printk the messages to the console immediately. It means
>      that any messages printed by the reboot notifiers should appear
>      on the console as well.
> [...] 
>> OK, I agree with you! It's indeed simpler and if others agree, I can
>> happily change the logic to what you proposed. Although...currently the
>> "crash_kexec_post_notifiers" allows to call _all_ panic_reboot_list
>> callbacks _before kdump_.
>>
>> We need to mention this change in the commit messages, but I really
>> would like to hear the opinions of heavy users of notifiers (as
>> Michael/Hyper-V) and the kdump interested parties (like Baoquan / Dave
>> Young / Hayatama). If we all agree on such approach, will change that
>> for V2 =)
> 
> Sure, we need to make sure that we call everything that is needed.
> And it should be documented.
> 
> I believe that this is the right way because:
> 
>   + It was actually the motivation for this patchset. We split
>     the notifiers into separate lists because we want to call
>     only the really needed ones before kmsg_dump and crash_dump.
> 
>   + If anything is needed for crash_dump that it should be called
>     even when crash_dump is called first. It should be either
>     hardcoded into crash_dump() or we would need another notifier
>     list that will be always called before crash_dump.

Ack, makes sense! Will do that in V2 =)

For the "hardcoded" part, we have the custom machine_crash_kexec() in
some archs (like x86), unfortunately not in all of them - this path is
ideally for mandatory code that is required for a successful crash_kexec().

The problem is the "same old, same old" - architecture folks push that
to panic notifiers; notifiers folks push it to the arch custom shutdown
handler (see [0] heheh).
(CCed Marc / Mark in case they want to chime-in here...)


>[...] 
> Thanks a lot for working on this.
> 
> Best Regards,
> Petr


You're welcome, _thank you_ for the great and detailed reviews!
Cheers,


Guilherme


[0]
https://lore.kernel.org/lkml/427a8277-49f0-4317-d6c3-4a15d7070e55@igalia.com/

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-05-16 16:18       ` Luck, Tony
@ 2022-05-16 16:33         ` Guilherme G. Piccoli
  2022-05-17 14:11           ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-16 16:33 UTC (permalink / raw)
  To: Luck, Tony, Petr Mladek, Dinh Nguyen
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, Tang, Feng, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu

On 16/05/2022 13:18, Luck, Tony wrote:
>> [...]
> Would it be possible to have some global "kdump is configured + enabled" flag?
> 
> Then notifiers could make an informed choice on whether to deep dive to
> get all the possible details (when there is no kdump) or just skim the high
> level stuff (to maximize chance of getting a successful kdump).
> 
> -Tony

Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?

With that, are you/Petr/Dinh OK in moving it for the info list?
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path
  2022-05-10 13:00     ` Guilherme G. Piccoli
@ 2022-05-17 10:58       ` Petr Mladek
  2022-05-17 13:03         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-17 10:58 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
	linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Christophe JAILLET, Mihai Carabas,
	Shile Zhang, Wang ShaoBo, zhenwei pi

On Tue 2022-05-10 10:00:58, Guilherme G. Piccoli wrote:
> On 10/05/2022 09:14, Petr Mladek wrote:
> > [...]
> >> With that said, it's dangerous to use regular spinlocks in such path,
> >> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
> >> This patch fixes that by replacing regular spinlocks with the trylock
> >> safer approach.
> > 
> > It seems that the lock is used just to manipulating a list. A super
> > safe solution would be to use the rcu API: rcu_add_rcu() and
> > list_del_rcu() under rcu_read_lock(). The spin lock will not be
> > needed and the list will always be valid.
> > 
> > The advantage would be that it will always call members that
> > were successfully added earlier. That said, I am not familiar
> > with pvpanic and am not sure if it is worth it.
> > 
> >> It also fixes an old comment (about a long gone framebuffer code) and
> >> the notifier priority - we should execute hypervisor notifiers early,
> >> deferring this way the panic action to the hypervisor, as expected by
> >> the users that are setting up pvpanic.
> > 
> > This should be done in a separate patch. It changes the behavior.
> > Also there might be a discussion whether it really should be
> > the maximal priority.
> > 
> > Best Regards,
> > Petr
> 
> Thanks for the review Petr. Patch was already merged - my goal was to be
> concise, i.e., a patch per driver / module, so the patch kinda fixes
> whatever I think is wrong with the driver with regards panic handling.
> 
> Do you think it worth to remove this patch from Greg's branch just to
> split it in 2? Personally I think it's not worth, but opinions are welcome.

No problem. It is not worth the effort.


> About the RCU part, this one really could be a new patch, a good
> improvement patch - it makes sense to me, we can think about that after
> the fixes I guess.

Yup.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path
  2022-05-17 10:58       ` Petr Mladek
@ 2022-05-17 13:03         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-17 13:03 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-arm-kernel, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Christophe JAILLET, Mihai Carabas,
	Shile Zhang, Wang ShaoBo, zhenwei pi

On 17/05/2022 07:58, Petr Mladek wrote:
> [...]
>> Thanks for the review Petr. Patch was already merged - my goal was to be
>> concise, i.e., a patch per driver / module, so the patch kinda fixes
>> whatever I think is wrong with the driver with regards panic handling.
>>
>> Do you think it worth to remove this patch from Greg's branch just to
>> split it in 2? Personally I think it's not worth, but opinions are welcome.
> 
> No problem. It is not worth the effort.
> 

OK, perfect!

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks
  2022-05-10 16:16     ` Guilherme G. Piccoli
@ 2022-05-17 13:11       ` Petr Mladek
  2022-05-17 15:19         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-17 13:11 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On Tue 2022-05-10 13:16:54, Guilherme G. Piccoli wrote:
> On 10/05/2022 12:16, Petr Mladek wrote:
> > [...]
> > Hmm, this looks like a hack. PANIC_UNUSED will never be used.
> > All notifiers will be always called with PANIC_NOTIFIER.
> > 
> > The @val parameter is normally used when the same notifier_list
> > is used in different situations.
> > 
> > But you are going to use it when the same notifier is used
> > in more lists. This is normally distinguished by the @nh
> > (atomic_notifier_head) parameter.
> > 
> > IMHO, it is a bad idea. First, it would confuse people because
> > it does not follow the original design of the parameters.
> > Second, the related code must be touched anyway when
> > the notifier is moved into another list so it does not
> > help much.
> > 
> > Or do I miss anything, please?
> > 
> > Best Regards,
> > Petr
> 
> Hi Petr, thanks for the review.
> 
> I'm not strong attached to this patch, so we could drop it and refactor
> the code of next patches to use the @nh as identification - but
> personally, I feel this parameter could be used to identify the list
> that called such function, in other words, what is the event that
> triggered the callback. Some notifiers are even declared with this
> parameter called "ev", like the event that triggers the notifier.
> 
> 
> You mentioned 2 cases:
> 
> (a) Same notifier_list used in different situations;
> 
> (b) Same *notifier callback* used in different lists;
> 
> Mine is case (b), right? Can you show me an example of case (a)?

There are many examples of case (a):

   + module_notify_list:
	MODULE_STATE_LIVE, 	/* Normal state. */
	MODULE_STATE_COMING,	/* Full formed, running module_init. */
	MODULE_STATE_GOING,	/* Going away. */
	MODULE_STATE_UNFORMED,	/* Still setting it up. */


   + netdev_chain:

	NETDEV_UP	= 1,	/* For now you can't veto a device up/down */
	NETDEV_DOWN,
	NETDEV_REBOOT,		/* Tell a protocol stack a network interface
				   detected a hardware crash and restarted
				   - we can use this eg to kick tcp sessions
				   once done */
	NETDEV_CHANGE,		/* Notify device state change */
	NETDEV_REGISTER,
	NETDEV_UNREGISTER,
	NETDEV_CHANGEMTU,	/* notify after mtu change happened */
	NETDEV_CHANGEADDR,	/* notify after the address change */
	NETDEV_PRE_CHANGEADDR,	/* notify before the address change */
	NETDEV_GOING_DOWN,
	...

    + vt_notifier_list:

	#define VT_ALLOCATE		0x0001 /* Console got allocated */
	#define VT_DEALLOCATE		0x0002 /* Console will be deallocated */
	#define VT_WRITE		0x0003 /* A char got output */
	#define VT_UPDATE		0x0004 /* A bigger update occurred */
	#define VT_PREWRITE		0x0005 /* A char is about to be written to the console */

    + die_chain:

	DIE_OOPS = 1,
	DIE_INT3,
	DIE_DEBUG,
	DIE_PANIC,
	DIE_NMI,
	DIE_DIE,
	DIE_KERNELDEBUG,
	...

These all call the same list/chain in different situations.
The situation is distinguished by @val.


> You can see in the following patches (or grep the kernel) that people are using
> this identification parameter to determine which kind of OOPS trigger
> the callback to condition the execution of the function to specific
> cases.

Could you please show me some existing code for case (b)?
I am not able to find any except in your patches.

Anyway, the solution in 16th patch is bad, definitely.
hv_die_panic_notify_crash() uses "val" to disinguish
both:

     + "panic_notifier_list" vs "die_chain"
     + die_val when callen via "die_chain"

The API around "die_chain" API is not aware of enum panic_notifier_val
and the API using "panic_notifier_list" is not aware of enum die_val.
As I said, it is mixing apples and oranges and it is error prone.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-16 16:02       ` Evan Green
@ 2022-05-17 13:28         ` Petr Mladek
  2022-05-17 16:37           ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-17 13:28 UTC (permalink / raw)
  To: Evan Green
  Cc: Guilherme G. Piccoli, David Gow, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, Linux PM,
	Florian Fainelli, Andrew Morton, bhe, kexec, LKML, linuxppc-dev,
	linux-alpha, linux-arm Mailing List, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	Andy Shevchenko, Arnd Bergmann, Borislav Petkov, Jonathan Corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, Greg Kroah-Hartman,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, Kees Cook,
	luto, mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi, Stephen Boyd

On Mon 2022-05-16 09:02:10, Evan Green wrote:
> On Mon, May 16, 2022 at 8:07 AM Guilherme G. Piccoli
> <gpiccoli@igalia.com> wrote:
> >
> > Thanks for the review!
> >
> > I agree with the blinking stuff, I can rework and add all LED/blinking
> > stuff into the loop list, it does make sense. I'll comment a bit in the
> > others below...
> >
> > On 16/05/2022 11:01, Petr Mladek wrote:
> > > [...]
> > >> --- a/arch/mips/sgi-ip22/ip22-reset.c
> > >> +++ b/arch/mips/sgi-ip22/ip22-reset.c
> > >> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
> > >>      }
> > >>
> > >>      timer_setup(&blink_timer, blink_timeout, 0);
> > >> -    atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> > >> +    atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
> > >
> > > This notifier enables blinking. It is not much safe. It calls
> > > mod_timer() that takes a lock internally.
> > >
> > > This kind of functionality should go into the last list called
> > > before panic() enters the infinite loop. IMHO, all the blinking
> > > stuff should go there.
> > > [...]
> > >> --- a/arch/mips/sgi-ip32/ip32-reset.c
> > >> +++ b/arch/mips/sgi-ip32/ip32-reset.c
> > >> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
> > >>      pm_power_off = ip32_machine_halt;
> > >>
> > >>      timer_setup(&blink_timer, blink_timeout, 0);
> > >> -    atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> > >> +    atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
> > >
> > > Same here. Should be done only before the "loop".
> > > [...]
> >
> > Ack.
> >
> >
> > >> --- a/drivers/firmware/google/gsmi.c
> > >> +++ b/drivers/firmware/google/gsmi.c
> > >> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
> > >>
> > >>      register_reboot_notifier(&gsmi_reboot_notifier);
> > >>      register_die_notifier(&gsmi_die_notifier);
> > >> -    atomic_notifier_chain_register(&panic_notifier_list,
> > >> +    atomic_notifier_chain_register(&panic_hypervisor_list,
> > >>                                     &gsmi_panic_notifier);
> > >
> > > I am not sure about this one. It looks like some logging or
> > > pre_reboot stuff.
> > >
> >
> > Disagree here. I'm looping Google maintainers, so they can comment.
> > (CCed Evan, David, Julius)
> >
> > This notifier is clearly a hypervisor notification mechanism. I've fixed
> > a locking stuff there (in previous patch), I feel it's low-risk but even
> > if it's mid-risk, the class of such callback remains a perfect fit with
> > the hypervisor list IMHO.
>
> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> for critical and power-related events. In some cases this ends up
> being the only clue we get in a Chromebook feedback report that a
> panic occurred, so from my perspective moving it to the front of the
> line seems like a good idea.

IMHO, this would really better fit into the pre-reboot notifier list:

   + the callback stores the log so it is similar to kmsg_dump()
     or console_flush_on_panic()

   + the callback should be proceed after "info" notifiers
     that might add some other useful information.

Honestly, I am not sure what exactly hypervisor callbacks do. But I
think that they do not try to extract the kernel log because they
would need to handle the internal format.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-16 15:06     ` Guilherme G. Piccoli
  2022-05-16 16:02       ` Evan Green
@ 2022-05-17 13:57       ` Petr Mladek
  2022-05-17 16:42         ` Guilherme G. Piccoli
  2022-05-18  7:58         ` Petr Mladek
  1 sibling, 2 replies; 183+ messages in thread
From: Petr Mladek @ 2022-05-17 13:57 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: David Gow, Evan Green, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, linux-pm,
	Florian Fainelli, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On Mon 2022-05-16 12:06:17, Guilherme G. Piccoli wrote:
> Thanks for the review!
> 
> I agree with the blinking stuff, I can rework and add all LED/blinking
> stuff into the loop list, it does make sense. I'll comment a bit in the
> others below...
> 
> On 16/05/2022 11:01, Petr Mladek wrote:
> >> --- a/drivers/firmware/google/gsmi.c
> >> +++ b/drivers/firmware/google/gsmi.c
> >> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
> >>  
> >>  	register_reboot_notifier(&gsmi_reboot_notifier);
> >>  	register_die_notifier(&gsmi_die_notifier);
> >> -	atomic_notifier_chain_register(&panic_notifier_list,
> >> +	atomic_notifier_chain_register(&panic_hypervisor_list,
> >>  				       &gsmi_panic_notifier);
> > 
> > I am not sure about this one. It looks like some logging or
> > pre_reboot stuff.
> > 
> 
> Disagree here. I'm looping Google maintainers, so they can comment.
> (CCed Evan, David, Julius)
> 
> This notifier is clearly a hypervisor notification mechanism. I've fixed
> a locking stuff there (in previous patch), I feel it's low-risk but even
> if it's mid-risk, the class of such callback remains a perfect fit with
> the hypervisor list IMHO.

It is similar to drivers/soc/bcm/brcmstb/pm/pm-arm.c.
See below for another idea.

> >> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
> >> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
> >> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> >>  
> >>  	/* register for panic notifier */
> >>  	vk->panic_nb.notifier_call = bcm_vk_on_panic;
> >> -	err = atomic_notifier_chain_register(&panic_notifier_list,
> >> +	err = atomic_notifier_chain_register(&panic_hypervisor_list,
> >>  					     &vk->panic_nb);
> > 
> > It seems to reset some hardware or so. IMHO, it should go into the
> > pre-reboot list.
> 
> Mixed feelings here, I'm looping Broadcom maintainers to comment.
> (CC Scott and Broadcom list)
> 
> I'm afraid it breaks kdump if this device is not reset beforehand - it's
> a doorbell write, so not high risk I think...
> 
> But in case the not-reset device can be probed normally in kdump kernel,
> then I'm fine in moving this to the reboot list! I don't have the HW to
> test myself.

Good question. Well, it if has to be called before kdump then
even "hypervisor" list is a wrong place because is not always
called before kdump.


> >> --- a/drivers/power/reset/ltc2952-poweroff.c
> >> +++ b/drivers/power/reset/ltc2952-poweroff.c
> >> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct platform_device *pdev)
> >>  	pm_power_off = ltc2952_poweroff_kill;
> >>  
> >>  	data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
> >> -	atomic_notifier_chain_register(&panic_notifier_list,
> >> +	atomic_notifier_chain_register(&panic_hypervisor_list,
> >>  				       &data->panic_notifier);
> > 
> > I looks like this somehow triggers the reboot. IMHO, it should go
> > into the pre_reboot list.
> 
> Mixed feeling again here - CCing the maintainers for comments (Sebastian
> / PM folks).
> 
> This is setting a variable only, and once it's set (data->kernel_panic
> is the bool's name), it just bails out the IRQ handler and a timer
> setting - this timer seems kinda tricky, so bailing out ASAP makes sense
> IMHO.

IMHO, the timer informs the hardware that the system is still alive
in the middle of panic(). If the timer is not working then the
hardware (chip) will think that the system frozen in panic()
and will power off the system. See the comments in
drivers/power/reset/ltc2952-poweroff.c:

 * The following GPIOs are used:
 * - trigger (input)
 *     A level change indicates the shut-down trigger. If it's state reverts
 *     within the time-out defined by trigger_delay, the shut down is not
 *     executed. If no pin is assigned to this input, the driver will start the
 *     watchdog toggle immediately. The chip will only power off the system if
 *     it is requested to do so through the kill line.
 *
 * - watchdog (output)
 *     Once a shut down is triggered, the driver will toggle this signal,
 *     with an internal (wde_interval) to stall the hardware shut down.

IMHO, we really have to keep it alive until we reach the reboot stage.

Another question is how it actually works when the interrupts are
disabled during panic() and the timer callbacks are not handled.


> > [...]
> >> --- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> >> +++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> >> @@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device *pdev)
> >>  		goto out;
> >>  	}
> >>  
> >> -	atomic_notifier_chain_register(&panic_notifier_list,
> >> +	atomic_notifier_chain_register(&panic_hypervisor_list,
> >>  				       &brcmstb_pm_panic_nb);
> > 
> > I am not sure about this one. It instruct some HW to preserve DRAM.
> > IMHO, it better fits into pre_reboot category but I do not have
> > strong opinion.
> 
> Disagree here, I'm CCing Florian for information.
> 
> This notifier preserves RAM so it's *very interesting* if we have
> kmsg_dump() for example, but maybe might be also relevant in case kdump
> kernel is configured to store something in a persistent RAM (then,
> without this notifier, after kdump reboots the system data would be lost).

I see. It is actually similar problem as with
drivers/firmware/google/gsmi.c.

I does similar things like kmsg_dump() so it should be called in
the same location (after info notifier list and before kdump).

A solution might be to put it at these notifiers at the very
end of the "info" list or make extra "dump" notifier list.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-05-16 16:33         ` Guilherme G. Piccoli
@ 2022-05-17 14:11           ` Petr Mladek
  2022-05-17 16:45             ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-17 14:11 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Luck, Tony, Dinh Nguyen, akpm, bhe, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, Tang, Feng, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu

On Mon 2022-05-16 13:33:51, Guilherme G. Piccoli wrote:
> On 16/05/2022 13:18, Luck, Tony wrote:
> >> [...]
> > Would it be possible to have some global "kdump is configured + enabled" flag?
> > 
> > Then notifiers could make an informed choice on whether to deep dive to
> > get all the possible details (when there is no kdump) or just skim the high
> > level stuff (to maximize chance of getting a successful kdump).
> > 
> > -Tony
> 
> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?

I like this idea.

One small problem is that kexec_crash_loaded() has valid result
only under kexec_mutex. On the other hand, it should stay true
once loaded so that the small race window should be innocent.

> With that, are you/Petr/Dinh OK in moving it for the info list?

Sounds good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks
  2022-05-17 13:11       ` Petr Mladek
@ 2022-05-17 15:19         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-17 15:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

On 17/05/2022 10:11, Petr Mladek wrote:
> [...]
>> You mentioned 2 cases:
>>
>> (a) Same notifier_list used in different situations;
>>
>> (b) Same *notifier callback* used in different lists;
>>
>> Mine is case (b), right? Can you show me an example of case (a)?
> 
> There are many examples of case (a):
> 
> [... snip ...] 
> These all call the same list/chain in different situations.
> The situation is distinguished by @val.
> 
> 
>> You can see in the following patches (or grep the kernel) that people are using
>> this identification parameter to determine which kind of OOPS trigger
>> the callback to condition the execution of the function to specific
>> cases.
> 
> Could you please show me some existing code for case (b)?
> I am not able to find any except in your patches.
> 

Hi Petr, thanks for the examples - I agree with you. In the end, seems
I'm kind of abusing the API. This id is used to distinguish different
situations in which the callback is called, but in the same
"realm"/notifier list.

In my case I have different list calling the same callback and
(ab-)using the id to make distinction. I can rework the patches using
pointer comparison, it's fine =)

So, I'll drop this patch in V2.

> Anyway, the solution in 16th patch is bad, definitely.
> hv_die_panic_notify_crash() uses "val" to disinguish
> both:
> 
>      + "panic_notifier_list" vs "die_chain"
>      + die_val when callen via "die_chain"
> 
> The API around "die_chain" API is not aware of enum panic_notifier_val
> and the API using "panic_notifier_list" is not aware of enum die_val.
> As I said, it is mixing apples and oranges and it is error prone.
> 

OK, I'll re-work that patch - there's more there to be changed, that one
is complex heheh

Cheers!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers
  2022-05-10 15:28   ` Petr Mladek
@ 2022-05-17 15:32     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-17 15:32 UTC (permalink / raw)
  To: Petr Mladek, Florian Fainelli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-arm-kernel, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, Brian Norris

On 10/05/2022 12:28, Petr Mladek wrote:
> [...]
> IMHO, the check of the @self parameter was the proper solution.
> 
> "gisb_die_notifier" list uses @val from enum die_val.
> "gisb_panic_notifier" list uses @val from enum panic_notifier_val.
> 
> These are unrelated types. It might easily break when
> someone defines the same constant also in enum die_val.
> 
> Best Regards,
> Petr

OK Petr, I'll drop this idea for V2 - will just remove the useless
header / prototype then. (CC Florian)


Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 17/30] tracing: Improve panic/die notifiers
  2022-05-11 11:45   ` Petr Mladek
@ 2022-05-17 15:33     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-17 15:33 UTC (permalink / raw)
  To: Petr Mladek, rostedt
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will

On 11/05/2022 08:45, Petr Mladek wrote:
> [...]
> DIE_OOPS and PANIC_NOTIFIER are from different enum.
> It feels like comparing apples with oranges here.
> 
> IMHO, the proper way to unify the two notifiers is
> a check of the @self parameter. Something like:
> 
> static int trace_die_panic_handler(struct notifier_block *self,
> 				unsigned long ev, void *unused)
> {
> 	if (self == trace_die_notifier && val != DIE_OOPS)
> 		goto out;
> 
> 	ftrace_dump(ftrace_dump_on_oops);
> out:
> 	return NOTIFY_DONE;
> }
> 
> Best Regards,
> Petr

OK Petr, thanks - will implement your suggestion in V2 (CC Steven)

Cheers!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-17 13:28         ` Petr Mladek
@ 2022-05-17 16:37           ` Guilherme G. Piccoli
  2022-05-18  7:33             ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-17 16:37 UTC (permalink / raw)
  To: Petr Mladek, Evan Green, David Gow, Julius Werner
  Cc: Scott Branden, bcm-kernel-feedback-list, Sebastian Reichel,
	Linux PM, Florian Fainelli, Andrew Morton, bhe, kexec, LKML,
	linuxppc-dev, linux-alpha, linux-arm Mailing List, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, Andy Shevchenko, Arnd Bergmann,
	Borislav Petkov, Jonathan Corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, Greg Kroah-Hartman, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi, Stephen Boyd

On 17/05/2022 10:28, Petr Mladek wrote:
> [...]
>>> Disagree here. I'm looping Google maintainers, so they can comment.
>>> (CCed Evan, David, Julius)
>>>
>>> This notifier is clearly a hypervisor notification mechanism. I've fixed
>>> a locking stuff there (in previous patch), I feel it's low-risk but even
>>> if it's mid-risk, the class of such callback remains a perfect fit with
>>> the hypervisor list IMHO.
>>
>> This logs a panic to our "eventlog", a tiny logging area in SPI flash
>> for critical and power-related events. In some cases this ends up
>> being the only clue we get in a Chromebook feedback report that a
>> panic occurred, so from my perspective moving it to the front of the
>> line seems like a good idea.
> 
> IMHO, this would really better fit into the pre-reboot notifier list:
> 
>    + the callback stores the log so it is similar to kmsg_dump()
>      or console_flush_on_panic()
> 
>    + the callback should be proceed after "info" notifiers
>      that might add some other useful information.
> 
> Honestly, I am not sure what exactly hypervisor callbacks do. But I
> think that they do not try to extract the kernel log because they
> would need to handle the internal format.
> 

I guess the main point in your response is : "I am not sure what exactly
hypervisor callbacks do". We need to be sure about the semantics of such
list, and agree on that.

So, my opinion about this first list, that we call "hypervisor list",
is: it contains callbacks that

(1) should run early, preferably before kdump (or even if kdump isn't
set, should run ASAP);

(2) these callbacks perform some communication with an abstraction that
runs "below" the kernel, like a firmware or hypervisor. Classic example:
pvpanic, that communicates with VMM (usually qemu) and allow such VMM to
snapshot the full guest memory, for example.

(3) Should be low-risk. What defines risk is the level of reliability of
subsequent operations - if the callback have 50% of chance of "bricking"
the system totally and prevent kdump / kmsg_dump() / reboot , this is
high risk one for example.

Some good fits IMO: pvpanic, sstate_panic_event() [sparc], fadump in
powerpc, etc.

So, this is a good case for the Google notifier as well - it's not
collecting data like the dmesg (hence your second bullet seems to not
apply here, info notifiers won't add info to be collected by gsmi). It
is a firmware/hypervisor/whatever-gsmi-is notification mechanism, that
tells such "lower" abstraction a panic occurred. It seems low risk and
we want it to run ASAP, if possible.

So, I'd like to keep it here, unless gsmi maintainers disagree or I'm
perhaps misunderstanding the meaning of this first list.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-17 13:57       ` Petr Mladek
@ 2022-05-17 16:42         ` Guilherme G. Piccoli
  2022-05-18  7:38           ` Petr Mladek
  2022-05-18 22:17           ` Scott Branden
  2022-05-18  7:58         ` Petr Mladek
  1 sibling, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-17 16:42 UTC (permalink / raw)
  To: Petr Mladek, Scott Branden, Sebastian Reichel, Florian Fainelli
  Cc: David Gow, Evan Green, Julius Werner, bcm-kernel-feedback-list,
	linux-pm, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On 17/05/2022 10:57, Petr Mladek wrote:
> [...]
>>>> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
>>>> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
>>>> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>>> [... snip ...]
>>> It seems to reset some hardware or so. IMHO, it should go into the
>>> pre-reboot list.
>>
>> Mixed feelings here, I'm looping Broadcom maintainers to comment.
>> (CC Scott and Broadcom list)
>>
>> I'm afraid it breaks kdump if this device is not reset beforehand - it's
>> a doorbell write, so not high risk I think...
>>
>> But in case the not-reset device can be probed normally in kdump kernel,
>> then I'm fine in moving this to the reboot list! I don't have the HW to
>> test myself.
> 
> Good question. Well, it if has to be called before kdump then
> even "hypervisor" list is a wrong place because is not always
> called before kdump.

Agreed! I'll defer that to Scott and Broadcom folks to comment.
If it's not strictly necessary, I'll happily move it to the reboot list.

If necessary, we could use the machine_crash_kexec() approach, but we'll
fall into the case arm64 doesn't support it and I'm not sure if this
device is available for arm - again a question for the maintainers.


>  [...]
>>>> --- a/drivers/power/reset/ltc2952-poweroff.c
>>>> +++ b/drivers/power/reset/ltc2952-poweroff.c
>> [...]
>> This is setting a variable only, and once it's set (data->kernel_panic
>> is the bool's name), it just bails out the IRQ handler and a timer
>> setting - this timer seems kinda tricky, so bailing out ASAP makes sense
>> IMHO.
> 
> IMHO, the timer informs the hardware that the system is still alive
> in the middle of panic(). If the timer is not working then the
> hardware (chip) will think that the system frozen in panic()
> and will power off the system. See the comments in
> drivers/power/reset/ltc2952-poweroff.c:
> [.... snip ...]
> IMHO, we really have to keep it alive until we reach the reboot stage.
> 
> Another question is how it actually works when the interrupts are
> disabled during panic() and the timer callbacks are not handled.

Agreed here! Guess I can move this one the reboot list, fine by me.
Unless PM folks think otherwise.


> [...]
>> Disagree here, I'm CCing Florian for information.
>>
>> This notifier preserves RAM so it's *very interesting* if we have
>> kmsg_dump() for example, but maybe might be also relevant in case kdump
>> kernel is configured to store something in a persistent RAM (then,
>> without this notifier, after kdump reboots the system data would be lost).
> 
> I see. It is actually similar problem as with
> drivers/firmware/google/gsmi.c.
> 
> I does similar things like kmsg_dump() so it should be called in
> the same location (after info notifier list and before kdump).
> 
> A solution might be to put it at these notifiers at the very
> end of the "info" list or make extra "dump" notifier list.

Here I still disagree. I've commented in the other response thread
(about Google gsmi) about the semantics of the hypervisor list, but
again: this list should contain callbacks that

(a) Should run early, _by default_ before a kdump;
(b) Communicate with the firmware/hypervisor in a "low-risk" way;

Imagine a scenario where users configure kdump kernel to save something
in a persistent form in DRAM - it'd be like a late pstore, in the next
kernel. This callback enables that, it's meant to inform FW "hey, panic
happened, please from now on don't clear the RAM in the next FW-reboot".
I don't see a reason to postpone that - let's see if the maintainers
have an opinion.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-05-17 14:11           ` Petr Mladek
@ 2022-05-17 16:45             ` Guilherme G. Piccoli
  2022-05-17 17:02               ` Luck, Tony
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-17 16:45 UTC (permalink / raw)
  To: Petr Mladek, Luck, Tony, Dinh Nguyen
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, Tang, Feng, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu

On 17/05/2022 11:11, Petr Mladek wrote:
> [...]
>>> Then notifiers could make an informed choice on whether to deep dive to
>>> get all the possible details (when there is no kdump) or just skim the high
>>> level stuff (to maximize chance of getting a successful kdump).
>>>
>>> -Tony
>>
>> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?
> 
> I like this idea.
> 
> One small problem is that kexec_crash_loaded() has valid result
> only under kexec_mutex. On the other hand, it should stay true
> once loaded so that the small race window should be innocent.
> 
>> With that, are you/Petr/Dinh OK in moving it for the info list?
> 
> Sounds good to me.
> 
> Best Regards,
> Petr

Perfect, I'll do that for V2 then =)

Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
we run the code as-is? Does that make sense to you?

I'll postpone it to run almost in the end of info list (last position is
for panic_print).

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-05-17 16:45             ` Guilherme G. Piccoli
@ 2022-05-17 17:02               ` Luck, Tony
  2022-05-17 18:12                 ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Luck, Tony @ 2022-05-17 17:02 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Petr Mladek, Dinh Nguyen
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, Tang, Feng, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu

> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
> we run the code as-is? Does that make sense to you?

The "skip" option sounds like it needs some special flag associated with
an entry on the notifier chain. But there are other notifier chains ... so that
sounds messy to me.

Just all the notifiers in priority order. If any want to take different actions
based on kdump status, change the code. That seems more flexible than
an "all or nothing" approach by skipping.

-Tony

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-05-17 17:02               ` Luck, Tony
@ 2022-05-17 18:12                 ` Guilherme G. Piccoli
  2022-05-17 19:07                   ` Luck, Tony
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-17 18:12 UTC (permalink / raw)
  To: Luck, Tony, Petr Mladek, Dinh Nguyen
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, Tang, Feng, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu

On 17/05/2022 14:02, Luck, Tony wrote:
>> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
>> we run the code as-is? Does that make sense to you?
> 
> The "skip" option sounds like it needs some special flag associated with
> an entry on the notifier chain. But there are other notifier chains ... so that
> sounds messy to me.
> 
> Just all the notifiers in priority order. If any want to take different actions
> based on kdump status, change the code. That seems more flexible than
> an "all or nothing" approach by skipping.
> 
> -Tony

I guess I've expressed myself in a poor way - sorry!

What I'm planning to do in the altera_edac notifier is:

if (kdump_is_set)
 return;

/* regular code */

In other words: if the kdump is set, this notifier will be effectively a
nop (although it's gonna be called).

Lemme know your thoughts Tony, if that makes sense.
Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
  2022-05-17 18:12                 ` Guilherme G. Piccoli
@ 2022-05-17 19:07                   ` Luck, Tony
  0 siblings, 0 replies; 183+ messages in thread
From: Luck, Tony @ 2022-05-17 19:07 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Petr Mladek, Dinh Nguyen
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, Tang, Feng, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alex Elder, Alexander Gordeev,
	Anton Ivanov, Benjamin Herrenschmidt, Bjorn Andersson,
	Boris Ostrovsky, Chris Zankel, Christian Borntraeger,
	Corey Minyard, Dexuan Cui, H. Peter Anvin, Haiyang Zhang,
	Heiko Carstens, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu

> What I'm planning to do in the altera_edac notifier is:
>
> if (kdump_is_set)
>   return;

Yes. That's what I think should happen.

-Tony

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-17 16:37           ` Guilherme G. Piccoli
@ 2022-05-18  7:33             ` Petr Mladek
  2022-05-18 13:24               ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-18  7:33 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Evan Green, David Gow, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, Linux PM,
	Florian Fainelli, Andrew Morton, bhe, kexec, LKML, linuxppc-dev,
	linux-alpha, linux-arm Mailing List, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	Andy Shevchenko, Arnd Bergmann, Borislav Petkov, Jonathan Corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, Greg Kroah-Hartman,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, Kees Cook,
	luto, mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi, Stephen Boyd

On Tue 2022-05-17 13:37:58, Guilherme G. Piccoli wrote:
> On 17/05/2022 10:28, Petr Mladek wrote:
> > [...]
> >>> Disagree here. I'm looping Google maintainers, so they can comment.
> >>> (CCed Evan, David, Julius)
> >>>
> >>> This notifier is clearly a hypervisor notification mechanism. I've fixed
> >>> a locking stuff there (in previous patch), I feel it's low-risk but even
> >>> if it's mid-risk, the class of such callback remains a perfect fit with
> >>> the hypervisor list IMHO.
> >>
> >> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> >> for critical and power-related events. In some cases this ends up
 > >> being the only clue we get in a Chromebook feedback report that a
> >> panic occurred, so from my perspective moving it to the front of the
> >> line seems like a good idea.
> > 
> > IMHO, this would really better fit into the pre-reboot notifier list:
> > 
> >    + the callback stores the log so it is similar to kmsg_dump()
> >      or console_flush_on_panic()
> > 
> >    + the callback should be proceed after "info" notifiers
> >      that might add some other useful information.
> > 
> > Honestly, I am not sure what exactly hypervisor callbacks do. But I
> > think that they do not try to extract the kernel log because they
> > would need to handle the internal format.
> > 
> 
> I guess the main point in your response is : "I am not sure what exactly
> hypervisor callbacks do". We need to be sure about the semantics of such
> list, and agree on that.
> 
> So, my opinion about this first list, that we call "hypervisor list",
> is: it contains callbacks that
> 
> (1) should run early, preferably before kdump (or even if kdump isn't
> set, should run ASAP);
> 
> (2) these callbacks perform some communication with an abstraction that
> runs "below" the kernel, like a firmware or hypervisor. Classic example:
> pvpanic, that communicates with VMM (usually qemu) and allow such VMM to
> snapshot the full guest memory, for example.
> 
> (3) Should be low-risk. What defines risk is the level of reliability of
> subsequent operations - if the callback have 50% of chance of "bricking"
> the system totally and prevent kdump / kmsg_dump() / reboot , this is
> high risk one for example.
> 
> Some good fits IMO: pvpanic, sstate_panic_event() [sparc], fadump in
> powerpc, etc.
> 
> So, this is a good case for the Google notifier as well - it's not
> collecting data like the dmesg (hence your second bullet seems to not
> apply here, info notifiers won't add info to be collected by gsmi). It
> is a firmware/hypervisor/whatever-gsmi-is notification mechanism, that
> tells such "lower" abstraction a panic occurred. It seems low risk and
> we want it to run ASAP, if possible.

" 
> >> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> >> for critical and power-related events. In some cases this ends up

I see. I somehow assumed that it was about the kernel log because
Evans wrote:

  "This logs a panic to our "eventlog", a tiny logging area in SPI flash
   for critical and power-related events. In some cases this ends up"


Anyway, I would distinguish it the following way.

  + If the notifier is preserving kernel log then it should be ideally
    treated as kmsg_dump().

  + It the notifier is saving another debugging data then it better
    fits into the "hypervisor" notifier list.


Regarding the reliability. From my POV, any panic notifier enabled
in a generic kernel should be reliable with more than 99,9%.
Otherwise, they should not be in the notifier list at all.

An exception would be a platform-specific notifier that is
called only on some specific platform and developers maintaining
this platform agree on this.

The value "99,9%" is arbitrary. I am not sure if it is realistic
even in the other code, for example, console_flush_on_panic()
or emergency_restart(). I just want to point out that the border
should be rather high. Otherwise we would back in the situation
where people would want to disable particular notifiers.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-17 16:42         ` Guilherme G. Piccoli
@ 2022-05-18  7:38           ` Petr Mladek
  2022-05-18 13:09             ` Guilherme G. Piccoli
  2022-05-18 22:17           ` Scott Branden
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-18  7:38 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Scott Branden, Sebastian Reichel, Florian Fainelli, David Gow,
	Evan Green, Julius Werner, bcm-kernel-feedback-list, linux-pm,
	akpm, bhe, kexec, linux-kernel, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alexander Gordeev, Andrea Parri,
	Ard Biesheuvel, Benjamin Herrenschmidt, Brian Norris,
	Christian Borntraeger, Christophe JAILLET, David S. Miller,
	Dexuan Cui, Doug Berger, Haiyang Zhang, Hari Bathini,
	Heiko Carstens, Justin Chen, K. Y. Srinivasan, Lee Jones,
	Markus Mayer, Michael Ellerman, Mihai Carabas, Nicholas Piggin,
	Paul Mackerras, Pavel Machek, Shile Zhang, Stephen Hemminger,
	Sven Schnelle, Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik,
	Wang ShaoBo, Wei Liu, zhenwei pi

On Tue 2022-05-17 13:42:06, Guilherme G. Piccoli wrote:
> On 17/05/2022 10:57, Petr Mladek wrote:
> >> Disagree here, I'm CCing Florian for information.
> >>
> >> This notifier preserves RAM so it's *very interesting* if we have
> >> kmsg_dump() for example, but maybe might be also relevant in case kdump
> >> kernel is configured to store something in a persistent RAM (then,
> >> without this notifier, after kdump reboots the system data would be lost).
> > 
> > I see. It is actually similar problem as with
> > drivers/firmware/google/gsmi.c.
> > 
> > I does similar things like kmsg_dump() so it should be called in
> > the same location (after info notifier list and before kdump).
> > 
> > A solution might be to put it at these notifiers at the very
> > end of the "info" list or make extra "dump" notifier list.
> 
> Here I still disagree. I've commented in the other response thread
> (about Google gsmi) about the semantics of the hypervisor list, but
> again: this list should contain callbacks that
> 
> (a) Should run early, _by default_ before a kdump;
> (b) Communicate with the firmware/hypervisor in a "low-risk" way;
> 
> Imagine a scenario where users configure kdump kernel to save something
> in a persistent form in DRAM - it'd be like a late pstore, in the next
> kernel. This callback enables that, it's meant to inform FW "hey, panic
> happened, please from now on don't clear the RAM in the next FW-reboot".
> I don't see a reason to postpone that - let's see if the maintainers
> have an opinion.

I have answered this in more detail in the other reply, see
https://lore.kernel.org/r/YoShZVYNAdvvjb7z@alley

I agree that both notifiers in

    drivers/soc/bcm/brcmstb/pm/pm-arm.c
    drivers/firmware/google/gsmi.c

better fit into the hypervisor list after all.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-17 13:57       ` Petr Mladek
  2022-05-17 16:42         ` Guilherme G. Piccoli
@ 2022-05-18  7:58         ` Petr Mladek
  2022-05-18 13:16           ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-18  7:58 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: David Gow, Evan Green, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, linux-pm,
	Florian Fainelli, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On Tue 2022-05-17 15:57:34, Petr Mladek wrote:
> On Mon 2022-05-16 12:06:17, Guilherme G. Piccoli wrote:
> > >> --- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> > >> +++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> > >> @@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device *pdev)
> > >>  		goto out;
> > >>  	}
> > >>  
> > >> -	atomic_notifier_chain_register(&panic_notifier_list,
> > >> +	atomic_notifier_chain_register(&panic_hypervisor_list,
> > >>  				       &brcmstb_pm_panic_nb);
> > > 
> > > I am not sure about this one. It instruct some HW to preserve DRAM.
> > > IMHO, it better fits into pre_reboot category but I do not have
> > > strong opinion.
> > 
> > Disagree here, I'm CCing Florian for information.
> > 
> > This notifier preserves RAM so it's *very interesting* if we have
> > kmsg_dump() for example, but maybe might be also relevant in case kdump
> > kernel is configured to store something in a persistent RAM (then,
> > without this notifier, after kdump reboots the system data would be lost).
> 
> I see. It is actually similar problem as with
> drivers/firmware/google/gsmi.c.

As discussed in the other other reply, it seems that both affected
notifiers do not store kernel logs and should stay in the "hypervisor".

> I does similar things like kmsg_dump() so it should be called in
> the same location (after info notifier list and before kdump).
>
> A solution might be to put it at these notifiers at the very
> end of the "info" list or make extra "dump" notifier list.

I just want to point out that the above idea has problems.
Notifiers storing kernel log need to be treated as kmsg_dump().
In particular, we would  need to know if there are any.
We do not need to call "info" notifier list before kdump
when there is no kernel log dumper registered.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-18  7:38           ` Petr Mladek
@ 2022-05-18 13:09             ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-18 13:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Scott Branden, Sebastian Reichel, Florian Fainelli, David Gow,
	Evan Green, Julius Werner, bcm-kernel-feedback-list, linux-pm,
	akpm, bhe, kexec, linux-kernel, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, Alexander Gordeev, Andrea Parri,
	Ard Biesheuvel, Benjamin Herrenschmidt, Brian Norris,
	Christian Borntraeger, Christophe JAILLET, David S. Miller,
	Dexuan Cui, Doug Berger, Haiyang Zhang, Hari Bathini,
	Heiko Carstens, Justin Chen, K. Y. Srinivasan, Lee Jones,
	Markus Mayer, Michael Ellerman, Mihai Carabas, Nicholas Piggin,
	Paul Mackerras, Pavel Machek, Shile Zhang, Stephen Hemminger,
	Sven Schnelle, Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik,
	Wang ShaoBo, Wei Liu, zhenwei pi

On 18/05/2022 04:38, Petr Mladek wrote:
> [...]
> I have answered this in more detail in the other reply, see
> https://lore.kernel.org/r/YoShZVYNAdvvjb7z@alley
> 
> I agree that both notifiers in
> 
>     drivers/soc/bcm/brcmstb/pm/pm-arm.c
>     drivers/firmware/google/gsmi.c
> 
> better fit into the hypervisor list after all.
> 
> Best Regards,
> Petr

Perfect, thanks - will keep both in such list for V2.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-18  7:58         ` Petr Mladek
@ 2022-05-18 13:16           ` Guilherme G. Piccoli
  2022-05-19  7:03             ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-18 13:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: David Gow, Evan Green, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, linux-pm,
	Florian Fainelli, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On 18/05/2022 04:58, Petr Mladek wrote:
> [...]
>> I does similar things like kmsg_dump() so it should be called in
>> the same location (after info notifier list and before kdump).
>>
>> A solution might be to put it at these notifiers at the very
>> end of the "info" list or make extra "dump" notifier list.
> 
> I just want to point out that the above idea has problems.
> Notifiers storing kernel log need to be treated as kmsg_dump().
> In particular, we would  need to know if there are any.
> We do not need to call "info" notifier list before kdump
> when there is no kernel log dumper registered.
> 

Notifiers respect the priority concept, which is just a number that
orders the list addition (and the list is called in order).

I've used the last position to panic_print() [in patch 25] - one idea
here is to "reserve" the last position (represented by INT_MIN) for
notifiers that act like kmsg_dump(). I couldn't find any IIRC, but that
doesn't prevent us to save this position and comment about that.

Makes sense to you ?
Cheers!

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-18  7:33             ` Petr Mladek
@ 2022-05-18 13:24               ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-18 13:24 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Evan Green, David Gow, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, Linux PM,
	Florian Fainelli, Andrew Morton, bhe, kexec, LKML, linuxppc-dev,
	linux-alpha, linux-arm Mailing List, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	Andy Shevchenko, Arnd Bergmann, Borislav Petkov, Jonathan Corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, Greg Kroah-Hartman,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, Kees Cook,
	luto, mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi, Stephen Boyd

On 18/05/2022 04:33, Petr Mladek wrote:
> [...]
> Anyway, I would distinguish it the following way.
> 
>   + If the notifier is preserving kernel log then it should be ideally
>     treated as kmsg_dump().
> 
>   + It the notifier is saving another debugging data then it better
>     fits into the "hypervisor" notifier list.
> 
>

Definitely, I agree - it's logical, since we want more info in the logs,
and happens some notifiers running in the informational list do that,
like ftrace_on_oops for example.


> Regarding the reliability. From my POV, any panic notifier enabled
> in a generic kernel should be reliable with more than 99,9%.
> Otherwise, they should not be in the notifier list at all.
> 
> An exception would be a platform-specific notifier that is
> called only on some specific platform and developers maintaining
> this platform agree on this.
> 
> The value "99,9%" is arbitrary. I am not sure if it is realistic
> even in the other code, for example, console_flush_on_panic()
> or emergency_restart(). I just want to point out that the border
> should be rather high. Otherwise we would back in the situation
> where people would want to disable particular notifiers.
> 

Totally agree, these percentages are just an example, 50% is ridiculous
low reliability in my example heheh

But some notifiers deep dive in abstraction layers (like regmap or GPIO
stuff) and it's hard to determine the probability of a lock issue (take
a spinlock already taken inside regmap code and live-lock forever, for
example). These are better to run, if possible, later than kdump or even
info list.

Thanks again for the good analysis Petr!
Cheers,


Guilherme



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-17 16:42         ` Guilherme G. Piccoli
  2022-05-18  7:38           ` Petr Mladek
@ 2022-05-18 22:17           ` Scott Branden
  2022-05-19 12:19             ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Scott Branden @ 2022-05-18 22:17 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Petr Mladek, Sebastian Reichel,
	Florian Fainelli, Desmond yan
  Cc: David Gow, Evan Green, Julius Werner, bcm-kernel-feedback-list,
	linux-pm, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

[-- Attachment #1: Type: text/plain, Size: 4387 bytes --]

Hi Guilherme,

+Desmond

On 2022-05-17 09:42, Guilherme G. Piccoli wrote:
> On 17/05/2022 10:57, Petr Mladek wrote:
>> [...]
>>>>> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
>>>>> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
>>>>> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>>>> [... snip ...]
>>>> It seems to reset some hardware or so. IMHO, it should go into the
>>>> pre-reboot list.
>>>
>>> Mixed feelings here, I'm looping Broadcom maintainers to comment.
>>> (CC Scott and Broadcom list)
>>>
>>> I'm afraid it breaks kdump if this device is not reset beforehand - it's
>>> a doorbell write, so not high risk I think...
>>>
>>> But in case the not-reset device can be probed normally in kdump kernel,
>>> then I'm fine in moving this to the reboot list! I don't have the HW to
>>> test myself.
>>
>> Good question. Well, it if has to be called before kdump then
>> even "hypervisor" list is a wrong place because is not always
>> called before kdump.
> 
> Agreed! I'll defer that to Scott and Broadcom folks to comment.
> If it's not strictly necessary, I'll happily move it to the reboot list.
> 
> If necessary, we could use the machine_crash_kexec() approach, but we'll
> fall into the case arm64 doesn't support it and I'm not sure if this
> device is available for arm - again a question for the maintainers.
We register to the panic notifier so that we can kill the VK card ASAP
to stop DMAing things over to the host side.  If it is not notified then
memory may not be frozen when kdump is occurring.
Notifying the card on panic is also needed to allow for any type of 
reset to occur.

So, the only thing preventing moving the notifier later is the chance
that memory is modified while kdump is occurring.  Or, if DMA is 
disabled before kdump already then this wouldn't be an issue and the 
notification to the card (to allow for clean resets) can be done later.
> 
> 
>>   [...]
>>>>> --- a/drivers/power/reset/ltc2952-poweroff.c
>>>>> +++ b/drivers/power/reset/ltc2952-poweroff.c
>>> [...]
>>> This is setting a variable only, and once it's set (data->kernel_panic
>>> is the bool's name), it just bails out the IRQ handler and a timer
>>> setting - this timer seems kinda tricky, so bailing out ASAP makes sense
>>> IMHO.
>>
>> IMHO, the timer informs the hardware that the system is still alive
>> in the middle of panic(). If the timer is not working then the
>> hardware (chip) will think that the system frozen in panic()
>> and will power off the system. See the comments in
>> drivers/power/reset/ltc2952-poweroff.c:
>> [.... snip ...]
>> IMHO, we really have to keep it alive until we reach the reboot stage.
>>
>> Another question is how it actually works when the interrupts are
>> disabled during panic() and the timer callbacks are not handled.
> 
> Agreed here! Guess I can move this one the reboot list, fine by me.
> Unless PM folks think otherwise.
> 
> 
>> [...]
>>> Disagree here, I'm CCing Florian for information.
>>>
>>> This notifier preserves RAM so it's *very interesting* if we have
>>> kmsg_dump() for example, but maybe might be also relevant in case kdump
>>> kernel is configured to store something in a persistent RAM (then,
>>> without this notifier, after kdump reboots the system data would be lost).
>>
>> I see. It is actually similar problem as with
>> drivers/firmware/google/gsmi.c.
>>
>> I does similar things like kmsg_dump() so it should be called in
>> the same location (after info notifier list and before kdump).
>>
>> A solution might be to put it at these notifiers at the very
>> end of the "info" list or make extra "dump" notifier list.
> 
> Here I still disagree. I've commented in the other response thread
> (about Google gsmi) about the semantics of the hypervisor list, but
> again: this list should contain callbacks that
> 
> (a) Should run early, _by default_ before a kdump;
> (b) Communicate with the firmware/hypervisor in a "low-risk" way;
> 
> Imagine a scenario where users configure kdump kernel to save something
> in a persistent form in DRAM - it'd be like a late pstore, in the next
> kernel. This callback enables that, it's meant to inform FW "hey, panic
> happened, please from now on don't clear the RAM in the next FW-reboot".
> I don't see a reason to postpone that - let's see if the maintainers
> have an opinion.
> 
> Cheers,
> 
> 
> Guilherme

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4212 bytes --]

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-18 13:16           ` Guilherme G. Piccoli
@ 2022-05-19  7:03             ` Petr Mladek
  2022-05-19 12:07               ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-19  7:03 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: David Gow, Evan Green, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, linux-pm,
	Florian Fainelli, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On Wed 2022-05-18 10:16:20, Guilherme G. Piccoli wrote:
> On 18/05/2022 04:58, Petr Mladek wrote:
> > [...]
> >> I does similar things like kmsg_dump() so it should be called in
> >> the same location (after info notifier list and before kdump).
> >>
> >> A solution might be to put it at these notifiers at the very
> >> end of the "info" list or make extra "dump" notifier list.
> > 
> > I just want to point out that the above idea has problems.
> > Notifiers storing kernel log need to be treated as kmsg_dump().
> > In particular, we would  need to know if there are any.
> > We do not need to call "info" notifier list before kdump
> > when there is no kernel log dumper registered.
> > 
> 
> Notifiers respect the priority concept, which is just a number that
> orders the list addition (and the list is called in order).
> 
> I've used the last position to panic_print() [in patch 25] - one idea
> here is to "reserve" the last position (represented by INT_MIN) for
> notifiers that act like kmsg_dump(). I couldn't find any IIRC, but that
> doesn't prevent us to save this position and comment about that.

I would ignore it for now. If anyone would want to safe the log
then they would need to read it. They will most likely use
the existing kmsg_dump() infastructure. In fact, they should
use it to avoid a code duplication.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-19  7:03             ` Petr Mladek
@ 2022-05-19 12:07               ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-19 12:07 UTC (permalink / raw)
  To: Petr Mladek
  Cc: David Gow, Evan Green, Julius Werner, Scott Branden,
	bcm-kernel-feedback-list, Sebastian Reichel, linux-pm,
	Florian Fainelli, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On 19/05/2022 04:03, Petr Mladek wrote:
> [...]
> I would ignore it for now. If anyone would want to safe the log
> then they would need to read it. They will most likely use
> the existing kmsg_dump() infastructure. In fact, they should
> use it to avoid a code duplication.
> 
> Best Regards,
> Petr

Cool, thanks! I agree, let's expect people use kmsg_dump() as they should =)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-18 22:17           ` Scott Branden
@ 2022-05-19 12:19             ` Guilherme G. Piccoli
  2022-05-19 19:20               ` Scott Branden
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-19 12:19 UTC (permalink / raw)
  To: Scott Branden, Petr Mladek, Sebastian Reichel, Florian Fainelli,
	Desmond yan
  Cc: David Gow, Evan Green, Julius Werner, bcm-kernel-feedback-list,
	linux-pm, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On 18/05/2022 19:17, Scott Branden wrote:
> Hi Guilherme,
> 
> +Desmond
> [...] 
>>>> I'm afraid it breaks kdump if this device is not reset beforehand - it's
>>>> a doorbell write, so not high risk I think...
>>>>
>>>> But in case the not-reset device can be probed normally in kdump kernel,
>>>> then I'm fine in moving this to the reboot list! I don't have the HW to
>>>> test myself.
>>>
>>> Good question. Well, it if has to be called before kdump then
>>> even "hypervisor" list is a wrong place because is not always
>>> called before kdump.
>> [...]
> We register to the panic notifier so that we can kill the VK card ASAP
> to stop DMAing things over to the host side.  If it is not notified then
> memory may not be frozen when kdump is occurring.
> Notifying the card on panic is also needed to allow for any type of 
> reset to occur.
> 
> So, the only thing preventing moving the notifier later is the chance
> that memory is modified while kdump is occurring.  Or, if DMA is 
> disabled before kdump already then this wouldn't be an issue and the 
> notification to the card (to allow for clean resets) can be done later.

Hi Scott / Desmond, thanks for the detailed answer! Is this adapter
designed to run in x86 only or you have other architectures' use cases?

I'm not expert on that, but I guess whether DMA is "kept" or not depends
a bit if IOMMU is used. IIRC, there was a copy of the DMAR table in
kdump (at least for Intel IOMMU). Also, devices are not properly
quiesced on kdump IIUC, we don't call shutdown/reset handlers, they're
skip due to the crash nature - so there is a risk of devices doing bad
things in the new kernel.

With that said, and given this is a lightweight notifier that ideally
should run ASAP, I'd keep this one in the hypervisor list. We can
"adjust" the semantic of this list to include lightweight notifiers that
reset adapters.

With that said, Petr has a point - not always such list is going to be
called before kdump. So, that makes me think in another idea: what if we
have another list, but not on panic path, but instead in the custom
crash_shutdown()? Drivers could add callbacks there that must execute
before kexec/kdump, no matter what.

Let me know your thoughts Scott / Desmond / Petr and all interested parties.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-19 12:19             ` Guilherme G. Piccoli
@ 2022-05-19 19:20               ` Scott Branden
  2022-05-23 14:56                 ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Scott Branden @ 2022-05-19 19:20 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Petr Mladek, Sebastian Reichel,
	Florian Fainelli, Desmond yan
  Cc: David Gow, Evan Green, Julius Werner, bcm-kernel-feedback-list,
	linux-pm, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

[-- Attachment #1: Type: text/plain, Size: 2895 bytes --]



On 2022-05-19 05:19, Guilherme G. Piccoli wrote:
> On 18/05/2022 19:17, Scott Branden wrote:
>> Hi Guilherme,
>>
>> +Desmond
>> [...]
>>>>> I'm afraid it breaks kdump if this device is not reset beforehand - it's
>>>>> a doorbell write, so not high risk I think...
>>>>>
>>>>> But in case the not-reset device can be probed normally in kdump kernel,
>>>>> then I'm fine in moving this to the reboot list! I don't have the HW to
>>>>> test myself.
>>>>
>>>> Good question. Well, it if has to be called before kdump then
>>>> even "hypervisor" list is a wrong place because is not always
>>>> called before kdump.
>>> [...]
>> We register to the panic notifier so that we can kill the VK card ASAP
>> to stop DMAing things over to the host side.  If it is not notified then
>> memory may not be frozen when kdump is occurring.
>> Notifying the card on panic is also needed to allow for any type of
>> reset to occur.
>>
>> So, the only thing preventing moving the notifier later is the chance
>> that memory is modified while kdump is occurring.  Or, if DMA is
>> disabled before kdump already then this wouldn't be an issue and the
>> notification to the card (to allow for clean resets) can be done later.
> 
> Hi Scott / Desmond, thanks for the detailed answer! Is this adapter
> designed to run in x86 only or you have other architectures' use cases?
The adapter may be used in any PCIe design that supports DMA.
So it may be possible to run in arm64 servers.
> 
> I'm not expert on that, but I guess whether DMA is "kept" or not depends
> a bit if IOMMU is used. IIRC, there was a copy of the DMAR table in
> kdump (at least for Intel IOMMU). Also, devices are not properly
> quiesced on kdump IIUC, we don't call shutdown/reset handlers, they're
> skip due to the crash nature - so there is a risk of devices doing bad
> things in the new kernel.
> 
> With that said, and given this is a lightweight notifier that ideally
> should run ASAP, I'd keep this one in the hypervisor list. We can
> "adjust" the semantic of this list to include lightweight notifiers that
> reset adapters.
Sounds the best to keep system operating as tested today.
> 
> With that said, Petr has a point - not always such list is going to be
> called before kdump. So, that makes me think in another idea: what if we
> have another list, but not on panic path, but instead in the custom
> crash_shutdown()? Drivers could add callbacks there that must execute
> before kexec/kdump, no matter what.
It may be beneficial for some other drivers but for our use we would 
then need to register for the panic path and the crash_shutdown path. 
We notify the VK card for 2 purposes: one to stop DMA so memory stop 
changing during a kdump.  And also to get the card into a good state so 
resets happen cleanly.
> 
> Let me know your thoughts Scott / Desmond / Petr and all interested parties.
> Cheers,
> 
> 
> Guilherme

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4212 bytes --]

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-15 22:47     ` Guilherme G. Piccoli
  2022-05-16 10:21       ` Petr Mladek
@ 2022-05-19 23:45       ` Baoquan He
  2022-05-20 11:23         ` Guilherme G. Piccoli
  1 sibling, 1 reply; 183+ messages in thread
From: Baoquan He @ 2022-05-19 23:45 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Petr Mladek
  Cc: michael Kelley (LINUX),
	Dave Young, d.hatayama, akpm, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, dave.hansen, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets,
	will

On 05/15/22 at 07:47pm, Guilherme G. Piccoli wrote:
> On 12/05/2022 11:03, Petr Mladek wrote:
...... 
> > OK, the question is how to make it better. Let's start with
> > a clear picture of the problem:
> > 
> > 1. panic() has basically two funtions:
> > 
> >       + show/store debug information (optional ways and amount)
> >       + do something with the system (reboot, stay hanged)
> > 
> > 
> > 2. There are 4 ways how to show/store the information:
> > 
> >       + tell hypervisor to store what it is interested about
> >       + crash_dump
> >       + kmsg_dump()
> >       + consoles
> > 
> >   , where crash_dump and consoles are special:
> > 
> >      + crash_dump does not return. Instead it ends up with reboot.
> > 
> >      + Consoles work transparently. They just need an extra flush
> >        before reboot or staying hanged.
> > 
> > 
> > 3. The various notifiers do things like:
> > 
> >      + tell hypervisor about the crash
> >      + print more information (also stop watchdogs)
> >      + prepare system for reboot (touch some interfaces)
> >      + prepare system for staying hanged (blinking)
> > 
> >    Note that it pretty nicely matches the 4 notifier lists.
> > 
> 
> I really appreciate the summary skill you have, to convert complex
> problems in very clear and concise ideas. Thanks for that, very useful!
> I agree with what was summarized above.

I want to say the similar words to Petr's reviewing comment when I went
through the patches and traced each reviewing sub-thread to try to
catch up. Petr has reivewed this series so carefully and given many
comments I want to ack immediately.

I agree with most of the suggestions from Petr to this patch, except of
one tiny concern, please see below inline comment.

> 
> 
> > Now, we need to decide about the ordering. The main area is how
> > to store the debug information. Consoles are transparent so
> > the quesition is about:
> > 
> >      + hypervisor
> >      + crash_dump
> >      + kmsg_dump
> > 
> > Some people need none and some people want all. There is a
> > risk that system might hung at any stage. This why people want to
> > make the order configurable.
> > 
> > But crash_dump() does not return when it succeeds. And kmsg_dump()
> > users havn't complained about hypervisor problems yet. So, that
> > two variants might be enough:
> > 
> >     + crash_dump (hypervisor, kmsg_dump as fallback)
> >     + hypervisor, kmsg_dump, crash_dump
> > 
> > One option "panic_prefer_crash_dump" should be enough.
> > And the code might look like:
> > 
> > void panic()
> > {
> > [...]
> > 	dump_stack();
> > 	kgdb_panic(buf);
> > 
> > 	< ---  here starts the reworked code --- >
> > 
> > 	/* crash dump is enough when enabled and preferred. */
> > 	if (panic_prefer_crash_dump)
> > 		__crash_kexec(NULL);

I like the proposed skeleton of panic() and code style suggested by
Petr very much. About panic_prefer_crash_dump which might need be added,
I hope it has a default value true. This makes crash_dump execute at
first by default just as before, unless people specify
panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
this is inconsistent with the old behaviour.

> > 
> > 	/* Stop other CPUs and focus on handling the panic state. */
> > 	if (has_kexec_crash_image)
> > 		crash_smp_send_stop();
> > 	else
> > 		smp_send_stop()
> > 
> 
> Here we have a very important point. Why do we need 2 variants of SMP
> CPU stopping functions? I disagree with that - my understanding of this
> after some study in architectures is that the crash_() variant is
> "stronger", should work in all cases and if not, we should fix that -
> that'd be a bug.
> 
> Such variant either maps to smp_send_stop() (in various architectures,
> including XEN/x86) or overrides the basic function with more proper
> handling for panic() case...I don't see why we still need such
> distinction, if you / others have some insight about that, I'd like to
> hear =)
> 
> 
> > 	/* Notify hypervisor about the system panic. */
> > 	atomic_notifier_call_chain(&panic_hypervisor_list, 0, NULL);
> > 
> > 	/*
> > 	 * No need to risk extra info when there is no kmsg dumper
> > 	 * registered.
> > 	 */
> > 	if (!has_kmsg_dumper())
> > 		__crash_kexec(NULL);
> > 
> > 	/* Add extra info from different subsystems. */
> > 	atomic_notifier_call_chain(&panic_info_list, 0, NULL);
> > 
> > 	kmsg_dump(KMSG_DUMP_PANIC);
> > 	__crash_kexec(NULL);
> > 
> > 	/* Flush console */
> > 	unblank_screen();
> > 	console_unblank();
> > 	debug_locks_off();
> > 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > 
> > 	if (panic_timeout > 0) {
> > 		delay()
> > 	}
> > 
> > 	/*
> > 	 * Prepare system for eventual reboot and allow custom
> > 	 * reboot handling.
> > 	 */
> > 	atomic_notifier_call_chain(&panic_reboot_list, 0, NULL);
> 
> You had the order of panic_reboot_list VS. consoles flushing inverted.
> It might make sense, although I didn't do that in V1...
> Are you OK in having a helper for console flushing, as I did in V1? It
> makes code of panic() a bit less polluted / more focused I feel.
> 
> 
> > 
> > 	if (panic_timeout != 0) {
> > 		reboot();
> > 	}
> > 
> > 	/*
> > 	 * Prepare system for the infinite waiting, for example,
> > 	 * setup blinking.
> > 	 */
> > 	atomic_notifier_call_chain(&panic_loop_list, 0, NULL);
> > 
> > 	infinite_loop();
> > }
> > 
> > 
> > __crash_kexec() is there 3 times but otherwise the code looks
> > quite straight forward.
> > 
> > Note 1: I renamed the two last notifier list. The name 'post-reboot'
> > 	did sound strange from the logical POV ;-)
> > 
> > Note 2: We have to avoid the possibility to call "reboot" list
> > 	before kmsg_dump(). All callbacks providing info
> > 	have to be in the info list. It a callback combines
> > 	info and reboot functionality then it should be split.
> > 
> > 	There must be another way to calm down problematic
> > 	info callbacks. And it has to be solved when such
> > 	a problem is reported. Is there any known issue, please?
> > 
> > It is possible that I have missed something important.
> > But I would really like to make the logic as simple as possible.
> 
> OK, I agree with you! It's indeed simpler and if others agree, I can
> happily change the logic to what you proposed. Although...currently the
> "crash_kexec_post_notifiers" allows to call _all_ panic_reboot_list
> callbacks _before kdump_.
> 
> We need to mention this change in the commit messages, but I really
> would like to hear the opinions of heavy users of notifiers (as
> Michael/Hyper-V) and the kdump interested parties (like Baoquan / Dave
> Young / Hayatama). If we all agree on such approach, will change that
> for V2 =)
> 
> Thanks again Petr, for the time spent in such detailed review!
> Cheers,
> 
> 
> Guilherme
> 


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-19 23:45       ` Baoquan He
@ 2022-05-20 11:23         ` Guilherme G. Piccoli
  2022-05-24  8:01           ` Petr Mladek
  2022-05-24  8:32           ` Baoquan He
  0 siblings, 2 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-20 11:23 UTC (permalink / raw)
  To: Baoquan He, Petr Mladek
  Cc: michael Kelley (LINUX),
	Dave Young, d.hatayama, akpm, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, dave.hansen, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets,
	will

On 19/05/2022 20:45, Baoquan He wrote:
> [...]
>> I really appreciate the summary skill you have, to convert complex
>> problems in very clear and concise ideas. Thanks for that, very useful!
>> I agree with what was summarized above.
> 
> I want to say the similar words to Petr's reviewing comment when I went
> through the patches and traced each reviewing sub-thread to try to
> catch up. Petr has reivewed this series so carefully and given many
> comments I want to ack immediately.
> 
> I agree with most of the suggestions from Petr to this patch, except of
> one tiny concern, please see below inline comment.

Hi Baoquan, thanks! I'm glad you're also reviewing that =)


> [...]
> 
> I like the proposed skeleton of panic() and code style suggested by
> Petr very much. About panic_prefer_crash_dump which might need be added,
> I hope it has a default value true. This makes crash_dump execute at
> first by default just as before, unless people specify
> panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> this is inconsistent with the old behaviour.

I'd like to understand better why the crash_kexec() must always be the
first thing in your use case. If we keep that behavior, we'll see all
sorts of workarounds - see the last patches of this series, Hyper-V and
PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
execution of their relevant notifiers (like the vmbus disconnect,
specially in arm64 that has no custom machine_crash_shutdown, or the
fadump case in ppc). This led to more risk in kdump.

The thing is: with the notifiers' split, we tried to keep only the most
relevant/necessary stuff in this first list, things that ultimately
should improve kdump reliability or if not, at least not break it. My
feeling is that, with this series, we should change the idea/concept
that kdump must run first nevertheless, not matter what. We're here
trying to accommodate the antagonistic goals of hypervisors that need
some clean-up (even for kdump to work) VS. kdump users, that wish a
"pristine" system reboot ASAP after the crash.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-19 19:20               ` Scott Branden
@ 2022-05-23 14:56                 ` Guilherme G. Piccoli
  2022-05-24  8:04                   ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-23 14:56 UTC (permalink / raw)
  To: Scott Branden, Petr Mladek, Sebastian Reichel, Florian Fainelli,
	Desmond yan
  Cc: David Gow, Evan Green, Julius Werner, bcm-kernel-feedback-list,
	linux-pm, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On 19/05/2022 16:20, Scott Branden wrote:
> [...] 
>> Hi Scott / Desmond, thanks for the detailed answer! Is this adapter
>> designed to run in x86 only or you have other architectures' use cases?
> The adapter may be used in any PCIe design that supports DMA.
> So it may be possible to run in arm64 servers.
>>
>> [...]
>> With that said, and given this is a lightweight notifier that ideally
>> should run ASAP, I'd keep this one in the hypervisor list. We can
>> "adjust" the semantic of this list to include lightweight notifiers that
>> reset adapters.
> Sounds the best to keep system operating as tested today.
>>
>> With that said, Petr has a point - not always such list is going to be
>> called before kdump. So, that makes me think in another idea: what if we
>> have another list, but not on panic path, but instead in the custom
>> crash_shutdown()? Drivers could add callbacks there that must execute
>> before kexec/kdump, no matter what.
> It may be beneficial for some other drivers but for our use we would 
> then need to register for the panic path and the crash_shutdown path. 
> We notify the VK card for 2 purposes: one to stop DMA so memory stop 
> changing during a kdump.  And also to get the card into a good state so 
> resets happen cleanly.

Thanks Scott! With that, I guess it's really better to keep this
notifier in this hypervisor/early list - I'm planning to do that for V2.
Unless Petr or somebody has strong feelings against that, of course.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path
  2022-04-28 16:55   ` Helge Deller
  2022-04-29 14:34     ` Guilherme G. Piccoli
@ 2022-05-23 20:40     ` Guilherme G. Piccoli
  2022-05-23 21:31       ` Helge Deller
  1 sibling, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-23 20:40 UTC (permalink / raw)
  To: Helge Deller
  Cc: linux-kernel, kexec, pmladek, bhe, akpm,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, James E.J. Bottomley

On 28/04/2022 13:55, Helge Deller wrote:
> [...]
> You may add:
> Acked-by: Helge Deller <deller@gmx.de> # parisc
> 
> Helge

Hi Helge, do you think would be possible to still pick this one for
v5.19 or do you prefer to hold for the next release?

I'm working on V2, so if it's merged for 5.19 I won't send it again.
Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path
  2022-05-23 20:40     ` Guilherme G. Piccoli
@ 2022-05-23 21:31       ` Helge Deller
  2022-05-23 21:55         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Helge Deller @ 2022-05-23 21:31 UTC (permalink / raw)
  To: Guilherme G. Piccoli; +Cc: linux-kernel, linux-parisc

Hello Guilherme,

On 5/23/22 22:40, Guilherme G. Piccoli wrote:
> On 28/04/2022 13:55, Helge Deller wrote:
>> [...]
>> You may add:
>> Acked-by: Helge Deller <deller@gmx.de> # parisc
>
> Hi Helge, do you think would be possible to still pick this one for
> v5.19 or do you prefer to hold for the next release?

Actually, I'd prefer if you push this patch together with the whole
series upstream at once. The patch itself makes not much sense without
your series...

> I'm working on V2, so if it's merged for 5.19 I won't send it again.

Helge

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path
  2022-05-23 21:31       ` Helge Deller
@ 2022-05-23 21:55         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-23 21:55 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-kernel, linux-parisc

On 23/05/2022 18:31, Helge Deller wrote:
> Hello Guilherme,
> 
> On 5/23/22 22:40, Guilherme G. Piccoli wrote:
>> On 28/04/2022 13:55, Helge Deller wrote:
>>> [...]
>>> You may add:
>>> Acked-by: Helge Deller <deller@gmx.de> # parisc
>>
>> Hi Helge, do you think would be possible to still pick this one for
>> v5.19 or do you prefer to hold for the next release?
> 
> Actually, I'd prefer if you push this patch together with the whole
> series upstream at once. The patch itself makes not much sense without
> your series...
> 
>> I'm working on V2, so if it's merged for 5.19 I won't send it again.
> 
> Helge

Sure Helge, I guess I can do that - will resubmit for V2.

But notice the patch is self-contained, as it fixes a current issue in
the code - the risk for a lockup due to spinlock taking on atomic
context. It doesn't require the panic refactor to be merged in order to
achieve its goal...

I agree that such issue is rare to trigger though, so definitely no
hurry is needed =)

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-20 11:23         ` Guilherme G. Piccoli
@ 2022-05-24  8:01           ` Petr Mladek
  2022-05-24 10:18             ` Baoquan He
  2022-05-24  8:32           ` Baoquan He
  1 sibling, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-05-24  8:01 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Baoquan He, michael Kelley (LINUX),
	Dave Young, d.hatayama, akpm, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, dave.hansen, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets,
	will

On Fri 2022-05-20 08:23:33, Guilherme G. Piccoli wrote:
> On 19/05/2022 20:45, Baoquan He wrote:
> > [...]
> >> I really appreciate the summary skill you have, to convert complex
> >> problems in very clear and concise ideas. Thanks for that, very useful!
> >> I agree with what was summarized above.
> > 
> > I want to say the similar words to Petr's reviewing comment when I went
> > through the patches and traced each reviewing sub-thread to try to
> > catch up. Petr has reivewed this series so carefully and given many
> > comments I want to ack immediately.
> > 
> > I agree with most of the suggestions from Petr to this patch, except of
> > one tiny concern, please see below inline comment.
> 
> Hi Baoquan, thanks! I'm glad you're also reviewing that =)
> 
> 
> > [...]
> > 
> > I like the proposed skeleton of panic() and code style suggested by
> > Petr very much. About panic_prefer_crash_dump which might need be added,
> > I hope it has a default value true. This makes crash_dump execute at
> > first by default just as before, unless people specify
> > panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> > panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> > this is inconsistent with the old behaviour.
> 
> I'd like to understand better why the crash_kexec() must always be the
> first thing in your use case. If we keep that behavior, we'll see all
> sorts of workarounds - see the last patches of this series, Hyper-V and
> PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
> execution of their relevant notifiers (like the vmbus disconnect,
> specially in arm64 that has no custom machine_crash_shutdown, or the
> fadump case in ppc). This led to more risk in kdump.
> 
> The thing is: with the notifiers' split, we tried to keep only the most
> relevant/necessary stuff in this first list, things that ultimately
> should improve kdump reliability or if not, at least not break it. My
> feeling is that, with this series, we should change the idea/concept
> that kdump must run first nevertheless, not matter what. We're here
> trying to accommodate the antagonistic goals of hypervisors that need
> some clean-up (even for kdump to work) VS. kdump users, that wish a
> "pristine" system reboot ASAP after the crash.

Good question. I wonder if Baoquan knows about problems caused by the
particular notifiers that will end up in the hypervisor list. Note
that there will be some shuffles and the list will be slightly
different in V2.

Anyway, I see four possible solutions:

  1. The most conservative approach is to keep the current behavior
     and call kdump first by default.

  2. A medium conservative approach to change the default default
     behavior and call hypervisor and eventually the info notifiers
     before kdump. There still would be the possibility to call kdump
     first by the command line parameter.

  3. Remove the possibility to call kdump first completely. It would
     assume that all the notifiers in the info list are super safe
     or that they make kdump actually more safe.

  4. Create one more notifier list for operations that always should
     be called before crash_dump.

Regarding the extra notifier list (4th solution). It is not clear to
me whether it would be always called even before hypervisor list or
when kdump is not enabled. We must not over-engineer it.

2nd proposal looks like a good compromise. But maybe we could do
this change few releases later. The notifiers split is a big
change on its own.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
  2022-05-23 14:56                 ` Guilherme G. Piccoli
@ 2022-05-24  8:04                   ` Petr Mladek
  0 siblings, 0 replies; 183+ messages in thread
From: Petr Mladek @ 2022-05-24  8:04 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Scott Branden, Sebastian Reichel, Florian Fainelli, Desmond yan,
	David Gow, Evan Green, Julius Werner, bcm-kernel-feedback-list,
	linux-pm, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi

On Mon 2022-05-23 11:56:12, Guilherme G. Piccoli wrote:
> On 19/05/2022 16:20, Scott Branden wrote:
> > [...] 
> >> Hi Scott / Desmond, thanks for the detailed answer! Is this adapter
> >> designed to run in x86 only or you have other architectures' use cases?
> > The adapter may be used in any PCIe design that supports DMA.
> > So it may be possible to run in arm64 servers.
> >>
> >> [...]
> >> With that said, and given this is a lightweight notifier that ideally
> >> should run ASAP, I'd keep this one in the hypervisor list. We can
> >> "adjust" the semantic of this list to include lightweight notifiers that
> >> reset adapters.
> > Sounds the best to keep system operating as tested today.
> >>
> >> With that said, Petr has a point - not always such list is going to be
> >> called before kdump. So, that makes me think in another idea: what if we
> >> have another list, but not on panic path, but instead in the custom
> >> crash_shutdown()? Drivers could add callbacks there that must execute
> >> before kexec/kdump, no matter what.
> > It may be beneficial for some other drivers but for our use we would 
> > then need to register for the panic path and the crash_shutdown path. 
> > We notify the VK card for 2 purposes: one to stop DMA so memory stop 
> > changing during a kdump.  And also to get the card into a good state so 
> > resets happen cleanly.
> 
> Thanks Scott! With that, I guess it's really better to keep this
> notifier in this hypervisor/early list - I'm planning to do that for V2.
> Unless Petr or somebody has strong feelings against that, of course.

I am fine with it because we do not have a better solution at the
moment.

It might be a good candidate for the 5th notifier list mentioned
in the thread https://lore.kernel.org/r/YoyQyHHfhIIXSX0U@alley .
But I am not sure if the 5th list is worth the complexity.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-20 11:23         ` Guilherme G. Piccoli
  2022-05-24  8:01           ` Petr Mladek
@ 2022-05-24  8:32           ` Baoquan He
  1 sibling, 0 replies; 183+ messages in thread
From: Baoquan He @ 2022-05-24  8:32 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Petr Mladek, michael Kelley (LINUX),
	Dave Young, d.hatayama, akpm, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, dave.hansen, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets,
	will

On 05/20/22 at 08:23am, Guilherme G. Piccoli wrote:
> On 19/05/2022 20:45, Baoquan He wrote:
> > [...]
> >> I really appreciate the summary skill you have, to convert complex
> >> problems in very clear and concise ideas. Thanks for that, very useful!
> >> I agree with what was summarized above.
> > 
> > I want to say the similar words to Petr's reviewing comment when I went
> > through the patches and traced each reviewing sub-thread to try to
> > catch up. Petr has reivewed this series so carefully and given many
> > comments I want to ack immediately.
> > 
> > I agree with most of the suggestions from Petr to this patch, except of
> > one tiny concern, please see below inline comment.
> 
> Hi Baoquan, thanks! I'm glad you're also reviewing that =)
> 
> 
> > [...]
> > 
> > I like the proposed skeleton of panic() and code style suggested by
> > Petr very much. About panic_prefer_crash_dump which might need be added,
> > I hope it has a default value true. This makes crash_dump execute at
> > first by default just as before, unless people specify
> > panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> > panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> > this is inconsistent with the old behaviour.
> 
> I'd like to understand better why the crash_kexec() must always be the
> first thing in your use case. If we keep that behavior, we'll see all
> sorts of workarounds - see the last patches of this series, Hyper-V and
> PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
> execution of their relevant notifiers (like the vmbus disconnect,
> specially in arm64 that has no custom machine_crash_shutdown, or the
> fadump case in ppc). This led to more risk in kdump.

Firstly, kdump is not always the first thing. In any use case, if kdump
kernel is not loaded, it's not the first thing at all. Not to mention
if crash_kexec_post_notifiers is specified.

if kdump kernel is loaded, kdump has been executing firslty, since it
was added into kenrel/panic(); Until 2014, Masa added crash_kexec_post_notifiers
kernel parameter to make panic notifiers be able to execute before kdump
if specified.

	commit dc009d92435f99498cbc579ce76bf28e837e2c14
	Author: Eric W. Biederman <ebiederm@xmission.com>
	Date:   Sat Jun 25 14:57:52 2005 -0700
	
	    [PATCH] kexec: add kexec syscalls
 
	commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45
	Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
	Date:   Fri Jun 6 14:37:07 2014 -0700
	
	    kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers

Changing this will cause regression. During these years, nobody ever doubt
kdump should execute firstly if crashkernel is reserved and kdump kernel is
loaded. That's not saying we can't change
this, but need a convincing justification.

Secondly, even with the notifiers' split, we can't guarantee people will
absolutely add notifiers into right list in the future. Letting kdump
execute behind lists by default will put kdump into risk.

For example, you replied to Hatamata saying you have been working with
kdump in the last 3, 4 years, and you have have been working on these
panic notifiers refactoring issue in the recent months. However, in your
refactoring patches of introducing hypervisor/info/pre-reboot, I noticed
you acked the suggestion from Petr that several notifiers need be moved to
correct position. So even you can't make sure these, how can other people
be able to recognize which list should be 100% appropriate when they try
to register one notifier for their sub-component?

At last, I am wondering why fadump matters. I don't know in which case
people wants to load kdump kernel, but expect to trigger crash fadump.
Power people need consider this carefully and makes some change. Fadump
just borrows the crashkernel reservation mechanism. If fadump would rather
take risk to run all panic notifiers, whether fadump really needs them
or not, then execute crash_fadump(), that's powerpc's business.

As for Hyper-V, if it enforces to terminate VMbus connection, no matter
it's kdump or not, why not taking it out of panic notifiers list and
execute it before kdump unconditionally. Below is abstracted from
Michael's words.

https://lore.kernel.org/all/MWHPR21MB15933573F5C81C5250BF6A1CD75E9@MWHPR21MB1593.namprd21.prod.outlook.com/T/#u
=======
I looked at the code again, and should revise my previous comments
somewhat.   The Hyper-V resets that I described indeed must be done
prior to kexec'ing the kdump kernel.   Most such resets are actually
done via __crash_kexec() -> machine_crash_shutdown(), not via the
panic notifier. However, the Hyper-V panic notifier must terminate the
VMbus connection, because that must be done even if kdump is not
being invoked.  See commit 74347a99e73.
=======

> 
> The thing is: with the notifiers' split, we tried to keep only the most
> relevant/necessary stuff in this first list, things that ultimately
> should improve kdump reliability or if not, at least not break it. My
> feeling is that, with this series, we should change the idea/concept
> that kdump must run first nevertheless, not matter what. We're here
> trying to accommodate the antagonistic goals of hypervisors that need
> some clean-up (even for kdump to work) VS. kdump users, that wish a
> "pristine" system reboot ASAP after the crash.
> 
> Cheers,
> 
> 
> Guilherme
> 


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-24  8:01           ` Petr Mladek
@ 2022-05-24 10:18             ` Baoquan He
  0 siblings, 0 replies; 183+ messages in thread
From: Baoquan He @ 2022-05-24 10:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Guilherme G. Piccoli, michael Kelley (LINUX),
	Dave Young, d.hatayama, akpm, kexec, linux-kernel,
	bcm-kernel-feedback-list, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, dave.hansen, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets,
	will

On 05/24/22 at 10:01am, Petr Mladek wrote:
> On Fri 2022-05-20 08:23:33, Guilherme G. Piccoli wrote:
> > On 19/05/2022 20:45, Baoquan He wrote:
> > > [...]
> > >> I really appreciate the summary skill you have, to convert complex
> > >> problems in very clear and concise ideas. Thanks for that, very useful!
> > >> I agree with what was summarized above.
> > > 
> > > I want to say the similar words to Petr's reviewing comment when I went
> > > through the patches and traced each reviewing sub-thread to try to
> > > catch up. Petr has reivewed this series so carefully and given many
> > > comments I want to ack immediately.
> > > 
> > > I agree with most of the suggestions from Petr to this patch, except of
> > > one tiny concern, please see below inline comment.
> > 
> > Hi Baoquan, thanks! I'm glad you're also reviewing that =)
> > 
> > 
> > > [...]
> > > 
> > > I like the proposed skeleton of panic() and code style suggested by
> > > Petr very much. About panic_prefer_crash_dump which might need be added,
> > > I hope it has a default value true. This makes crash_dump execute at
> > > first by default just as before, unless people specify
> > > panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> > > panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> > > this is inconsistent with the old behaviour.
> > 
> > I'd like to understand better why the crash_kexec() must always be the
> > first thing in your use case. If we keep that behavior, we'll see all
> > sorts of workarounds - see the last patches of this series, Hyper-V and
> > PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
> > execution of their relevant notifiers (like the vmbus disconnect,
> > specially in arm64 that has no custom machine_crash_shutdown, or the
> > fadump case in ppc). This led to more risk in kdump.
> > 
> > The thing is: with the notifiers' split, we tried to keep only the most
> > relevant/necessary stuff in this first list, things that ultimately
> > should improve kdump reliability or if not, at least not break it. My
> > feeling is that, with this series, we should change the idea/concept
> > that kdump must run first nevertheless, not matter what. We're here
> > trying to accommodate the antagonistic goals of hypervisors that need
> > some clean-up (even for kdump to work) VS. kdump users, that wish a
> > "pristine" system reboot ASAP after the crash.
> 
> Good question. I wonder if Baoquan knows about problems caused by the
> particular notifiers that will end up in the hypervisor list. Note
> that there will be some shuffles and the list will be slightly
> different in V2.

Yes, I knew some of them. Please check my response to Guilherme.

We have bug to track the issue on Hyper-V in which failure happened
during panic notifiers running, haven't come to kdump. Seems both of
us sent mail replying to Guilherme at the same time. 

> 
> Anyway, I see four possible solutions:
> 
>   1. The most conservative approach is to keep the current behavior
>      and call kdump first by default.
> 
>   2. A medium conservative approach to change the default default
>      behavior and call hypervisor and eventually the info notifiers
>      before kdump. There still would be the possibility to call kdump
>      first by the command line parameter.
> 
>   3. Remove the possibility to call kdump first completely. It would
>      assume that all the notifiers in the info list are super safe
>      or that they make kdump actually more safe.
> 
>   4. Create one more notifier list for operations that always should
>      be called before crash_dump.

I would vote for 1 or 4 without any hesitation, and prefer 4. I ever
suggest the variant of solution 4 in v1 reviewing. That's taking those
notifiers out of list and enforcing to execute them before kdump. E.g
the one on HyperV to terminate VMbus connection. Maybe solution 4 is
better to provide a determinate way for people to add necessary code
at the earliest part.

> 
> Regarding the extra notifier list (4th solution). It is not clear to
> me whether it would be always called even before hypervisor list or
> when kdump is not enabled. We must not over-engineer it.

One thing I would like to notice is, no matter how perfect we split the
lists this time, we can't gurantee people will add notifiers reasonablly
in the future. And people from different sub-component may not do
sufficient investigation and add them to fulfil their local purpose.

The current panic notifers list is the best example. Hyper-V actually
wants to run some necessary code before kdump, but not all of them, they
just add it, ignoring the original purpose of
crash_kexec_post_notifiers. I guess they do like this just because it's
easy to do, no need to bother changing code in generic place.

Solution 4 can make this no doubt, that's why I like it better.

> 
> 2nd proposal looks like a good compromise. But maybe we could do
> this change few releases later. The notifiers split is a big
> change on its own.

As I replied to Guilherme, solution 2 will cause regression if not
calling kdump firstly. Solution 3 leaves people space to make mistake,
they could add nontifier into wrong list.

I would like to note again that the panic notifiers are optional to run,
while kdump is expectd once loaded, from the original purpose. I guess
people I know will still have this thought, e.g Hatayama, Masa, they are
truly often use panic notifiers like this on their company's system.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-04-27 22:49 ` [PATCH 24/30] panic: Refactor the panic path Guilherme G. Piccoli
                     ` (3 preceding siblings ...)
  2022-05-12 14:03   ` Petr Mladek
@ 2022-05-24 14:44   ` Eric W. Biederman
  2022-05-26 16:25     ` Guilherme G. Piccoli
  4 siblings, 1 reply; 183+ messages in thread
From: Eric W. Biederman @ 2022-05-24 14:44 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel,
	bcm-kernel-feedback-list, coresight, linuxppc-dev, linux-alpha,
	linux-arm-kernel, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will

"Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:

> The panic() function is somewhat convoluted - a lot of changes were
> made over the years, adding comments that might be misleading/outdated
> now, it has a code structure that is a bit complex to follow, with
> lots of conditionals, for example. The panic notifier list is something
> else - a single list, with multiple callbacks of different purposes,
> that run in a non-deterministic order and may affect hardly kdump
> reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.
>
> This patch proposes a major refactor on the panic path based on Petr's
> idea [0] - basically we split the notifiers list in three, having a set
> of different call points in the panic path. Below a list of changes
> proposed in this patch, culminating in the panic notifiers level
> concept:
>
> (a) First of all, we improved comments all over the function
> and removed useless variables / includes. Also, as part of this
> clean-up we concentrate the console flushing functions in a helper.
>
> (b) As mentioned before, there is a split of the panic notifier list
> in three, based on the purpose of the callback. The code contains
> good documentation in form of comments, but a summary of the three
> lists follows:
>
> - the hypervisor list aims low-risk procedures to inform hypervisors
> or firmware about the panic event, also includes LED-related functions;
>
> - the informational list contains callbacks that provide more details,
> like kernel offset or trace dump (if enabled) and also includes the
> callbacks aimed at reducing log pollution or warns, like the RCU and
> hung task disable callbacks;
>
> - finally, the pre_reboot list is the old notifier list renamed,
> containing the more risky callbacks that didn't fit the previous
> lists. There is also a 4th list (the post_reboot one), but it's not
> related with the original list - it contains late time architecture
> callbacks aimed at stopping the machine, for example.
>
> The 3 notifiers lists execute in different moments, hypervisor being
> the first, followed by informational and finally the pre_reboot list.
>
> (c) But then, there is the ordering problem of the notifiers against
> the crash_kernel() call - kdump must be as reliable as possible.
> For that, a simple binary "switch" as "crash_kexec_post_notifiers"
> is not enough, hence we introduce here concept of panic notifier
> levels: there are 5 levels, from 0 (no notifier executes before
> kdump) until 4 (all notifiers run before kdump); the default level
> is 2, in which the hypervisor and (iff we have any kmsg dumper)
> the informational notifiers execute before kdump.
>
> The detailed documentation of the levels is present in code comments
> and in the kernel-parameters.txt file; as an analogy with the previous
> panic() implementation, the level 0 is exactly the same as the old
> behavior of notifiers, running all after kdump, and the level 4 is
> the same as "crash_kexec_post_notifiers=Y" (we kept this parameter as
> a deprecated one).
>
> (d) Finally, an important change made here: we now use only the
> function "crash_smp_send_stop()" to shut all the secondary CPUs
> in the panic path. Before, there was a case of using the regular
> "smp_send_stop()", but the better approach is to simplify the
> code and try to use the function which was created exclusively
> for the panic path. Experiments showed that it works fine, and
> code was very simplified with that.
>
> Functional change is expected from this refactor, since now we
> call some notifiers by default before kdump, but the goal here
> besides code clean-up is to have a better panic path, more
> reliable and deterministic, but also very customizable.
>
> [0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

I am late to this discussion.  My apologies.

Unfortunately I am also very grouchy.

Notifiers before kexec on panic are fundamentally broken.  So please
just remove crash_kexec_post notifiers and be done with it.  Part of the
deep issue is that firmware always has a common and broken
implementation for anything that is not mission critical to
motherboards.

Notifiers in any sense on these paths are just bollocks.  Any kind of
notifier list is fundamentally fragile in the face of memory corruption
and very very difficult to review.

So I am going to refresh my ancient NACK on this.

I can certainly appreciate that there are pieces of the reboot paths
that can be improved.  I don't think making anything more feature full
or flexible is any kind of real improvement.

Eric



>
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>
> Special thanks to Petr and Baoquan for the suggestion and feedback in a previous
> email thread. There's some important design decisions that worth mentioning and
> discussing:
>
> * The default panic notifiers level is 2, based on Petr Mladek's suggestion,
> which makes a lot of sense. Of course, this is customizable through the
> parameter, but would be something worthwhile to have a KConfig option to set
> the default level? It would help distros that want the old behavior
> (no notifiers before kdump) as default.
>
> * The implementation choice was to _avoid_ intricate if conditionals in the
> panic path, which would _definitely_ be present with the panic notifiers levels
> idea; so, instead of lots of if conditionals, the set/clear bits approach with
> functions called in 2 points (but executing only in one of them) is much easier
> to follow an was used here; the ordering helper function and the comments also
> help a lot to avoid confusion (hopefully).
>
> * Choice was to *always* use crash_smp_send_stop() instead of sometimes making
> use of the regular smp_send_stop(); for most architectures they are the same,
> including Xen (on x86). For the ones that override it, all should work fine,
> in the powerpc case it's even more correct (see the subsequent patch
> "powerpc: Do not force all panic notifiers to execute before kdump")
>
> There seems to be 2 cases that requires some plumbing to work 100% right:
> - ARM doesn't disable local interrupts / FIQs in the crash version of
> send_stop(); we patched that early in this series;
> - x86 could face an issue if we have VMX and do use crash_smp_send_stop()
> _without_ kdump, but this is fixed in the first patch of the series (and
> it's a bug present even before this refactor).
>
> * Notice we didn't add a sysrq for panic notifiers level - should have it?
> Alejandro proposed recently to add a sysrq for "crash_kexec_post_notifiers",
> let me know if you feel the need here Alejandro, since the core parameters are
> present in /sys, I didn't consider much gain in having a sysrq, but of course
> I'm open to suggestions!
>
> Thanks advance for the review!
>
>  .../admin-guide/kernel-parameters.txt         |  42 ++-
>  include/linux/panic_notifier.h                |   1 +
>  kernel/kexec_core.c                           |   8 +-
>  kernel/panic.c                                | 292 +++++++++++++-----
>  .../selftests/pstore/pstore_crash_test        |   5 +-
>  5 files changed, 252 insertions(+), 96 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 3f1cc5e317ed..8d3524060ce3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -829,6 +829,13 @@
>  			It will be ignored when crashkernel=X,high is not used
>  			or memory reserved is below 4G.
>  
> +	crash_kexec_post_notifiers
> +			This was DEPRECATED - users should always prefer the
> +			parameter "panic_notifiers_level" - check its entry
> +			in this documentation for details on how it works.
> +			Setting this parameter is exactly the same as setting
> +			"panic_notifiers_level=4".
> +
>  	cryptomgr.notests
>  			[KNL] Disable crypto self-tests
>  
> @@ -3784,6 +3791,33 @@
>  			timeout < 0: reboot immediately
>  			Format: <timeout>
>  
> +	panic_notifiers_level=
> +			[KNL] Set the panic notifiers execution order.
> +			Format: <unsigned int>
> +			We currently have 4 lists of panic notifiers; based
> +			on the functionality and risk (for panic success) the
> +			callbacks are added in a given list. The lists are:
> +			- hypervisor/FW notification list (low risk);
> +			- informational list (low/medium risk);
> +			- pre_reboot list (higher risk);
> +			- post_reboot list (only run late in panic and after
> +			kdump, not configurable for now).
> +			This parameter defines the ordering of the first 3
> +			lists with regards to kdump; the levels determine
> +			which set of notifiers execute before kdump. The
> +			accepted levels are:
> +			0: kdump is the first thing to run, NO list is
> +			executed before kdump.
> +			1: only the hypervisor list is executed before kdump.
> +			2 (default level): the hypervisor list and (*if*
> +			there's any kmsg_dumper defined) the informational
> +			list are executed before kdump.
> +			3: both the hypervisor and the informational lists
> +			(always) execute before kdump.
> +			4: the 3 lists (hypervisor, info and pre_reboot)
> +			execute before kdump - this behavior is analog to the
> +			deprecated parameter "crash_kexec_post_notifiers".
> +
>  	panic_print=	Bitmask for printing system info when panic happens.
>  			User can chose combination of the following bits:
>  			bit 0: print all tasks info
> @@ -3814,14 +3848,6 @@
>  	panic_on_warn	panic() instead of WARN().  Useful to cause kdump
>  			on a WARN().
>  
> -	crash_kexec_post_notifiers
> -			Run kdump after running panic-notifiers and dumping
> -			kmsg. This only for the users who doubt kdump always
> -			succeeds in any situation.
> -			Note that this also increases risks of kdump failure,
> -			because some panic notifiers can make the crashed
> -			kernel more unstable.
> -
>  	parkbd.port=	[HW] Parallel port number the keyboard adapter is
>  			connected to, default is 0.
>  			Format: <parport#>
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index bcf6a5ea9d7f..b5041132321d 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -10,6 +10,7 @@ extern struct atomic_notifier_head panic_info_list;
>  extern struct atomic_notifier_head panic_pre_reboot_list;
>  extern struct atomic_notifier_head panic_post_reboot_list;
>  
> +bool panic_notifiers_before_kdump(void);
>  extern bool crash_kexec_post_notifiers;
>  
>  enum panic_notifier_val {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 68480f731192..f8906db8ca72 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -74,11 +74,11 @@ struct resource crashk_low_res = {
>  int kexec_should_crash(struct task_struct *p)
>  {
>  	/*
> -	 * If crash_kexec_post_notifiers is enabled, don't run
> -	 * crash_kexec() here yet, which must be run after panic
> -	 * notifiers in panic().
> +	 * In case any panic notifiers are set to execute before kexec,
> +	 * don't run crash_kexec() here yet, which must run after these
> +	 * panic notifiers are executed, in the panic() path.
>  	 */
> -	if (crash_kexec_post_notifiers)
> +	if (panic_notifiers_before_kdump())
>  		return 0;
>  	/*
>  	 * There are 4 panic() calls in make_task_dead() path, each of which
> diff --git a/kernel/panic.c b/kernel/panic.c
> index bf792102b43e..b7c055d4f87f 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -15,7 +15,6 @@
>  #include <linux/kgdb.h>
>  #include <linux/kmsg_dump.h>
>  #include <linux/kallsyms.h>
> -#include <linux/notifier.h>
>  #include <linux/vt_kern.h>
>  #include <linux/module.h>
>  #include <linux/random.h>
> @@ -52,14 +51,23 @@ static unsigned long tainted_mask =
>  static int pause_on_oops;
>  static int pause_on_oops_flag;
>  static DEFINE_SPINLOCK(pause_on_oops_lock);
> -bool crash_kexec_post_notifiers;
> +
>  int panic_on_warn __read_mostly;
> +bool panic_on_taint_nousertaint;
>  unsigned long panic_on_taint;
> -bool panic_on_taint_nousertaint = false;
>  
>  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>  EXPORT_SYMBOL_GPL(panic_timeout);
>  
> +/* Initialized with all notifiers set to run before kdump */
> +static unsigned long panic_notifiers_bits = 15;
> +
> +/* Default level is 2, see kernel-parameters.txt */
> +unsigned int panic_notifiers_level = 2;
> +
> +/* DEPRECATED in favor of panic_notifiers_level */
> +bool crash_kexec_post_notifiers;
> +
>  #define PANIC_PRINT_TASK_INFO		0x00000001
>  #define PANIC_PRINT_MEM_INFO		0x00000002
>  #define PANIC_PRINT_TIMER_INFO		0x00000004
> @@ -109,10 +117,14 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs)
>  }
>  
>  /*
> - * Stop other CPUs in panic.  Architecture dependent code may override this
> - * with more suitable version.  For example, if the architecture supports
> - * crash dump, it should save registers of each stopped CPU and disable
> - * per-CPU features such as virtualization extensions.
> + * Stop other CPUs in panic context.
> + *
> + * Architecture dependent code may override this with more suitable version.
> + * For example, if the architecture supports crash dump, it should save the
> + * registers of each stopped CPU and disable per-CPU features such as
> + * virtualization extensions. When not overridden in arch code (and for
> + * x86/xen), this is exactly the same as execute smp_send_stop(), but
> + * guarded against duplicate execution.
>   */
>  void __weak crash_smp_send_stop(void)
>  {
> @@ -183,6 +195,112 @@ static void panic_print_sys_info(bool console_flush)
>  		ftrace_dump(DUMP_ALL);
>  }
>  
> +/*
> + * Helper that accumulates all console flushing routines executed on panic.
> + */
> +static void console_flushing(void)
> +{
> +#ifdef CONFIG_VT
> +	unblank_screen();
> +#endif
> +	console_unblank();
> +
> +	/*
> +	 * In this point, we may have disabled other CPUs, hence stopping the
> +	 * CPU holding the lock while still having some valuable data in the
> +	 * console buffer.
> +	 *
> +	 * Try to acquire the lock then release it regardless of the result.
> +	 * The release will also print the buffers out. Locks debug should
> +	 * be disabled to avoid reporting bad unlock balance when panic()
> +	 * is not being called from OOPS.
> +	 */
> +	debug_locks_off();
> +	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
> +	panic_print_sys_info(true);
> +}
> +
> +#define PN_HYPERVISOR_BIT	0
> +#define PN_INFO_BIT		1
> +#define PN_PRE_REBOOT_BIT	2
> +#define PN_POST_REBOOT_BIT	3
> +
> +/*
> + * Determine the order of panic notifiers with regards to kdump.
> + *
> + * This function relies in the "panic_notifiers_level" kernel parameter
> + * to determine how to order the notifiers with regards to kdump. We
> + * have currently 5 levels. For details, please check the kernel docs for
> + * "panic_notifiers_level" at Documentation/admin-guide/kernel-parameters.txt.
> + *
> + * Default level is 2, which means the panic hypervisor and informational
> + * (unless we don't have any kmsg_dumper) lists will execute before kdump.
> + */
> +static void order_panic_notifiers_and_kdump(void)
> +{
> +	/*
> +	 * The parameter "crash_kexec_post_notifiers" is deprecated, but
> +	 * valid. Users that set it want really all panic notifiers to
> +	 * execute before kdump, so it's effectively the same as setting
> +	 * the panic notifiers level to 4.
> +	 */
> +	if (panic_notifiers_level >= 4 || crash_kexec_post_notifiers)
> +		return;
> +
> +	/*
> +	 * Based on the level configured (smaller than 4), we clear the
> +	 * proper bits in "panic_notifiers_bits". Notice that this bitfield
> +	 * is initialized with all notifiers set.
> +	 */
> +	switch (panic_notifiers_level) {
> +	case 3:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		break;
> +	case 2:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +
> +		if (!kmsg_has_dumpers())
> +			clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		break;
> +	case 1:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		break;
> +	case 0:
> +		clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> +		clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
> +		break;
> +	}
> +}
> +
> +/*
> + * Set of helpers to execute the panic notifiers only once.
> + * Just the informational notifier cares about the return.
> + */
> +static inline bool notifier_run_once(struct atomic_notifier_head head,
> +				     char *buf, long bit)
> +{
> +	if (test_and_change_bit(bit, &panic_notifiers_bits)) {
> +		atomic_notifier_call_chain(&head, PANIC_NOTIFIER, buf);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +#define panic_notifier_hypervisor_once(buf)\
> +	notifier_run_once(panic_hypervisor_list, buf, PN_HYPERVISOR_BIT)
> +
> +#define panic_notifier_info_once(buf)\
> +	notifier_run_once(panic_info_list, buf, PN_INFO_BIT)
> +
> +#define panic_notifier_pre_reboot_once(buf)\
> +	notifier_run_once(panic_pre_reboot_list, buf, PN_PRE_REBOOT_BIT)
> +
> +#define panic_notifier_post_reboot_once(buf)\
> +	notifier_run_once(panic_post_reboot_list, buf, PN_POST_REBOOT_BIT)
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -198,32 +316,29 @@ void panic(const char *fmt, ...)
>  	long i, i_next = 0, len;
>  	int state = 0;
>  	int old_cpu, this_cpu;
> -	bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
>  
> -	if (panic_on_warn) {
> -		/*
> -		 * This thread may hit another WARN() in the panic path.
> -		 * Resetting this prevents additional WARN() from panicking the
> -		 * system on this thread.  Other threads are blocked by the
> -		 * panic_mutex in panic().
> -		 */
> -		panic_on_warn = 0;
> -	}
> +	/*
> +	 * This thread may hit another WARN() in the panic path, so
> +	 * resetting this option prevents additional WARN() from
> +	 * re-panicking the system here.
> +	 */
> +	panic_on_warn = 0;
>  
>  	/*
>  	 * Disable local interrupts. This will prevent panic_smp_self_stop
> -	 * from deadlocking the first cpu that invokes the panic, since
> -	 * there is nothing to prevent an interrupt handler (that runs
> -	 * after setting panic_cpu) from invoking panic() again.
> +	 * from deadlocking the first cpu that invokes the panic, since there
> +	 * is nothing to prevent an interrupt handler (that runs after setting
> +	 * panic_cpu) from invoking panic() again. Also disables preemption
> +	 * here - notice it's not safe to rely on interrupt disabling to avoid
> +	 * preemption, since any cond_resched() or cond_resched_lock() might
> +	 * trigger a reschedule if the preempt count is 0 (for reference, see
> +	 * Documentation/locking/preempt-locking.rst). Some functions called
> +	 * from here want preempt disabled, so no point enabling it later.
>  	 */
>  	local_irq_disable();
>  	preempt_disable_notrace();
>  
>  	/*
> -	 * It's possible to come here directly from a panic-assertion and
> -	 * not have preempt disabled. Some functions called from here want
> -	 * preempt to be disabled. No point enabling it later though...
> -	 *
>  	 * Only one CPU is allowed to execute the panic code from here. For
>  	 * multiple parallel invocations of panic, all other CPUs either
>  	 * stop themself or will wait until they are stopped by the 1st CPU
> @@ -266,73 +381,75 @@ void panic(const char *fmt, ...)
>  	kgdb_panic(buf);
>  
>  	/*
> -	 * If we have crashed and we have a crash kernel loaded let it handle
> -	 * everything else.
> -	 * If we want to run this after calling panic_notifiers, pass
> -	 * the "crash_kexec_post_notifiers" option to the kernel.
> +	 * Here lies one of the most subtle parts of the panic path,
> +	 * the panic notifiers and their order with regards to kdump.
> +	 * We currently have 4 sets of notifiers:
>  	 *
> -	 * Bypass the panic_cpu check and call __crash_kexec directly.
> +	 *  - the hypervisor list is composed by callbacks that are related
> +	 *  to warn the FW / hypervisor about panic, or non-invasive LED
> +	 *  controlling functions - (hopefully) low-risk for kdump, should
> +	 *  run early if possible.
> +	 *
> +	 *  - the informational list is composed by functions dumping data
> +	 *  like kernel offsets, device error registers or tracing buffer;
> +	 *  also log flooding prevention callbacks fit in this list. It is
> +	 *  relatively safe to run before kdump.
> +	 *
> +	 *  - the pre_reboot list basically is everything else, all the
> +	 *  callbacks that don't fit in the 2 previous lists. It should
> +	 *  run *after* kdump if possible, as it contains high-risk
> +	 *  functions that may break kdump.
> +	 *
> +	 *  - we also have a 4th list of notifiers, the post_reboot
> +	 *  callbacks. This is not strongly related to kdump since it's
> +	 *  always executed late in the panic path, after the restart
> +	 *  mechanism (if set); its goal is to provide a way for
> +	 *  architecture code effectively power-off/disable the system.
> +	 *
> +	 *  The kernel provides the "panic_notifiers_level" parameter
> +	 *  to adjust the ordering in which these notifiers should run
> +	 *  with regards to kdump - the default level is 2, so both the
> +	 *  hypervisor and informational notifiers should execute before
> +	 *  the __crash_kexec(); the info notifier won't run by default
> +	 *  unless there's some kmsg_dumper() registered. For details
> +	 *  about it, check Documentation/admin-guide/kernel-parameters.txt.
> +	 *
> +	 *  Notice that the code relies in bits set/clear operations to
> +	 *  determine the ordering, functions *_once() execute only one
> +	 *  time, as their name implies. The goal is to prevent too much
> +	 *  if conditionals and more confusion. Finally, regarding CPUs
> +	 *  disabling: unless NO panic notifier executes before kdump,
> +	 *  we always disable secondary CPUs before __crash_kexec() and
> +	 *  the notifiers execute.
>  	 */
> -	if (!_crash_kexec_post_notifiers) {
> +	order_panic_notifiers_and_kdump();
> +
> +	/* If no level, we should kdump ASAP. */
> +	if (!panic_notifiers_level)
>  		__crash_kexec(NULL);
>  
> -		/*
> -		 * Note smp_send_stop is the usual smp shutdown function, which
> -		 * unfortunately means it may not be hardened to work in a
> -		 * panic situation.
> -		 */
> -		smp_send_stop();
> -	} else {
> -		/*
> -		 * If we want to do crash dump after notifier calls and
> -		 * kmsg_dump, we will need architecture dependent extra
> -		 * works in addition to stopping other CPUs.
> -		 */
> -		crash_smp_send_stop();
> -	}
> +	crash_smp_send_stop();
> +	panic_notifier_hypervisor_once(buf);
>  
> -	/*
> -	 * Run any panic handlers, including those that might need to
> -	 * add information to the kmsg dump output.
> -	 */
> -	atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
> -	atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
> -	atomic_notifier_call_chain(&panic_pre_reboot_list, PANIC_NOTIFIER, buf);
> +	if (panic_notifier_info_once(buf)) {
> +		panic_print_sys_info(false);
> +		kmsg_dump(KMSG_DUMP_PANIC);
> +	}
>  
> -	panic_print_sys_info(false);
> +	panic_notifier_pre_reboot_once(buf);
>  
> -	kmsg_dump(KMSG_DUMP_PANIC);
> +	__crash_kexec(NULL);
>  
> -	/*
> -	 * If you doubt kdump always works fine in any situation,
> -	 * "crash_kexec_post_notifiers" offers you a chance to run
> -	 * panic_notifiers and dumping kmsg before kdump.
> -	 * Note: since some panic_notifiers can make crashed kernel
> -	 * more unstable, it can increase risks of the kdump failure too.
> -	 *
> -	 * Bypass the panic_cpu check and call __crash_kexec directly.
> -	 */
> -	if (_crash_kexec_post_notifiers)
> -		__crash_kexec(NULL);
> +	panic_notifier_hypervisor_once(buf);
>  
> -#ifdef CONFIG_VT
> -	unblank_screen();
> -#endif
> -	console_unblank();
> -
> -	/*
> -	 * We may have ended up stopping the CPU holding the lock (in
> -	 * smp_send_stop()) while still having some valuable data in the console
> -	 * buffer.  Try to acquire the lock then release it regardless of the
> -	 * result.  The release will also print the buffers out.  Locks debug
> -	 * should be disabled to avoid reporting bad unlock balance when
> -	 * panic() is not being callled from OOPS.
> -	 */
> -	debug_locks_off();
> -	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +	if (panic_notifier_info_once(buf)) {
> +		panic_print_sys_info(false);
> +		kmsg_dump(KMSG_DUMP_PANIC);
> +	}
>  
> -	panic_print_sys_info(true);
> +	panic_notifier_pre_reboot_once(buf);
>  
> +	console_flushing();
>  	if (!panic_blink)
>  		panic_blink = no_blink;
>  
> @@ -363,8 +480,7 @@ void panic(const char *fmt, ...)
>  		emergency_restart();
>  	}
>  
> -	atomic_notifier_call_chain(&panic_post_reboot_list,
> -				   PANIC_NOTIFIER, buf);
> +	panic_notifier_post_reboot_once(buf);
>  
>  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
>  
> @@ -383,6 +499,15 @@ void panic(const char *fmt, ...)
>  
>  EXPORT_SYMBOL(panic);
>  
> +/*
> + * Helper used in the kexec code, to validate if any
> + * panic notifier is set to execute early, before kdump.
> + */
> +inline bool panic_notifiers_before_kdump(void)
> +{
> +	return panic_notifiers_level || crash_kexec_post_notifiers;
> +}
> +
>  /*
>   * TAINT_FORCED_RMMOD could be a per-module flag but the module
>   * is being removed anyway.
> @@ -692,6 +817,9 @@ core_param(panic, panic_timeout, int, 0644);
>  core_param(panic_print, panic_print, ulong, 0644);
>  core_param(pause_on_oops, pause_on_oops, int, 0644);
>  core_param(panic_on_warn, panic_on_warn, int, 0644);
> +core_param(panic_notifiers_level, panic_notifiers_level, uint, 0644);
> +
> +/* DEPRECATED in favor of panic_notifiers_level */
>  core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644);
>  
>  static int __init oops_setup(char *s)
> diff --git a/tools/testing/selftests/pstore/pstore_crash_test b/tools/testing/selftests/pstore/pstore_crash_test
> index 2a329bbb4aca..1e60ce4501aa 100755
> --- a/tools/testing/selftests/pstore/pstore_crash_test
> +++ b/tools/testing/selftests/pstore/pstore_crash_test
> @@ -25,6 +25,7 @@ touch $REBOOT_FLAG
>  sync
>  
>  # cause crash
> -# Note: If you use kdump and want to see kmesg-* files after reboot, you should
> -#       specify 'crash_kexec_post_notifiers' in 1st kernel's cmdline.
> +# Note: If you use kdump and want to see kmsg-* files after reboot, you should
> +#       be sure that the parameter "panic_notifiers_level" is more than '2' (the
> +#       default value for this parameter is '2') in the first kernel's cmdline.
>  echo c > /proc/sysrq-trigger

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-24 14:44   ` Eric W. Biederman
@ 2022-05-26 16:25     ` Guilherme G. Piccoli
  2022-06-14 14:36       ` Petr Mladek
  0 siblings, 1 reply; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-26 16:25 UTC (permalink / raw)
  To: bhe, d.hatayama, Eric W. Biederman, Mark Rutland, mikelley,
	pmladek, vkuznets
  Cc: akpm, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-arm-kernel, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	dave.hansen, dyoung, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, will

Hey folks, first of all thanks a lot for the reviews / opinions about
this. I imagined that such change would be polemic, and I see I was
right heh


I'll try to "mix" all the relevant opinions in a single email, since
they happened in different responses and even different mail threads.

I've looped here the most interested parties based on the feedback
received, such as Baoquan (kdump), Hatayama (kdump), Eric (kexec), Mark
(arm64), Michael (Hyper-V), Petr (console/printk) and Vitaly (hyper-v /
kvm). I hope we can discuss and try to reach some consensus - my
apologies in advance for this long message!

So, here goes some feedback we received about this change and correlated
feedback from arm64 community - my apologies if I missed something
important, I've tried to collect the most relevant portions, while
keeping the summary "as short" as possible. I'll respond to such
feedback below, after the quotes.


On 24/05/2022 05:32, Baoquan He wrote:
>> [...] 
>> Firstly, kdump is not always the first thing. In any use case, if kdump
>> kernel is not loaded, it's not the first thing at all. Not to mention
>> if crash_kexec_post_notifiers is specified.
>> [...]
>> Changing this will cause regression. During these years, nobody ever doubt
>> kdump should execute firstly if crashkernel is reserved and kdump kernel is
>> loaded. That's not saying we can't change
>> this, but need a convincing justification.
>> [...] 
>> Secondly, even with the notifiers' split, we can't guarantee people will
>> absolutely add notifiers into right list in the future. Letting kdump
>> execute behind lists by default will put kdump into risk.
>> [...] 
>> As for Hyper-V, if it enforces to terminate VMbus connection, no matter
>> it's kdump or not, why not taking it out of panic notifiers list and
>> execute it before kdump unconditionally.


On 24/05/2022 05:01, Petr Mladek wrote:
>> [...]
>> Anyway, I see four possible solutions:
>> 
>>   1. The most conservative approach is to keep the current behavior
>>      and call kdump first by default.
>> 
>>   2. A medium conservative approach to change the default default
>>      behavior and call hypervisor and eventually the info notifiers
>>      before kdump. There still would be the possibility to call kdump
>>      first by the command line parameter.
>> 
>>   3. Remove the possibility to call kdump first completely. It would
>>      assume that all the notifiers in the info list are super safe
>>      or that they make kdump actually more safe.
>> 
>>   4. Create one more notifier list for operations that always should
>>      be called before crash_dump.
>> 
>> Regarding the extra notifier list (4th solution). It is not clear to
>> me whether it would be always called even before hypervisor list or
>> when kdump is not enabled. We must not over-engineer it.
>> 
>> 2nd proposal looks like a good compromise. But maybe we could do
>> this change few releases later. The notifiers split is a big
>> change on its own.


On 24/05/2022 07:18, Baoquan He wrote:
>>[...]
>> I would vote for 1 or 4 without any hesitation, and prefer 4. I ever
>> suggest the variant of solution 4 in v1 reviewing. That's taking those
>> notifiers out of list and enforcing to execute them before kdump. E.g
>> the one on HyperV to terminate VMbus connection. Maybe solution 4 is
>> better to provide a determinate way for people to add necessary code
>> at the earliest part.
>> [...] 
>>>
>>> Regarding the extra notifier list (4th solution). It is not clear to
>>> me whether it would be always called even before hypervisor list or
>>> when kdump is not enabled. We must not over-engineer it.
>> 
>> One thing I would like to notice is, no matter how perfect we split the
>> lists this time, we can't gurantee people will add notifiers reasonablly
>> in the future. And people from different sub-component may not do
>> sufficient investigation and add them to fulfil their local purpose.
>> 
>> The current panic notifers list is the best example. Hyper-V actually
>> wants to run some necessary code before kdump, but not all of them, they
>> just add it, ignoring the original purpose of
>> crash_kexec_post_notifiers. I guess they do like this just because it's
>> easy to do, no need to bother changing code in generic place.
>> 
>> Solution 4 can make this no doubt, that's why I like it better.
>> [...] 
>> As I replied to Guilherme, solution 2 will cause regression if not
>> calling kdump firstly. Solution 3 leaves people space to make mistake,
>> they could add nontifier into wrong list.
>> 
>> I would like to note again that the panic notifiers are optional to run,
>> while kdump is expectd once loaded, from the original purpose. I guess
>> people I know will still have this thought, e.g Hatayama, Masa, they are
>> truly often use panic notifiers like this on their company's system.


On 24/05/2022 11:44, Eric W. Biederman wrote:
> [...]
> Unfortunately I am also very grouchy.
> 
> Notifiers before kexec on panic are fundamentally broken.  So please
> just remove crash_kexec_post notifiers and be done with it.  Part of the
> deep issue is that firmware always has a common and broken
> implementation for anything that is not mission critical to
> motherboards.
> 
> Notifiers in any sense on these paths are just bollocks.  Any kind of
> notifier list is fundamentally fragile in the face of memory corruption
> and very very difficult to review.
> 
> So I am going to refresh my ancient NACK on this.
> 
> I can certainly appreciate that there are pieces of the reboot paths
> that can be improved.  I don't think making anything more feature full
> or flexible is any kind of real improvement.


Now, from the thread "Should arm64 have a custom crash shutdown
handler?" (link:
https://lore.kernel.org/lkml/427a8277-49f0-4317-d6c3-4a15d7070e55@igalia.com/),
we have:

On 05/05/2022 08:10, Mark Rutland wrote:
>> On Wed, May 04, 2022 at 05:00:42PM -0300, Guilherme G. Piccoli wrote:
>>> [...]
>>> Currently, when we kexec in arm64, the function machine_crash_shutdown()
>>> is called as a handler to disable CPUs and (potentially) do extra
>>> quiesce work. In the aforementioned architectures, there's a way to
>>> override this function, if for example an hypervisor wish to have its
>>> guests running their own custom shutdown machinery.
>> 
>> What exactly do you need to do in this custom shutdown machinery?
>> 
>> The general expectation for arm64 is that any hypervisor can implement PSCI,
>> and as long as you have that, CPUs (and the VM as a whole) can be shutdown in a
>> standard way.
>> 
>> I suspect what you're actually after is a mechanism to notify the hypervisor
>> when the guest crashes, rather than changing the way the shutdown itself
>> occurs? If so, we already have panic notifiers, and QEMU has a "pvpanic"
>> device using that. See drivers/misc/pvpanic/.


OK, so it seems we have some points in which agreement exists, and some
points that there is no agreement and instead, we have antagonistic /
opposite views and needs. Let's start with the easier part heh


It seems everybody agrees that *we shouldn't over-engineer things*, and
as per Eric good words: making the panic path more feature-full or
increasing flexibility isn't a good idea. So, as a "corollary": the
panic level approach I'm proposing is not a good fit, I'll drop it and
let's go with something simpler.

Another point of agreement seems to be that _notifier lists in the panic
path are dangerous_, for *2 different reasons*:

(a) We cannot guarantee that people won't add crazy callbacks there, we
can plan and document things the best as possible - it'll never be
enough, somebody eventually would slip a nonsense callback that would
break things and defeat the planned purpose of such a list;

(b) As per Eric point, in a panic/crash situation we might have memory
corruption exactly in the list code / pointers, etc, so the notifier
lists are, by nature, a bit fragile. But I think we shouldn't consider
it completely "bollocks", since this approach has been used for a while
with a good success rate. So, lists aren't perfect at all, but at the
same time, they aren't completely useless.


Now, to the points in which there are conflicting / antagonistic
needs/views:

(I) Kdump should be the first thing to run, as it's been like that since
forever. But...notice that "crash_kexec_post_notifiers" was created
exactly as a way to circumvent that, so we can see this is not an
absolute truth. Some users really *require to execute* some special code
*before kdump*.
Worth noticing here that regular kexec invokes the drivers .shutdown()
handlers, while kdump [aka crash_kexec()] does not, so we must have a
way to run code before kdump in a crash situation.

(II) If *we need* to have some code always running before kdump/reboot
on panic path (like the Hyper-V vmbus connection unload), *where to add
such code*? Again, conflicting views. Some would say we should hardcode
this in the panic() function. Others, that we should use the custom
machine_crash_shutdown() infrastructure - but notice that this isn't
available in all architectures, like arm64. Finally, others suggest
to...use notifier lists! Which was more or less the approach we took in
this patch.

How can we reach consensus on this? Not everybody will be 100% happy,
that's for sure. Also, I'd risk to say keep things as-is now or even
getting rid of "crash_kexec_post_notifiers" won't work at all, we have
users with legitimate needs of running code before a kdump/reboot when
crash happens. The *main goal* should be to have a *simple solution*
that doesn't require users to abuse parameters, like it's been done with
"crash_kexec_post_notifiers" (Hyper-V and PowerPC currently force this
parameter to be enabled, for example).


To avoid using a 4th list, especially given the list nature is a bit
fragile, I'd suggest one of the 3 following approaches - I *really
appreciate feedbacks* on that so I can implement the best solution and
avoid wasting time in some poor/disliked solution:

(1) We could have a helper function in the "beginning" of panic path,
i.e., after the logic to disable preemption/IRQs/secondary CPUs, but
*before* kdump. Then, users like Hyper-V that require to execute stuff
regardless of kdump or not, would run their callbacks from there,
directly, no lists involved.

- pros: simple, doesn't mess with arch code or involve lists.
- cons: header issues - will need to "export" such function from driver
code, for example, to run in core code. Also, some code might only be
required to run in some architectures, or only if kdump is set, etc.,
making the callbacks more complex / full of if conditionals.


(2) Similarly to previous solution, we could have a helper in the kexec
core code, not in the panic path. This way, users that require running
stuff *before a kdump* would add direct calls there; if kdump isn't
configured, and if such users also require that this code execute in
panic nevertheless, they'd need to also add a callback to some notifier
list.

- pros: also simple / doesn't mess with arch code or involve lists;
restricts the callbacks to kdump case.
- cons: also header issues, but might cause code duplicity too, since
some users would require both to run their code before a kdump and in
some panic notifier list.


(3) Have a way in arm64 (and all archs maybe) to run code before a kdump
- this is analog to the custom machine_crash_shutdown() we have nowadays
in some archs.

- pros: decouple kdump-required callbacks from panic notifiers, doesn't
involve lists, friendly to arch-dependent callbacks.
- cons: also header issues, might cause code duplicity (since some users
would also require to run their code in panic path regardless of kdump)
and involve changing arch code (some maintainers like Mark aren't fond
about that, with good reasons!).


So, hopefully we can converge to some direction even if not 100% of
users are happy - this problem is naturally full of trade-offs.
Thanks again for the reviews and the time you're spending reading these
long threads.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-05-26 16:25     ` Guilherme G. Piccoli
@ 2022-06-14 14:36       ` Petr Mladek
  2022-06-15  9:36         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 183+ messages in thread
From: Petr Mladek @ 2022-06-14 14:36 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: bhe, d.hatayama, Eric W. Biederman, Mark Rutland, mikelley,
	vkuznets, akpm, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-arm-kernel, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	dave.hansen, dyoung, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, will

On Thu 2022-05-26 13:25:57, Guilherme G. Piccoli wrote:
> OK, so it seems we have some points in which agreement exists, and some
> points that there is no agreement and instead, we have antagonistic /
> opposite views and needs. Let's start with the easier part heh
>
> It seems everybody agrees that *we shouldn't over-engineer things*, and
> as per Eric good words: making the panic path more feature-full or
> increasing flexibility isn't a good idea. So, as a "corollary": the
> panic level approach I'm proposing is not a good fit, I'll drop it and
> let's go with something simpler.

Makes sense.

> Another point of agreement seems to be that _notifier lists in the panic
> path are dangerous_, for *2 different reasons*:
> 
> (a) We cannot guarantee that people won't add crazy callbacks there, we
> can plan and document things the best as possible - it'll never be
> enough, somebody eventually would slip a nonsense callback that would
> break things and defeat the planned purpose of such a list;

It is true that notifier lists might allow to add crazy stuff
without proper review more easily. Things added into the core
code would most likely get better review.

But nothing is error-proof. And bugs will happen with any approach.


> (b) As per Eric point, in a panic/crash situation we might have memory
> corruption exactly in the list code / pointers, etc, so the notifier
> lists are, by nature, a bit fragile. But I think we shouldn't consider
> it completely "bollocks", since this approach has been used for a while
> with a good success rate. So, lists aren't perfect at all, but at the
> same time, they aren't completely useless.

I am not able to judge this. Of course, any extra step increases
the risk. I am just not sure how much more complicated it would
be to hardcode the calls. Most of them are architecture
and/or feature specific. And such code is often hard to
review and maintain.

> To avoid using a 4th list,

4th or 5th? We already have "hypervisor", "info", "pre-reboot", and "pre-loop".
The 5th might be pre-crash-exec.

> especially given the list nature is a bit
> fragile, I'd suggest one of the 3 following approaches - I *really
> appreciate feedbacks* on that so I can implement the best solution and
> avoid wasting time in some poor/disliked solution:

Honestly, I am not able to decide what might be better without seeing
the code.

Most things fits pretty well into the 4 proposed lists:
"hypervisor", "info", "pre-reboot", and "pre-loop". IMHO, the
only question is the code that needs to be always called
even before crash_dump.

I suggest that you solve the crash_dump callbacks the way that
looks best to you. Ideally do it in a separate patch so it can be
reviewed and reworked more easily.

I believe that a fresh code with an updated split and simplified
logic would help us to move forward.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 24/30] panic: Refactor the panic path
  2022-06-14 14:36       ` Petr Mladek
@ 2022-06-15  9:36         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 183+ messages in thread
From: Guilherme G. Piccoli @ 2022-06-15  9:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: bhe, d.hatayama, Eric W. Biederman, Mark Rutland, mikelley,
	vkuznets, akpm, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-arm-kernel, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um,
	linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
	xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
	dave.hansen, dyoung, feng.tang, gregkh, hidehiro.kawai.ez,
	jgross, john.ogness, keescook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, stern, tglx, vgoyal, will

Perfect Petr, thanks for your feedback!

I'll be out for some weeks, but after that what I'm doing is to split
the series in 2 parts:

(a) The general fixes, which should be reviewed by subsystem maintainers
and even merged individually by them.

(b) The proper panic refactor, which includes the notifiers list split,
etc. I'll think about what I consider the best solution for the
crash_dump required ones, and will try to split in very simple patches
to make it easier to review.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 183+ messages in thread

end of thread, other threads:[~2022-06-15 14:38 UTC | newest]

Thread overview: 183+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27 22:48 [PATCH 00/30] The panic notifiers refactor Guilherme G. Piccoli
2022-04-27 22:48 ` [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart Guilherme G. Piccoli
2022-05-09 12:32   ` Guilherme G. Piccoli
2022-05-09 15:52   ` Sean Christopherson
2022-05-10 20:11     ` Guilherme G. Piccoli
2022-04-27 22:48 ` [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path Guilherme G. Piccoli
2022-04-29 16:26   ` Michael Kelley (LINUX)
2022-04-29 18:20   ` Marc Zyngier
2022-04-29 21:38     ` Guilherme G. Piccoli
2022-04-29 21:45       ` Russell King (Oracle)
2022-04-29 21:56         ` Guilherme G. Piccoli
2022-04-29 22:00         ` Marc Zyngier
2022-04-27 22:48 ` [PATCH 03/30] notifier: Add panic notifiers info and purge trailing whitespaces Guilherme G. Piccoli
2022-04-27 22:48 ` [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path Guilherme G. Piccoli
2022-05-03 18:03   ` Evan Green
2022-05-03 19:12     ` Guilherme G. Piccoli
2022-05-03 21:56       ` Evan Green
2022-05-04 12:45         ` Guilherme G. Piccoli
2022-05-10 11:38       ` Petr Mladek
2022-05-10 13:04         ` Guilherme G. Piccoli
2022-05-10 17:20         ` Steven Rostedt
2022-05-10 19:40           ` John Ogness
2022-05-11 11:13             ` Petr Mladek
2022-04-27 22:48 ` [PATCH 05/30] misc/pvpanic: " Guilherme G. Piccoli
2022-05-10 12:14   ` Petr Mladek
2022-05-10 13:00     ` Guilherme G. Piccoli
2022-05-17 10:58       ` Petr Mladek
2022-05-17 13:03         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header Guilherme G. Piccoli
2022-05-02 15:38   ` Florian Fainelli
2022-05-02 15:47     ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 07/30] mips: ip22: Reword PANICED to PANICKED " Guilherme G. Piccoli
2022-05-04 20:32   ` Thomas Bogendoerfer
2022-05-04 21:26     ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers Guilherme G. Piccoli
2022-05-05 18:55   ` Hari Bathini
2022-05-05 19:28     ` Guilherme G. Piccoli
2022-05-09 12:50     ` Guilherme G. Piccoli
2022-05-10 13:53       ` Michael Ellerman
2022-05-10 14:10         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier Guilherme G. Piccoli
2022-04-28  8:11   ` Suzuki K Poulose
2022-04-29 14:01     ` Guilherme G. Piccoli
2022-05-09 13:09     ` Guilherme G. Piccoli
2022-05-09 16:14       ` Suzuki K Poulose
2022-05-09 16:26         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 10/30] alpha: Clean-up the panic notifier code Guilherme G. Piccoli
2022-05-09 14:13   ` Guilherme G. Piccoli
2022-05-10 14:16     ` Petr Mladek
2022-05-11 20:10       ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 11/30] um: Improve panic notifiers consistency and ordering Guilherme G. Piccoli
2022-04-28  8:30   ` Johannes Berg
2022-04-29 15:46     ` Guilherme G. Piccoli
2022-05-10 14:28   ` Petr Mladek
2022-05-11 20:22     ` Guilherme G. Piccoli
2022-05-13 14:44       ` Johannes Berg
2022-05-15 22:12         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path Guilherme G. Piccoli
2022-04-28 16:55   ` Helge Deller
2022-04-29 14:34     ` Guilherme G. Piccoli
2022-05-23 20:40     ` Guilherme G. Piccoli
2022-05-23 21:31       ` Helge Deller
2022-05-23 21:55         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 13/30] s390/consoles: Improve panic notifiers reliability Guilherme G. Piccoli
2022-04-29 18:46   ` Heiko Carstens
2022-04-29 19:31     ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks Guilherme G. Piccoli
2022-05-10 15:16   ` Petr Mladek
2022-05-10 16:16     ` Guilherme G. Piccoli
2022-05-17 13:11       ` Petr Mladek
2022-05-17 15:19         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers Guilherme G. Piccoli
2022-05-02 15:38   ` Florian Fainelli
2022-05-02 15:50     ` Guilherme G. Piccoli
2022-05-10 15:28   ` Petr Mladek
2022-05-17 15:32     ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers Guilherme G. Piccoli
2022-04-29 17:16   ` Michael Kelley (LINUX)
2022-04-29 22:35     ` Guilherme G. Piccoli
2022-05-03 18:13       ` Michael Kelley (LINUX)
2022-05-03 18:57         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 17/30] tracing: Improve panic/die notifiers Guilherme G. Piccoli
2022-04-29  9:22   ` Sergei Shtylyov
2022-04-29 13:23     ` Steven Rostedt
2022-04-29 13:46       ` Guilherme G. Piccoli
2022-04-29 13:56         ` Steven Rostedt
2022-04-29 14:44           ` Guilherme G. Piccoli
2022-05-11 11:45   ` Petr Mladek
2022-05-17 15:33     ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set Guilherme G. Piccoli
2022-04-28  1:01   ` Xiaoming Ni
2022-04-29 19:38     ` Guilherme G. Piccoli
2022-05-10 17:29     ` Steven Rostedt
2022-05-16 16:14       ` Guilherme G. Piccoli
2022-04-29 16:27   ` Michael Kelley (LINUX)
2022-04-27 22:49 ` [PATCH 19/30] panic: Add the panic hypervisor notifier list Guilherme G. Piccoli
2022-04-29 17:30   ` Michael Kelley (LINUX)
2022-04-29 18:04     ` Guilherme G. Piccoli
2022-05-03 17:44       ` Michael Kelley (LINUX)
2022-05-03 17:56         ` Guilherme G. Piccoli
2022-05-16 14:01   ` Petr Mladek
2022-05-16 15:06     ` Guilherme G. Piccoli
2022-05-16 16:02       ` Evan Green
2022-05-17 13:28         ` Petr Mladek
2022-05-17 16:37           ` Guilherme G. Piccoli
2022-05-18  7:33             ` Petr Mladek
2022-05-18 13:24               ` Guilherme G. Piccoli
2022-05-17 13:57       ` Petr Mladek
2022-05-17 16:42         ` Guilherme G. Piccoli
2022-05-18  7:38           ` Petr Mladek
2022-05-18 13:09             ` Guilherme G. Piccoli
2022-05-18 22:17           ` Scott Branden
2022-05-19 12:19             ` Guilherme G. Piccoli
2022-05-19 19:20               ` Scott Branden
2022-05-23 14:56                 ` Guilherme G. Piccoli
2022-05-24  8:04                   ` Petr Mladek
2022-05-18  7:58         ` Petr Mladek
2022-05-18 13:16           ` Guilherme G. Piccoli
2022-05-19  7:03             ` Petr Mladek
2022-05-19 12:07               ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 20/30] panic: Add the panic informational " Guilherme G. Piccoli
2022-04-27 23:49   ` Paul E. McKenney
2022-04-28  8:14   ` Suzuki K Poulose
2022-04-29 14:50     ` Guilherme G. Piccoli
2022-05-16 14:11   ` Petr Mladek
2022-05-16 14:28     ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 21/30] panic: Introduce the panic pre-reboot " Guilherme G. Piccoli
2022-04-28 14:13   ` Alex Elder
2022-04-28 16:26   ` Corey Minyard
2022-04-29 15:18     ` Guilherme G. Piccoli
2022-04-29 16:04   ` Max Filippov
2022-04-29 19:34     ` Guilherme G. Piccoli
2022-05-16 14:33   ` Petr Mladek
2022-05-16 16:05     ` Guilherme G. Piccoli
2022-05-16 16:18       ` Luck, Tony
2022-05-16 16:33         ` Guilherme G. Piccoli
2022-05-17 14:11           ` Petr Mladek
2022-05-17 16:45             ` Guilherme G. Piccoli
2022-05-17 17:02               ` Luck, Tony
2022-05-17 18:12                 ` Guilherme G. Piccoli
2022-05-17 19:07                   ` Luck, Tony
2022-04-27 22:49 ` [PATCH 22/30] panic: Introduce the panic post-reboot " Guilherme G. Piccoli
2022-05-09 14:16   ` Guilherme G. Piccoli
2022-05-11 16:45     ` Heiko Carstens
2022-05-11 19:58       ` Guilherme G. Piccoli
2022-05-16 14:45   ` Petr Mladek
2022-05-16 16:08     ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers Guilherme G. Piccoli
2022-05-10 17:40   ` Steven Rostedt
2022-05-11 20:03     ` Guilherme G. Piccoli
2022-05-16 14:50       ` Petr Mladek
2022-05-16 16:09         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 24/30] panic: Refactor the panic path Guilherme G. Piccoli
2022-04-28  0:28   ` Randy Dunlap
2022-04-29 16:04     ` Guilherme G. Piccoli
2022-05-09 14:25       ` Guilherme G. Piccoli
2022-04-29 17:53   ` Michael Kelley (LINUX)
2022-04-29 20:38     ` Guilherme G. Piccoli
2022-05-03 17:31       ` Michael Kelley (LINUX)
2022-05-03 18:06         ` Guilherme G. Piccoli
2022-05-09 15:16   ` d.hatayama
2022-05-09 16:39     ` Guilherme G. Piccoli
2022-05-12 14:03   ` Petr Mladek
2022-05-15 22:47     ` Guilherme G. Piccoli
2022-05-16 10:21       ` Petr Mladek
2022-05-16 16:32         ` Guilherme G. Piccoli
2022-05-19 23:45       ` Baoquan He
2022-05-20 11:23         ` Guilherme G. Piccoli
2022-05-24  8:01           ` Petr Mladek
2022-05-24 10:18             ` Baoquan He
2022-05-24  8:32           ` Baoquan He
2022-05-24 14:44   ` Eric W. Biederman
2022-05-26 16:25     ` Guilherme G. Piccoli
2022-06-14 14:36       ` Petr Mladek
2022-06-15  9:36         ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier Guilherme G. Piccoli
2022-05-16 14:56   ` Petr Mladek
2022-05-16 16:11     ` Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 26/30] Drivers: hv: Do not force all panic notifiers to execute before kdump Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 27/30] powerpc: " Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 28/30] panic: Unexport crash_kexec_post_notifiers Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 29/30] powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic Guilherme G. Piccoli
2022-04-27 22:49 ` [PATCH 30/30] um: Avoid duplicate call to kmsg_dump() Guilherme G. Piccoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).