linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] The panic notifiers refactor strikes back - fixes/clean-ups
@ 2022-07-19 19:53 Guilherme G. Piccoli
  2022-07-19 19:53 ` [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-07-19 19:53 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, linux-hyperv, netdev, x86, kernel-dev, kernel,
	halves, fabiomirmar, alejandro.j.jimenez, andriy.shevchenko,
	arnd, bp, corbet, d.hatayama, dave.hansen, dyoung, feng.tang,
	gregkh, mikelley, hidehiro.kawai.ez, jgross, john.ogness,
	keescook, luto, mhiramat, mingo, paulmck, peterz, rostedt,
	senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Guilherme G. Piccoli, bcm-kernel-feedback-list, linux-alpha,
	linux-arm-kernel, linux-edac, linux-efi, linux-parisc, linux-um

Hi folks, this the second iteration of the panic notifiers refactor work,
but limited to the fixes/clean-ups in the first moment. The (full) V1 is
available at:
https://lore.kernel.org/lkml/20220427224924.592546-1-gpiccoli@igalia.com/

The idea of splitting the series is that, originally we had a bunch of fixes
followed by the notifiers refactor, but this second part (the effective
refactor) is a bit "polemic", with reviews having antagonistic goals and some
complexities  - it might be hard to achieve consensus.
For the curious, here is a good summary of the conflicting views and some
strategies we might take in the refactor V2:
https://lore.kernel.org/lkml/0d084eed-4781-c815-29c7-ac62c498e216@igalia.com/

So splitting and sending only the simple fixes/clean-ups in a first moment
makes sense, this way we don't prevent them to be discussed/merged/reworked
while the more complex part is subject to scrutiny in a different (future)
email thread.


I've tried to test this series building for all affected architecture/drivers
and also through some boot/runtime tests; below the test "matrix" used:

Build tests (using cross-compilers): alpha, arm, arm64, parisc, um, x86_64.
Boot/Runtime tests: x86_64 (Hyper-V and QEMU guests).

Here is the link with the .config files used:
https://people.igalia.com/gpiccoli/panic_notifiers_configs/5.19-rc7/
(tried my best to build all the affected code).


The series is based on 5.19-rc7; I'd like to ask that, if possible, maintainers
take the patches here in their trees, since there is no need to merge the series
as whole, patches are independent from each other.

Regarding the CC strategy, I've tried to reduce a bit the list of CCed emails,
given that it was huge in the first iteration. Hopefully I didn't forget
anybody interested in the topic (my apologies if so).

As usual, reviews / comments are always welcome, thanks in advance for them!
Cheers,


Guilherme




Guilherme G. Piccoli (13):
  ARM: Disable FIQs (but not IRQs) on CPUs shutdown paths
  notifier: Add panic notifiers info and purge trailing whitespaces
  firmware: google: Test spinlock on panic path to avoid lockups
  soc: bcm: brcmstb: Document panic notifier action and remove useless header
  alpha: Clean-up the panic notifier code
  um: Improve panic notifiers consistency and ordering
  parisc: Replace regular spinlock with spin_trylock on panic path
  tracing: Improve panic/die notifiers
  notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  EDAC/altera: Skip the panic notifier if kdump is loaded
  video/hyperv_fb: Avoid taking busy spinlock on panic path
  drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  panic: Fixes the panic_print NMI backtrace setting

 arch/alpha/kernel/setup.c           |  36 ++++-----
 arch/arm/kernel/machine_kexec.c     |   2 +
 arch/arm/kernel/smp.c               |   5 +-
 arch/parisc/include/asm/pdc.h       |   1 +
 arch/parisc/kernel/firmware.c       |  27 ++++++-
 arch/um/drivers/mconsole_kern.c     |   7 +-
 arch/um/kernel/um_arch.c            |   8 +-
 drivers/edac/altera_edac.c          |  16 +++-
 drivers/firmware/google/gsmi.c      |   8 ++
 drivers/hv/ring_buffer.c            |  16 ++++
 drivers/hv/vmbus_drv.c              | 109 +++++++++++++++++-----------
 drivers/parisc/power.c              |  17 +++--
 drivers/soc/bcm/brcmstb/pm/pm-arm.c |  16 +++-
 drivers/video/fbdev/hyperv_fb.c     |  16 +++-
 include/linux/hyperv.h              |   2 +
 include/linux/notifier.h            |   8 +-
 kernel/notifier.c                   |  22 ++++--
 kernel/panic.c                      |  47 +++++++-----
 kernel/trace/trace.c                |  55 +++++++-------
 19 files changed, 268 insertions(+), 150 deletions(-)

-- 
2.37.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-07-19 19:53 [PATCH v2 00/13] The panic notifiers refactor strikes back - fixes/clean-ups Guilherme G. Piccoli
@ 2022-07-19 19:53 ` Guilherme G. Piccoli
  2022-08-07 15:48   ` Guilherme G. Piccoli
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-07-19 19:53 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, linux-hyperv, netdev, x86, kernel-dev, kernel,
	halves, fabiomirmar, alejandro.j.jimenez, andriy.shevchenko,
	arnd, bp, corbet, d.hatayama, dave.hansen, dyoung, feng.tang,
	gregkh, mikelley, hidehiro.kawai.ez, jgross, john.ogness,
	keescook, luto, mhiramat, mingo, paulmck, peterz, rostedt,
	senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Guilherme G. Piccoli, linux-edac, Dinh Nguyen, Tony Luck

The altera_edac panic notifier performs some data collection with
regards errors detected; such code relies in the regmap layer to
perform reads/writes, so the code is abstracted and there is some
risk level to execute that, since the panic path runs in atomic
context, with interrupts/preemption and secondary CPUs disabled.

Users want the information collected in this panic notifier though,
so in order to balance the risk/benefit, let's skip the altera panic
notifier if kdump is loaded. While at it, remove a useless header
and encompass a macro inside the sole ifdef block it is used.

Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

---

V2:
- new patch, based on the discussion in [0].
[0] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/

 drivers/edac/altera_edac.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index e7e8e624a436..741fe5539154 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -16,7 +16,6 @@
 #include <linux/kernel.h>
 #include <linux/mfd/altera-sysmgr.h>
 #include <linux/mfd/syscon.h>
-#include <linux/notifier.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
 #include <linux/of_platform.h>
@@ -24,6 +23,7 @@
 #include <linux/platform_device.h>
 #include <linux/regmap.h>
 #include <linux/types.h>
+#include <linux/kexec.h>
 #include <linux/uaccess.h>
 
 #include "altera_edac.h"
@@ -2063,22 +2063,30 @@ static const struct irq_domain_ops a10_eccmgr_ic_ops = {
 };
 
 /************** Stratix 10 EDAC Double Bit Error Handler ************/
-#define to_a10edac(p, m) container_of(p, struct altr_arria10_edac, m)
-
 #ifdef CONFIG_64BIT
 /* panic routine issues reboot on non-zero panic_timeout */
 extern int panic_timeout;
 
+#define to_a10edac(p, m) container_of(p, struct altr_arria10_edac, m)
+
 /*
  * The double bit error is handled through SError which is fatal. This is
  * called as a panic notifier to printout ECC error info as part of the panic.
+ *
+ * Notice that if kdump is set, we take the risk avoidance approach and
+ * skip the notifier, given that users are expected to have access to a
+ * full vmcore.
  */
 static int s10_edac_dberr_handler(struct notifier_block *this,
 				  unsigned long event, void *ptr)
 {
-	struct altr_arria10_edac *edac = to_a10edac(this, panic_notifier);
+	struct altr_arria10_edac *edac;
 	int err_addr, dberror;
 
+	if (kexec_crash_loaded())
+		return NOTIFY_DONE;
+
+	edac = to_a10edac(this, panic_notifier);
 	regmap_read(edac->ecc_mgr_map, S10_SYSMGR_ECC_INTSTAT_DERR_OFST,
 		    &dberror);
 	regmap_write(edac->ecc_mgr_map, S10_SYSMGR_UE_VAL_OFST, dberror);
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-07-19 19:53 ` [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
@ 2022-08-07 15:48   ` Guilherme G. Piccoli
  2022-08-16 18:44   ` Dinh Nguyen
  2022-08-17 17:31   ` Borislav Petkov
  2 siblings, 0 replies; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-07 15:48 UTC (permalink / raw)
  To: kexec, Tony Luck, Dinh Nguyen, linux-edac
  Cc: akpm, bhe, linux-kernel, linux-hyperv, netdev, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	pmladek

On 19/07/2022 16:53, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
> 
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.
> 
> Cc: Dinh Nguyen <dinguyen@kernel.org>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> ---
> 
> V2:
> - new patch, based on the discussion in [0].
> [0] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
> 
>  drivers/edac/altera_edac.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> [...]

Hey Tony / Dinh, do you think this patch is good, based on the
discussion we had [0] last iteration?

Thanks in advance,


Guilherme


[0]
https://lore.kernel.org/lkml/599b72f6-76a4-8e6d-5432-56fb1ffd7e0b@igalia.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-07-19 19:53 ` [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
  2022-08-07 15:48   ` Guilherme G. Piccoli
@ 2022-08-16 18:44   ` Dinh Nguyen
  2022-08-16 20:16     ` Guilherme G. Piccoli
  2022-08-17 17:31   ` Borislav Petkov
  2 siblings, 1 reply; 17+ messages in thread
From: Dinh Nguyen @ 2022-08-16 18:44 UTC (permalink / raw)
  To: Guilherme G. Piccoli, akpm, bhe, pmladek, kexec
  Cc: linux-kernel, linux-hyperv, netdev, x86, kernel-dev, kernel,
	halves, fabiomirmar, alejandro.j.jimenez, andriy.shevchenko,
	arnd, bp, corbet, d.hatayama, dave.hansen, dyoung, feng.tang,
	gregkh, mikelley, hidehiro.kawai.ez, jgross, john.ogness,
	keescook, luto, mhiramat, mingo, paulmck, peterz, rostedt,
	senozhatsky, stern, tglx, vgoyal, vkuznets, will, linux-edac,
	Tony Luck



On 7/19/22 14:53, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
> 
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.
> 
> Cc: Dinh Nguyen <dinguyen@kernel.org>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> ---
> 
> V2:
> - new patch, based on the discussion in [0].
> [0] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
> 

Acked-by: Dinh Nguyen <dinguyen@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-16 18:44   ` Dinh Nguyen
@ 2022-08-16 20:16     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-16 20:16 UTC (permalink / raw)
  To: Dinh Nguyen, kexec, linux-edac, Tony Luck, bp
  Cc: akpm, pmladek, bhe, linux-kernel, linux-hyperv, netdev, x86,
	kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, corbet, d.hatayama, dave.hansen, dyoung,
	feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will

On 16/08/2022 15:44, Dinh Nguyen wrote:
> [...] 
>> V2:
>> - new patch, based on the discussion in [0].
>> [0] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
>>
> 
> Acked-by: Dinh Nguyen <dinguyen@kernel.org>

Thanks a lot Dinh!

There is something I'm asking for maintainers on patches in this series,
so here it goes for you (CC Borislav / Tony): do you think it's possible
to pick this one in your tree directly, from this series, or do you
prefer I re-submit as a standalone patch, on linux-edac maybe?

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-07-19 19:53 ` [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
  2022-08-07 15:48   ` Guilherme G. Piccoli
  2022-08-16 18:44   ` Dinh Nguyen
@ 2022-08-17 17:31   ` Borislav Petkov
  2022-08-17 18:45     ` Guilherme G. Piccoli
  2 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2022-08-17 17:31 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel, linux-hyperv, netdev,
	x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, corbet, d.hatayama,
	dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, linux-edac, Dinh Nguyen, Tony Luck

On Tue, Jul 19, 2022 at 04:53:23PM -0300, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
> 
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.

How does the fact that kdump is loaded, obviate the need to print
information about the errors?

Are you suggesting that people who have the whole vmcore would be able
to piece together the error information?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 17:31   ` Borislav Petkov
@ 2022-08-17 18:45     ` Guilherme G. Piccoli
  2022-08-17 19:34       ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-17 18:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, bhe, pmladek, kexec, linux-kernel, linux-hyperv, netdev,
	x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, corbet, d.hatayama,
	dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, linux-edac, Dinh Nguyen, Tony Luck

On 17/08/2022 14:31, Borislav Petkov wrote:
> [...]
> 
> How does the fact that kdump is loaded, obviate the need to print
> information about the errors?
> 
> Are you suggesting that people who have the whole vmcore would be able
> to piece together the error information?
> 

Hi Boris, thanks for chiming in.

So, this is part of an effort to clean-up the panic path. Currently, if
a kdump happens, *all* the panic notifiers are skipped by default,
including this one. In this scenario, this patch seems like a no-op.

But happens that in the refactor we are proposing [0], some notifiers
should run before the kdump. We are basically putting some ordering in
the way notifiers are executed, while documenting this properly and with
the goal of not increasing the failure risk for kdump.

This patch is useful so we can bring the altera EDAC notifier to run
earlier while not increasing the risk on kdump - this operation is a bit
"delicate" to happen in the panic scenario. The origin of this patch was
a discussion with Tony/Peter [1], guess we can call it a "compromise
solution".

Let me know if you disagree or have more questions, and in case you'd
like to check/participate in the whole panic notifiers refactor
discussion, it would be great =)
Cheers,


Guilherme


[0]
https://lore.kernel.org/lkml/20220427224924.592546-1-gpiccoli@igalia.com/

[1]
https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 18:45     ` Guilherme G. Piccoli
@ 2022-08-17 19:34       ` Borislav Petkov
  2022-08-17 20:28         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2022-08-17 19:34 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel, linux-hyperv, netdev,
	x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, corbet, d.hatayama,
	dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, linux-edac, Dinh Nguyen, Tony Luck

On Wed, Aug 17, 2022 at 03:45:30PM -0300, Guilherme G. Piccoli wrote:
> But happens that in the refactor we are proposing [0], some notifiers
> should run before the kdump. We are basically putting some ordering in
> the way notifiers are executed, while documenting this properly and with
> the goal of not increasing the failure risk for kdump.

What is "the failure risk for kdump"?

Some of the notifiers which run before kdump might fail and thus prevent
the machine from kdumping?

> This patch is useful so we can bring the altera EDAC notifier to run
> earlier while not increasing the risk on kdump - this operation is a bit
> "delicate" to happen in the panic scenario. The origin of this patch was
> a discussion with Tony/Peter [1], guess we can call it a "compromise
> solution".

My question stands: if kdump is loaded and the s10_edac_dberr_handler()
does not read the the fatal errors and they don't get shown in dmesg
before the machine panics, how do you intend to show that information to
the user?

Because fatal errors are something you absolutely wanna show, at least,
in dmesg!

I don't think you can "read" the errors from vmcore - they need to be
read from the hw registers before the machine dies.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 19:34       ` Borislav Petkov
@ 2022-08-17 20:28         ` Guilherme G. Piccoli
  2022-08-17 21:02           ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-17 20:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, bhe, pmladek, kexec, linux-kernel, linux-hyperv, netdev,
	x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, corbet, d.hatayama,
	dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, linux-edac, Dinh Nguyen, Tony Luck

On 17/08/2022 16:34, Borislav Petkov wrote:
> [...]
> 
> What is "the failure risk for kdump"?
> 
> Some of the notifiers which run before kdump might fail and thus prevent
> the machine from kdumping?
>

Exactly; some notifiers could break the machine and prevent a successful
kdump. The EDAC one is consider medium risk, due to invasive operations
(register readings on panic situation).


> [...] 
> My question stands: if kdump is loaded and the s10_edac_dberr_handler()
> does not read the the fatal errors and they don't get shown in dmesg
> before the machine panics, how do you intend to show that information to
> the user?
> 
> Because fatal errors are something you absolutely wanna show, at least,
> in dmesg!
> 
> I don't think you can "read" the errors from vmcore - they need to be
> read from the hw registers before the machine dies.
> 

My understanding is the same as yours, i.e., this is not possible to
collect from vmcore, it requires register reading. But again: if you
kdump your machine today, you won't collect this information, patch
changed nothing in that regard.

The one thing it changes is that you'd skip the altera register dump if
kdump is set AND you managed to also set "crash_kexec_post_notifiers".

In case you / Dinh / Tony disagrees with the patch, it's fine and we can
discard it, but then this notifier couldn't run early in the refactor we
are doing, it'd postponed to run later. This are is full of trade-offs,
we just need to choose what compromise solution is preferred by the
majority of developers =)

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 20:28         ` Guilherme G. Piccoli
@ 2022-08-17 21:02           ` Borislav Petkov
  2022-08-17 21:39             ` Guilherme G. Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2022-08-17 21:02 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, pmladek, kexec, linux-kernel, linux-hyperv, netdev,
	x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, corbet, d.hatayama,
	dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, linux-edac, Dinh Nguyen, Tony Luck

On Wed, Aug 17, 2022 at 05:28:34PM -0300, Guilherme G. Piccoli wrote:
> My understanding is the same as yours, i.e., this is not possible to
> collect from vmcore, it requires register reading. But again: if you
> kdump your machine today, you won't collect this information, patch
> changed nothing in that regard.

Why won't you be able to collect it? You can certainly access dmesg in
the vmcore and see those errors logged there.

> The one thing it changes is that you'd skip the altera register dump if
> kdump is set AND you managed to also set "crash_kexec_post_notifiers".

What your patch changes is, it prevents s10_edac_dberr_handler() from
logging potentially important fatal hw errors when kdump is loaded.

If Dinh is fine with that, I'll take the patch. But it looks like a bad
idea to me.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 21:02           ` Borislav Petkov
@ 2022-08-17 21:39             ` Guilherme G. Piccoli
  2022-08-17 21:46               ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-17 21:39 UTC (permalink / raw)
  To: Borislav Petkov, pmladek, Dinh Nguyen, Tony Luck
  Cc: akpm, bhe, kexec, linux-kernel, linux-hyperv, netdev, x86,
	kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, corbet, d.hatayama, dave.hansen, dyoung,
	feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	linux-edac

On 17/08/2022 18:02, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 05:28:34PM -0300, Guilherme G. Piccoli wrote:
>> My understanding is the same as yours, i.e., this is not possible to
>> collect from vmcore, it requires register reading. But again: if you
>> kdump your machine today, you won't collect this information, patch
>> changed nothing in that regard.
> 
> Why won't you be able to collect it? You can certainly access dmesg in
> the vmcore and see those errors logged there.

Sorry for the confusion, let me try to be a bit more clear:

(1) if we kdump but we *didn't run* s10_edac_dberr_handler() before
kdump, the information is lost, since s10_edac_dberr_handler() performs
register readings. That information is not contained inside the vmcore.

(2) If for some reason the function s10_edac_dberr_handler() *was
executed prior to kdump*, of course the registers information would be
on dmesg, easy to collect in the vmcore.

Makes sense?


> 
>> The one thing it changes is that you'd skip the altera register dump if
>> kdump is set AND you managed to also set "crash_kexec_post_notifiers".
> 
> What your patch changes is, it prevents s10_edac_dberr_handler() from
> logging potentially important fatal hw errors when kdump is loaded.

Agreed. If kdump is loaded, we cannot log that information (modulo that
we do not collect it today by default on kdump as well).
The other part of story (the reason of the patch) is that we plan to
start running this panic notifier a bit earlier, being able to collect
such edac logs with pstore, for example.


> 
> If Dinh is fine with that, I'll take the patch. But it looks like a bad
> idea to me.
> 

I think we should seek what the majority of the folks consider the best,
in order to converge to some well-accepted solution. I'm completely OK
in dropping this one and rework with some other idea, or we can leave
this panic notifier as is, continue running that a bit later.

Tony / Petr (when back), suggestions are welcome =)
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 21:39             ` Guilherme G. Piccoli
@ 2022-08-17 21:46               ` Borislav Petkov
  2022-08-17 21:56                 ` Guilherme G. Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2022-08-17 21:46 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: pmladek, Dinh Nguyen, Tony Luck, akpm, bhe, kexec, linux-kernel,
	linux-hyperv, netdev, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, linux-edac

On Wed, Aug 17, 2022 at 06:39:07PM -0300, Guilherme G. Piccoli wrote:
> Sorry for the confusion, let me try to be a bit more clear:

I think you're missing the point. Lemme try again:

You *absolutely* must log those errors because they're important. It
doesn't matter if this is done in a panic notifier and you're changing
that whole shebang or through some other magic.

If you do, then this driver needs to *still* *log* those fatal errors -
regardless through a panic notifier or some novel contraption it wants
to use.

So if you want to change the panic notifiers, you *must* make sure those
errors still do get logged.

Better?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 21:46               ` Borislav Petkov
@ 2022-08-17 21:56                 ` Guilherme G. Piccoli
  2022-08-17 22:00                   ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-17 21:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: pmladek, Dinh Nguyen, Tony Luck, akpm, bhe, kexec, linux-kernel,
	linux-hyperv, netdev, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, linux-edac

On 17/08/2022 18:46, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 06:39:07PM -0300, Guilherme G. Piccoli wrote:
>> Sorry for the confusion, let me try to be a bit more clear:
> 
> I think you're missing the point. Lemme try again:
> 
> You *absolutely* must log those errors because they're important. It
> doesn't matter if this is done in a panic notifier and you're changing
> that whole shebang or through some other magic.
> 
> If you do, then this driver needs to *still* *log* those fatal errors -
> regardless through a panic notifier or some novel contraption it wants
> to use.
> 
> So if you want to change the panic notifiers, you *must* make sure those
> errors still do get logged.
> 
> Better?
> 

Boris, I understand the importance of the logs, for sure!

But do you agree that currently, in case of a kdump, that information
*is not collected*, with our without my patch?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 21:56                 ` Guilherme G. Piccoli
@ 2022-08-17 22:00                   ` Borislav Petkov
  2022-08-17 22:09                     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2022-08-17 22:00 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: pmladek, Dinh Nguyen, Tony Luck, akpm, bhe, kexec, linux-kernel,
	linux-hyperv, netdev, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, linux-edac

On Wed, Aug 17, 2022 at 06:56:11PM -0300, Guilherme G. Piccoli wrote:
> But do you agree that currently, in case of a kdump, that information
> *is not collected*, with our without my patch?

If for some reason that panic notifier does not get run before kdump,
then that's a bug and that driver should not use a panic notifier to log
hw errors in the first place.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 22:00                   ` Borislav Petkov
@ 2022-08-17 22:09                     ` Guilherme G. Piccoli
  2022-08-17 22:19                       ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-17 22:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: pmladek, Dinh Nguyen, Tony Luck, akpm, bhe, kexec, linux-kernel,
	linux-hyperv, netdev, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, linux-edac

On 17/08/2022 19:00, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 06:56:11PM -0300, Guilherme G. Piccoli wrote:
>> But do you agree that currently, in case of a kdump, that information
>> *is not collected*, with our without my patch?
> 
> If for some reason that panic notifier does not get run before kdump,
> then that's a bug and that driver should not use a panic notifier to log
> hw errors in the first place.
> 

Indeed, the notifiers don't run before kdump by default, in a conscious
decision of the kdump maintainers.

You might be right, in the sense that maybe the edac error handler
shouldn't run as a panic notifier. Let's see if Tony / Dinh can chime in
on that discussion - we could move it to run in the kexec event as well,
so it'd always run before a kdump, but maybe the risk it offers during
panic time is not worth.

Again - a matter of a trade-off, a good compromise must be agreed by all
parties (kdump maintainers are usually extremely afraid of taking risks
to not break kdump).

Cheers!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 22:09                     ` Guilherme G. Piccoli
@ 2022-08-17 22:19                       ` Borislav Petkov
  2022-08-17 22:49                         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2022-08-17 22:19 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: pmladek, Dinh Nguyen, Tony Luck, akpm, bhe, kexec, linux-kernel,
	linux-hyperv, netdev, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, linux-edac

On Wed, Aug 17, 2022 at 07:09:26PM -0300, Guilherme G. Piccoli wrote:
> Again - a matter of a trade-off, a good compromise must be agreed by all
> parties (kdump maintainers are usually extremely afraid of taking risks
> to not break kdump).

Right, and logging hw errors from a panic notifier feels simply weird.

x86 has its own, special notifier exactly for that and it is independent
from the panic path but it gets run right after the exception raised due
to the hw error is done.

Dunno if ARM has such facilities tho...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-17 22:19                       ` Borislav Petkov
@ 2022-08-17 22:49                         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 17+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-17 22:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: pmladek, Dinh Nguyen, Tony Luck, akpm, bhe, kexec, linux-kernel,
	linux-hyperv, netdev, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will, linux-edac

On 17/08/2022 19:19, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 07:09:26PM -0300, Guilherme G. Piccoli wrote:
>> Again - a matter of a trade-off, a good compromise must be agreed by all
>> parties (kdump maintainers are usually extremely afraid of taking risks
>> to not break kdump).
> 
> Right, and logging hw errors from a panic notifier feels simply weird.
> 
> x86 has its own, special notifier exactly for that and it is independent
> from the panic path but it gets run right after the exception raised due
> to the hw error is done.
> 
> Dunno if ARM has such facilities tho...
> 
> Thx.
> 

Yeah, MCE stuff right? Not sure for ARM and other architectures as well.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-08-17 22:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-19 19:53 [PATCH v2 00/13] The panic notifiers refactor strikes back - fixes/clean-ups Guilherme G. Piccoli
2022-07-19 19:53 ` [PATCH v2 10/13] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
2022-08-07 15:48   ` Guilherme G. Piccoli
2022-08-16 18:44   ` Dinh Nguyen
2022-08-16 20:16     ` Guilherme G. Piccoli
2022-08-17 17:31   ` Borislav Petkov
2022-08-17 18:45     ` Guilherme G. Piccoli
2022-08-17 19:34       ` Borislav Petkov
2022-08-17 20:28         ` Guilherme G. Piccoli
2022-08-17 21:02           ` Borislav Petkov
2022-08-17 21:39             ` Guilherme G. Piccoli
2022-08-17 21:46               ` Borislav Petkov
2022-08-17 21:56                 ` Guilherme G. Piccoli
2022-08-17 22:00                   ` Borislav Petkov
2022-08-17 22:09                     ` Guilherme G. Piccoli
2022-08-17 22:19                       ` Borislav Petkov
2022-08-17 22:49                         ` Guilherme G. Piccoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).