linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 00/11] The panic notifiers refactor - fixes/clean-ups (V3)
@ 2022-08-19 22:17 Guilherme G. Piccoli
  2022-08-19 22:17 ` [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
  0 siblings, 1 reply; 8+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-19 22:17 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, linux-hyperv, netdev, x86, kernel-dev, kernel,
	halves, fabiomirmar, alejandro.j.jimenez, andriy.shevchenko,
	arnd, bp, corbet, d.hatayama, dave.hansen, dyoung, feng.tang,
	gregkh, mikelley, hidehiro.kawai.ez, jgross, john.ogness,
	keescook, luto, mhiramat, mingo, paulmck, peterz, rostedt,
	senozhatsky, stern, tglx, vgoyal, vkuznets, will, xuqiang36,
	Guilherme G. Piccoli, bcm-kernel-feedback-list, linux-alpha,
	linux-arm-kernel, linux-edac, linux-efi, linux-parisc, linux-um

Hey everybody, this the third iteration of the panic notifiers
fixes/clean-ups;

V2 available at:
https://lore.kernel.org/lkml/20220719195325.402745-1-gpiccoli@igalia.com/

V1 (including the refactor) available at:
https://lore.kernel.org/lkml/20220427224924.592546-1-gpiccoli@igalia.com/


There wasn't much change here compared to V2 (the specifics are in the
patches), but a global change is that I've rebased against 6.0-rc1.
One patch got merged in -next, another one was re-submit in a standalone
format (requested by maintainer), so both of these are not here anymore.


As usual, tested this series building for all affected architecture/drivers
and also through some boot/runtime tests; below the test "matrix" used:

Build tests (using cross-compilers): alpha, arm, arm64, parisc, um, x86_64.
Boot/Runtime tests: x86_64 (QEMU guests and Steam Deck).

Here is the link with the .config files used:
https://people.igalia.com/gpiccoli/panic_notifiers_configs/6.0-rc1/


About the merge strategy: I've noticed there is a difference in maintainers
preferences (and my preference as well), so I see 3 strategies for merge:

(a) Maintainers pick patches that are good from the series and merge in
their trees;

(b) Some maintainer would pick the whole series and merge, at once, given
that everything is fine/ack/reviewed.

(c) I must re-send patches individually once they are reviewed/acked, as
standalone patches to the relevant maintainers, so they can merge it in
their trees.

I'm willing to do what's best for everybody - (a) is my choice when possible,
(b) seems to stall things and potentially cause conflicts, (c) seems to be
the compromise. I'll do that as per preference of the respective maintainers.


As usual, reviews / comments are always welcome, thanks in advance for them!
Cheers,

Guilherme


Guilherme G. Piccoli (11):
  ARM: Disable FIQs (but not IRQs) on CPUs shutdown paths
  notifier: Add panic notifiers info and purge trailing whitespaces
  alpha: Clean-up the panic notifier code
  um: Improve panic notifiers consistency and ordering
  parisc: Replace regular spinlock with spin_trylock on panic path
  tracing: Improve panic/die notifiers
  notifiers: Add tracepoints to the notifiers infrastructure
  EDAC/altera: Skip the panic notifier if kdump is loaded
  video/hyperv_fb: Avoid taking busy spinlock on panic path
  drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  panic: Fixes the panic_print NMI backtrace setting

 arch/alpha/kernel/setup.c        |  36 +++++-----
 arch/arm/kernel/machine_kexec.c  |   2 +
 arch/arm/kernel/smp.c            |   5 +-
 arch/parisc/include/asm/pdc.h    |   1 +
 arch/parisc/kernel/firmware.c    |  27 ++++++--
 arch/um/drivers/mconsole_kern.c  |   7 +-
 arch/um/kernel/um_arch.c         |   8 +--
 drivers/edac/altera_edac.c       |  16 +++--
 drivers/hv/ring_buffer.c         |  13 ++++
 drivers/hv/vmbus_drv.c           | 109 +++++++++++++++++++------------
 drivers/parisc/power.c           |  17 +++--
 drivers/video/fbdev/hyperv_fb.c  |  16 ++++-
 include/linux/hyperv.h           |   2 +
 include/linux/notifier.h         |   8 ++-
 include/trace/events/notifiers.h |  69 +++++++++++++++++++
 kernel/notifier.c                |   6 ++
 kernel/panic.c                   |  47 +++++++------
 kernel/trace/trace.c             |  55 ++++++++--------
 18 files changed, 302 insertions(+), 142 deletions(-)
 create mode 100644 include/trace/events/notifiers.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-19 22:17 [PATCH V3 00/11] The panic notifiers refactor - fixes/clean-ups (V3) Guilherme G. Piccoli
@ 2022-08-19 22:17 ` Guilherme G. Piccoli
  2022-09-18 14:10   ` Guilherme G. Piccoli
  2022-12-09 16:03   ` Petr Mladek
  0 siblings, 2 replies; 8+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-19 22:17 UTC (permalink / raw)
  To: akpm, bhe, pmladek, kexec
  Cc: linux-kernel, linux-hyperv, netdev, x86, kernel-dev, kernel,
	halves, fabiomirmar, alejandro.j.jimenez, andriy.shevchenko,
	arnd, bp, corbet, d.hatayama, dave.hansen, dyoung, feng.tang,
	gregkh, mikelley, hidehiro.kawai.ez, jgross, john.ogness,
	keescook, luto, mhiramat, mingo, paulmck, peterz, rostedt,
	senozhatsky, stern, tglx, vgoyal, vkuznets, will, xuqiang36,
	Guilherme G. Piccoli, linux-edac, Tony Luck, Dinh Nguyen

The altera_edac panic notifier performs some data collection with
regards errors detected; such code relies in the regmap layer to
perform reads/writes, so the code is abstracted and there is some
risk level to execute that, since the panic path runs in atomic
context, with interrupts/preemption and secondary CPUs disabled.

Users want the information collected in this panic notifier though,
so in order to balance the risk/benefit, let's skip the altera panic
notifier if kdump is loaded. While at it, remove a useless header
and encompass a macro inside the sole ifdef block it is used.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

---

V3:
- added the ack tag from Dinh - thanks!
- had a good discussion with Boris about that in V2 [0],
hopefully we can continue and reach a consensus in this V3.
[0] https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@igalia.com/

V2:
- new patch, based on the discussion in [1].
[1] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/


 drivers/edac/altera_edac.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index e7e8e624a436..741fe5539154 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -16,7 +16,6 @@
 #include <linux/kernel.h>
 #include <linux/mfd/altera-sysmgr.h>
 #include <linux/mfd/syscon.h>
-#include <linux/notifier.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
 #include <linux/of_platform.h>
@@ -24,6 +23,7 @@
 #include <linux/platform_device.h>
 #include <linux/regmap.h>
 #include <linux/types.h>
+#include <linux/kexec.h>
 #include <linux/uaccess.h>
 
 #include "altera_edac.h"
@@ -2063,22 +2063,30 @@ static const struct irq_domain_ops a10_eccmgr_ic_ops = {
 };
 
 /************** Stratix 10 EDAC Double Bit Error Handler ************/
-#define to_a10edac(p, m) container_of(p, struct altr_arria10_edac, m)
-
 #ifdef CONFIG_64BIT
 /* panic routine issues reboot on non-zero panic_timeout */
 extern int panic_timeout;
 
+#define to_a10edac(p, m) container_of(p, struct altr_arria10_edac, m)
+
 /*
  * The double bit error is handled through SError which is fatal. This is
  * called as a panic notifier to printout ECC error info as part of the panic.
+ *
+ * Notice that if kdump is set, we take the risk avoidance approach and
+ * skip the notifier, given that users are expected to have access to a
+ * full vmcore.
  */
 static int s10_edac_dberr_handler(struct notifier_block *this,
 				  unsigned long event, void *ptr)
 {
-	struct altr_arria10_edac *edac = to_a10edac(this, panic_notifier);
+	struct altr_arria10_edac *edac;
 	int err_addr, dberror;
 
+	if (kexec_crash_loaded())
+		return NOTIFY_DONE;
+
+	edac = to_a10edac(this, panic_notifier);
 	regmap_read(edac->ecc_mgr_map, S10_SYSMGR_ECC_INTSTAT_DERR_OFST,
 		    &dberror);
 	regmap_write(edac->ecc_mgr_map, S10_SYSMGR_UE_VAL_OFST, dberror);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-19 22:17 ` [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
@ 2022-09-18 14:10   ` Guilherme G. Piccoli
  2022-10-17 14:05     ` Guilherme G. Piccoli
  2022-11-22 13:33     ` Guilherme G. Piccoli
  2022-12-09 16:03   ` Petr Mladek
  1 sibling, 2 replies; 8+ messages in thread
From: Guilherme G. Piccoli @ 2022-09-18 14:10 UTC (permalink / raw)
  To: bhe, kexec, Dinh Nguyen, Tony Luck, linux-edac, bp
  Cc: pmladek, akpm, linux-kernel, linux-hyperv, netdev, x86,
	kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, corbet, d.hatayama, dave.hansen, dyoung,
	feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	xuqiang36

On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
> 
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.
> 
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Acked-by: Dinh Nguyen <dinguyen@kernel.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> ---
> 
> V3:
> - added the ack tag from Dinh - thanks!
> - had a good discussion with Boris about that in V2 [0],
> hopefully we can continue and reach a consensus in this V3.
> [0] https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@igalia.com/
> 
> V2:
> - new patch, based on the discussion in [1].
> [1] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
> 
> [...]

Hi Dinh, Tony, Boris - sorry for the ping.

Appreciate reviews on this one - Dinh already ACKed the patch but Boris
raised some points in the past version [0], so any opinions or
discussions are welcome!

Thanks,


Guilherme


[0]
https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@igalia.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-09-18 14:10   ` Guilherme G. Piccoli
@ 2022-10-17 14:05     ` Guilherme G. Piccoli
  2022-11-22 13:33     ` Guilherme G. Piccoli
  1 sibling, 0 replies; 8+ messages in thread
From: Guilherme G. Piccoli @ 2022-10-17 14:05 UTC (permalink / raw)
  To: Dinh Nguyen, Tony Luck, linux-edac, bp
  Cc: kexec, bhe, pmladek, akpm, linux-kernel, linux-hyperv, netdev,
	x86, kernel-dev, kernel, halves, fabiomirmar,
	alejandro.j.jimenez, andriy.shevchenko, arnd, corbet, d.hatayama,
	dave.hansen, dyoung, feng.tang, gregkh, mikelley,
	hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
	mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will, xuqiang36

On 18/09/2022 11:10, Guilherme G. Piccoli wrote:
> On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
>> The altera_edac panic notifier performs some data collection with
>> regards errors detected; such code relies in the regmap layer to
>> perform reads/writes, so the code is abstracted and there is some
>> risk level to execute that, since the panic path runs in atomic
>> context, with interrupts/preemption and secondary CPUs disabled.
>>
>> Users want the information collected in this panic notifier though,
>> so in order to balance the risk/benefit, let's skip the altera panic
>> notifier if kdump is loaded. While at it, remove a useless header
>> and encompass a macro inside the sole ifdef block it is used.
>>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Petr Mladek <pmladek@suse.com>
>> Cc: Tony Luck <tony.luck@intel.com>
>> Acked-by: Dinh Nguyen <dinguyen@kernel.org>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>>
>> ---
>>
>> V3:
>> - added the ack tag from Dinh - thanks!
>> - had a good discussion with Boris about that in V2 [0],
>> hopefully we can continue and reach a consensus in this V3.
>> [0] https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@igalia.com/
>>
>> V2:
>> - new patch, based on the discussion in [1].
>> [1] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
>>
>> [...]
> 
> Hi Dinh, Tony, Boris - sorry for the ping.

Hey folks, apologies for the new ping.

Is there anything to improve here maybe? Reviews / opinions are very
appreciated!
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-09-18 14:10   ` Guilherme G. Piccoli
  2022-10-17 14:05     ` Guilherme G. Piccoli
@ 2022-11-22 13:33     ` Guilherme G. Piccoli
  2022-11-22 15:06       ` Borislav Petkov
  1 sibling, 1 reply; 8+ messages in thread
From: Guilherme G. Piccoli @ 2022-11-22 13:33 UTC (permalink / raw)
  To: Dinh Nguyen, Tony Luck, linux-edac, bp
  Cc: kexec, pmladek, linux-kernel, linux-hyperv, netdev, x86,
	kernel-dev, kernel

On 18/09/2022 11:10, Guilherme G. Piccoli wrote:
> On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
>> The altera_edac panic notifier performs some data collection with
>> regards errors detected; such code relies in the regmap layer to
>> perform reads/writes, so the code is abstracted and there is some
>> risk level to execute that, since the panic path runs in atomic
>> context, with interrupts/preemption and secondary CPUs disabled.
>>
>> Users want the information collected in this panic notifier though,
>> so in order to balance the risk/benefit, let's skip the altera panic
>> notifier if kdump is loaded. While at it, remove a useless header
>> and encompass a macro inside the sole ifdef block it is used.
>>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Petr Mladek <pmladek@suse.com>
>> Cc: Tony Luck <tony.luck@intel.com>
>> Acked-by: Dinh Nguyen <dinguyen@kernel.org>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>>
>> ---
>>
>> V3:
>> - added the ack tag from Dinh - thanks!
>> - had a good discussion with Boris about that in V2 [0],
>> hopefully we can continue and reach a consensus in this V3.
>> [0] https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@igalia.com/
>>
>> V2:
>> - new patch, based on the discussion in [1].
>> [1] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
>>
>> [...]
> 
> Hi Dinh, Tony, Boris - sorry for the ping.
> 
> Appreciate reviews on this one - Dinh already ACKed the patch but Boris
> raised some points in the past version [0], so any opinions or
> discussions are welcome!


Hi folks, monthly ping heheh
Apologies for the re-pings, please let me know if there is anything
required to move on this patch.

Cheers,


Guilherme


P.S. I've been trimming the huge CC list in the series, done it here as
well.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-11-22 13:33     ` Guilherme G. Piccoli
@ 2022-11-22 15:06       ` Borislav Petkov
  2023-02-10 16:01         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2022-11-22 15:06 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Tony Luck, Rabara Niravkumar L
  Cc: Dinh Nguyen, linux-edac, kexec, pmladek, linux-kernel,
	linux-hyperv, netdev, x86, kernel-dev, kernel

On Tue, Nov 22, 2022 at 10:33:12AM -0300, Guilherme G. Piccoli wrote:

Leaving in the whole thing for newly added people.

> On 18/09/2022 11:10, Guilherme G. Piccoli wrote:
> > On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
> >> The altera_edac panic notifier performs some data collection with
> >> regards errors detected; such code relies in the regmap layer to
> >> perform reads/writes, so the code is abstracted and there is some
> >> risk level to execute that, since the panic path runs in atomic
> >> context, with interrupts/preemption and secondary CPUs disabled.
> >>
> >> Users want the information collected in this panic notifier though,
> >> so in order to balance the risk/benefit, let's skip the altera panic
> >> notifier if kdump is loaded. While at it, remove a useless header
> >> and encompass a macro inside the sole ifdef block it is used.
> >>
> >> Cc: Borislav Petkov <bp@alien8.de>
> >> Cc: Petr Mladek <pmladek@suse.com>
> >> Cc: Tony Luck <tony.luck@intel.com>
> >> Acked-by: Dinh Nguyen <dinguyen@kernel.org>
> >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> >>
> >> ---
> >>
> >> V3:
> >> - added the ack tag from Dinh - thanks!
> >> - had a good discussion with Boris about that in V2 [0],
> >> hopefully we can continue and reach a consensus in this V3.
> >> [0] https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@igalia.com/
> >>
> >> V2:
> >> - new patch, based on the discussion in [1].
> >> [1] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
> >>
> >> [...]
> > 
> > Hi Dinh, Tony, Boris - sorry for the ping.
> > 
> > Appreciate reviews on this one - Dinh already ACKed the patch but Boris
> > raised some points in the past version [0], so any opinions or
> > discussions are welcome!
> 
> 
> Hi folks, monthly ping heheh
> Apologies for the re-pings, please let me know if there is anything
> required to move on this patch.

Looking at this again, I really don't like the sprinkling of

	if (kexec_crash_loaded())

in unrelated code. And I still think that the real fix here is to kill
this

	edac->panic_notifier

thing. And replace it with simply logging the error from the double bit
error interrupt handle. That DBERR IRQ thing altr_edac_a10_irq_handler().
Because this is what this panic notifier does - dump double-bit errors.

Now, if Dinh doesn't move, I guess we can ask Tony and/or Rabara (he has
sent a patch for this driver recently and Altera belongs to Intel now)
to find someone who can test such a change and we (you could give it a
try first :)) can do that change.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-08-19 22:17 ` [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
  2022-09-18 14:10   ` Guilherme G. Piccoli
@ 2022-12-09 16:03   ` Petr Mladek
  1 sibling, 0 replies; 8+ messages in thread
From: Petr Mladek @ 2022-12-09 16:03 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: akpm, bhe, kexec, linux-kernel, linux-hyperv, netdev, x86,
	kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	xuqiang36, linux-edac, Tony Luck, Dinh Nguyen

On Fri 2022-08-19 19:17:28, Guilherme G. Piccoli wrote:
> The altera_edac panic notifier performs some data collection with
> regards errors detected; such code relies in the regmap layer to
> perform reads/writes, so the code is abstracted and there is some
> risk level to execute that, since the panic path runs in atomic
> context, with interrupts/preemption and secondary CPUs disabled.
> 
> Users want the information collected in this panic notifier though,
> so in order to balance the risk/benefit, let's skip the altera panic
> notifier if kdump is loaded. While at it, remove a useless header
> and encompass a macro inside the sole ifdef block it is used.
> 
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Acked-by: Dinh Nguyen <dinguyen@kernel.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> ---
> 
> V3:
> - added the ack tag from Dinh - thanks!
> - had a good discussion with Boris about that in V2 [0],
> hopefully we can continue and reach a consensus in this V3.
> [0] https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@igalia.com/
> 
> V2:
> - new patch, based on the discussion in [1].
> [1] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
> 
> 
>  drivers/edac/altera_edac.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
> index e7e8e624a436..741fe5539154 100644
> --- a/drivers/edac/altera_edac.c
> +++ b/drivers/edac/altera_edac.c
> @@ -16,7 +16,6 @@
>  #include <linux/kernel.h>
>  #include <linux/mfd/altera-sysmgr.h>
>  #include <linux/mfd/syscon.h>
> -#include <linux/notifier.h>
>  #include <linux/of_address.h>
>  #include <linux/of_irq.h>
>  #include <linux/of_platform.h>
> @@ -24,6 +23,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/regmap.h>
>  #include <linux/types.h>
> +#include <linux/kexec.h>
>  #include <linux/uaccess.h>
>  
>  #include "altera_edac.h"
> @@ -2063,22 +2063,30 @@ static const struct irq_domain_ops a10_eccmgr_ic_ops = {
>  };
>  
>  /************** Stratix 10 EDAC Double Bit Error Handler ************/
> -#define to_a10edac(p, m) container_of(p, struct altr_arria10_edac, m)
> -
>  #ifdef CONFIG_64BIT
>  /* panic routine issues reboot on non-zero panic_timeout */
>  extern int panic_timeout;
>  
> +#define to_a10edac(p, m) container_of(p, struct altr_arria10_edac, m)
> +
>  /*
>   * The double bit error is handled through SError which is fatal. This is
>   * called as a panic notifier to printout ECC error info as part of the panic.
> + *
> + * Notice that if kdump is set, we take the risk avoidance approach and
> + * skip the notifier, given that users are expected to have access to a
> + * full vmcore.
>   */
>  static int s10_edac_dberr_handler(struct notifier_block *this,
>  				  unsigned long event, void *ptr)
>  {
> -	struct altr_arria10_edac *edac = to_a10edac(this, panic_notifier);
> +	struct altr_arria10_edac *edac;
>  	int err_addr, dberror;
>  
> +	if (kexec_crash_loaded())
> +		return NOTIFY_DONE;

I have read the discussion about v2 [1] and this looks like a bad
approach from my POV.

My understanding is that the information provided by this notifier
could not be found in the crashdump. It means that people really
want to run this before crashdump in principle.

Of course, there is the question how much safe this code is. I mean
if the panic() code path might get blocked here.

I see two possibilities.

The best solution would be if we know that this is "always" safe or if
it can be done a safe way. Then we could keep it as it is or implement
the safe way.

Alternative solution would be to create a kernel parameter that
would enable/disable this particular report when kdump is enabled.
The question would be the default. It would depend on how risky
the code is and how useful the information is.

[1] https://lore.kernel.org/r/20220719195325.402745-11-gpiccoli@igalia.com

> +	edac = to_a10edac(this, panic_notifier);
>  	regmap_read(edac->ecc_mgr_map, S10_SYSMGR_ECC_INTSTAT_DERR_OFST,
>  		    &dberror);
>  	regmap_write(edac->ecc_mgr_map, S10_SYSMGR_UE_VAL_OFST, dberror);

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded
  2022-11-22 15:06       ` Borislav Petkov
@ 2023-02-10 16:01         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 8+ messages in thread
From: Guilherme G. Piccoli @ 2023-02-10 16:01 UTC (permalink / raw)
  To: Borislav Petkov, pmladek
  Cc: linux-edac, kexec, linux-kernel, linux-hyperv, netdev, x86,
	kernel-dev, kernel, Dinh Nguyen, Rabara Niravkumar L, Tony Luck

Hi Boris and Petr, first of all thanks for your great analysis and
really sorry for the huge delay in my response.

Below I'm pasting the 2 relevant responses from both Petr and Boris.


On 22/11/2022 12:06, Borislav Petkov wrote:
> On Tue, Nov 22, 2022 at 10:33:12AM -0300, Guilherme G. Piccoli wrote:
> 
> Leaving in the whole thing for newly added people.
> 
>> On 18/09/2022 11:10, Guilherme G. Piccoli wrote:
>>> On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
>>>> The altera_edac panic notifier performs some data collection with
>>>> regards errors detected; such code relies in the regmap layer to
>>>> perform reads/writes, so the code is abstracted and there is some
>>>> risk level to execute that, since the panic path runs in atomic
>>>> context, with interrupts/preemption and secondary CPUs disabled.
>>>>
>>>> Users want the information collected in this panic notifier though,
>>>> so in order to balance the risk/benefit, let's skip the altera panic
>>>> notifier if kdump is loaded. While at it, remove a useless header
>>>> and encompass a macro inside the sole ifdef block it is used.
>>>>
>>>> Cc: Borislav Petkov <bp@alien8.de>
>>>> Cc: Petr Mladek <pmladek@suse.com>
>>>> Cc: Tony Luck <tony.luck@intel.com>
>>>> Acked-by: Dinh Nguyen <dinguyen@kernel.org>
>>>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>>>>
>>>> ---
>>>>
>>>> V3:
>>>> - added the ack tag from Dinh - thanks!
>>>> - had a good discussion with Boris about that in V2 [0],
>>>> hopefully we can continue and reach a consensus in this V3.
>>>> [0] https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@igalia.com/
>>>>
>>>> V2:
>>>> - new patch, based on the discussion in [1].
>>>> [1] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@igalia.com/
>>>>
>>>> [...]
>>>
>>> Hi Dinh, Tony, Boris - sorry for the ping.
>>>
>>> Appreciate reviews on this one - Dinh already ACKed the patch but Boris
>>> raised some points in the past version [0], so any opinions or
>>> discussions are welcome!
>>
>>
>> Hi folks, monthly ping heheh
>> Apologies for the re-pings, please let me know if there is anything
>> required to move on this patch.
> 
> Looking at this again, I really don't like the sprinkling of
> 
> 	if (kexec_crash_loaded())
> 
> in unrelated code. And I still think that the real fix here is to kill
> this
> 
> 	edac->panic_notifier
> 
> thing. And replace it with simply logging the error from the double bit
> error interrupt handle. That DBERR IRQ thing altr_edac_a10_irq_handler().
> Because this is what this panic notifier does - dump double-bit errors.
> 
> Now, if Dinh doesn't move, I guess we can ask Tony and/or Rabara (he has
> sent a patch for this driver recently and Altera belongs to Intel now)
> to find someone who can test such a change and we (you could give it a
> try first :)) can do that change.
> 
> Thx.
> 

On 09/12/2022 13:03, Petr Mladek wrote:> [...]>
> I have read the discussion about v2 [1] and this looks like a bad
> approach from my POV.
>
> My understanding is that the information provided by this notifier
> could not be found in the crashdump. It means that people really
> want to run this before crashdump in principle.
>
> Of course, there is the question how much safe this code is. I mean
> if the panic() code path might get blocked here.
>
> I see two possibilities.
>
> The best solution would be if we know that this is "always" safe or if
> it can be done a safe way. Then we could keep it as it is or implement
> the safe way.
>
> Alternative solution would be to create a kernel parameter that
> would enable/disable this particular report when kdump is enabled.
> The question would be the default. It would depend on how risky
> the code is and how useful the information is.
>
> [1] https://lore.kernel.org/r/20220719195325.402745-11-gpiccoli@igalia.com
>


So, for me Petr approach is the more straightforward, though we could
rethink the idea of this notifier being...a notifier, as suggest Boris heh

Anyway, what I plan to do is: I'll re-submit a simple clean-up for this
code (header / ifdef stuff), not functional-changing the code path.

After that, when re-submitting the V2 or the notifiers refactor (which
I'm pending for some good months =O ), I'll deal with this code
properly, factoring the ideas and proposing a meaningful change.

So, let's discard this patch for now.
Thanks again,


Guilherme

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-10 16:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19 22:17 [PATCH V3 00/11] The panic notifiers refactor - fixes/clean-ups (V3) Guilherme G. Piccoli
2022-08-19 22:17 ` [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded Guilherme G. Piccoli
2022-09-18 14:10   ` Guilherme G. Piccoli
2022-10-17 14:05     ` Guilherme G. Piccoli
2022-11-22 13:33     ` Guilherme G. Piccoli
2022-11-22 15:06       ` Borislav Petkov
2023-02-10 16:01         ` Guilherme G. Piccoli
2022-12-09 16:03   ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).