linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] EDAC/igen6: Fix the issue of no error events
@ 2023-07-25  8:04 Qiuxu Zhuo
  2023-07-25 15:51 ` Luck, Tony
  2023-08-02 20:18 ` Luck, Tony
  0 siblings, 2 replies; 4+ messages in thread
From: Qiuxu Zhuo @ 2023-07-25  8:04 UTC (permalink / raw)
  To: Tony Luck
  Cc: Qiuxu Zhuo, Borislav Petkov, Aristeu Rozanski,
	Mauro Carvalho Chehab, linux-edac, linux-kernel, Ee Wey Lim

Current igen6_edac checks for pending errors before the registration
of the error handler. However, there is a possibility that the error
occurs during the registration process, leading to unhandled pending
errors and no future error events. This issue can be reproduced by
repeatedly injecting errors during the loading of the igen6_edac.

Fix this issue by moving the pending error handler after the registration
of the error handler, ensuring that no pending errors are left unhandled.

Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: Ee Wey Lim <ee.wey.lim@intel.com>
Tested-by: Ee Wey Lim <ee.wey.lim@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 drivers/edac/igen6_edac.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 544dd19072ea..1a18693294db 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -27,7 +27,7 @@
 #include "edac_mc.h"
 #include "edac_module.h"
 
-#define IGEN6_REVISION	"v2.5"
+#define IGEN6_REVISION	"v2.5.1"
 
 #define EDAC_MOD_STR	"igen6_edac"
 #define IGEN6_NMI_NAME	"igen6_ibecc"
@@ -1216,9 +1216,6 @@ static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	INIT_WORK(&ecclog_work, ecclog_work_cb);
 	init_irq_work(&ecclog_irq_work, ecclog_irq_work_cb);
 
-	/* Check if any pending errors before registering the NMI handler */
-	ecclog_handler();
-
 	rc = register_err_handler();
 	if (rc)
 		goto fail3;
@@ -1230,6 +1227,9 @@ static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto fail4;
 	}
 
+	/* Check if any pending errors before/during the registration of the error handler */
+	ecclog_handler();
+
 	igen6_debug_setup();
 	return 0;
 fail4:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events
  2023-07-25  8:04 [PATCH 1/1] EDAC/igen6: Fix the issue of no error events Qiuxu Zhuo
@ 2023-07-25 15:51 ` Luck, Tony
  2023-07-26  2:31   ` Zhuo, Qiuxu
  2023-08-02 20:18 ` Luck, Tony
  1 sibling, 1 reply; 4+ messages in thread
From: Luck, Tony @ 2023-07-25 15:51 UTC (permalink / raw)
  To: Zhuo, Qiuxu
  Cc: Borislav Petkov, Aristeu Rozanski, Mauro Carvalho Chehab,
	linux-edac, linux-kernel, Lim, Ee Wey

> Fix this issue by moving the pending error handler after the registration
> of the error handler, ensuring that no pending errors are left unhandled.

Do you think drivers/edac/e7xxx_edac.c has the same issue?

491
492         /* clear any pending errors, or initial state bits */
493         e7xxx_get_error_info(mci, &discard);
494
495         /* Here we assume that we will never see multiple instances of this
496          * type of memory controller.  The ID is therefore hardcoded to 0.
497          */
498         if (edac_mc_add_mc(mci)) {
499                 edac_dbg(3, "failed edac_mc_add_mc()\n");
500                 goto fail1;
501         }
502
503         /* allocating generic PCI control info */
504         e7xxx_pci = edac_pci_create_generic_ctl(&pdev->dev, EDAC_MOD_STR);

Though it might be hard to find such an old system to test.

-Tony


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events
  2023-07-25 15:51 ` Luck, Tony
@ 2023-07-26  2:31   ` Zhuo, Qiuxu
  0 siblings, 0 replies; 4+ messages in thread
From: Zhuo, Qiuxu @ 2023-07-26  2:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Aristeu Rozanski, Mauro Carvalho Chehab,
	linux-edac, linux-kernel, Lim, Ee Wey

> From: Luck, Tony <tony.luck@intel.com>
> ...
> Subject: RE: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events
> 
> > Fix this issue by moving the pending error handler after the
> > registration of the error handler, ensuring that no pending errors are left
> unhandled.
> 
> Do you think drivers/edac/e7xxx_edac.c has the same issue?

Hi Tony,

Based on the code [1], the e7xxx_edac works in polling mode and the pending 
errors can be handled within one period after the error handler registration.

So, I don't think the e7xxx_edac has the same issue.

[1] e7xxx_probe1()-> mci->edac_check = e7xxx_check;
                                  -> edac_mc_add_mc() -> if (mci->edac_check) { mci->op_state = OP_RUNNING_POLL; ... }

> 491
> 492         /* clear any pending errors, or initial state bits */
> 493         e7xxx_get_error_info(mci, &discard);

                   This function is also invoked periodically in polling mode:
                   mci->edac_check() -> e7xxx_check() -> e7xxx_get_error_info()

> 494
> 495         /* Here we assume that we will never see multiple instances of this
> 496          * type of memory controller.  The ID is therefore hardcoded to 0.
> 497          */
> 498         if (edac_mc_add_mc(mci)) {
> 499                 edac_dbg(3, "failed edac_mc_add_mc()\n");
> 500                 goto fail1;
> 501         }
> 502
> 503         /* allocating generic PCI control info */
> 504         e7xxx_pci = edac_pci_create_generic_ctl(&pdev->dev,
> EDAC_MOD_STR);
> 
> Though it might be hard to find such an old system to test.
> 
> -Tony


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events
  2023-07-25  8:04 [PATCH 1/1] EDAC/igen6: Fix the issue of no error events Qiuxu Zhuo
  2023-07-25 15:51 ` Luck, Tony
@ 2023-08-02 20:18 ` Luck, Tony
  1 sibling, 0 replies; 4+ messages in thread
From: Luck, Tony @ 2023-08-02 20:18 UTC (permalink / raw)
  To: Zhuo, Qiuxu
  Cc: Borislav Petkov, Aristeu Rozanski, Mauro Carvalho Chehab,
	linux-edac, linux-kernel, Lim, Ee Wey

> Current igen6_edac checks for pending errors before the registration
> of the error handler. However, there is a possibility that the error
> occurs during the registration process, leading to unhandled pending
> errors and no future error events. This issue can be reproduced by
> repeatedly injecting errors during the loading of the igen6_edac.
>
> Fix this issue by moving the pending error handler after the registration
> of the error handler, ensuring that no pending errors are left unhandled.
>
> Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
> Reported-by: Ee Wey Lim <ee.wey.lim@intel.com>
> Tested-by: Ee Wey Lim <ee.wey.lim@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

Applied to RAS tree for next merge window.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-02 20:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-25  8:04 [PATCH 1/1] EDAC/igen6: Fix the issue of no error events Qiuxu Zhuo
2023-07-25 15:51 ` Luck, Tony
2023-07-26  2:31   ` Zhuo, Qiuxu
2023-08-02 20:18 ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).