linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* EDAC instances probing
@ 2020-12-11 18:19 Borislav Petkov
  2020-12-11 20:35 ` Yazen Ghannam
  0 siblings, 1 reply; 5+ messages in thread
From: Borislav Petkov @ 2020-12-11 18:19 UTC (permalink / raw)
  To: Tony Luck, Yazen Ghannam; +Cc: linux-edac

Hi guys,

so we converted a couple of EDAC drivers to per-CPU-family autoprobing
instead of the PCI device IDs one which needed constant adding of new
device IDs.

However easy the new probing is, it spams dmesg on each CPU as it tries
loading on each CPU, when there's no ECC DIMMs or ECC is disabled.
Here's the output from a 128 CPU box:

$ grep EDAC dmesg.log | sed 's/\[.*\] //' | sort | uniq -c
    128 EDAC amd64: F17h detected (node 0).
    128 EDAC amd64: Node 0: DRAM ECC disabled.
      1 EDAC MC: Ver: 3.0.0

that's 2 lines per CPU.

Btw, people have complained about the spamming.

So I tried something clumsy, see below, which fixes this into what it
should say:

$ dmesg | grep EDAC
[    2.693470] EDAC MC: Ver: 3.0.0
[    8.284461] EDAC amd64: F17h detected (node 0).
[    8.287953] EDAC amd64: Node 0: DRAM ECC disabled.
[    8.381430] EDAC amd64: F17h detected (node 1).
[    8.384684] EDAC amd64: Node 1: DRAM ECC disabled.
[    8.461902] EDAC amd64: F17h detected (node 2).
[    8.461993] EDAC amd64: Node 2: DRAM ECC disabled.
[    8.536907] EDAC amd64: F17h detected (node 3).
[    8.538923] EDAC amd64: Node 3: DRAM ECC disabled.
[    8.643213] EDAC amd64: F17h detected (node 4).
[    8.645474] EDAC amd64: Node 4: DRAM ECC disabled.
[    8.713411] EDAC amd64: F17h detected (node 5).
[    8.714818] EDAC amd64: Node 5: DRAM ECC disabled.
[    8.807825] EDAC amd64: F17h detected (node 6).
[    8.809882] EDAC amd64: Node 6: DRAM ECC disabled.
[    8.908043] EDAC amd64: F17h detected (node 7).
[    8.910883] EDAC amd64: Node 7: DRAM ECC disabled.

Once per driver instance, however each driver accounts an instance -
logical node, physical node, whatever.

So it looks like this, do you guys think this is too ugly to live?

---
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f7087ddddb90..de37d0d9a27b 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3581,6 +3581,7 @@ static int probe_one_instance(unsigned int nid)
 
 	dump_misc_regs(pvt);
 
+	set_bit(nid, edac_get_probed_instances());
 	return ret;
 
 err_enable:
@@ -3591,6 +3592,7 @@ static int probe_one_instance(unsigned int nid)
 	kfree(s);
 	ecc_stngs[nid] = NULL;
 
+	set_bit(nid, edac_get_probed_instances());
 err_out:
 	return ret;
 }
@@ -3674,6 +3676,10 @@ static int __init amd64_edac_init(void)
 		goto err_free;
 
 	for (i = 0; i < amd_nb_num(); i++) {
+
+		if (test_bit(i, edac_get_probed_instances()))
+			continue;
+
 		err = probe_one_instance(i);
 		if (err) {
 			/* unwind properly */
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index f6d462d0be2d..f97186237ccc 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -53,6 +53,15 @@ static LIST_HEAD(mc_devices);
  */
 static const char *edac_mc_owner;
 
+/* bitmap of already probed driver instances, 64 should be big enough. :-P */
+static DECLARE_BITMAP(probed_instances, 64);
+
+unsigned long *edac_get_probed_instances(void)
+{
+	return probed_instances;
+}
+EXPORT_SYMBOL_GPL(edac_get_probed_instances);
+
 static struct mem_ctl_info *error_desc_to_mci(struct edac_raw_error_desc *e)
 {
 	return container_of(e, struct mem_ctl_info, error_desc);
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 881b00eadf7a..7c0d4ac7c35a 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -255,4 +255,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
  */
 extern char *edac_op_state_to_string(int op_state);
 
+unsigned long *edac_get_probed_instances(void);
+
 #endif				/* _EDAC_MC_H_ */

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-23  4:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-11 18:19 EDAC instances probing Borislav Petkov
2020-12-11 20:35 ` Yazen Ghannam
2020-12-11 20:58   ` Borislav Petkov
2021-01-13 20:33     ` Borislav Petkov
2021-01-23  4:45       ` Yazen Ghannam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).