All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] EDAC: Convert ghes_edac to a normal module
@ 2017-07-26  8:48 Borislav Petkov
  2017-07-26  8:48   ` [1/3] " Borislav Petkov
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26  8:48 UTC (permalink / raw)
  To: linux-edac
  Cc: Tony Luck, Mauro Carvalho Chehab, Toshimitsu Kani,
	Rafael J. Wysocki, LKML

From: Borislav Petkov <bp@suse.de>

Hi,

here's the conversion of ghes_edac to a proper module using a notifier.
This is only the bare minimum stuff and we should do all outstanding
work ontop. Which right now is:

* platform whitelist

* Tony: "Additional cleanup: edac_ghes.c should really call
cper_mem_err_location() instead of doing pretty much the same thing but
with subtle formatting differences ("node:%d" vs. "node: %d")."

Yes, I think we should take a look at doing that eventually as a
sensible cleanup.

* Tony: "This driver isn't going to set the EDAC error counts since it
just does:

        e->top_layer = -1;
        e->mid_layer = -1;
        e->low_layer = -1;
"

Right, that we should address by making layer 0 EDAC_MC_LAYER_SLOT and
no other layers as we can't get any channel, branch or whatever DIMM
topology from GHES. At least I'm not aware of any.

But that should be enough as this info is good enough for pinpointing
the DIMM. From our discussion with Tony:

>>  The FRU identifier is:
>>
>> "The module number of the memory error location. (NODE, CARD, and MODULE
>> should provide the information necessary to identify the failing FRU)."
>
> Sounds pretty good. Generally people don't care about the channel. They
> want to know which DIMM to replace. This will give them what they want.

Thanks.

Borislav Petkov (3):
  EDAC: Add edac_pr_err/info macros
  ACPI/GHES: Add an EDAC notifier chain
  EDAC, ghes: Make it a proper module

 drivers/acpi/apei/ghes.c |  32 +++++++-----
 drivers/edac/Kconfig     |   4 +-
 drivers/edac/edac_mc.h   |   3 ++
 drivers/edac/ghes_edac.c | 129 ++++++++++++++++++++++++-----------------------
 include/acpi/ghes.h      |  27 +---------
 5 files changed, 93 insertions(+), 102 deletions(-)

-- 
2.14.0.rc0

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/3] EDAC: Add edac_pr_err/info macros
@ 2017-07-26  8:48   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26  8:48 UTC (permalink / raw)
  To: linux-edac
  Cc: Tony Luck, Mauro Carvalho Chehab, Toshimitsu Kani,
	Rafael J. Wysocki, LKML

From: Borislav Petkov <bp@suse.de>

Analogue to the generic pr_* variants but issuing the EDAC prefix.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/edac/edac_mc.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 5357800e418d..6d46f30dc657 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -60,6 +60,9 @@
 #define edac_pci_printk(ctl, level, fmt, arg...) \
 	printk(level "EDAC PCI%d: " fmt, ctl->pci_idx, ##arg)
 
+#define edac_pr_err(fmt, arg...)	edac_printk(KERN_ERR, "", fmt, ##arg)
+#define edac_pr_info(fmt, arg...)	edac_printk(KERN_INFO, "", fmt, ##arg)
+
 /* prefixes for edac_printk() and edac_mc_printk() */
 #define EDAC_MC "MC"
 #define EDAC_PCI "PCI"
-- 
2.14.0.rc0

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [1/3] EDAC: Add edac_pr_err/info macros
@ 2017-07-26  8:48   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26  8:48 UTC (permalink / raw)
  To: linux-edac
  Cc: Tony Luck, Mauro Carvalho Chehab, Toshimitsu Kani,
	Rafael J. Wysocki, LKML

From: Borislav Petkov <bp@suse.de>

Analogue to the generic pr_* variants but issuing the EDAC prefix.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/edac/edac_mc.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 5357800e418d..6d46f30dc657 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -60,6 +60,9 @@
 #define edac_pci_printk(ctl, level, fmt, arg...) \
 	printk(level "EDAC PCI%d: " fmt, ctl->pci_idx, ##arg)
 
+#define edac_pr_err(fmt, arg...)	edac_printk(KERN_ERR, "", fmt, ##arg)
+#define edac_pr_info(fmt, arg...)	edac_printk(KERN_INFO, "", fmt, ##arg)
+
 /* prefixes for edac_printk() and edac_mc_printk() */
 #define EDAC_MC "MC"
 #define EDAC_PCI "PCI"

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/3] ACPI/GHES: Add an EDAC notifier chain
@ 2017-07-26  8:48   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26  8:48 UTC (permalink / raw)
  To: linux-edac
  Cc: Tony Luck, Mauro Carvalho Chehab, Toshimitsu Kani,
	Rafael J. Wysocki, LKML

From: Borislav Petkov <bp@suse.de>

... so that GHES can dump memory errors into it and consumers like
ghes_edac can register.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 14 ++++++++++++++
 include/acpi/ghes.h      |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d452b238..3f05f5ff3461 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -140,6 +140,20 @@ static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
 
+static ATOMIC_NOTIFIER_HEAD(ghes_edac_chain);
+
+void ghes_register_edac_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_register(&ghes_edac_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_register_edac_chain);
+
+void ghes_unregister_edac_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&ghes_edac_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_edac_chain);
+
 static int ghes_ioremap_init(void)
 {
 	ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 9f26e01186ae..506b6a197738 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -76,6 +76,8 @@ static inline void ghes_edac_unregister(struct ghes *ghes)
 {
 }
 #endif
+void ghes_register_edac_chain(struct notifier_block *nb);
+void ghes_unregister_edac_chain(struct notifier_block *nb);
 
 static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
 {
-- 
2.14.0.rc0

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [2/3] ACPI/GHES: Add an EDAC notifier chain
@ 2017-07-26  8:48   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26  8:48 UTC (permalink / raw)
  To: linux-edac
  Cc: Tony Luck, Mauro Carvalho Chehab, Toshimitsu Kani,
	Rafael J. Wysocki, LKML

From: Borislav Petkov <bp@suse.de>

... so that GHES can dump memory errors into it and consumers like
ghes_edac can register.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 14 ++++++++++++++
 include/acpi/ghes.h      |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d452b238..3f05f5ff3461 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -140,6 +140,20 @@ static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
 
+static ATOMIC_NOTIFIER_HEAD(ghes_edac_chain);
+
+void ghes_register_edac_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_register(&ghes_edac_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_register_edac_chain);
+
+void ghes_unregister_edac_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&ghes_edac_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_edac_chain);
+
 static int ghes_ioremap_init(void)
 {
 	ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 9f26e01186ae..506b6a197738 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -76,6 +76,8 @@ static inline void ghes_edac_unregister(struct ghes *ghes)
 {
 }
 #endif
+void ghes_register_edac_chain(struct notifier_block *nb);
+void ghes_unregister_edac_chain(struct notifier_block *nb);
 
 static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
 {

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26  8:48   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26  8:48 UTC (permalink / raw)
  To: linux-edac
  Cc: Tony Luck, Mauro Carvalho Chehab, Toshimitsu Kani,
	Rafael J. Wysocki, LKML

From: Borislav Petkov <bp@suse.de>

Register with the GHES notifier chain so that there's no need to call
into the module with ghes_edac_report_mem_error().

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c |  18 +++----
 drivers/edac/Kconfig     |   4 +-
 drivers/edac/ghes_edac.c | 129 ++++++++++++++++++++++++-----------------------
 include/acpi/ghes.h      |  25 ---------
 4 files changed, 74 insertions(+), 102 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3f05f5ff3461..37cd698cacd2 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -475,11 +475,11 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 static void ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
-	int sev, sec_sev;
 	struct acpi_hest_generic_data *gdata;
 	guid_t *sec_type;
 	guid_t *fru_id = &NULL_UUID_LE;
 	char *fru_text = "";
+	int sev, sec_sev;
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
@@ -494,7 +494,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
-			ghes_edac_report_mem_error(ghes, sev, mem_err);
+
+			atomic_notifier_call_chain(&ghes_edac_chain, sev, &mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
 			ghes_handle_memory_failure(gdata, sev);
@@ -1153,10 +1154,6 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		goto err;
 	}
 
-	rc = ghes_edac_register(ghes, &ghes_dev->dev);
-	if (rc < 0)
-		goto err;
-
 	switch (generic->notify.type) {
 	case ACPI_HEST_NOTIFY_POLLED:
 		setup_deferrable_timer(&ghes->timer, ghes_poll_func,
@@ -1169,13 +1166,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		if (rc) {
 			pr_err(GHES_PFX "Failed to map GSI to IRQ for generic hardware error source: %d\n",
 			       generic->header.source_id);
-			goto err_edac_unreg;
+			goto err;
 		}
 		rc = request_irq(ghes->irq, ghes_irq_func, 0, "GHES IRQ", ghes);
 		if (rc) {
 			pr_err(GHES_PFX "Failed to register IRQ for generic hardware error source: %d\n",
 			       generic->header.source_id);
-			goto err_edac_unreg;
+			goto err;
 		}
 		break;
 
@@ -1204,8 +1201,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	ghes_proc(ghes);
 
 	return 0;
-err_edac_unreg:
-	ghes_edac_unregister(ghes);
+
 err:
 	if (ghes) {
 		ghes_fini(ghes);
@@ -1255,8 +1251,6 @@ static int ghes_remove(struct platform_device *ghes_dev)
 
 	ghes_fini(ghes);
 
-	ghes_edac_unregister(ghes);
-
 	kfree(ghes);
 
 	platform_set_drvdata(ghes_dev, NULL);
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 96afb2aeed18..fdd8278ca89a 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -53,8 +53,8 @@ config EDAC_DECODE_MCE
 	  has been initialized.
 
 config EDAC_GHES
-	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
-	depends on ACPI_APEI_GHES && (EDAC=y)
+	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"
+	depends on ACPI_APEI_GHES
 	help
 	  Not all machines support hardware-driven error report. Some of those
 	  provide a BIOS-driven error report mechanism via ACPI, using the
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..ecd34b8bd27e 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -5,6 +5,9 @@
  * License version 2.
  *
  * Copyright (c) 2013 by Mauro Carvalho Chehab
+ *	     (c) 2017 Borislav Petkov
+ *
+ * Borislav Petkov: turn it into a proper module.
  *
  * Red Hat Inc. http://www.redhat.com
  */
@@ -28,10 +31,10 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static struct ghes_edac_pvt *ghes_pvt;
 
+/* Hand it into EDAC's core so that we have a device to operate on. */
+static struct device dummy_dev;
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -163,24 +166,21 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 	}
 }
 
-void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
-				struct cper_sec_mem_err *mem_err)
+static int report_mem_error(struct notifier_block *nb, unsigned long sev, void *data)
 {
+	struct cper_sec_mem_err *mem_err = data;
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
-	char *p;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
 	u8 grain_bits;
+	char *p;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
-		pr_err("Internal error: Can't find EDAC structure\n");
-		return;
+		edac_pr_err("Internal error: Can't find EDAC structure\n");
+		return NOTIFY_DONE;
 	}
+
 	mci = pvt->mci;
 	e = &mci->error_desc;
 
@@ -400,23 +400,40 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 
 	/* Report the error via EDAC API */
 	edac_raw_mc_handle_error(type, mci, e);
+
+	return NOTIFY_DONE;
 }
-EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static struct notifier_block ghes_nb = {
+	.notifier_call  = report_mem_error,
+};
+
+static const char * const fake_msg =
+"This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"
+"Unfortunately, not all BIOSes reflect the memory layout correctly.\n"
+"So, the end result of using this driver varies from vendor to vendor.\n"
+"If you find incorrect reports, please contact your hardware vendor\n"
+"to correct its BIOS.";
+
+static const char * const super_crap_msg =
+"This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"
+"Its SMBIOS info is wrong. It is doubtful that the error report would\n"
+"work on such system. Use this driver with caution.";
+
+static int __init ghes_edac_register(void)
 {
+	struct ghes_edac_pvt *pvt = ghes_pvt;
 	bool fake = false;
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
 	/* Check if we've got a bogus BIOS */
-	if (num_dimm == 0) {
+	if (!num_dimm) {
 		fake = true;
 		num_dimm = 1;
 	}
@@ -429,21 +446,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
 	 * to avoid duplicated memory controller numbers
 	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(*pvt));
 	if (!mci) {
-		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
+		edac_pr_err("Can't allocate memory for EDAC data\n");
 		return -ENOMEM;
 	}
 
 	pvt = mci->pvt_info;
 	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
 	pvt->mci  = mci;
-	mci->pdev = dev;
+
+	mci->pdev = &dummy_dev;
 
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
@@ -452,21 +465,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
-	}
+	if (!fake)
+		edac_pr_info("%s\n", fake_msg);
+	else
+		edac_pr_info("%s\n", super_crap_msg);
+
+	edac_pr_info("This system has %d DIMM sockets.\n", num_dimm);
 
 	if (!fake) {
 		/*
@@ -475,13 +479,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		 * Keep it in blank for the other memory controllers, as
 		 * there's no reliable way to properly credit each DIMM to
 		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
+		 * DMI bank location fields in different ways.
 		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -495,30 +497,31 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 
 	rc = edac_mc_add_mc(mci);
 	if (rc < 0) {
-		pr_info("Can't register at EDAC core\n");
+		edac_pr_err("Can't register with EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
 
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
+	ghes_register_edac_chain(&ghes_nb);
+
 	return 0;
 }
-EXPORT_SYMBOL_GPL(ghes_edac_register);
+module_init(ghes_edac_register);
 
-void ghes_edac_unregister(struct ghes *ghes)
+static void __exit ghes_edac_unregister(void)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	ghes_unregister_edac_chain(&ghes_nb);
+
+	mci = find_mci_by_dev(&dummy_dev);
+	WARN_ON(!mci);
+
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
+
 }
-EXPORT_SYMBOL_GPL(ghes_edac_unregister);
+module_exit(ghes_edac_unregister);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("GHES error decoding module");
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 506b6a197738..c02b8eb91bd6 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -51,31 +51,6 @@ enum {
 	GHES_SEV_PANIC = 0x3,
 };
 
-/* From drivers/edac/ghes_edac.c */
-
-#ifdef CONFIG_EDAC_GHES
-void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
-				struct cper_sec_mem_err *mem_err);
-
-int ghes_edac_register(struct ghes *ghes, struct device *dev);
-
-void ghes_edac_unregister(struct ghes *ghes);
-
-#else
-static inline void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
-				       struct cper_sec_mem_err *mem_err)
-{
-}
-
-static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
-{
-	return 0;
-}
-
-static inline void ghes_edac_unregister(struct ghes *ghes)
-{
-}
-#endif
 void ghes_register_edac_chain(struct notifier_block *nb);
 void ghes_unregister_edac_chain(struct notifier_block *nb);
 
-- 
2.14.0.rc0

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26  8:48   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26  8:48 UTC (permalink / raw)
  To: linux-edac
  Cc: Tony Luck, Mauro Carvalho Chehab, Toshimitsu Kani,
	Rafael J. Wysocki, LKML

From: Borislav Petkov <bp@suse.de>

Register with the GHES notifier chain so that there's no need to call
into the module with ghes_edac_report_mem_error().

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c |  18 +++----
 drivers/edac/Kconfig     |   4 +-
 drivers/edac/ghes_edac.c | 129 ++++++++++++++++++++++++-----------------------
 include/acpi/ghes.h      |  25 ---------
 4 files changed, 74 insertions(+), 102 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3f05f5ff3461..37cd698cacd2 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -475,11 +475,11 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 static void ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
-	int sev, sec_sev;
 	struct acpi_hest_generic_data *gdata;
 	guid_t *sec_type;
 	guid_t *fru_id = &NULL_UUID_LE;
 	char *fru_text = "";
+	int sev, sec_sev;
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
@@ -494,7 +494,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
-			ghes_edac_report_mem_error(ghes, sev, mem_err);
+
+			atomic_notifier_call_chain(&ghes_edac_chain, sev, &mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
 			ghes_handle_memory_failure(gdata, sev);
@@ -1153,10 +1154,6 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		goto err;
 	}
 
-	rc = ghes_edac_register(ghes, &ghes_dev->dev);
-	if (rc < 0)
-		goto err;
-
 	switch (generic->notify.type) {
 	case ACPI_HEST_NOTIFY_POLLED:
 		setup_deferrable_timer(&ghes->timer, ghes_poll_func,
@@ -1169,13 +1166,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		if (rc) {
 			pr_err(GHES_PFX "Failed to map GSI to IRQ for generic hardware error source: %d\n",
 			       generic->header.source_id);
-			goto err_edac_unreg;
+			goto err;
 		}
 		rc = request_irq(ghes->irq, ghes_irq_func, 0, "GHES IRQ", ghes);
 		if (rc) {
 			pr_err(GHES_PFX "Failed to register IRQ for generic hardware error source: %d\n",
 			       generic->header.source_id);
-			goto err_edac_unreg;
+			goto err;
 		}
 		break;
 
@@ -1204,8 +1201,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	ghes_proc(ghes);
 
 	return 0;
-err_edac_unreg:
-	ghes_edac_unregister(ghes);
+
 err:
 	if (ghes) {
 		ghes_fini(ghes);
@@ -1255,8 +1251,6 @@ static int ghes_remove(struct platform_device *ghes_dev)
 
 	ghes_fini(ghes);
 
-	ghes_edac_unregister(ghes);
-
 	kfree(ghes);
 
 	platform_set_drvdata(ghes_dev, NULL);
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 96afb2aeed18..fdd8278ca89a 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -53,8 +53,8 @@ config EDAC_DECODE_MCE
 	  has been initialized.
 
 config EDAC_GHES
-	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
-	depends on ACPI_APEI_GHES && (EDAC=y)
+	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"
+	depends on ACPI_APEI_GHES
 	help
 	  Not all machines support hardware-driven error report. Some of those
 	  provide a BIOS-driven error report mechanism via ACPI, using the
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..ecd34b8bd27e 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -5,6 +5,9 @@
  * License version 2.
  *
  * Copyright (c) 2013 by Mauro Carvalho Chehab
+ *	     (c) 2017 Borislav Petkov
+ *
+ * Borislav Petkov: turn it into a proper module.
  *
  * Red Hat Inc. http://www.redhat.com
  */
@@ -28,10 +31,10 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static struct ghes_edac_pvt *ghes_pvt;
 
+/* Hand it into EDAC's core so that we have a device to operate on. */
+static struct device dummy_dev;
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -163,24 +166,21 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 	}
 }
 
-void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
-				struct cper_sec_mem_err *mem_err)
+static int report_mem_error(struct notifier_block *nb, unsigned long sev, void *data)
 {
+	struct cper_sec_mem_err *mem_err = data;
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
-	char *p;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
 	u8 grain_bits;
+	char *p;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
-		pr_err("Internal error: Can't find EDAC structure\n");
-		return;
+		edac_pr_err("Internal error: Can't find EDAC structure\n");
+		return NOTIFY_DONE;
 	}
+
 	mci = pvt->mci;
 	e = &mci->error_desc;
 
@@ -400,23 +400,40 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 
 	/* Report the error via EDAC API */
 	edac_raw_mc_handle_error(type, mci, e);
+
+	return NOTIFY_DONE;
 }
-EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static struct notifier_block ghes_nb = {
+	.notifier_call  = report_mem_error,
+};
+
+static const char * const fake_msg =
+"This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"
+"Unfortunately, not all BIOSes reflect the memory layout correctly.\n"
+"So, the end result of using this driver varies from vendor to vendor.\n"
+"If you find incorrect reports, please contact your hardware vendor\n"
+"to correct its BIOS.";
+
+static const char * const super_crap_msg =
+"This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"
+"Its SMBIOS info is wrong. It is doubtful that the error report would\n"
+"work on such system. Use this driver with caution.";
+
+static int __init ghes_edac_register(void)
 {
+	struct ghes_edac_pvt *pvt = ghes_pvt;
 	bool fake = false;
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
 	/* Check if we've got a bogus BIOS */
-	if (num_dimm == 0) {
+	if (!num_dimm) {
 		fake = true;
 		num_dimm = 1;
 	}
@@ -429,21 +446,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
 	 * to avoid duplicated memory controller numbers
 	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(*pvt));
 	if (!mci) {
-		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
+		edac_pr_err("Can't allocate memory for EDAC data\n");
 		return -ENOMEM;
 	}
 
 	pvt = mci->pvt_info;
 	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
 	pvt->mci  = mci;
-	mci->pdev = dev;
+
+	mci->pdev = &dummy_dev;
 
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
@@ -452,21 +465,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
-	}
+	if (!fake)
+		edac_pr_info("%s\n", fake_msg);
+	else
+		edac_pr_info("%s\n", super_crap_msg);
+
+	edac_pr_info("This system has %d DIMM sockets.\n", num_dimm);
 
 	if (!fake) {
 		/*
@@ -475,13 +479,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		 * Keep it in blank for the other memory controllers, as
 		 * there's no reliable way to properly credit each DIMM to
 		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
+		 * DMI bank location fields in different ways.
 		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -495,30 +497,31 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 
 	rc = edac_mc_add_mc(mci);
 	if (rc < 0) {
-		pr_info("Can't register at EDAC core\n");
+		edac_pr_err("Can't register with EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
 
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
+	ghes_register_edac_chain(&ghes_nb);
+
 	return 0;
 }
-EXPORT_SYMBOL_GPL(ghes_edac_register);
+module_init(ghes_edac_register);
 
-void ghes_edac_unregister(struct ghes *ghes)
+static void __exit ghes_edac_unregister(void)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	ghes_unregister_edac_chain(&ghes_nb);
+
+	mci = find_mci_by_dev(&dummy_dev);
+	WARN_ON(!mci);
+
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
+
 }
-EXPORT_SYMBOL_GPL(ghes_edac_unregister);
+module_exit(ghes_edac_unregister);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("GHES error decoding module");
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 506b6a197738..c02b8eb91bd6 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -51,31 +51,6 @@ enum {
 	GHES_SEV_PANIC = 0x3,
 };
 
-/* From drivers/edac/ghes_edac.c */
-
-#ifdef CONFIG_EDAC_GHES
-void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
-				struct cper_sec_mem_err *mem_err);
-
-int ghes_edac_register(struct ghes *ghes, struct device *dev);
-
-void ghes_edac_unregister(struct ghes *ghes);
-
-#else
-static inline void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
-				       struct cper_sec_mem_err *mem_err)
-{
-}
-
-static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
-{
-	return 0;
-}
-
-static inline void ghes_edac_unregister(struct ghes *ghes)
-{
-}
-#endif
 void ghes_register_edac_chain(struct notifier_block *nb);
 void ghes_unregister_edac_chain(struct notifier_block *nb);
 

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 10:24     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 38+ messages in thread
From: Mauro Carvalho Chehab @ 2017-07-26 10:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Tony Luck, Toshimitsu Kani, Rafael J. Wysocki, LKML

Em Wed, 26 Jul 2017 10:48:27 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> From: Borislav Petkov <bp@suse.de>
> 
> Register with the GHES notifier chain so that there's no need to call
> into the module with ghes_edac_report_mem_error().

Hmm... I'm not seeing any implementation that would allow setting
between firmware first, hardware first or "auto", as we've discussed.

> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  drivers/acpi/apei/ghes.c |  18 +++----
>  drivers/edac/Kconfig     |   4 +-
>  drivers/edac/ghes_edac.c | 129 ++++++++++++++++++++++++-----------------------
>  include/acpi/ghes.h      |  25 ---------
>  4 files changed, 74 insertions(+), 102 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 3f05f5ff3461..37cd698cacd2 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -475,11 +475,11 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
>  static void ghes_do_proc(struct ghes *ghes,
>  			 const struct acpi_hest_generic_status *estatus)
>  {
> -	int sev, sec_sev;
>  	struct acpi_hest_generic_data *gdata;
>  	guid_t *sec_type;
>  	guid_t *fru_id = &NULL_UUID_LE;
>  	char *fru_text = "";
> +	int sev, sec_sev;
>  
>  	sev = ghes_severity(estatus->error_severity);
>  	apei_estatus_for_each_section(estatus, gdata) {
> @@ -494,7 +494,8 @@ static void ghes_do_proc(struct ghes *ghes,
>  		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
>  			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
>  
> -			ghes_edac_report_mem_error(ghes, sev, mem_err);
> +
> +			atomic_notifier_call_chain(&ghes_edac_chain, sev, &mem_err);
>  
>  			arch_apei_report_mem_error(sev, mem_err);
>  			ghes_handle_memory_failure(gdata, sev);
> @@ -1153,10 +1154,6 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		goto err;
>  	}
>  
> -	rc = ghes_edac_register(ghes, &ghes_dev->dev);
> -	if (rc < 0)
> -		goto err;
> -
>  	switch (generic->notify.type) {
>  	case ACPI_HEST_NOTIFY_POLLED:
>  		setup_deferrable_timer(&ghes->timer, ghes_poll_func,
> @@ -1169,13 +1166,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		if (rc) {
>  			pr_err(GHES_PFX "Failed to map GSI to IRQ for generic hardware error source: %d\n",
>  			       generic->header.source_id);
> -			goto err_edac_unreg;
> +			goto err;
>  		}
>  		rc = request_irq(ghes->irq, ghes_irq_func, 0, "GHES IRQ", ghes);
>  		if (rc) {
>  			pr_err(GHES_PFX "Failed to register IRQ for generic hardware error source: %d\n",
>  			       generic->header.source_id);
> -			goto err_edac_unreg;
> +			goto err;
>  		}
>  		break;
>  
> @@ -1204,8 +1201,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	ghes_proc(ghes);
>  
>  	return 0;
> -err_edac_unreg:
> -	ghes_edac_unregister(ghes);
> +
>  err:
>  	if (ghes) {
>  		ghes_fini(ghes);
> @@ -1255,8 +1251,6 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  
>  	ghes_fini(ghes);
>  
> -	ghes_edac_unregister(ghes);
> -
>  	kfree(ghes);
>  
>  	platform_set_drvdata(ghes_dev, NULL);
> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
> index 96afb2aeed18..fdd8278ca89a 100644
> --- a/drivers/edac/Kconfig
> +++ b/drivers/edac/Kconfig
> @@ -53,8 +53,8 @@ config EDAC_DECODE_MCE
>  	  has been initialized.
>  
>  config EDAC_GHES
> -	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
> -	depends on ACPI_APEI_GHES && (EDAC=y)
> +	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"
> +	depends on ACPI_APEI_GHES
>  	help
>  	  Not all machines support hardware-driven error report. Some of those
>  	  provide a BIOS-driven error report mechanism via ACPI, using the
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 6f80eb65c26c..ecd34b8bd27e 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -5,6 +5,9 @@
>   * License version 2.
>   *
>   * Copyright (c) 2013 by Mauro Carvalho Chehab
> + *	     (c) 2017 Borislav Petkov
> + *
> + * Borislav Petkov: turn it into a proper module.
>   *
>   * Red Hat Inc. http://www.redhat.com
>   */
> @@ -28,10 +31,10 @@ struct ghes_edac_pvt {
>  	char msg[80];
>  };
>  
> -static LIST_HEAD(ghes_reglist);
> -static DEFINE_MUTEX(ghes_edac_lock);
> -static int ghes_edac_mc_num;
> +static struct ghes_edac_pvt *ghes_pvt;
>  
> +/* Hand it into EDAC's core so that we have a device to operate on. */
> +static struct device dummy_dev;
>  
>  /* Memory Device - Type 17 of SMBIOS spec */
>  struct memdev_dmi_entry {
> @@ -163,24 +166,21 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>  	}
>  }
>  
> -void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> -				struct cper_sec_mem_err *mem_err)
> +static int report_mem_error(struct notifier_block *nb, unsigned long sev, void *data)
>  {
> +	struct cper_sec_mem_err *mem_err = data;
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = NULL;
> -	char *p;
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
>  	u8 grain_bits;
> +	char *p;
>  
> -	list_for_each_entry(pvt, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes)
> -			break;
> -	}
>  	if (!pvt) {
> -		pr_err("Internal error: Can't find EDAC structure\n");
> -		return;
> +		edac_pr_err("Internal error: Can't find EDAC structure\n");
> +		return NOTIFY_DONE;
>  	}
> +
>  	mci = pvt->mci;
>  	e = &mci->error_desc;
>  
> @@ -400,23 +400,40 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
>  
>  	/* Report the error via EDAC API */
>  	edac_raw_mc_handle_error(type, mci, e);
> +
> +	return NOTIFY_DONE;
>  }
> -EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
>  
> -int ghes_edac_register(struct ghes *ghes, struct device *dev)
> +static struct notifier_block ghes_nb = {
> +	.notifier_call  = report_mem_error,
> +};
> +
> +static const char * const fake_msg =
> +"This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"
> +"Unfortunately, not all BIOSes reflect the memory layout correctly.\n"
> +"So, the end result of using this driver varies from vendor to vendor.\n"
> +"If you find incorrect reports, please contact your hardware vendor\n"
> +"to correct its BIOS.";
> +
> +static const char * const super_crap_msg =
> +"This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"
> +"Its SMBIOS info is wrong. It is doubtful that the error report would\n"
> +"work on such system. Use this driver with caution.";

I would also add a note saying that the BIOS may be masking errors.

> +
> +static int __init ghes_edac_register(void)
>  {
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
>  	bool fake = false;
>  	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
> -	struct ghes_edac_pvt *pvt;
>  	struct ghes_edac_dimm_fill dimm_fill;
>  
>  	/* Get the number of DIMMs */
>  	dmi_walk(ghes_edac_count_dimms, &num_dimm);
>  
>  	/* Check if we've got a bogus BIOS */
> -	if (num_dimm == 0) {
> +	if (!num_dimm) {
>  		fake = true;
>  		num_dimm = 1;
>  	}
> @@ -429,21 +446,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
>  	 * to avoid duplicated memory controller numbers
>  	 */
> -	mutex_lock(&ghes_edac_lock);
> -	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
> -			    sizeof(*pvt));
> +	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(*pvt));
>  	if (!mci) {
> -		pr_info("Can't allocate memory for EDAC data\n");
> -		mutex_unlock(&ghes_edac_lock);
> +		edac_pr_err("Can't allocate memory for EDAC data\n");
>  		return -ENOMEM;
>  	}
>  
>  	pvt = mci->pvt_info;
>  	memset(pvt, 0, sizeof(*pvt));
> -	list_add_tail(&pvt->list, &ghes_reglist);
> -	pvt->ghes = ghes;
>  	pvt->mci  = mci;
> -	mci->pdev = dev;
> +
> +	mci->pdev = &dummy_dev;
>  
>  	mci->mtype_cap = MEM_FLAG_EMPTY;
>  	mci->edac_ctl_cap = EDAC_FLAG_NONE;
> @@ -452,21 +465,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	mci->ctl_name = "ghes_edac";
>  	mci->dev_name = "ghes";
>  
> -	if (!ghes_edac_mc_num) {
> -		if (!fake) {
> -			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
> -			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
> -			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
> -			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
> -			pr_info("to correct its BIOS.\n");
> -			pr_info("This system has %d DIMM sockets.\n",
> -				num_dimm);
> -		} else {
> -			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> -			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> -			pr_info("work on such system. Use this driver with caution\n");
> -		}
> -	}
> +	if (!fake)
> +		edac_pr_info("%s\n", fake_msg);
> +	else
> +		edac_pr_info("%s\n", super_crap_msg);
> +
> +	edac_pr_info("This system has %d DIMM sockets.\n", num_dimm);
>  
>  	if (!fake) {
>  		/*
> @@ -475,13 +479,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  		 * Keep it in blank for the other memory controllers, as
>  		 * there's no reliable way to properly credit each DIMM to
>  		 * the memory controller, as different BIOSes fill the
> -		 * DMI bank location fields on different ways
> +		 * DMI bank location fields in different ways.
>  		 */
> -		if (!ghes_edac_mc_num) {
> -			dimm_fill.count = 0;
> -			dimm_fill.mci = mci;
> -			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
> -		}
> +		dimm_fill.count = 0;
> +		dimm_fill.mci = mci;
> +		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
>  	} else {
>  		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
>  						       mci->n_layers, 0, 0, 0);
> @@ -495,30 +497,31 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  
>  	rc = edac_mc_add_mc(mci);
>  	if (rc < 0) {
> -		pr_info("Can't register at EDAC core\n");
> +		edac_pr_err("Can't register with EDAC core\n");
>  		edac_mc_free(mci);
> -		mutex_unlock(&ghes_edac_lock);
>  		return -ENODEV;
>  	}
>  
> -	ghes_edac_mc_num++;
> -	mutex_unlock(&ghes_edac_lock);
> +	ghes_register_edac_chain(&ghes_nb);
> +
>  	return 0;
>  }
> -EXPORT_SYMBOL_GPL(ghes_edac_register);
> +module_init(ghes_edac_register);
>  
> -void ghes_edac_unregister(struct ghes *ghes)
> +static void __exit ghes_edac_unregister(void)
>  {
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt, *tmp;
> -
> -	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes) {
> -			mci = pvt->mci;
> -			edac_mc_del_mc(mci->pdev);
> -			edac_mc_free(mci);
> -			list_del(&pvt->list);
> -		}
> -	}
> +
> +	ghes_unregister_edac_chain(&ghes_nb);
> +
> +	mci = find_mci_by_dev(&dummy_dev);
> +	WARN_ON(!mci);
> +
> +	edac_mc_del_mc(mci->pdev);
> +	edac_mc_free(mci);
> +
>  }
> -EXPORT_SYMBOL_GPL(ghes_edac_unregister);
> +module_exit(ghes_edac_unregister);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("GHES error decoding module");
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 506b6a197738..c02b8eb91bd6 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -51,31 +51,6 @@ enum {
>  	GHES_SEV_PANIC = 0x3,
>  };
>  
> -/* From drivers/edac/ghes_edac.c */
> -
> -#ifdef CONFIG_EDAC_GHES
> -void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> -				struct cper_sec_mem_err *mem_err);
> -
> -int ghes_edac_register(struct ghes *ghes, struct device *dev);
> -
> -void ghes_edac_unregister(struct ghes *ghes);
> -
> -#else
> -static inline void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> -				       struct cper_sec_mem_err *mem_err)
> -{
> -}
> -
> -static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
> -{
> -	return 0;
> -}
> -
> -static inline void ghes_edac_unregister(struct ghes *ghes)
> -{
> -}
> -#endif
>  void ghes_register_edac_chain(struct notifier_block *nb);
>  void ghes_unregister_edac_chain(struct notifier_block *nb);
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 10:24     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 38+ messages in thread
From: Mauro Carvalho Chehab @ 2017-07-26 10:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Tony Luck, Toshimitsu Kani, Rafael J. Wysocki, LKML

Em Wed, 26 Jul 2017 10:48:27 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> From: Borislav Petkov <bp@suse.de>
> 
> Register with the GHES notifier chain so that there's no need to call
> into the module with ghes_edac_report_mem_error().

Hmm... I'm not seeing any implementation that would allow setting
between firmware first, hardware first or "auto", as we've discussed.

> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  drivers/acpi/apei/ghes.c |  18 +++----
>  drivers/edac/Kconfig     |   4 +-
>  drivers/edac/ghes_edac.c | 129 ++++++++++++++++++++++++-----------------------
>  include/acpi/ghes.h      |  25 ---------
>  4 files changed, 74 insertions(+), 102 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 3f05f5ff3461..37cd698cacd2 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -475,11 +475,11 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
>  static void ghes_do_proc(struct ghes *ghes,
>  			 const struct acpi_hest_generic_status *estatus)
>  {
> -	int sev, sec_sev;
>  	struct acpi_hest_generic_data *gdata;
>  	guid_t *sec_type;
>  	guid_t *fru_id = &NULL_UUID_LE;
>  	char *fru_text = "";
> +	int sev, sec_sev;
>  
>  	sev = ghes_severity(estatus->error_severity);
>  	apei_estatus_for_each_section(estatus, gdata) {
> @@ -494,7 +494,8 @@ static void ghes_do_proc(struct ghes *ghes,
>  		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
>  			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
>  
> -			ghes_edac_report_mem_error(ghes, sev, mem_err);
> +
> +			atomic_notifier_call_chain(&ghes_edac_chain, sev, &mem_err);
>  
>  			arch_apei_report_mem_error(sev, mem_err);
>  			ghes_handle_memory_failure(gdata, sev);
> @@ -1153,10 +1154,6 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		goto err;
>  	}
>  
> -	rc = ghes_edac_register(ghes, &ghes_dev->dev);
> -	if (rc < 0)
> -		goto err;
> -
>  	switch (generic->notify.type) {
>  	case ACPI_HEST_NOTIFY_POLLED:
>  		setup_deferrable_timer(&ghes->timer, ghes_poll_func,
> @@ -1169,13 +1166,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		if (rc) {
>  			pr_err(GHES_PFX "Failed to map GSI to IRQ for generic hardware error source: %d\n",
>  			       generic->header.source_id);
> -			goto err_edac_unreg;
> +			goto err;
>  		}
>  		rc = request_irq(ghes->irq, ghes_irq_func, 0, "GHES IRQ", ghes);
>  		if (rc) {
>  			pr_err(GHES_PFX "Failed to register IRQ for generic hardware error source: %d\n",
>  			       generic->header.source_id);
> -			goto err_edac_unreg;
> +			goto err;
>  		}
>  		break;
>  
> @@ -1204,8 +1201,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	ghes_proc(ghes);
>  
>  	return 0;
> -err_edac_unreg:
> -	ghes_edac_unregister(ghes);
> +
>  err:
>  	if (ghes) {
>  		ghes_fini(ghes);
> @@ -1255,8 +1251,6 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  
>  	ghes_fini(ghes);
>  
> -	ghes_edac_unregister(ghes);
> -
>  	kfree(ghes);
>  
>  	platform_set_drvdata(ghes_dev, NULL);
> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
> index 96afb2aeed18..fdd8278ca89a 100644
> --- a/drivers/edac/Kconfig
> +++ b/drivers/edac/Kconfig
> @@ -53,8 +53,8 @@ config EDAC_DECODE_MCE
>  	  has been initialized.
>  
>  config EDAC_GHES
> -	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
> -	depends on ACPI_APEI_GHES && (EDAC=y)
> +	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"
> +	depends on ACPI_APEI_GHES
>  	help
>  	  Not all machines support hardware-driven error report. Some of those
>  	  provide a BIOS-driven error report mechanism via ACPI, using the
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 6f80eb65c26c..ecd34b8bd27e 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -5,6 +5,9 @@
>   * License version 2.
>   *
>   * Copyright (c) 2013 by Mauro Carvalho Chehab
> + *	     (c) 2017 Borislav Petkov
> + *
> + * Borislav Petkov: turn it into a proper module.
>   *
>   * Red Hat Inc. http://www.redhat.com
>   */
> @@ -28,10 +31,10 @@ struct ghes_edac_pvt {
>  	char msg[80];
>  };
>  
> -static LIST_HEAD(ghes_reglist);
> -static DEFINE_MUTEX(ghes_edac_lock);
> -static int ghes_edac_mc_num;
> +static struct ghes_edac_pvt *ghes_pvt;
>  
> +/* Hand it into EDAC's core so that we have a device to operate on. */
> +static struct device dummy_dev;
>  
>  /* Memory Device - Type 17 of SMBIOS spec */
>  struct memdev_dmi_entry {
> @@ -163,24 +166,21 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>  	}
>  }
>  
> -void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> -				struct cper_sec_mem_err *mem_err)
> +static int report_mem_error(struct notifier_block *nb, unsigned long sev, void *data)
>  {
> +	struct cper_sec_mem_err *mem_err = data;
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = NULL;
> -	char *p;
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
>  	u8 grain_bits;
> +	char *p;
>  
> -	list_for_each_entry(pvt, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes)
> -			break;
> -	}
>  	if (!pvt) {
> -		pr_err("Internal error: Can't find EDAC structure\n");
> -		return;
> +		edac_pr_err("Internal error: Can't find EDAC structure\n");
> +		return NOTIFY_DONE;
>  	}
> +
>  	mci = pvt->mci;
>  	e = &mci->error_desc;
>  
> @@ -400,23 +400,40 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
>  
>  	/* Report the error via EDAC API */
>  	edac_raw_mc_handle_error(type, mci, e);
> +
> +	return NOTIFY_DONE;
>  }
> -EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
>  
> -int ghes_edac_register(struct ghes *ghes, struct device *dev)
> +static struct notifier_block ghes_nb = {
> +	.notifier_call  = report_mem_error,
> +};
> +
> +static const char * const fake_msg =
> +"This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"
> +"Unfortunately, not all BIOSes reflect the memory layout correctly.\n"
> +"So, the end result of using this driver varies from vendor to vendor.\n"
> +"If you find incorrect reports, please contact your hardware vendor\n"
> +"to correct its BIOS.";
> +
> +static const char * const super_crap_msg =
> +"This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"
> +"Its SMBIOS info is wrong. It is doubtful that the error report would\n"
> +"work on such system. Use this driver with caution.";

I would also add a note saying that the BIOS may be masking errors.

> +
> +static int __init ghes_edac_register(void)
>  {
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
>  	bool fake = false;
>  	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
> -	struct ghes_edac_pvt *pvt;
>  	struct ghes_edac_dimm_fill dimm_fill;
>  
>  	/* Get the number of DIMMs */
>  	dmi_walk(ghes_edac_count_dimms, &num_dimm);
>  
>  	/* Check if we've got a bogus BIOS */
> -	if (num_dimm == 0) {
> +	if (!num_dimm) {
>  		fake = true;
>  		num_dimm = 1;
>  	}
> @@ -429,21 +446,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
>  	 * to avoid duplicated memory controller numbers
>  	 */
> -	mutex_lock(&ghes_edac_lock);
> -	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
> -			    sizeof(*pvt));
> +	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(*pvt));
>  	if (!mci) {
> -		pr_info("Can't allocate memory for EDAC data\n");
> -		mutex_unlock(&ghes_edac_lock);
> +		edac_pr_err("Can't allocate memory for EDAC data\n");
>  		return -ENOMEM;
>  	}
>  
>  	pvt = mci->pvt_info;
>  	memset(pvt, 0, sizeof(*pvt));
> -	list_add_tail(&pvt->list, &ghes_reglist);
> -	pvt->ghes = ghes;
>  	pvt->mci  = mci;
> -	mci->pdev = dev;
> +
> +	mci->pdev = &dummy_dev;
>  
>  	mci->mtype_cap = MEM_FLAG_EMPTY;
>  	mci->edac_ctl_cap = EDAC_FLAG_NONE;
> @@ -452,21 +465,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	mci->ctl_name = "ghes_edac";
>  	mci->dev_name = "ghes";
>  
> -	if (!ghes_edac_mc_num) {
> -		if (!fake) {
> -			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
> -			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
> -			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
> -			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
> -			pr_info("to correct its BIOS.\n");
> -			pr_info("This system has %d DIMM sockets.\n",
> -				num_dimm);
> -		} else {
> -			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> -			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> -			pr_info("work on such system. Use this driver with caution\n");
> -		}
> -	}
> +	if (!fake)
> +		edac_pr_info("%s\n", fake_msg);
> +	else
> +		edac_pr_info("%s\n", super_crap_msg);
> +
> +	edac_pr_info("This system has %d DIMM sockets.\n", num_dimm);
>  
>  	if (!fake) {
>  		/*
> @@ -475,13 +479,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  		 * Keep it in blank for the other memory controllers, as
>  		 * there's no reliable way to properly credit each DIMM to
>  		 * the memory controller, as different BIOSes fill the
> -		 * DMI bank location fields on different ways
> +		 * DMI bank location fields in different ways.
>  		 */
> -		if (!ghes_edac_mc_num) {
> -			dimm_fill.count = 0;
> -			dimm_fill.mci = mci;
> -			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
> -		}
> +		dimm_fill.count = 0;
> +		dimm_fill.mci = mci;
> +		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
>  	} else {
>  		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
>  						       mci->n_layers, 0, 0, 0);
> @@ -495,30 +497,31 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  
>  	rc = edac_mc_add_mc(mci);
>  	if (rc < 0) {
> -		pr_info("Can't register at EDAC core\n");
> +		edac_pr_err("Can't register with EDAC core\n");
>  		edac_mc_free(mci);
> -		mutex_unlock(&ghes_edac_lock);
>  		return -ENODEV;
>  	}
>  
> -	ghes_edac_mc_num++;
> -	mutex_unlock(&ghes_edac_lock);
> +	ghes_register_edac_chain(&ghes_nb);
> +
>  	return 0;
>  }
> -EXPORT_SYMBOL_GPL(ghes_edac_register);
> +module_init(ghes_edac_register);
>  
> -void ghes_edac_unregister(struct ghes *ghes)
> +static void __exit ghes_edac_unregister(void)
>  {
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt, *tmp;
> -
> -	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes) {
> -			mci = pvt->mci;
> -			edac_mc_del_mc(mci->pdev);
> -			edac_mc_free(mci);
> -			list_del(&pvt->list);
> -		}
> -	}
> +
> +	ghes_unregister_edac_chain(&ghes_nb);
> +
> +	mci = find_mci_by_dev(&dummy_dev);
> +	WARN_ON(!mci);
> +
> +	edac_mc_del_mc(mci->pdev);
> +	edac_mc_free(mci);
> +
>  }
> -EXPORT_SYMBOL_GPL(ghes_edac_unregister);
> +module_exit(ghes_edac_unregister);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("GHES error decoding module");
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 506b6a197738..c02b8eb91bd6 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -51,31 +51,6 @@ enum {
>  	GHES_SEV_PANIC = 0x3,
>  };
>  
> -/* From drivers/edac/ghes_edac.c */
> -
> -#ifdef CONFIG_EDAC_GHES
> -void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> -				struct cper_sec_mem_err *mem_err);
> -
> -int ghes_edac_register(struct ghes *ghes, struct device *dev);
> -
> -void ghes_edac_unregister(struct ghes *ghes);
> -
> -#else
> -static inline void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> -				       struct cper_sec_mem_err *mem_err)
> -{
> -}
> -
> -static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
> -{
> -	return 0;
> -}
> -
> -static inline void ghes_edac_unregister(struct ghes *ghes)
> -{
> -}
> -#endif
>  void ghes_register_edac_chain(struct notifier_block *nb);
>  void ghes_unregister_edac_chain(struct notifier_block *nb);
>  



Thanks,
Mauro
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 10:37       ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26 10:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-edac, Tony Luck, Toshimitsu Kani, Rafael J. Wysocki, LKML

On Wed, Jul 26, 2017 at 07:24:04AM -0300, Mauro Carvalho Chehab wrote:
> Hmm... I'm not seeing any implementation that would allow setting
> between firmware first, hardware first or "auto", as we've discussed.

This is all coming up. As the 0/3 message said, these 3 patches are the
bare minimum of reorganizing stuff only and should serve as a base.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 10:37       ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-26 10:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-edac, Tony Luck, Toshimitsu Kani, Rafael J. Wysocki, LKML

On Wed, Jul 26, 2017 at 07:24:04AM -0300, Mauro Carvalho Chehab wrote:
> Hmm... I'm not seeing any implementation that would allow setting
> between firmware first, hardware first or "auto", as we've discussed.

This is all coming up. As the 0/3 message said, these 3 patches are the
bare minimum of reorganizing stuff only and should serve as a base.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 10:51         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 38+ messages in thread
From: Mauro Carvalho Chehab @ 2017-07-26 10:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Tony Luck, Toshimitsu Kani, Rafael J. Wysocki, LKML

Em Wed, 26 Jul 2017 12:37:08 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> On Wed, Jul 26, 2017 at 07:24:04AM -0300, Mauro Carvalho Chehab wrote:
> > Hmm... I'm not seeing any implementation that would allow setting
> > between firmware first, hardware first or "auto", as we've discussed.
> 
> This is all coming up. As the 0/3 message said, these 3 patches are the
> bare minimum of reorganizing stuff only and should serve as a base.

I'll then wait for such patch before acking this series.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 10:51         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 38+ messages in thread
From: Mauro Carvalho Chehab @ 2017-07-26 10:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Tony Luck, Toshimitsu Kani, Rafael J. Wysocki, LKML

Em Wed, 26 Jul 2017 12:37:08 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> On Wed, Jul 26, 2017 at 07:24:04AM -0300, Mauro Carvalho Chehab wrote:
> > Hmm... I'm not seeing any implementation that would allow setting
> > between firmware first, hardware first or "auto", as we've discussed.
> 
> This is all coming up. As the 0/3 message said, these 3 patches are the
> bare minimum of reorganizing stuff only and should serve as a base.

I'll then wait for such patch before acking this series.

Thanks,
Mauro
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 17:27           ` Luck, Tony
  0 siblings, 0 replies; 38+ messages in thread
From: Luck, Tony @ 2017-07-26 17:27 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Borislav Petkov
  Cc: linux-edac, Toshimitsu Kani, Rafael J. Wysocki, LKML

> > > Hmm... I'm not seeing any implementation that would allow setting
> > > between firmware first, hardware first or "auto", as we've discussed.
> > 
> > This is all coming up. As the 0/3 message said, these 3 patches are the
> > bare minimum of reorganizing stuff only and should serve as a base.
>
> I'll then wait for such patch before acking this series.

I didn't think that a BIOS that set "firmware first" gave the OS any choice about this.

What exactly is this option going to do?  Fiddle with ACPI OSC??

-Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 17:27           ` Luck, Tony
  0 siblings, 0 replies; 38+ messages in thread
From: Luck, Tony @ 2017-07-26 17:27 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Borislav Petkov
  Cc: linux-edac, Toshimitsu Kani, Rafael J. Wysocki, LKML

> > > Hmm... I'm not seeing any implementation that would allow setting
> > > between firmware first, hardware first or "auto", as we've discussed.
> > 
> > This is all coming up. As the 0/3 message said, these 3 patches are the
> > bare minimum of reorganizing stuff only and should serve as a base.
>
> I'll then wait for such patch before acking this series.

I didn't think that a BIOS that set "firmware first" gave the OS any choice about this.

What exactly is this option going to do?  Fiddle with ACPI OSC??

-Tony
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 18:17             ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 38+ messages in thread
From: Mauro Carvalho Chehab @ 2017-07-26 18:17 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, linux-edac, Toshimitsu Kani, Rafael J. Wysocki, LKML

Em Wed, 26 Jul 2017 17:27:12 +0000
"Luck, Tony" <tony.luck@intel.com> escreveu:

> > > > Hmm... I'm not seeing any implementation that would allow setting
> > > > between firmware first, hardware first or "auto", as we've discussed.
> > > 
> > > This is all coming up. As the 0/3 message said, these 3 patches are the
> > > bare minimum of reorganizing stuff only and should serve as a base.
> >
> > I'll then wait for such patch before acking this series.
> 
> I didn't think that a BIOS that set "firmware first" gave the OS any choice about this.
> 
> What exactly is this option going to do?  Fiddle with ACPI OSC??

Currently, my HP server that I use to build the Kernel is FF:

[    3.783803] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.

I didn't try to disable FF on its BIOS. Not sure if it is even possible.

Still, EDAC is working there using sb_edac. As I pointed before, one of the
MC channels is not being detected, but I don't use it on this machine.
Except for that, EDAC seems to be working fine there:

$ ras-mc-ctl --layout
       +-----------------------------------------------------------------------+
       |                mc0                |                mc1                |
       | channel0  | channel1  | channel2  | channel0  | channel1  | channel2  |
-------+-----------------------------------------------------------------------+
slot2: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
slot1: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
slot0: |  16384 MB  |     0 MB  |  16384 MB  |  16384 MB  |     0 MB  |  16384 MB  |
-------+---------------------------------------------------------------------------+

# ras-mc-ctl --guess-labels
memory stick 'PROC 1 DIMM 1' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 2' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 3' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 4' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 5' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 6' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 7' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 8' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 9' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 10' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 11' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 12' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 1' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 2' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 3' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 4' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 5' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 6' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 7' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 8' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 9' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 10' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 11' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 12' is located at 'Not Specified'

I didn't try to inject an error, as I'm not sure if EINJ feature is
enabled on this BIOS. Probably not.

At least on this machine, I very much prefer to use sb_edac driver.

As I explained earlier in the previous thread, I just don't if the
BIOS would be doing the right thing for CE, as I don't know its
internal algorithm. 

Also, as I'm maintaining the EDAC userspace tools (rasdaemon),
I would really love to get a few CE error reports there from time to
time, as it could be used to check if rasdaemon is doing do the right
thing to them.

So, I very much prefer to not have any threshold at all there at BIOS.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 18:17             ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 38+ messages in thread
From: Mauro Carvalho Chehab @ 2017-07-26 18:17 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, linux-edac, Toshimitsu Kani, Rafael J. Wysocki, LKML

Em Wed, 26 Jul 2017 17:27:12 +0000
"Luck, Tony" <tony.luck@intel.com> escreveu:

> > > > Hmm... I'm not seeing any implementation that would allow setting
> > > > between firmware first, hardware first or "auto", as we've discussed.
> > > 
> > > This is all coming up. As the 0/3 message said, these 3 patches are the
> > > bare minimum of reorganizing stuff only and should serve as a base.
> >
> > I'll then wait for such patch before acking this series.
> 
> I didn't think that a BIOS that set "firmware first" gave the OS any choice about this.
> 
> What exactly is this option going to do?  Fiddle with ACPI OSC??

Currently, my HP server that I use to build the Kernel is FF:

[    3.783803] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.

I didn't try to disable FF on its BIOS. Not sure if it is even possible.

Still, EDAC is working there using sb_edac. As I pointed before, one of the
MC channels is not being detected, but I don't use it on this machine.
Except for that, EDAC seems to be working fine there:

$ ras-mc-ctl --layout
       +-----------------------------------------------------------------------+
       |                mc0                |                mc1                |
       | channel0  | channel1  | channel2  | channel0  | channel1  | channel2  |
-------+-----------------------------------------------------------------------+
slot2: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
slot1: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
slot0: |  16384 MB  |     0 MB  |  16384 MB  |  16384 MB  |     0 MB  |  16384 MB  |
-------+---------------------------------------------------------------------------+

# ras-mc-ctl --guess-labels
memory stick 'PROC 1 DIMM 1' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 2' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 3' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 4' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 5' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 6' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 7' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 8' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 9' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 10' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 11' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 12' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 1' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 2' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 3' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 4' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 5' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 6' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 7' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 8' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 9' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 10' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 11' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 12' is located at 'Not Specified'

I didn't try to inject an error, as I'm not sure if EINJ feature is
enabled on this BIOS. Probably not.

At least on this machine, I very much prefer to use sb_edac driver.

As I explained earlier in the previous thread, I just don't if the
BIOS would be doing the right thing for CE, as I don't know its
internal algorithm. 

Also, as I'm maintaining the EDAC userspace tools (rasdaemon),
I would really love to get a few CE error reports there from time to
time, as it could be used to check if rasdaemon is doing do the right
thing to them.

So, I very much prefer to not have any threshold at all there at BIOS.

Thanks,
Mauro
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 19:24               ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Kani, Toshimitsu @ 2017-07-26 19:24 UTC (permalink / raw)
  To: mchehab, tony.luck; +Cc: rjw, linux-kernel, bp, linux-edac

On Wed, 2017-07-26 at 15:17 -0300, Mauro Carvalho Chehab wrote:
> Em Wed, 26 Jul 2017 17:27:12 +0000
 :
> I didn't try to inject an error, as I'm not sure if EINJ feature is
> enabled on this BIOS. Probably not.

I believe it has EINJ support.

> At least on this machine, I very much prefer to use sb_edac driver.
> 
> As I explained earlier in the previous thread, I just don't if the
> BIOS would be doing the right thing for CE, as I don't know its
> internal algorithm. 
> 
> Also, as I'm maintaining the EDAC userspace tools (rasdaemon),
> I would really love to get a few CE error reports there from time to
> time, as it could be used to check if rasdaemon is doing do the right
> thing to them.
> 
> So, I very much prefer to not have any threshold at all there at
> BIOS.

Using sb_edac does not change the fact that it is FF.  I do not think
you'd see normal CEs on your box.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 19:24               ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Toshi Kani @ 2017-07-26 19:24 UTC (permalink / raw)
  To: mchehab, tony.luck; +Cc: rjw, linux-kernel, bp, linux-edac

On Wed, 2017-07-26 at 15:17 -0300, Mauro Carvalho Chehab wrote:
> Em Wed, 26 Jul 2017 17:27:12 +0000
 :
> I didn't try to inject an error, as I'm not sure if EINJ feature is
> enabled on this BIOS. Probably not.

I believe it has EINJ support.

> At least on this machine, I very much prefer to use sb_edac driver.
> 
> As I explained earlier in the previous thread, I just don't if the
> BIOS would be doing the right thing for CE, as I don't know its
> internal algorithm. 
> 
> Also, as I'm maintaining the EDAC userspace tools (rasdaemon),
> I would really love to get a few CE error reports there from time to
> time, as it could be used to check if rasdaemon is doing do the right
> thing to them.
> 
> So, I very much prefer to not have any threshold at all there at
> BIOS.

Using sb_edac does not change the fact that it is FF.  I do not think
you'd see normal CEs on your box.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 19:49       ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Kani, Toshimitsu @ 2017-07-26 19:49 UTC (permalink / raw)
  To: mchehab, bp; +Cc: linux-kernel, tony.luck, linux-edac, rjw

On Wed, 2017-07-26 at 07:24 -0300, Mauro Carvalho Chehab wrote:
> Em Wed, 26 Jul 2017 10:48:27 +0200
> Borislav Petkov <bp@alien8.de> escreveu:
> 
> > From: Borislav Petkov <bp@suse.de>
> > 
> > Register with the GHES notifier chain so that there's no need to
> > call into the module with ghes_edac_report_mem_error().
> 
> Hmm... I'm not seeing any implementation that would allow setting
> between firmware first, hardware first or "auto", as we've discussed.

A minor nit for terminology - hardware is always the first one to
detect an error.  The difference is whether an error is first reported
to the kernel or firmware.  So, "kernel first" or "os first" would be
more appropriate as an alternative to "firmware first".  That said,
such wording may be misleading since it does not change the mode.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-26 19:49       ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Toshi Kani @ 2017-07-26 19:49 UTC (permalink / raw)
  To: mchehab, bp; +Cc: linux-kernel, tony.luck, linux-edac, rjw

T24gV2VkLCAyMDE3LTA3LTI2IGF0IDA3OjI0IC0wMzAwLCBNYXVybyBDYXJ2YWxobyBDaGVoYWIg
d3JvdGU6DQo+IEVtIFdlZCwgMjYgSnVsIDIwMTcgMTA6NDg6MjcgKzAyMDANCj4gQm9yaXNsYXYg
UGV0a292IDxicEBhbGllbjguZGU+IGVzY3JldmV1Og0KPiANCj4gPiBGcm9tOiBCb3Jpc2xhdiBQ
ZXRrb3YgPGJwQHN1c2UuZGU+DQo+ID4gDQo+ID4gUmVnaXN0ZXIgd2l0aCB0aGUgR0hFUyBub3Rp
ZmllciBjaGFpbiBzbyB0aGF0IHRoZXJlJ3Mgbm8gbmVlZCB0bw0KPiA+IGNhbGwgaW50byB0aGUg
bW9kdWxlIHdpdGggZ2hlc19lZGFjX3JlcG9ydF9tZW1fZXJyb3IoKS4NCj4gDQo+IEhtbS4uLiBJ
J20gbm90IHNlZWluZyBhbnkgaW1wbGVtZW50YXRpb24gdGhhdCB3b3VsZCBhbGxvdyBzZXR0aW5n
DQo+IGJldHdlZW4gZmlybXdhcmUgZmlyc3QsIGhhcmR3YXJlIGZpcnN0IG9yICJhdXRvIiwgYXMg
d2UndmUgZGlzY3Vzc2VkLg0KDQpBIG1pbm9yIG5pdCBmb3IgdGVybWlub2xvZ3kgLSBoYXJkd2Fy
ZSBpcyBhbHdheXMgdGhlIGZpcnN0IG9uZSB0bw0KZGV0ZWN0IGFuIGVycm9yLiAgVGhlIGRpZmZl
cmVuY2UgaXMgd2hldGhlciBhbiBlcnJvciBpcyBmaXJzdCByZXBvcnRlZA0KdG8gdGhlIGtlcm5l
bCBvciBmaXJtd2FyZS4gIFNvLCAia2VybmVsIGZpcnN0IiBvciAib3MgZmlyc3QiIHdvdWxkIGJl
DQptb3JlIGFwcHJvcHJpYXRlIGFzIGFuIGFsdGVybmF0aXZlIHRvICJmaXJtd2FyZSBmaXJzdCIu
ICBUaGF0IHNhaWQsDQpzdWNoIHdvcmRpbmcgbWF5IGJlIG1pc2xlYWRpbmcgc2luY2UgaXQgZG9l
cyBub3QgY2hhbmdlIHRoZSBtb2RlLg0KDQpUaGFua3MsDQotVG9zaGkNCg0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-27  5:20                 ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-27  5:20 UTC (permalink / raw)
  To: mchehab, tony.luck; +Cc: Kani, Toshimitsu, rjw, linux-kernel, linux-edac

On Wed, Jul 26, 2017 at 07:24:59PM +0000, Kani, Toshimitsu wrote:
> Using sb_edac does not change the fact that it is FF.  I do not think
> you'd see normal CEs on your box.

I guess we should add some blurb to EDAC to say that on FF systems,
error counts are unreliable or even non-existent.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-27  5:20                 ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-27  5:20 UTC (permalink / raw)
  To: mchehab, tony.luck; +Cc: Kani, Toshimitsu, rjw, linux-kernel, linux-edac

On Wed, Jul 26, 2017 at 07:24:59PM +0000, Kani, Toshimitsu wrote:
> Using sb_edac does not change the fact that it is FF.  I do not think
> you'd see normal CEs on your box.

I guess we should add some blurb to EDAC to say that on FF systems,
error counts are unreliable or even non-existent.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/3] EDAC: Convert ghes_edac to a normal module
  2017-07-26  8:48 [PATCH 0/3] EDAC: Convert ghes_edac to a normal module Borislav Petkov
                   ` (2 preceding siblings ...)
  2017-07-26  8:48   ` [3/3] " Borislav Petkov
@ 2017-07-27  5:54 ` Borislav Petkov
  3 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-27  5:54 UTC (permalink / raw)
  To: linux-edac
  Cc: Tony Luck, Mauro Carvalho Chehab, Toshimitsu Kani,
	Rafael J. Wysocki, LKML

On Wed, Jul 26, 2017 at 10:48:24AM +0200, Borislav Petkov wrote:
>   EDAC: Add edac_pr_err/info macros
>   ACPI/GHES: Add an EDAC notifier chain
>   EDAC, ghes: Make it a proper module

Pushed here:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=ghes

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-28 18:50     ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Kani, Toshimitsu @ 2017-07-28 18:50 UTC (permalink / raw)
  To: bp, linux-edac; +Cc: mchehab, linux-kernel, tony.luck, rjw

On Wed, 2017-07-26 at 10:48 +0200, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> Register with the GHES notifier chain so that there's no need to call
> into the module with ghes_edac_report_mem_error().
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
 :
> +static int report_mem_error(struct notifier_block *nb, unsigned long
> sev, void *data)
>  {
> +	struct cper_sec_mem_err *mem_err = data;
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = NULL;
> -	char *p;
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
>  	u8 grain_bits;
> +	char *p;
>  
> -	list_for_each_entry(pvt, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes)
> -			break;
> -	}
>  	if (!pvt) {

I think it always hits this error condition.  See below.

> -		pr_err("Internal error: Can't find EDAC
> structure\n");
> -		return;
> +		edac_pr_err("Internal error: Can't find EDAC
> structure\n");
> +		return NOTIFY_DONE;
>  	}
> +
 :
> +static int __init ghes_edac_register(void)
>  {
> +	struct ghes_edac_pvt *pvt = ghes_pvt;

This simply sets NULL to pvt, and does not initialize ghes_pvt.

>  	bool fake = false;
>  	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
> -	struct ghes_edac_pvt *pvt;
>  	struct ghes_edac_dimm_fill dimm_fill;
>  
 :
> -EXPORT_SYMBOL_GPL(ghes_edac_register);
> +module_init(ghes_edac_register);

Since this patch has removed the GHES-presence check and ordering hack
for ghes_edac, it now registers itself per the module_init() ordering
regardless of the presence of GHES.  As Mauro pointed out, some type of
GHES check needs to be in place before making this change.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-28 18:50     ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Toshi Kani @ 2017-07-28 18:50 UTC (permalink / raw)
  To: bp, linux-edac; +Cc: mchehab, linux-kernel, tony.luck, rjw

On Wed, 2017-07-26 at 10:48 +0200, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> Register with the GHES notifier chain so that there's no need to call
> into the module with ghes_edac_report_mem_error().
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
 :
> +static int report_mem_error(struct notifier_block *nb, unsigned long
> sev, void *data)
>  {
> +	struct cper_sec_mem_err *mem_err = data;
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = NULL;
> -	char *p;
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
>  	u8 grain_bits;
> +	char *p;
>  
> -	list_for_each_entry(pvt, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes)
> -			break;
> -	}
>  	if (!pvt) {

I think it always hits this error condition.  See below.

> -		pr_err("Internal error: Can't find EDAC
> structure\n");
> -		return;
> +		edac_pr_err("Internal error: Can't find EDAC
> structure\n");
> +		return NOTIFY_DONE;
>  	}
> +
 :
> +static int __init ghes_edac_register(void)
>  {
> +	struct ghes_edac_pvt *pvt = ghes_pvt;

This simply sets NULL to pvt, and does not initialize ghes_pvt.

>  	bool fake = false;
>  	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
> -	struct ghes_edac_pvt *pvt;
>  	struct ghes_edac_dimm_fill dimm_fill;
>  
 :
> -EXPORT_SYMBOL_GPL(ghes_edac_register);
> +module_init(ghes_edac_register);

Since this patch has removed the GHES-presence check and ordering hack
for ghes_edac, it now registers itself per the module_init() ordering
regardless of the presence of GHES.  As Mauro pointed out, some type of
GHES check needs to be in place before making this change.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-29  6:47       ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-29  6:47 UTC (permalink / raw)
  To: Kani, Toshimitsu; +Cc: linux-edac, mchehab, linux-kernel, tony.luck, rjw

On Fri, Jul 28, 2017 at 06:50:56PM +0000, Kani, Toshimitsu wrote:
> This simply sets NULL to pvt, and does not initialize ghes_pvt.

Yeah, I guess we need this ontop:

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index ecd34b8bd27e..e158bf4ee337 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -422,7 +422,6 @@ static const char * const super_crap_msg =
 
 static int __init ghes_edac_register(void)
 {
-	struct ghes_edac_pvt *pvt = ghes_pvt;
 	bool fake = false;
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
@@ -446,15 +445,15 @@ static int __init ghes_edac_register(void)
 	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
 	 * to avoid duplicated memory controller numbers
 	 */
-	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		edac_pr_err("Can't allocate memory for EDAC data\n");
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	pvt->mci  = mci;
+	ghes_pvt = mci->pvt_info;
+	memset(ghes_pvt, 0, sizeof(struct ghes_edac_pvt));
+	ghes_pvt->mci = mci;
 
 	mci->pdev = &dummy_dev;
 
---

>  As Mauro pointed out, some type of GHES check needs to be in place
> before making this change.

Your whitelist I guess.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-29  6:47       ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-07-29  6:47 UTC (permalink / raw)
  To: Kani, Toshimitsu; +Cc: linux-edac, mchehab, linux-kernel, tony.luck, rjw

On Fri, Jul 28, 2017 at 06:50:56PM +0000, Kani, Toshimitsu wrote:
> This simply sets NULL to pvt, and does not initialize ghes_pvt.

Yeah, I guess we need this ontop:
---
---

>  As Mauro pointed out, some type of GHES check needs to be in place
> before making this change.

Your whitelist I guess.

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index ecd34b8bd27e..e158bf4ee337 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -422,7 +422,6 @@ static const char * const super_crap_msg =
 
 static int __init ghes_edac_register(void)
 {
-	struct ghes_edac_pvt *pvt = ghes_pvt;
 	bool fake = false;
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
@@ -446,15 +445,15 @@ static int __init ghes_edac_register(void)
 	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
 	 * to avoid duplicated memory controller numbers
 	 */
-	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		edac_pr_err("Can't allocate memory for EDAC data\n");
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	pvt->mci  = mci;
+	ghes_pvt = mci->pvt_info;
+	memset(ghes_pvt, 0, sizeof(struct ghes_edac_pvt));
+	ghes_pvt->mci = mci;
 
 	mci->pdev = &dummy_dev;
 

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-07-31 20:19         ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Kani, Toshimitsu @ 2017-07-31 20:19 UTC (permalink / raw)
  To: bp; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

On Sat, 2017-07-29 at 08:47 +0200, Borislav Petkov wrote:
> On Fri, Jul 28, 2017 at 06:50:56PM +0000, Kani, Toshimitsu wrote:
> > This simply sets NULL to pvt, and does not initialize ghes_pvt.
> 
> Yeah, I guess we need this ontop:

Yes, this fix looks good.

> >  As Mauro pointed out, some type of GHES check needs to be in place
> > before making this change.
> 
> Your whitelist I guess.

We still need the two mechanisms provided by the existing code below. 
The whitelist simply compliments 1.

1. GHES-presence check.  (We can add APEI OSC bit check as well.)
2. Module priority.  ghes_edac has higher priority for registration.

I'd prefer to add the whitelist check to ghes_edac first.  This makes
the existing code to work.  We can then work on refactoring changes
like this on top of it without breaking the functionality.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-07-31 20:19         ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Toshi Kani @ 2017-07-31 20:19 UTC (permalink / raw)
  To: bp; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

T24gU2F0LCAyMDE3LTA3LTI5IGF0IDA4OjQ3ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIEZyaSwgSnVsIDI4LCAyMDE3IGF0IDA2OjUwOjU2UE0gKzAwMDAsIEthbmksIFRvc2hp
bWl0c3Ugd3JvdGU6DQo+ID4gVGhpcyBzaW1wbHkgc2V0cyBOVUxMIHRvIHB2dCwgYW5kIGRvZXMg
bm90IGluaXRpYWxpemUgZ2hlc19wdnQuDQo+IA0KPiBZZWFoLCBJIGd1ZXNzIHdlIG5lZWQgdGhp
cyBvbnRvcDoNCg0KWWVzLCB0aGlzIGZpeCBsb29rcyBnb29kLg0KDQo+ID4gwqBBcyBNYXVybyBw
b2ludGVkIG91dCwgc29tZSB0eXBlIG9mIEdIRVMgY2hlY2sgbmVlZHMgdG8gYmUgaW4gcGxhY2UN
Cj4gPiBiZWZvcmUgbWFraW5nIHRoaXMgY2hhbmdlLg0KPiANCj4gWW91ciB3aGl0ZWxpc3QgSSBn
dWVzcy4NCg0KV2Ugc3RpbGwgbmVlZCB0aGUgdHdvIG1lY2hhbmlzbXMgcHJvdmlkZWQgYnkgdGhl
IGV4aXN0aW5nIGNvZGUgYmVsb3cuIA0KVGhlIHdoaXRlbGlzdCBzaW1wbHkgY29tcGxpbWVudHMg
MS4NCg0KMS4gR0hFUy1wcmVzZW5jZSBjaGVjay4gIChXZSBjYW4gYWRkIEFQRUkgT1NDIGJpdCBj
aGVjayBhcyB3ZWxsLikNCjIuIE1vZHVsZSBwcmlvcml0eS4gIGdoZXNfZWRhYyBoYXMgaGlnaGVy
IHByaW9yaXR5IGZvciByZWdpc3RyYXRpb24uDQoNCkknZCBwcmVmZXIgdG8gYWRkIHRoZSB3aGl0
ZWxpc3QgY2hlY2sgdG8gZ2hlc19lZGFjIGZpcnN0LiAgVGhpcyBtYWtlcw0KdGhlIGV4aXN0aW5n
IGNvZGUgdG8gd29yay4gIFdlIGNhbiB0aGVuIHdvcmsgb24gcmVmYWN0b3JpbmcgY2hhbmdlcw0K
bGlrZSB0aGlzIG9uIHRvcCBvZiBpdCB3aXRob3V0IGJyZWFraW5nIHRoZSBmdW5jdGlvbmFsaXR5
Lg0KDQpUaGFua3MsDQotVG9zaGkNCg0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-08-01  9:46           ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-08-01  9:46 UTC (permalink / raw)
  To: Kani, Toshimitsu; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

On Mon, Jul 31, 2017 at 08:19:32PM +0000, Kani, Toshimitsu wrote:
> I'd prefer to add the whitelist check to ghes_edac first.  This makes
> the existing code to work.  We can then work on refactoring changes
> like this on top of it without breaking the functionality.

Yes, but we want only the whitelist - not the FF testing because, as we
said, BIOS is notoriously buggy so we're going to load ghes_edac only on
known-good platforms.

Which brings the question about the priority.

And I *think* the easiest would be if the whitelist were in the core
edac.ko module, perhaps in edac_module.c (even though it doesn't really
matter, technically).

There we can set a "use_ghes" or so bool which the x86 platform drivers
would query through accessor functions and determine whether to load or
not.

In any case, something like that. I'm always open for better
suggestions, though.

I've pushed a rebased branch here:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=ghes

feel free to base your changes ontop.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-08-01  9:46           ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-08-01  9:46 UTC (permalink / raw)
  To: Kani, Toshimitsu; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

On Mon, Jul 31, 2017 at 08:19:32PM +0000, Kani, Toshimitsu wrote:
> I'd prefer to add the whitelist check to ghes_edac first.  This makes
> the existing code to work.  We can then work on refactoring changes
> like this on top of it without breaking the functionality.

Yes, but we want only the whitelist - not the FF testing because, as we
said, BIOS is notoriously buggy so we're going to load ghes_edac only on
known-good platforms.

Which brings the question about the priority.

And I *think* the easiest would be if the whitelist were in the core
edac.ko module, perhaps in edac_module.c (even though it doesn't really
matter, technically).

There we can set a "use_ghes" or so bool which the x86 platform drivers
would query through accessor functions and determine whether to load or
not.

In any case, something like that. I'm always open for better
suggestions, though.

I've pushed a rebased branch here:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=ghes

feel free to base your changes ontop.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-08-02  0:19             ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Kani, Toshimitsu @ 2017-08-02  0:19 UTC (permalink / raw)
  To: bp; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

On Tue, 2017-08-01 at 11:46 +0200, Borislav Petkov wrote:
> On Mon, Jul 31, 2017 at 08:19:32PM +0000, Kani, Toshimitsu wrote:
> > I'd prefer to add the whitelist check to ghes_edac first.  This
> > makes the existing code to work.  We can then work on refactoring
> > changes like this on top of it without breaking the functionality.
> 
> Yes, but we want only the whitelist - not the FF testing because, as
> we said, BIOS is notoriously buggy so we're going to load ghes_edac
> only on known-good platforms.

This GHES-probe itself is appropriate and should remain.  Since not all
GHES firmware can be trusted, we will add the white-list as an
additional condition to complement this check.

> Which brings the question about the priority.
> 
> And I *think* the easiest would be if the whitelist were in the core
> edac.ko module, perhaps in edac_module.c (even though it doesn't
> really matter, technically).

I agree that adding the white-list into the core edac module is the
easiest when we make the change on top of yours.  Thinking further on
this, though, I now think that keeping the current implementation is
more reasonable with the reasons below.

1. Device-probing-logic should belong to a driver, and should remain
private to a driver.  When we add the white-list, it should be added to
ghes_edac.

2. ghes_edac is an extension to the ghes driver as they both are
specific to ghes.  ghes_edac is merely ghes driver's edac error-
reporting wrapper than an independent edac driver.  It looks OK to let
ghes_edac get registered as part of ghes_probe() and leave it as an
unconventional edac driver.

3. EDAC does not have its managed probe-chain.  All edac drivers are
called from module_init list.  They independently probe the hardware
and get unloaded when not needed.  The core edac is simply a set of
library to them.  I think it's good to keep them independent, and not
to introduce a new central mechanism for a special case like ghes_edac.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-08-02  0:19             ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Toshi Kani @ 2017-08-02  0:19 UTC (permalink / raw)
  To: bp; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

On Tue, 2017-08-01 at 11:46 +0200, Borislav Petkov wrote:
> On Mon, Jul 31, 2017 at 08:19:32PM +0000, Kani, Toshimitsu wrote:
> > I'd prefer to add the whitelist check to ghes_edac first.  This
> > makes the existing code to work.  We can then work on refactoring
> > changes like this on top of it without breaking the functionality.
> 
> Yes, but we want only the whitelist - not the FF testing because, as
> we said, BIOS is notoriously buggy so we're going to load ghes_edac
> only on known-good platforms.

This GHES-probe itself is appropriate and should remain.  Since not all
GHES firmware can be trusted, we will add the white-list as an
additional condition to complement this check.

> Which brings the question about the priority.
> 
> And I *think* the easiest would be if the whitelist were in the core
> edac.ko module, perhaps in edac_module.c (even though it doesn't
> really matter, technically).

I agree that adding the white-list into the core edac module is the
easiest when we make the change on top of yours.  Thinking further on
this, though, I now think that keeping the current implementation is
more reasonable with the reasons below.

1. Device-probing-logic should belong to a driver, and should remain
private to a driver.  When we add the white-list, it should be added to
ghes_edac.

2. ghes_edac is an extension to the ghes driver as they both are
specific to ghes.  ghes_edac is merely ghes driver's edac error-
reporting wrapper than an independent edac driver.  It looks OK to let
ghes_edac get registered as part of ghes_probe() and leave it as an
unconventional edac driver.

3. EDAC does not have its managed probe-chain.  All edac drivers are
called from module_init list.  They independently probe the hardware
and get unloaded when not needed.  The core edac is simply a set of
library to them.  I think it's good to keep them independent, and not
to introduce a new central mechanism for a special case like ghes_edac.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-08-02  3:18               ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-08-02  3:18 UTC (permalink / raw)
  To: Kani, Toshimitsu; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

On Wed, Aug 02, 2017 at 12:19:29AM +0000, Kani, Toshimitsu wrote:
> 1. Device-probing-logic should belong to a driver, and should remain
> private to a driver.  When we add the white-list, it should be added to
> ghes_edac.

Nonsense. There are a lot of examples where driver probing depends on
outside modalities like built-in quirks and such.

> 2. ghes_edac is an extension to the ghes driver as they both are
> specific to ghes.  ghes_edac is merely ghes driver's edac error-
> reporting wrapper than an independent edac driver.  It looks OK to let
> ghes_edac get registered as part of ghes_probe() and leave it as an
> unconventional edac driver.

Except that GHES wants to report into the EDAC infrastructure so it
better has a wrapper for it.

One of the directions I explored when looking at this is to stick
ghes_edac functionality into ghes.c or so and make it completely
independent from EDAC. Would've been much cleaner.

> 3. EDAC does not have its managed probe-chain.  All edac drivers are
> called from module_init list.  They independently probe the hardware
> and get unloaded when not needed.  The core edac is simply a set of
> library to them.  I think it's good to keep them independent, and not
> to introduce a new central mechanism for a special case like ghes_edac.

They're independent because before GHES we needed to load one driver per
system. Until the bolted-on thing came. And it is bolted on because the
already overwhelmed firmware decided to do error reporting too.

So the only real reason why I'm fine with keeping the current situation
is the whitelist. Because then, we can at least control what loads and
what not.

But then we need:

1. A clean mechanism for the platform drivers to query whether another
agent is loaded (ghes_edac) and not do any probing then.

2. ghes_edac needs to drop that multiple probing thing as its
dmi_walk(ghes_edac_count_dimms, &num_dimm) already probes *all* DIMMs on
the system so no need to do that multiple times.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-08-02  3:18               ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2017-08-02  3:18 UTC (permalink / raw)
  To: Kani, Toshimitsu; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

On Wed, Aug 02, 2017 at 12:19:29AM +0000, Kani, Toshimitsu wrote:
> 1. Device-probing-logic should belong to a driver, and should remain
> private to a driver.  When we add the white-list, it should be added to
> ghes_edac.

Nonsense. There are a lot of examples where driver probing depends on
outside modalities like built-in quirks and such.

> 2. ghes_edac is an extension to the ghes driver as they both are
> specific to ghes.  ghes_edac is merely ghes driver's edac error-
> reporting wrapper than an independent edac driver.  It looks OK to let
> ghes_edac get registered as part of ghes_probe() and leave it as an
> unconventional edac driver.

Except that GHES wants to report into the EDAC infrastructure so it
better has a wrapper for it.

One of the directions I explored when looking at this is to stick
ghes_edac functionality into ghes.c or so and make it completely
independent from EDAC. Would've been much cleaner.

> 3. EDAC does not have its managed probe-chain.  All edac drivers are
> called from module_init list.  They independently probe the hardware
> and get unloaded when not needed.  The core edac is simply a set of
> library to them.  I think it's good to keep them independent, and not
> to introduce a new central mechanism for a special case like ghes_edac.

They're independent because before GHES we needed to load one driver per
system. Until the bolted-on thing came. And it is bolted on because the
already overwhelmed firmware decided to do error reporting too.

So the only real reason why I'm fine with keeping the current situation
is the whitelist. Because then, we can at least control what loads and
what not.

But then we need:

1. A clean mechanism for the platform drivers to query whether another
agent is loaded (ghes_edac) and not do any probing then.

2. ghes_edac needs to drop that multiple probing thing as its
dmi_walk(ghes_edac_count_dimms, &num_dimm) already probes *all* DIMMs on
the system so no need to do that multiple times.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
@ 2017-08-02 22:41                 ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Kani, Toshimitsu @ 2017-08-02 22:41 UTC (permalink / raw)
  To: bp; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

On Wed, 2017-08-02 at 05:18 +0200, Borislav Petkov wrote:
> On Wed, Aug 02, 2017 at 12:19:29AM +0000, Kani, Toshimitsu wrote:
> > 1. Device-probing-logic should belong to a driver, and should
> > remain private to a driver.  When we add the white-list, it should
> > be added to ghes_edac.
> 
> Nonsense. There are a lot of examples where driver probing depends on
> outside modalities like built-in quirks and such.
> 
> > 2. ghes_edac is an extension to the ghes driver as they both are
> > specific to ghes.  ghes_edac is merely ghes driver's edac error-
> > reporting wrapper than an independent edac driver.  It looks OK to
> > let ghes_edac get registered as part of ghes_probe() and leave it
> > as an unconventional edac driver.
> 
> Except that GHES wants to report into the EDAC infrastructure so it
> better has a wrapper for it.
> 
> One of the directions I explored when looking at this is to stick
> ghes_edac functionality into ghes.c or so and make it completely
> independent from EDAC. Would've been much cleaner.

Agreed. I think the current model aimed at this direction while it was
needed to depend on EDAC.

> > 3. EDAC does not have its managed probe-chain.  All edac drivers
> > are called from module_init list.  They independently probe the
> > hardware and get unloaded when not needed.  The core edac is simply
> > a set of library to them.  I think it's good to keep them
> > independent, and not to introduce a new central mechanism for a
> > special case like ghes_edac.
> 
> They're independent because before GHES we needed to load one driver
> per system. Until the bolted-on thing came. And it is bolted on
> because the already overwhelmed firmware decided to do error
> reporting too.
> 
> So the only real reason why I'm fine with keeping the current
> situation is the whitelist. Because then, we can at least control
> what loads and what not.
> 
> But then we need:
> 
> 1. A clean mechanism for the platform drivers to query whether
> another agent is loaded (ghes_edac) and not do any probing then.
> 
> 2. ghes_edac needs to drop that multiple probing thing as its
> dmi_walk(ghes_edac_count_dimms, &num_dimm) already probes *all* DIMMs
> on the system so no need to do that multiple times.

Sounds good. I will keep the current model and address the above
points.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [3/3] EDAC, ghes: Make it a proper module
@ 2017-08-02 22:41                 ` Toshi Kani
  0 siblings, 0 replies; 38+ messages in thread
From: Toshi Kani @ 2017-08-02 22:41 UTC (permalink / raw)
  To: bp; +Cc: mchehab, linux-kernel, tony.luck, linux-edac, rjw

T24gV2VkLCAyMDE3LTA4LTAyIGF0IDA1OjE4ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIFdlZCwgQXVnIDAyLCAyMDE3IGF0IDEyOjE5OjI5QU0gKzAwMDAsIEthbmksIFRvc2hp
bWl0c3Ugd3JvdGU6DQo+ID4gMS4gRGV2aWNlLXByb2JpbmctbG9naWMgc2hvdWxkIGJlbG9uZyB0
byBhIGRyaXZlciwgYW5kIHNob3VsZA0KPiA+IHJlbWFpbiBwcml2YXRlIHRvIGEgZHJpdmVyLsKg
wqBXaGVuIHdlIGFkZCB0aGUgd2hpdGUtbGlzdCwgaXQgc2hvdWxkDQo+ID4gYmUgYWRkZWQgdG8g
Z2hlc19lZGFjLg0KPiANCj4gTm9uc2Vuc2UuIFRoZXJlIGFyZSBhIGxvdCBvZiBleGFtcGxlcyB3
aGVyZSBkcml2ZXIgcHJvYmluZyBkZXBlbmRzIG9uDQo+IG91dHNpZGUgbW9kYWxpdGllcyBsaWtl
IGJ1aWx0LWluIHF1aXJrcyBhbmQgc3VjaC4NCj4gDQo+ID4gMi4gZ2hlc19lZGFjIGlzIGFuIGV4
dGVuc2lvbiB0byB0aGUgZ2hlcyBkcml2ZXIgYXMgdGhleSBib3RoIGFyZQ0KPiA+IHNwZWNpZmlj
IHRvIGdoZXMuwqDCoGdoZXNfZWRhYyBpcyBtZXJlbHkgZ2hlcyBkcml2ZXIncyBlZGFjIGVycm9y
LQ0KPiA+IHJlcG9ydGluZyB3cmFwcGVyIHRoYW4gYW4gaW5kZXBlbmRlbnQgZWRhYyBkcml2ZXIu
wqDCoEl0IGxvb2tzIE9LIHRvDQo+ID4gbGV0IGdoZXNfZWRhYyBnZXQgcmVnaXN0ZXJlZCBhcyBw
YXJ0IG9mIGdoZXNfcHJvYmUoKSBhbmQgbGVhdmUgaXQNCj4gPiBhcyBhbiB1bmNvbnZlbnRpb25h
bCBlZGFjIGRyaXZlci4NCj4gDQo+IEV4Y2VwdCB0aGF0IEdIRVMgd2FudHMgdG8gcmVwb3J0IGlu
dG8gdGhlIEVEQUMgaW5mcmFzdHJ1Y3R1cmUgc28gaXQNCj4gYmV0dGVyIGhhcyBhIHdyYXBwZXIg
Zm9yIGl0Lg0KPiANCj4gT25lIG9mIHRoZSBkaXJlY3Rpb25zIEkgZXhwbG9yZWQgd2hlbiBsb29r
aW5nIGF0IHRoaXMgaXMgdG8gc3RpY2sNCj4gZ2hlc19lZGFjIGZ1bmN0aW9uYWxpdHkgaW50byBn
aGVzLmMgb3Igc28gYW5kIG1ha2UgaXQgY29tcGxldGVseQ0KPiBpbmRlcGVuZGVudCBmcm9tIEVE
QUMuIFdvdWxkJ3ZlIGJlZW4gbXVjaCBjbGVhbmVyLg0KDQpBZ3JlZWQuIEkgdGhpbmsgdGhlIGN1
cnJlbnQgbW9kZWwgYWltZWQgYXQgdGhpcyBkaXJlY3Rpb24gd2hpbGUgaXQgd2FzDQpuZWVkZWQg
dG8gZGVwZW5kIG9uIEVEQUMuDQoNCj4gPiAzLiBFREFDIGRvZXMgbm90IGhhdmUgaXRzIG1hbmFn
ZWQgcHJvYmUtY2hhaW4uwqDCoEFsbCBlZGFjIGRyaXZlcnMNCj4gPiBhcmUgY2FsbGVkIGZyb20g
bW9kdWxlX2luaXQgbGlzdC7CoMKgVGhleSBpbmRlcGVuZGVudGx5IHByb2JlIHRoZQ0KPiA+IGhh
cmR3YXJlIGFuZCBnZXQgdW5sb2FkZWQgd2hlbiBub3QgbmVlZGVkLsKgwqBUaGUgY29yZSBlZGFj
IGlzIHNpbXBseQ0KPiA+IGEgc2V0IG9mIGxpYnJhcnkgdG8gdGhlbS7CoMKgSSB0aGluayBpdCdz
IGdvb2QgdG8ga2VlcCB0aGVtDQo+ID4gaW5kZXBlbmRlbnQsIGFuZCBub3QgdG8gaW50cm9kdWNl
IGEgbmV3IGNlbnRyYWwgbWVjaGFuaXNtIGZvciBhDQo+ID4gc3BlY2lhbCBjYXNlIGxpa2UgZ2hl
c19lZGFjLg0KPiANCj4gVGhleSdyZSBpbmRlcGVuZGVudCBiZWNhdXNlIGJlZm9yZSBHSEVTIHdl
IG5lZWRlZCB0byBsb2FkIG9uZSBkcml2ZXINCj4gcGVyIHN5c3RlbS4gVW50aWwgdGhlIGJvbHRl
ZC1vbiB0aGluZyBjYW1lLiBBbmQgaXQgaXMgYm9sdGVkIG9uDQo+IGJlY2F1c2UgdGhlIGFscmVh
ZHkgb3ZlcndoZWxtZWQgZmlybXdhcmUgZGVjaWRlZCB0byBkbyBlcnJvcg0KPiByZXBvcnRpbmcg
dG9vLg0KPiANCj4gU28gdGhlIG9ubHkgcmVhbCByZWFzb24gd2h5IEknbSBmaW5lIHdpdGgga2Vl
cGluZyB0aGUgY3VycmVudA0KPiBzaXR1YXRpb24gaXMgdGhlIHdoaXRlbGlzdC4gQmVjYXVzZSB0
aGVuLCB3ZSBjYW4gYXQgbGVhc3QgY29udHJvbA0KPiB3aGF0IGxvYWRzIGFuZCB3aGF0IG5vdC4N
Cj4gDQo+IEJ1dCB0aGVuIHdlIG5lZWQ6DQo+IA0KPiAxLiBBIGNsZWFuIG1lY2hhbmlzbSBmb3Ig
dGhlIHBsYXRmb3JtIGRyaXZlcnMgdG8gcXVlcnkgd2hldGhlcg0KPiBhbm90aGVyIGFnZW50IGlz
IGxvYWRlZCAoZ2hlc19lZGFjKSBhbmQgbm90IGRvIGFueSBwcm9iaW5nIHRoZW4uDQo+IA0KPiAy
LiBnaGVzX2VkYWMgbmVlZHMgdG8gZHJvcCB0aGF0IG11bHRpcGxlIHByb2JpbmcgdGhpbmcgYXMg
aXRzDQo+IGRtaV93YWxrKGdoZXNfZWRhY19jb3VudF9kaW1tcywgJm51bV9kaW1tKSBhbHJlYWR5
IHByb2JlcyAqYWxsKiBESU1Ncw0KPiBvbiB0aGUgc3lzdGVtIHNvIG5vIG5lZWQgdG8gZG8gdGhh
dCBtdWx0aXBsZSB0aW1lcy4NCg0KU291bmRzIGdvb2QuIEkgd2lsbCBrZWVwIHRoZSBjdXJyZW50
IG1vZGVsIGFuZCBhZGRyZXNzIHRoZSBhYm92ZQ0KcG9pbnRzLg0KDQpUaGFua3MsDQotVG9zaGkN
Cg0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2017-08-02 22:41 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-26  8:48 [PATCH 0/3] EDAC: Convert ghes_edac to a normal module Borislav Petkov
2017-07-26  8:48 ` [PATCH 1/3] EDAC: Add edac_pr_err/info macros Borislav Petkov
2017-07-26  8:48   ` [1/3] " Borislav Petkov
2017-07-26  8:48 ` [PATCH 2/3] ACPI/GHES: Add an EDAC notifier chain Borislav Petkov
2017-07-26  8:48   ` [2/3] " Borislav Petkov
2017-07-26  8:48 ` [PATCH 3/3] EDAC, ghes: Make it a proper module Borislav Petkov
2017-07-26  8:48   ` [3/3] " Borislav Petkov
2017-07-26 10:24   ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-26 10:24     ` [3/3] " Mauro Carvalho Chehab
2017-07-26 10:37     ` [PATCH 3/3] " Borislav Petkov
2017-07-26 10:37       ` [3/3] " Borislav Petkov
2017-07-26 10:51       ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-26 10:51         ` [3/3] " Mauro Carvalho Chehab
2017-07-26 17:27         ` [PATCH 3/3] " Luck, Tony
2017-07-26 17:27           ` [3/3] " Luck, Tony
2017-07-26 18:17           ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-26 18:17             ` [3/3] " Mauro Carvalho Chehab
2017-07-26 19:24             ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-26 19:24               ` [3/3] " Toshi Kani
2017-07-27  5:20               ` [PATCH 3/3] " Borislav Petkov
2017-07-27  5:20                 ` [3/3] " Borislav Petkov
2017-07-26 19:49     ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-26 19:49       ` [3/3] " Toshi Kani
2017-07-28 18:50   ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-28 18:50     ` [3/3] " Toshi Kani
2017-07-29  6:47     ` [PATCH 3/3] " Borislav Petkov
2017-07-29  6:47       ` [3/3] " Borislav Petkov
2017-07-31 20:19       ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-31 20:19         ` [3/3] " Toshi Kani
2017-08-01  9:46         ` [PATCH 3/3] " Borislav Petkov
2017-08-01  9:46           ` [3/3] " Borislav Petkov
2017-08-02  0:19           ` [PATCH 3/3] " Kani, Toshimitsu
2017-08-02  0:19             ` [3/3] " Toshi Kani
2017-08-02  3:18             ` [PATCH 3/3] " Borislav Petkov
2017-08-02  3:18               ` [3/3] " Borislav Petkov
2017-08-02 22:41               ` [PATCH 3/3] " Kani, Toshimitsu
2017-08-02 22:41                 ` [3/3] " Toshi Kani
2017-07-27  5:54 ` [PATCH 0/3] EDAC: Convert ghes_edac to a normal module Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.