linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/8] Make ghes_edac a proper module
@ 2022-08-31  7:40 Jia He
  2022-08-31  7:40 ` [PATCH v4 1/8] efi/cper: export several helpers for ghes_edac to use Jia He
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hasn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

The proper solution is to make ghes_edac a proper module.

Changelog:
v4:
 - move the kernel boot option to ghes module parameter
 - collapse th ghes_present and ghes_edac_preferred into one patch
v3: https://lore.kernel.org/lkml/20220822154048.188253-1-justin.he@arm.com/
 - refine the commit logs
 - introduce ghes preferred and present flag (by Toshi)
 - move force_load to setup parameter
 - add the ghes_edac_preferred() check for x86/Arm edac drivers (incompleted list)
v2: https://lore.kernel.org/lkml/20220817143458.335938-1-justin.he@arm.com/
 - add acked-by tag of Patch 1 from Ard
 - split the notifier patch
 - add 2 patch to get regular drivers selected when ghes edac is not loaded
 - fix an errno in igen6 driver
 - add a patch to fix the sparse warning of ghes
 - refine the commit logs
v1: https://lore.kernel.org/lkml/20220811091713.10427-1-justin.he@arm.com/

Jia He (8):
  efi/cper: export several helpers for ghes_edac to use
  EDAC/ghes: Add a notifier for reporting memory errors
  EDAC:ghes: Move ghes_edac.force_load to ghes module parameter
  ghes: Introduce a helper ghes_edac_preferred()
  EDAC/ghes: Make ghes_edac a proper module to remove the dependency on
    ghes
  EDAC: Add the ghes_edac_preferred check for chipset-specific edac
    drivers
  apei/ghes: Use unrcu_pointer for cmpxchg
  EDAC/igen6: Return consistent errno when another edac driver is
    enabled

 drivers/acpi/apei/ghes.c       | 85 +++++++++++++++++++++++++++++++---
 drivers/edac/Kconfig           |  4 +-
 drivers/edac/amd64_edac.c      |  3 ++
 drivers/edac/armada_xp_edac.c  |  3 ++
 drivers/edac/edac_module.h     |  1 +
 drivers/edac/ghes_edac.c       | 77 +++++++++++++++++++-----------
 drivers/edac/i10nm_base.c      |  3 ++
 drivers/edac/igen6_edac.c      |  5 +-
 drivers/edac/layerscape_edac.c |  3 ++
 drivers/edac/pnd2_edac.c       |  3 ++
 drivers/edac/sb_edac.c         |  3 ++
 drivers/edac/skx_base.c        |  3 ++
 drivers/edac/thunderx_edac.c   |  3 ++
 drivers/edac/xgene_edac.c      |  3 ++
 drivers/firmware/efi/cper.c    |  3 ++
 include/acpi/apei.h            |  2 +
 include/acpi/ghes.h            | 36 +++++---------
 17 files changed, 179 insertions(+), 61 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/8] efi/cper: export several helpers for ghes_edac to use
  2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
@ 2022-08-31  7:40 ` Jia He
  2022-08-31  7:40 ` [PATCH v4 2/8] EDAC/ghes: Add a notifier for reporting memory errors Jia He
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Before ghes_edac can be turned back into a proper module again, export
several helpers which are going to be used by it.

Signed-off-by: Jia He <justin.he@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/cper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index e4e5ea7ce910..053eae13f409 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -290,6 +290,7 @@ int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(cper_mem_err_location);
 
 int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 {
@@ -310,6 +311,7 @@ int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(cper_dimm_err_location);
 
 void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 		       struct cper_mem_err_compact *cmem)
@@ -331,6 +333,7 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 	cmem->mem_array_handle = mem->mem_array_handle;
 	cmem->mem_dev_handle = mem->mem_dev_handle;
 }
+EXPORT_SYMBOL_GPL(cper_mem_err_pack);
 
 const char *cper_mem_err_unpack(struct trace_seq *p,
 				struct cper_mem_err_compact *cmem)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/8] EDAC/ghes: Add a notifier for reporting memory errors
  2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
  2022-08-31  7:40 ` [PATCH v4 1/8] efi/cper: export several helpers for ghes_edac to use Jia He
@ 2022-08-31  7:40 ` Jia He
  2022-08-31  7:40 ` [PATCH v4 3/8] EDAC:ghes: Move ghes_edac.force_load to ghes module parameter Jia He
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

To make a proper module, add a notifier for reporting memory errors. Use
an atomic notifier because calls sites like ghes_proc_in_irq() run in
interrupt context.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 16 +++++++++++++++-
 drivers/edac/ghes_edac.c | 19 +++++++++++++++++--
 include/acpi/ghes.h      | 10 +++-------
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d91ad378c00d..8cb65f757d06 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -94,6 +94,8 @@
 #define FIX_APEI_GHES_SDEI_CRITICAL	__end_of_fixed_addresses
 #endif
 
+static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
+
 static inline bool is_hest_type_generic_v2(struct ghes *ghes)
 {
 	return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
@@ -645,7 +647,7 @@ static bool ghes_do_proc(struct ghes *ghes,
 		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
-			ghes_edac_report_mem_error(sev, mem_err);
+			atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
 			queued = ghes_handle_memory_failure(gdata, sev);
@@ -1497,3 +1499,15 @@ void __init acpi_ghes_init(void)
 	else
 		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
 }
+
+void ghes_register_report_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_register(&ghes_report_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_register_report_chain);
+
+void ghes_unregister_report_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&ghes_report_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_report_chain);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index c8fa7dcfdbd0..7b8d56a769f6 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -14,6 +14,7 @@
 #include <linux/dmi.h>
 #include "edac_module.h"
 #include <ras/ras_event.h>
+#include <linux/notifier.h>
 
 #define OTHER_DETAIL_LEN	400
 
@@ -267,11 +268,14 @@ static int print_mem_error_other_detail(const struct cper_sec_mem_err *mem, char
 	return n;
 }
 
-void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
+static int ghes_edac_report_mem_error(struct notifier_block *nb,
+				      unsigned long val, void *data)
 {
+	struct cper_sec_mem_err *mem_err = (struct cper_sec_mem_err *)data;
 	struct cper_mem_err_compact cmem;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
+	unsigned long sev = val;
 	struct ghes_pvt *pvt;
 	unsigned long flags;
 	char *p;
@@ -282,7 +286,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 	 * know.
 	 */
 	if (WARN_ON_ONCE(in_nmi()))
-		return;
+		return NOTIFY_OK;
 
 	spin_lock_irqsave(&ghes_lock, flags);
 
@@ -374,8 +378,15 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 unlock:
 	spin_unlock_irqrestore(&ghes_lock, flags);
+
+	return NOTIFY_OK;
 }
 
+static struct notifier_block ghes_edac_mem_err_nb = {
+	.notifier_call	= ghes_edac_report_mem_error,
+	.priority	= 0,
+};
+
 /*
  * Known systems that are safe to enable this module.
  */
@@ -503,6 +514,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	ghes_pvt = pvt;
 	spin_unlock_irqrestore(&ghes_lock, flags);
 
+	ghes_register_report_chain(&ghes_edac_mem_err_nb);
+
 	/* only set on success */
 	refcount_set(&ghes_refcount, 1);
 
@@ -548,6 +561,8 @@ void ghes_edac_unregister(struct ghes *ghes)
 	if (mci)
 		edac_mc_free(mci);
 
+	ghes_unregister_report_chain(&ghes_edac_mem_err_nb);
+
 unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 34fb3431a8f3..5cbd38b6e4e1 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -76,18 +76,11 @@ int ghes_estatus_pool_init(int num_ghes);
 /* From drivers/edac/ghes_edac.c */
 
 #ifdef CONFIG_EDAC_GHES
-void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
-
 int ghes_edac_register(struct ghes *ghes, struct device *dev);
 
 void ghes_edac_unregister(struct ghes *ghes);
 
 #else
-static inline void ghes_edac_report_mem_error(int sev,
-				       struct cper_sec_mem_err *mem_err)
-{
-}
-
 static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	return -ENODEV;
@@ -145,4 +138,7 @@ int ghes_notify_sea(void);
 static inline int ghes_notify_sea(void) { return -ENOENT; }
 #endif
 
+struct notifier_block;
+extern void ghes_register_report_chain(struct notifier_block *nb);
+extern void ghes_unregister_report_chain(struct notifier_block *nb);
 #endif /* GHES_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/8] EDAC:ghes: Move ghes_edac.force_load to ghes module parameter
  2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
  2022-08-31  7:40 ` [PATCH v4 1/8] efi/cper: export several helpers for ghes_edac to use Jia He
  2022-08-31  7:40 ` [PATCH v4 2/8] EDAC/ghes: Add a notifier for reporting memory errors Jia He
@ 2022-08-31  7:40 ` Jia He
  2022-09-02 14:54   ` Kani, Toshi
  2022-08-31  7:40 ` [PATCH v4 4/8] ghes: Introduce a helper ghes_edac_preferred() Jia He
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

ghes_edac_init() is too late to set this module flag ghes_edac.force_load.
Also, other edac drivers should not be able to control this flag.

Move this flag to the module parameter in ghes instead.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 11 +++++++++++
 drivers/edac/ghes_edac.c | 10 +++-------
 include/acpi/apei.h      |  2 ++
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8cb65f757d06..6bc9c78f916b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -109,6 +109,14 @@ static inline bool is_hest_type_generic_v2(struct ghes *ghes)
 bool ghes_disable;
 module_param_named(disable, ghes_disable, bool, 0);
 
+/*
+ * "ghes.edac_force_load" forcibly loads ghes_edac and skips the platform
+ * check.
+ */
+bool ghes_edac_force_load;
+module_param_named(edac_force_load, ghes_edac_force_load, bool, 0);
+EXPORT_SYMBOL(ghes_edac_force_load);
+
 /*
  * All error sources notified with HED (Hardware Error Device) share a
  * single notifier callback, so they need to be linked and checked one
@@ -1483,6 +1491,9 @@ void __init acpi_ghes_init(void)
 		return;
 	}
 
+	if (!IS_ENABLED(CONFIG_X86))
+		ghes_edac_force_load = true;
+
 	ghes_nmi_init_cxt();
 
 	rc = platform_driver_register(&ghes_platform_driver);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 7b8d56a769f6..f716d5d8f1df 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
-/* "ghes_edac.force_load=1" skips the platform check */
-static bool __read_mostly force_load;
-module_param(force_load, bool, 0);
-
 static bool system_scanned;
 
 /* Memory Device - Type 17 of SMBIOS spec */
@@ -408,10 +404,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (IS_ENABLED(CONFIG_X86)) {
 		/* Check if safe to enable on this system */
 		idx = acpi_match_platform_list(plat_list);
-		if (!force_load && idx < 0)
+		if (!ghes_edac_force_load && idx < 0)
 			return -ENODEV;
 	} else {
-		force_load = true;
+		ghes_edac_force_load = true;
 		idx = 0;
 	}
 
@@ -535,7 +531,7 @@ void ghes_edac_unregister(struct ghes *ghes)
 	struct mem_ctl_info *mci;
 	unsigned long flags;
 
-	if (!force_load)
+	if (!ghes_edac_force_load)
 		return;
 
 	mutex_lock(&ghes_reg_mutex);
diff --git a/include/acpi/apei.h b/include/acpi/apei.h
index dc60f7db5524..356bdf4bb66e 100644
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -27,9 +27,11 @@ extern int hest_disable;
 extern int erst_disable;
 #ifdef CONFIG_ACPI_APEI_GHES
 extern bool ghes_disable;
+extern bool ghes_edac_force_load;
 void __init acpi_ghes_init(void);
 #else
 #define ghes_disable 1
+#define ghes_edac_force_load 0
 static inline void acpi_ghes_init(void) { }
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 4/8] ghes: Introduce a helper ghes_edac_preferred()
  2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
                   ` (2 preceding siblings ...)
  2022-08-31  7:40 ` [PATCH v4 3/8] EDAC:ghes: Move ghes_edac.force_load to ghes module parameter Jia He
@ 2022-08-31  7:40 ` Jia He
  2022-09-02 15:30   ` Kani, Toshi
  2022-08-31  7:40 ` [PATCH v4 5/8] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Introduce a flag ghes_present to differentiate between the system ROM
really not offering GHES vs. the ghes module not running. ghes-present
is latched on ghes_probe() and never unlatched since ACPI GHES table
is static.

Introduce a helper ghes_edac_preferred() and it is true if the
platform-check passes.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 29 +++++++++++++++++++++++++++++
 include/acpi/ghes.h      |  2 ++
 2 files changed, 31 insertions(+)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6bc9c78f916b..d798a58fbbd9 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -117,6 +117,8 @@ bool ghes_edac_force_load;
 module_param_named(edac_force_load, ghes_edac_force_load, bool, 0);
 EXPORT_SYMBOL(ghes_edac_force_load);
 
+static bool ghes_present;
+
 /*
  * All error sources notified with HED (Hardware Error Device) share a
  * single notifier callback, so they need to be linked and checked one
@@ -1388,6 +1390,8 @@ static int ghes_probe(struct platform_device *ghes_dev)
 
 	ghes_edac_register(ghes, &ghes_dev->dev);
 
+	ghes_present = true;
+
 	/* Handle any pending errors right away */
 	spin_lock_irqsave(&ghes_notify_lock_irq, flags);
 	ghes_proc(ghes);
@@ -1511,6 +1515,31 @@ void __init acpi_ghes_init(void)
 		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
 }
 
+/*
+ * Known x86 systems that prefer GHES error reporting:
+ */
+static struct acpi_platform_list plat_list[] = {
+	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+	{ } /* End */
+};
+
+bool ghes_edac_preferred(void)
+{
+	int idx = -1;
+
+	if (!ghes_present)
+		return false;
+
+	if (IS_ENABLED(CONFIG_X86)) {
+		idx = acpi_match_platform_list(plat_list);
+		if (idx < 0 && !ghes_edac_force_load)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(ghes_edac_preferred);
+
 void ghes_register_report_chain(struct notifier_block *nb)
 {
 	atomic_notifier_chain_register(&ghes_report_chain, nb);
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 5cbd38b6e4e1..7ce91c0ff3eb 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -80,6 +80,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev);
 
 void ghes_edac_unregister(struct ghes *ghes);
 
+bool ghes_edac_preferred(void);
 #else
 static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
@@ -89,6 +90,7 @@ static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
 static inline void ghes_edac_unregister(struct ghes *ghes)
 {
 }
+static bool ghes_edac_preferred(void) { return false; }
 #endif
 
 static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 5/8] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
                   ` (3 preceding siblings ...)
  2022-08-31  7:40 ` [PATCH v4 4/8] ghes: Introduce a helper ghes_edac_preferred() Jia He
@ 2022-08-31  7:40 ` Jia He
  2022-09-02 17:13   ` Kani, Toshi
  2022-08-31  7:40 ` [PATCH v4 6/8] EDAC: Add the ghes_edac_preferred check for chipset-specific edac drivers Jia He
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hadn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

To remove the dependency of ghes_edac on ghes, make it a proper module. Use
a list to save the probing devices in ghes_probe(), and defer the
ghes_edac_register() to module_init() of the new ghes_edac module by
iterating over the devices list.

Co-developed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/acpi/apei/ghes.c | 22 ++++++++++++++--
 drivers/edac/Kconfig     |  4 +--
 drivers/edac/ghes_edac.c | 56 ++++++++++++++++++++++++----------------
 include/acpi/ghes.h      | 24 +++++------------
 4 files changed, 62 insertions(+), 44 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d798a58fbbd9..d11e3b2e5345 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -130,6 +130,9 @@ static bool ghes_present;
 static LIST_HEAD(ghes_hed);
 static DEFINE_MUTEX(ghes_list_mutex);
 
+static LIST_HEAD(ghes_devs);
+static DEFINE_MUTEX(ghes_devs_mutex);
+
 /*
  * Because the memory area used to transfer hardware error information
  * from BIOS to Linux can be determined only in NMI, IRQ or timer
@@ -1388,7 +1391,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 
 	platform_set_drvdata(ghes_dev, ghes);
 
-	ghes_edac_register(ghes, &ghes_dev->dev);
+	ghes->dev = &ghes_dev->dev;
+
+	mutex_lock(&ghes_devs_mutex);
+	list_add_tail(&ghes->elist, &ghes_devs);
+	mutex_unlock(&ghes_devs_mutex);
 
 	ghes_present = true;
 
@@ -1454,7 +1461,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 
 	ghes_fini(ghes);
 
-	ghes_edac_unregister(ghes);
+	mutex_lock(&ghes_devs_mutex);
+	list_del(&ghes->elist);
+	mutex_unlock(&ghes_devs_mutex);
 
 	kfree(ghes);
 
@@ -1540,6 +1549,15 @@ bool ghes_edac_preferred(void)
 }
 EXPORT_SYMBOL_GPL(ghes_edac_preferred);
 
+struct list_head *ghes_get_devices(void)
+{
+	if (!ghes_edac_preferred())
+		return NULL;
+
+	return &ghes_devs;
+}
+EXPORT_SYMBOL_GPL(ghes_get_devices);
+
 void ghes_register_report_chain(struct notifier_block *nb)
 {
 	atomic_notifier_chain_register(&ghes_report_chain, nb);
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 17562cf1fe97..df45db81858b 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -53,8 +53,8 @@ config EDAC_DECODE_MCE
 	  has been initialized.
 
 config EDAC_GHES
-	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
-	depends on ACPI_APEI_GHES && (EDAC=y)
+	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"
+	depends on ACPI_APEI_GHES
 	select UEFI_CPER
 	help
 	  Not all machines support hardware-driven error report. Some of those
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index f716d5d8f1df..ec654222e227 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -56,6 +56,8 @@ static DEFINE_SPINLOCK(ghes_lock);
 
 static bool system_scanned;
 
+static struct list_head *ghes_devs;
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
 	u8 type;
@@ -383,34 +385,15 @@ static struct notifier_block ghes_edac_mem_err_nb = {
 	.priority	= 0,
 };
 
-/*
- * Known systems that are safe to enable this module.
- */
-static struct acpi_platform_list plat_list[] = {
-	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
-	{ } /* End */
-};
-
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static int ghes_edac_register(struct device *dev)
 {
 	bool fake = false;
 	struct mem_ctl_info *mci;
 	struct ghes_pvt *pvt;
 	struct edac_mc_layer layers[1];
 	unsigned long flags;
-	int idx = -1;
 	int rc = 0;
 
-	if (IS_ENABLED(CONFIG_X86)) {
-		/* Check if safe to enable on this system */
-		idx = acpi_match_platform_list(plat_list);
-		if (!ghes_edac_force_load && idx < 0)
-			return -ENODEV;
-	} else {
-		ghes_edac_force_load = true;
-		idx = 0;
-	}
-
 	/* finish another registration/unregistration instance first */
 	mutex_lock(&ghes_reg_mutex);
 
@@ -454,7 +437,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
 		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
 		pr_info("work on such system. Use this driver with caution\n");
-	} else if (idx < 0) {
+	} else if (ghes_edac_force_load) {
 		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
 		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
 		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
@@ -526,7 +509,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	return rc;
 }
 
-void ghes_edac_unregister(struct ghes *ghes)
+static void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
 	unsigned long flags;
@@ -562,3 +545,32 @@ void ghes_edac_unregister(struct ghes *ghes)
 unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
+
+static int __init ghes_edac_init(void)
+{
+	struct ghes *g, *g_tmp;
+
+	ghes_devs = ghes_get_devices();
+	if (!ghes_devs)
+		return -ENODEV;
+
+	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
+		ghes_edac_register(g->dev);
+	}
+
+	return 0;
+}
+module_init(ghes_edac_init);
+
+static void __exit ghes_edac_exit(void)
+{
+	struct ghes *g, *g_tmp;
+
+	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
+		ghes_edac_unregister(g);
+	}
+}
+module_exit(ghes_edac_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Output ACPI APEI/GHES BIOS detected errors module via EDAC");
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 7ce91c0ff3eb..08f12dde99ae 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -27,6 +27,8 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+	struct device *dev;
+	struct list_head elist;
 };
 
 struct ghes_estatus_node {
@@ -69,30 +71,16 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
  * @nb: pointer to the notifier_block structure of the vendor record handler.
  */
 void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
-#endif
-
-int ghes_estatus_pool_init(int num_ghes);
-
-/* From drivers/edac/ghes_edac.c */
-
-#ifdef CONFIG_EDAC_GHES
-int ghes_edac_register(struct ghes *ghes, struct device *dev);
-
-void ghes_edac_unregister(struct ghes *ghes);
 
+struct list_head *ghes_get_devices(void);
 bool ghes_edac_preferred(void);
 #else
-static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline void ghes_edac_unregister(struct ghes *ghes)
-{
-}
+static inline struct list_head *ghes_get_devices(void) { return NULL; }
 static bool ghes_edac_preferred(void) { return false; }
 #endif
 
+int ghes_estatus_pool_init(int num_ghes);
+
 static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
 {
 	return gdata->revision >> 8;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 6/8] EDAC: Add the ghes_edac_preferred check for chipset-specific edac drivers
  2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
                   ` (4 preceding siblings ...)
  2022-08-31  7:40 ` [PATCH v4 5/8] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
@ 2022-08-31  7:40 ` Jia He
  2022-09-02 15:48   ` Kani, Toshi
  2022-08-31  7:40 ` [PATCH v4 7/8] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
  2022-08-31  7:40 ` [PATCH v4 8/8] EDAC/igen6: Return consistent errno when another edac driver is enabled Jia He
  7 siblings, 1 reply; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

edac_mc_add_mc* is too late in the init path and that check should happen
as the very first thing in the driver init function.
Add the newly introduced helper ghes_edac_preferred() for the platform
check.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/edac/amd64_edac.c      | 3 +++
 drivers/edac/armada_xp_edac.c  | 3 +++
 drivers/edac/edac_module.h     | 1 +
 drivers/edac/i10nm_base.c      | 3 +++
 drivers/edac/igen6_edac.c      | 3 +++
 drivers/edac/layerscape_edac.c | 3 +++
 drivers/edac/pnd2_edac.c       | 3 +++
 drivers/edac/sb_edac.c         | 3 +++
 drivers/edac/skx_base.c        | 3 +++
 drivers/edac/thunderx_edac.c   | 3 +++
 drivers/edac/xgene_edac.c      | 3 +++
 11 files changed, 31 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2f854feeeb23..5314a934c2bf 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -4329,6 +4329,9 @@ static int __init amd64_edac_init(void)
 	int err = -ENODEV;
 	int i;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/armada_xp_edac.c b/drivers/edac/armada_xp_edac.c
index 038abbb83f4b..486532b92ce0 100644
--- a/drivers/edac/armada_xp_edac.c
+++ b/drivers/edac/armada_xp_edac.c
@@ -599,6 +599,9 @@ static int __init armada_xp_edac_init(void)
 {
 	int res;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	/* only polling is supported */
 	edac_op_state = EDAC_OPSTATE_POLL;
 
diff --git a/drivers/edac/edac_module.h b/drivers/edac/edac_module.h
index 96f6de0c8ff6..3826f82de487 100644
--- a/drivers/edac/edac_module.h
+++ b/drivers/edac/edac_module.h
@@ -11,6 +11,7 @@
 #ifndef	__EDAC_MODULE_H__
 #define	__EDAC_MODULE_H__
 
+#include <acpi/ghes.h>
 #include "edac_mc.h"
 #include "edac_pci.h"
 #include "edac_device.h"
diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c
index 6cf50ee0b77c..691df9b51622 100644
--- a/drivers/edac/i10nm_base.c
+++ b/drivers/edac/i10nm_base.c
@@ -548,6 +548,9 @@ static int __init i10nm_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index a07bbfd075d0..4ac6d0c533ec 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1271,6 +1271,9 @@ static int __init igen6_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -ENODEV;
diff --git a/drivers/edac/layerscape_edac.c b/drivers/edac/layerscape_edac.c
index 94cac7686a56..60ff4c6674cd 100644
--- a/drivers/edac/layerscape_edac.c
+++ b/drivers/edac/layerscape_edac.c
@@ -38,6 +38,9 @@ static int __init fsl_ddr_mc_init(void)
 {
 	int res;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	/* make sure error reporting method is sane */
 	switch (edac_op_state) {
 	case EDAC_OPSTATE_POLL:
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index a20b299f1202..4ddc43e6c420 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1528,6 +1528,9 @@ static int __init pnd2_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 9678ab97c7ac..3ff604a5e95a 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -3506,6 +3506,9 @@ static int __init sbridge_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c
index 1abc020d49ab..286370728552 100644
--- a/drivers/edac/skx_base.c
+++ b/drivers/edac/skx_base.c
@@ -653,6 +653,9 @@ static int __init skx_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/thunderx_edac.c b/drivers/edac/thunderx_edac.c
index f13674081cb6..2c4baa6817a9 100644
--- a/drivers/edac/thunderx_edac.c
+++ b/drivers/edac/thunderx_edac.c
@@ -2114,6 +2114,9 @@ static int __init thunderx_edac_init(void)
 {
 	int rc = 0;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	rc = pci_register_driver(&thunderx_lmc_driver);
 	if (rc)
 		return rc;
diff --git a/drivers/edac/xgene_edac.c b/drivers/edac/xgene_edac.c
index 54081403db4f..9aa68220b625 100644
--- a/drivers/edac/xgene_edac.c
+++ b/drivers/edac/xgene_edac.c
@@ -2004,6 +2004,9 @@ static int __init xgene_edac_init(void)
 {
 	int rc;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	/* Make sure error reporting method is sane */
 	switch (edac_op_state) {
 	case EDAC_OPSTATE_POLL:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 7/8] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
                   ` (5 preceding siblings ...)
  2022-08-31  7:40 ` [PATCH v4 6/8] EDAC: Add the ghes_edac_preferred check for chipset-specific edac drivers Jia He
@ 2022-08-31  7:40 ` Jia He
  2022-08-31  7:40 ` [PATCH v4 8/8] EDAC/igen6: Return consistent errno when another edac driver is enabled Jia He
  7 siblings, 0 replies; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He, kernel test robot

ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
   drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache *
   drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache *

unrcu_pointer is to strip the __rcu in cmpxchg.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d11e3b2e5345..bbd96e91bec1 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -153,7 +153,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
 
-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
@@ -843,8 +843,9 @@ static void ghes_estatus_cache_add(
 	}
 	/* new_cache must be put into array after its contents are written */
 	smp_wmb();
-	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-				  slot_cache, new_cache) == slot_cache) {
+	if (slot != -1 && unrcu_pointer(cmpxchg(ghes_estatus_caches + slot,
+				RCU_INITIALIZER(slot_cache),
+				RCU_INITIALIZER(new_cache))) == slot_cache) {
 		if (slot_cache)
 			call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
 	} else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 8/8] EDAC/igen6: Return consistent errno when another edac driver is enabled
  2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
                   ` (6 preceding siblings ...)
  2022-08-31  7:40 ` [PATCH v4 7/8] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
@ 2022-08-31  7:40 ` Jia He
  7 siblings, 0 replies; 14+ messages in thread
From: Jia He @ 2022-08-31  7:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Only a single edac driver can be enabled for EDAC MC. The igen6_init()
should be returned with EBUSY instead of ENODEV, which is consistent with
other edac drivers.

Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/edac/igen6_edac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 4ac6d0c533ec..4646cb72b9f8 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1276,7 +1276,7 @@ static int __init igen6_init(void)
 
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
-		return -ENODEV;
+		return -EBUSY;
 
 	edac_op_state = EDAC_OPSTATE_NMI;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [PATCH v4 3/8] EDAC:ghes: Move ghes_edac.force_load to ghes module parameter
  2022-08-31  7:40 ` [PATCH v4 3/8] EDAC:ghes: Move ghes_edac.force_load to ghes module parameter Jia He
@ 2022-09-02 14:54   ` Kani, Toshi
  0 siblings, 0 replies; 14+ messages in thread
From: Kani, Toshi @ 2022-09-02 14:54 UTC (permalink / raw)
  To: Jia He, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd

On Wednesday, August 31, 2022 1:40 AM, Jia He wrote:
> +/*
> + * "ghes.edac_force_load" forcibly loads ghes_edac and skips the platform
> + * check.
> + */
> +bool ghes_edac_force_load;
> +module_param_named(edac_force_load, ghes_edac_force_load, bool, 0);
> +EXPORT_SYMBOL(ghes_edac_force_load);

Since ghes has no ability to load ghes_edac module, I suggest changing
the option name to ghes.edac_force_enable or ghes.edac_enable.

Toshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v4 4/8] ghes: Introduce a helper ghes_edac_preferred()
  2022-08-31  7:40 ` [PATCH v4 4/8] ghes: Introduce a helper ghes_edac_preferred() Jia He
@ 2022-09-02 15:30   ` Kani, Toshi
  0 siblings, 0 replies; 14+ messages in thread
From: Kani, Toshi @ 2022-09-02 15:30 UTC (permalink / raw)
  To: Jia He, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd


On Wednesday, August 31, 2022 1:40 AM, Jia He wrote:
> Introduce a flag ghes_present to differentiate between the system ROM
> really not offering GHES vs. the ghes module not running. ghes-present
> is latched on ghes_probe() and never unlatched since ACPI GHES table
> is static.
> 
> Introduce a helper ghes_edac_preferred() and it is true if the
> platform-check passes.

I'd suggest to rephase a bit, something like:
==
Introduce a helper ghes_edac_preferred(), which returns true when
ACPI GHES table is present and the platform-check passes on the system.
ghes_present is set once in ghes_probe() since ACPI GHES table is static.
==

Otherwise, the patch looks good to me.  
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

Toshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v4 6/8] EDAC: Add the ghes_edac_preferred check for chipset-specific edac drivers
  2022-08-31  7:40 ` [PATCH v4 6/8] EDAC: Add the ghes_edac_preferred check for chipset-specific edac drivers Jia He
@ 2022-09-02 15:48   ` Kani, Toshi
  0 siblings, 0 replies; 14+ messages in thread
From: Kani, Toshi @ 2022-09-02 15:48 UTC (permalink / raw)
  To: Jia He, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd


On Wednesday, August 31, 2022 1:40 AM, Jia He wrote:
> edac_mc_add_mc* is too late in the init path and that check should happen
> as the very first thing in the driver init function.
> Add the newly introduced helper ghes_edac_preferred() for the platform
> check.

I'd suggest to rephase a bit, something like:
==
Add ghes_edac_preferred() check to chipset-specific edac drivers to ensure
that ghes_edac is used on the platform where ghes_edac is preferred over
chipset-specific edac driver.

Unlike the existing edac_get_owner() check, the ghes_edac_preferred() check
works independent to the module_init ordering.
==

Otherwise, the patch looks good to me.  
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

Toshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v4 5/8] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-31  7:40 ` [PATCH v4 5/8] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
@ 2022-09-02 17:13   ` Kani, Toshi
  2022-09-05 14:33     ` Justin He
  0 siblings, 1 reply; 14+ messages in thread
From: Kani, Toshi @ 2022-09-02 17:13 UTC (permalink / raw)
  To: Jia He, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd

On Wednesday, August 31, 2022 1:40 AM, Jia He wrote:
> @@ -454,7 +437,7 @@ int ghes_edac_register(struct ghes *ghes, struct
> device *dev)
>  		pr_info("This system has a very crappy BIOS: It doesn't even
> list the DIMMS.\n");
>  		pr_info("Its SMBIOS info is wrong. It is doubtful that the error
> report would\n");
>  		pr_info("work on such system. Use this driver with
> caution\n");
> -	} else if (idx < 0) {
> +	} else if (ghes_edac_force_load) {

This change causes the following messages to start showing up on Arm.
Is that what you intend to do?

The messages can be avoided by not setting the force flag on Arm unconditionally.
This will need some change to the flag check in ghes_edac_unregister() though.

>  		pr_info("This EDAC driver relies on BIOS to enumerate
> memory and get error reports.\n");
>  		pr_info("Unfortunately, not all BIOSes reflect the memory
> layout correctly.\n");
>  		pr_info("So, the end result of using this driver varies from
> vendor to vendor.\n");

 Toshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v4 5/8] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-09-02 17:13   ` Kani, Toshi
@ 2022-09-05 14:33     ` Justin He
  0 siblings, 0 replies; 14+ messages in thread
From: Justin He @ 2022-09-05 14:33 UTC (permalink / raw)
  To: Kani, Toshi, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd

Hi Kani
As per your review for previous patches, I will update the commit log in the next version.
Please see my comments below:

> -----Original Message-----
> > @@ -454,7 +437,7 @@ int ghes_edac_register(struct ghes *ghes, struct
> > device *dev)
> >  		pr_info("This system has a very crappy BIOS: It doesn't even list
> > the DIMMS.\n");
> >  		pr_info("Its SMBIOS info is wrong. It is doubtful that the error
> > report would\n");
> >  		pr_info("work on such system. Use this driver with caution\n");
> > -	} else if (idx < 0) {
> > +	} else if (ghes_edac_force_load) {
> 
> This change causes the following messages to start showing up on Arm.
> Is that what you intend to do?
> 
No

> The messages can be avoided by not setting the force flag on Arm
> unconditionally.
> This will need some change to the flag check in ghes_edac_unregister() though.
Yes, looks like I can remove the ghes_edac_force_load check in the ghes_edac_unregister()
since it had been checked in the ghes_edac_init



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-09-05 14:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31  7:40 [PATCH v4 0/8] Make ghes_edac a proper module Jia He
2022-08-31  7:40 ` [PATCH v4 1/8] efi/cper: export several helpers for ghes_edac to use Jia He
2022-08-31  7:40 ` [PATCH v4 2/8] EDAC/ghes: Add a notifier for reporting memory errors Jia He
2022-08-31  7:40 ` [PATCH v4 3/8] EDAC:ghes: Move ghes_edac.force_load to ghes module parameter Jia He
2022-09-02 14:54   ` Kani, Toshi
2022-08-31  7:40 ` [PATCH v4 4/8] ghes: Introduce a helper ghes_edac_preferred() Jia He
2022-09-02 15:30   ` Kani, Toshi
2022-08-31  7:40 ` [PATCH v4 5/8] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
2022-09-02 17:13   ` Kani, Toshi
2022-09-05 14:33     ` Justin He
2022-08-31  7:40 ` [PATCH v4 6/8] EDAC: Add the ghes_edac_preferred check for chipset-specific edac drivers Jia He
2022-09-02 15:48   ` Kani, Toshi
2022-08-31  7:40 ` [PATCH v4 7/8] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
2022-08-31  7:40 ` [PATCH v4 8/8] EDAC/igen6: Return consistent errno when another edac driver is enabled Jia He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).