All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND PATCH v3 0/9] Make ghes_edac a proper module
@ 2022-08-22 15:40 Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 1/9] efi/cper: export several helpers for ghes_edac to use Jia He
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He

Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hasn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

Changelog:
v3:
 - refine the commit logs
 - introduce ghes preferred and present flag (by Toshi)
 - move force_load to setup parameter
 - add the ghes_edac_preferred() check for x86/Arm edac drivers (incompleted list)
 
v2:
 - add acked-by tag of Patch 1 from Ard
 - split the notifier patch
 - add 2 patch to get regular drivers selected when ghes edac is not loaded
 - fix an errno in igen6 driver
 - add a patch to fix the sparse warning of ghes
 - refine the commit logs

Jia He (9):
  efi/cper: export several helpers for ghes_edac to use
  EDAC/ghes: Add a notifier for reporting memory errors
  EDAC/ghes: Make ghes_edac a proper module to remove the dependency on
    ghes
  EDAC/ghes: Move ghes_edac.force_load to setup parameter
  EDAC: Don't load chipset-specific edac drivers when ghes_edac is
    preferred
  ghes: Introduce a flag ghes_present
  apei/ghes: Use unrcu_pointer for cmpxchg
  EDAC/igen6: Keep returned errno consistent when edac mc has been
    enabled
  edac: Don't load Arm specific edac drivers when ghes_edac is preferred

 .../admin-guide/kernel-parameters.txt         |   5 +
 drivers/acpi/apei/ghes.c                      |  94 ++++++++++++++--
 drivers/edac/Kconfig                          |   4 +-
 drivers/edac/amd64_edac.c                     |   3 +
 drivers/edac/armada_xp_edac.c                 |   3 +
 drivers/edac/edac_module.h                    |   1 +
 drivers/edac/ghes_edac.c                      | 102 +++++++++++-------
 drivers/edac/i10nm_base.c                     |   3 +
 drivers/edac/igen6_edac.c                     |   5 +-
 drivers/edac/layerscape_edac.c                |   3 +
 drivers/edac/pnd2_edac.c                      |   3 +
 drivers/edac/sb_edac.c                        |   3 +
 drivers/edac/skx_base.c                       |   3 +
 drivers/edac/thunderx_edac.c                  |   3 +
 drivers/edac/xgene_edac.c                     |   3 +
 drivers/firmware/efi/cper.c                   |   3 +
 include/acpi/ghes.h                           |  38 +++----
 17 files changed, 204 insertions(+), 75 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 1/9] efi/cper: export several helpers for ghes_edac to use
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 2/9] EDAC/ghes: Add a notifier for reporting memory errors Jia He
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He

Before ghes_edac can be turned back into a proper module again, export
several helpers which are going to be used by it.

Signed-off-by: Jia He <justin.he@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/cper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index e4e5ea7ce910..053eae13f409 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -290,6 +290,7 @@ int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(cper_mem_err_location);
 
 int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 {
@@ -310,6 +311,7 @@ int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(cper_dimm_err_location);
 
 void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 		       struct cper_mem_err_compact *cmem)
@@ -331,6 +333,7 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 	cmem->mem_array_handle = mem->mem_array_handle;
 	cmem->mem_dev_handle = mem->mem_dev_handle;
 }
+EXPORT_SYMBOL_GPL(cper_mem_err_pack);
 
 const char *cper_mem_err_unpack(struct trace_seq *p,
 				struct cper_mem_err_compact *cmem)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 2/9] EDAC/ghes: Add a notifier for reporting memory errors
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 1/9] efi/cper: export several helpers for ghes_edac to use Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He

To make a proper module, add a notifier for reporting memory errors. Use
an atomic notifier because calls sites like ghes_proc_in_irq() run in
interrupt context.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 16 +++++++++++++++-
 drivers/edac/ghes_edac.c | 19 +++++++++++++++++--
 include/acpi/ghes.h      | 10 +++-------
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d91ad378c00d..8cb65f757d06 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -94,6 +94,8 @@
 #define FIX_APEI_GHES_SDEI_CRITICAL	__end_of_fixed_addresses
 #endif
 
+static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
+
 static inline bool is_hest_type_generic_v2(struct ghes *ghes)
 {
 	return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
@@ -645,7 +647,7 @@ static bool ghes_do_proc(struct ghes *ghes,
 		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
-			ghes_edac_report_mem_error(sev, mem_err);
+			atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
 			queued = ghes_handle_memory_failure(gdata, sev);
@@ -1497,3 +1499,15 @@ void __init acpi_ghes_init(void)
 	else
 		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
 }
+
+void ghes_register_report_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_register(&ghes_report_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_register_report_chain);
+
+void ghes_unregister_report_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&ghes_report_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_report_chain);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index c8fa7dcfdbd0..7b8d56a769f6 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -14,6 +14,7 @@
 #include <linux/dmi.h>
 #include "edac_module.h"
 #include <ras/ras_event.h>
+#include <linux/notifier.h>
 
 #define OTHER_DETAIL_LEN	400
 
@@ -267,11 +268,14 @@ static int print_mem_error_other_detail(const struct cper_sec_mem_err *mem, char
 	return n;
 }
 
-void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
+static int ghes_edac_report_mem_error(struct notifier_block *nb,
+				      unsigned long val, void *data)
 {
+	struct cper_sec_mem_err *mem_err = (struct cper_sec_mem_err *)data;
 	struct cper_mem_err_compact cmem;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
+	unsigned long sev = val;
 	struct ghes_pvt *pvt;
 	unsigned long flags;
 	char *p;
@@ -282,7 +286,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 	 * know.
 	 */
 	if (WARN_ON_ONCE(in_nmi()))
-		return;
+		return NOTIFY_OK;
 
 	spin_lock_irqsave(&ghes_lock, flags);
 
@@ -374,8 +378,15 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 unlock:
 	spin_unlock_irqrestore(&ghes_lock, flags);
+
+	return NOTIFY_OK;
 }
 
+static struct notifier_block ghes_edac_mem_err_nb = {
+	.notifier_call	= ghes_edac_report_mem_error,
+	.priority	= 0,
+};
+
 /*
  * Known systems that are safe to enable this module.
  */
@@ -503,6 +514,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	ghes_pvt = pvt;
 	spin_unlock_irqrestore(&ghes_lock, flags);
 
+	ghes_register_report_chain(&ghes_edac_mem_err_nb);
+
 	/* only set on success */
 	refcount_set(&ghes_refcount, 1);
 
@@ -548,6 +561,8 @@ void ghes_edac_unregister(struct ghes *ghes)
 	if (mci)
 		edac_mc_free(mci);
 
+	ghes_unregister_report_chain(&ghes_edac_mem_err_nb);
+
 unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 34fb3431a8f3..5cbd38b6e4e1 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -76,18 +76,11 @@ int ghes_estatus_pool_init(int num_ghes);
 /* From drivers/edac/ghes_edac.c */
 
 #ifdef CONFIG_EDAC_GHES
-void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
-
 int ghes_edac_register(struct ghes *ghes, struct device *dev);
 
 void ghes_edac_unregister(struct ghes *ghes);
 
 #else
-static inline void ghes_edac_report_mem_error(int sev,
-				       struct cper_sec_mem_err *mem_err)
-{
-}
-
 static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	return -ENODEV;
@@ -145,4 +138,7 @@ int ghes_notify_sea(void);
 static inline int ghes_notify_sea(void) { return -ENOENT; }
 #endif
 
+struct notifier_block;
+extern void ghes_register_report_chain(struct notifier_block *nb);
+extern void ghes_unregister_report_chain(struct notifier_block *nb);
 #endif /* GHES_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 1/9] efi/cper: export several helpers for ghes_edac to use Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 2/9] EDAC/ghes: Add a notifier for reporting memory errors Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-24 15:37   ` Borislav Petkov
  2022-08-26 22:42   ` Elliott, Robert (Servers)
  2022-08-22 15:40 ` [RESEND PATCH v3 4/9] EDAC/ghes: Move ghes_edac.force_load to setup parameter Jia He
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He, stable

Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hadn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

To remove the dependency of ghes_edac on ghes, make it a proper module. Use
a list to save the probing devices in ghes_probe(), and defer the
ghes_edac_register() to module_init() of the new ghes_edac module by
iterating over the devices list.

Co-developed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Cc: stable@kernel.org
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/acpi/apei/ghes.c | 35 ++++++++++++++++++++++--
 drivers/edac/Kconfig     |  4 +--
 drivers/edac/ghes_edac.c | 59 +++++++++++++++++++++++++---------------
 include/acpi/ghes.h      | 23 ++++------------
 4 files changed, 77 insertions(+), 44 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8cb65f757d06..9c52183e3ad9 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -120,6 +120,9 @@ module_param_named(disable, ghes_disable, bool, 0);
 static LIST_HEAD(ghes_hed);
 static DEFINE_MUTEX(ghes_list_mutex);
 
+static LIST_HEAD(ghes_devs);
+static DEFINE_MUTEX(ghes_devs_mutex);
+
 /*
  * Because the memory area used to transfer hardware error information
  * from BIOS to Linux can be determined only in NMI, IRQ or timer
@@ -1378,7 +1381,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 
 	platform_set_drvdata(ghes_dev, ghes);
 
-	ghes_edac_register(ghes, &ghes_dev->dev);
+	ghes->dev = &ghes_dev->dev;
+
+	mutex_lock(&ghes_devs_mutex);
+	list_add_tail(&ghes->elist, &ghes_devs);
+	mutex_unlock(&ghes_devs_mutex);
 
 	/* Handle any pending errors right away */
 	spin_lock_irqsave(&ghes_notify_lock_irq, flags);
@@ -1442,7 +1449,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 
 	ghes_fini(ghes);
 
-	ghes_edac_unregister(ghes);
+	mutex_lock(&ghes_devs_mutex);
+	list_del_rcu(&ghes->elist);
+	mutex_unlock(&ghes_devs_mutex);
 
 	kfree(ghes);
 
@@ -1500,6 +1509,28 @@ void __init acpi_ghes_init(void)
 		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
 }
 
+/*
+ * Known x86 systems that prefer GHES error reporting:
+ */
+static struct acpi_platform_list plat_list[] = {
+	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+	{ } /* End */
+};
+
+struct list_head *ghes_get_devices(bool force)
+{
+	int idx = -1;
+
+	if (IS_ENABLED(CONFIG_X86)) {
+		idx = acpi_match_platform_list(plat_list);
+		if (idx < 0 && !force)
+			return NULL;
+	}
+
+	return &ghes_devs;
+}
+EXPORT_SYMBOL_GPL(ghes_get_devices);
+
 void ghes_register_report_chain(struct notifier_block *nb)
 {
 	atomic_notifier_chain_register(&ghes_report_chain, nb);
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 17562cf1fe97..df45db81858b 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -53,8 +53,8 @@ config EDAC_DECODE_MCE
 	  has been initialized.
 
 config EDAC_GHES
-	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
-	depends on ACPI_APEI_GHES && (EDAC=y)
+	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"
+	depends on ACPI_APEI_GHES
 	select UEFI_CPER
 	help
 	  Not all machines support hardware-driven error report. Some of those
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 7b8d56a769f6..bb3ea42ba70b 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -60,6 +60,8 @@ module_param(force_load, bool, 0);
 
 static bool system_scanned;
 
+static struct list_head *ghes_devs;
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
 	u8 type;
@@ -387,34 +389,15 @@ static struct notifier_block ghes_edac_mem_err_nb = {
 	.priority	= 0,
 };
 
-/*
- * Known systems that are safe to enable this module.
- */
-static struct acpi_platform_list plat_list[] = {
-	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
-	{ } /* End */
-};
-
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static int ghes_edac_register(struct device *dev)
 {
 	bool fake = false;
 	struct mem_ctl_info *mci;
 	struct ghes_pvt *pvt;
 	struct edac_mc_layer layers[1];
 	unsigned long flags;
-	int idx = -1;
 	int rc = 0;
 
-	if (IS_ENABLED(CONFIG_X86)) {
-		/* Check if safe to enable on this system */
-		idx = acpi_match_platform_list(plat_list);
-		if (!force_load && idx < 0)
-			return -ENODEV;
-	} else {
-		force_load = true;
-		idx = 0;
-	}
-
 	/* finish another registration/unregistration instance first */
 	mutex_lock(&ghes_reg_mutex);
 
@@ -458,7 +441,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
 		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
 		pr_info("work on such system. Use this driver with caution\n");
-	} else if (idx < 0) {
+	} else if (force_load) {
 		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
 		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
 		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
@@ -530,7 +513,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	return rc;
 }
 
-void ghes_edac_unregister(struct ghes *ghes)
+static void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
 	unsigned long flags;
@@ -566,3 +549,35 @@ void ghes_edac_unregister(struct ghes *ghes)
 unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
+
+static int __init ghes_edac_init(void)
+{
+	struct ghes *g, *g_tmp;
+
+	if (!IS_ENABLED(CONFIG_X86))
+		force_load = true;
+
+	ghes_devs = ghes_get_devices(force_load);
+	if (!ghes_devs)
+		return -ENODEV;
+
+	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
+		ghes_edac_register(g->dev);
+	}
+
+	return 0;
+}
+module_init(ghes_edac_init);
+
+static void __exit ghes_edac_exit(void)
+{
+	struct ghes *g, *g_tmp;
+
+	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
+		ghes_edac_unregister(g);
+	}
+}
+module_exit(ghes_edac_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Output ACPI APEI/GHES BIOS detected errors module via EDAC");
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 5cbd38b6e4e1..150c0b9500d6 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -27,6 +27,8 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+	struct device *dev;
+	struct list_head elist;
 };
 
 struct ghes_estatus_node {
@@ -69,28 +71,13 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
  * @nb: pointer to the notifier_block structure of the vendor record handler.
  */
 void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
+struct list_head *ghes_get_devices(bool force);
+#else
+static inline struct list_head *ghes_get_devices(bool force) { return NULL; }
 #endif
 
 int ghes_estatus_pool_init(int num_ghes);
 
-/* From drivers/edac/ghes_edac.c */
-
-#ifdef CONFIG_EDAC_GHES
-int ghes_edac_register(struct ghes *ghes, struct device *dev);
-
-void ghes_edac_unregister(struct ghes *ghes);
-
-#else
-static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline void ghes_edac_unregister(struct ghes *ghes)
-{
-}
-#endif
-
 static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
 {
 	return gdata->revision >> 8;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 4/9] EDAC/ghes: Move ghes_edac.force_load to setup parameter
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
                   ` (2 preceding siblings ...)
  2022-08-22 15:40 ` [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-24 15:52   ` Borislav Petkov
  2022-08-22 15:40 ` [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac drivers when ghes_edac is preferred Jia He
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He

ghes_edac_init() is too late to set this module flag ghes_edac.force_load.
Also, other edac drivers should not be able to control this flag.

Move this flag to setup parameter in ghes.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 drivers/acpi/apei/ghes.c                      | 24 +++++++++++-
 drivers/edac/ghes_edac.c                      | 38 +++++++------------
 include/acpi/ghes.h                           |  7 +++-
 4 files changed, 46 insertions(+), 28 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d7f30902fda0..a5f0ee0d7727 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1593,6 +1593,11 @@
 			When zero, profiling data is discarded and associated
 			debugfs files are removed at module unload time.
 
+	ghes_edac_force= [X86] Skip the platform check and forcibly load the
+			ghes_edac modules.
+			Format: <bool>
+			default: false (0)
+
 	goldfish	[X86] Enable the goldfish android emulator platform.
 			Don't use this when you are not running on the
 			android emulator
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 9c52183e3ad9..e17e0ee8f842 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -94,6 +94,26 @@
 #define FIX_APEI_GHES_SDEI_CRITICAL	__end_of_fixed_addresses
 #endif
 
+/*
+ * "ghes_edac_force=1" forcibly loads ghes_edac and skips the platform
+ * check.
+ */
+bool __read_mostly ghes_edac_force;
+EXPORT_SYMBOL(ghes_edac_force);
+
+static int __init setup_ghes_edac_load(char *str)
+{
+	if (str)
+		if (!strcmp("true", str) || !strcmp("1", str))
+			ghes_edac_force = true;
+
+	if (!IS_ENABLED(CONFIG_X86))
+		ghes_edac_force = true;
+
+	return 1;
+}
+__setup("ghes_edac_force=", setup_ghes_edac_load);
+
 static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
 
 static inline bool is_hest_type_generic_v2(struct ghes *ghes)
@@ -1517,13 +1537,13 @@ static struct acpi_platform_list plat_list[] = {
 	{ } /* End */
 };
 
-struct list_head *ghes_get_devices(bool force)
+struct list_head *ghes_get_devices(void)
 {
 	int idx = -1;
 
 	if (IS_ENABLED(CONFIG_X86)) {
 		idx = acpi_match_platform_list(plat_list);
-		if (idx < 0 && !force)
+		if (idx < 0 && !ghes_edac_force)
 			return NULL;
 	}
 
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index bb3ea42ba70b..6a2b54cc7240 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
-/* "ghes_edac.force_load=1" skips the platform check */
-static bool __read_mostly force_load;
-module_param(force_load, bool, 0);
-
 static bool system_scanned;
 
 static struct list_head *ghes_devs;
@@ -437,23 +433,12 @@ static int ghes_edac_register(struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (fake) {
-		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-		pr_info("work on such system. Use this driver with caution\n");
-	} else if (force_load) {
-		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-		pr_info("to correct its BIOS.\n");
-		pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
-	}
-
 	if (!fake) {
 		struct dimm_info *src, *dst;
 		int i = 0;
 
+		pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
+
 		mci_for_each_dimm(mci, dst) {
 			src = &ghes_hw.dimms[i];
 
@@ -478,6 +463,17 @@ static int ghes_edac_register(struct device *dev)
 	} else {
 		struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
 
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
+
+		if (ghes_edac_force) {
+			pr_info("This EDAC driver relies on BIOS to enumerate memory and get\n");
+			pr_info("error reports. Unfortunately, not all BIOSes reflect the\n");
+			pr_info("memory layout correctly. If you find incorrect reports, please\n");
+			pr_info("contact your hardware vendor for its in correct BIOS.\n");
+		}
+
 		dimm->nr_pages = 1;
 		dimm->grain = 128;
 		dimm->mtype = MEM_UNKNOWN;
@@ -518,9 +514,6 @@ static void ghes_edac_unregister(struct ghes *ghes)
 	struct mem_ctl_info *mci;
 	unsigned long flags;
 
-	if (!force_load)
-		return;
-
 	mutex_lock(&ghes_reg_mutex);
 
 	system_scanned = false;
@@ -554,10 +547,7 @@ static int __init ghes_edac_init(void)
 {
 	struct ghes *g, *g_tmp;
 
-	if (!IS_ENABLED(CONFIG_X86))
-		force_load = true;
-
-	ghes_devs = ghes_get_devices(force_load);
+	ghes_devs = ghes_get_devices();
 	if (!ghes_devs)
 		return -ENODEV;
 
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 150c0b9500d6..e29327ee0b83 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -71,9 +71,12 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
  * @nb: pointer to the notifier_block structure of the vendor record handler.
  */
 void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
-struct list_head *ghes_get_devices(bool force);
+
+struct list_head *ghes_get_devices(void);
+extern bool ghes_edac_force;
 #else
-static inline struct list_head *ghes_get_devices(bool force) { return NULL; }
+static inline struct list_head *ghes_get_devices(void) { return NULL; }
+static bool ghes_edac_force;
 #endif
 
 int ghes_estatus_pool_init(int num_ghes);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac drivers when ghes_edac is preferred
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
                   ` (3 preceding siblings ...)
  2022-08-22 15:40 ` [RESEND PATCH v3 4/9] EDAC/ghes: Move ghes_edac.force_load to setup parameter Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-24 23:04   ` Kani, Toshi
  2022-08-22 15:40 ` [RESEND PATCH v3 6/9] ghes: Introduce a flag ghes_present Jia He
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He

edac_mc_add_mc* is too late in the init path and that check should happen
as the very first thing in the driver init function.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c   | 13 +++++++++++--
 drivers/edac/amd64_edac.c  |  3 +++
 drivers/edac/edac_module.h |  1 +
 drivers/edac/i10nm_base.c  |  3 +++
 drivers/edac/igen6_edac.c  |  3 +++
 drivers/edac/pnd2_edac.c   |  3 +++
 drivers/edac/sb_edac.c     |  3 +++
 drivers/edac/skx_base.c    |  3 +++
 include/acpi/ghes.h        |  2 ++
 9 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e17e0ee8f842..327386f3cf33 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1537,16 +1537,25 @@ static struct acpi_platform_list plat_list[] = {
 	{ } /* End */
 };
 
-struct list_head *ghes_get_devices(void)
+bool ghes_edac_preferred(void)
 {
 	int idx = -1;
 
 	if (IS_ENABLED(CONFIG_X86)) {
 		idx = acpi_match_platform_list(plat_list);
 		if (idx < 0 && !ghes_edac_force)
-			return NULL;
+			return false;
 	}
 
+	return true;
+}
+EXPORT_SYMBOL_GPL(ghes_edac_preferred);
+
+struct list_head *ghes_get_devices(void)
+{
+	if (!ghes_edac_preferred())
+		return NULL;
+
 	return &ghes_devs;
 }
 EXPORT_SYMBOL_GPL(ghes_get_devices);
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2f854feeeb23..5314a934c2bf 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -4329,6 +4329,9 @@ static int __init amd64_edac_init(void)
 	int err = -ENODEV;
 	int i;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/edac_module.h b/drivers/edac/edac_module.h
index 96f6de0c8ff6..3826f82de487 100644
--- a/drivers/edac/edac_module.h
+++ b/drivers/edac/edac_module.h
@@ -11,6 +11,7 @@
 #ifndef	__EDAC_MODULE_H__
 #define	__EDAC_MODULE_H__
 
+#include <acpi/ghes.h>
 #include "edac_mc.h"
 #include "edac_pci.h"
 #include "edac_device.h"
diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c
index 6cf50ee0b77c..691df9b51622 100644
--- a/drivers/edac/i10nm_base.c
+++ b/drivers/edac/i10nm_base.c
@@ -548,6 +548,9 @@ static int __init i10nm_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index a07bbfd075d0..4ac6d0c533ec 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1271,6 +1271,9 @@ static int __init igen6_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -ENODEV;
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index a20b299f1202..4ddc43e6c420 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1528,6 +1528,9 @@ static int __init pnd2_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 9678ab97c7ac..3ff604a5e95a 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -3506,6 +3506,9 @@ static int __init sbridge_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c
index 1abc020d49ab..286370728552 100644
--- a/drivers/edac/skx_base.c
+++ b/drivers/edac/skx_base.c
@@ -653,6 +653,9 @@ static int __init skx_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index e29327ee0b83..1c47018a1e43 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -73,9 +73,11 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
 void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
 
 struct list_head *ghes_get_devices(void);
+bool ghes_edac_preferred(void);
 extern bool ghes_edac_force;
 #else
 static inline struct list_head *ghes_get_devices(void) { return NULL; }
+static bool ghes_edac_preferred(void) { return false; }
 static bool ghes_edac_force;
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 6/9] ghes: Introduce a flag ghes_present
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
                   ` (4 preceding siblings ...)
  2022-08-22 15:40 ` [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac drivers when ghes_edac is preferred Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 7/9] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He

Introduce a flag ghes_present to differentiate between the system ROM
really not offering GHES vs. the ghes module not running. The true of
ghes_present means ghes_probe() has been called.

If ghes_edac is not enabled (but ghes is enabled) on a system with GHES
present & preferred, no edac driver gets registered.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 327386f3cf33..31c674639e86 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -114,6 +114,8 @@ static int __init setup_ghes_edac_load(char *str)
 }
 __setup("ghes_edac_force=", setup_ghes_edac_load);
 
+static bool ghes_present;
+
 static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
 
 static inline bool is_hest_type_generic_v2(struct ghes *ghes)
@@ -1407,6 +1409,8 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	list_add_tail(&ghes->elist, &ghes_devs);
 	mutex_unlock(&ghes_devs_mutex);
 
+	ghes_present = true;
+
 	/* Handle any pending errors right away */
 	spin_lock_irqsave(&ghes_notify_lock_irq, flags);
 	ghes_proc(ghes);
@@ -1541,6 +1545,9 @@ bool ghes_edac_preferred(void)
 {
 	int idx = -1;
 
+	if (!ghes_present)
+		return false;
+
 	if (IS_ENABLED(CONFIG_X86)) {
 		idx = acpi_match_platform_list(plat_list);
 		if (idx < 0 && !ghes_edac_force)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 7/9] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
                   ` (5 preceding siblings ...)
  2022-08-22 15:40 ` [RESEND PATCH v3 6/9] ghes: Introduce a flag ghes_present Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 8/9] EDAC/igen6: Keep returned errno consistent when edac mc has been enabled Jia He
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He,
	kernel test robot

ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
   drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache *
   drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache *

unrcu_pointer is to strip the __rcu in cmpxchg.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 31c674639e86..94e3a15c9b06 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -165,7 +165,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
 
-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
@@ -855,8 +855,9 @@ static void ghes_estatus_cache_add(
 	}
 	/* new_cache must be put into array after its contents are written */
 	smp_wmb();
-	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-				  slot_cache, new_cache) == slot_cache) {
+	if (slot != -1 && unrcu_pointer(cmpxchg(ghes_estatus_caches + slot,
+				RCU_INITIALIZER(slot_cache),
+				RCU_INITIALIZER(new_cache))) == slot_cache) {
 		if (slot_cache)
 			call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
 	} else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 8/9] EDAC/igen6: Keep returned errno consistent when edac mc has been enabled
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
                   ` (6 preceding siblings ...)
  2022-08-22 15:40 ` [RESEND PATCH v3 7/9] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-22 15:40 ` [RESEND PATCH v3 9/9] edac: Don't load Arm specific edac drivers when ghes_edac is preferred Jia He
  2022-08-23  1:49 ` [RESEND PATCH v3 0/9] Make ghes_edac a proper module Justin He
  9 siblings, 0 replies; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He

Only a single edac driver can be enabled for EDAC MC. The igen6_init()
should be returned with EBUSY instead of ENODEV, which is consistent with
other edac drivers.

Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/edac/igen6_edac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 4ac6d0c533ec..4646cb72b9f8 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1276,7 +1276,7 @@ static int __init igen6_init(void)
 
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
-		return -ENODEV;
+		return -EBUSY;
 
 	edac_op_state = EDAC_OPSTATE_NMI;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND PATCH v3 9/9] edac: Don't load Arm specific edac drivers when ghes_edac is preferred
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
                   ` (7 preceding siblings ...)
  2022-08-22 15:40 ` [RESEND PATCH v3 8/9] EDAC/igen6: Keep returned errno consistent when edac mc has been enabled Jia He
@ 2022-08-22 15:40 ` Jia He
  2022-08-23  1:49 ` [RESEND PATCH v3 0/9] Make ghes_edac a proper module Justin He
  9 siblings, 0 replies; 27+ messages in thread
From: Jia He @ 2022-08-22 15:40 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, Jia He

edac_mc_add_mc* is too late in the init path and that check should happen
as the very first thing in the driver init function.

Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/edac/armada_xp_edac.c  | 3 +++
 drivers/edac/layerscape_edac.c | 3 +++
 drivers/edac/thunderx_edac.c   | 3 +++
 drivers/edac/xgene_edac.c      | 3 +++
 4 files changed, 12 insertions(+)

diff --git a/drivers/edac/armada_xp_edac.c b/drivers/edac/armada_xp_edac.c
index 038abbb83f4b..486532b92ce0 100644
--- a/drivers/edac/armada_xp_edac.c
+++ b/drivers/edac/armada_xp_edac.c
@@ -599,6 +599,9 @@ static int __init armada_xp_edac_init(void)
 {
 	int res;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	/* only polling is supported */
 	edac_op_state = EDAC_OPSTATE_POLL;
 
diff --git a/drivers/edac/layerscape_edac.c b/drivers/edac/layerscape_edac.c
index 94cac7686a56..60ff4c6674cd 100644
--- a/drivers/edac/layerscape_edac.c
+++ b/drivers/edac/layerscape_edac.c
@@ -38,6 +38,9 @@ static int __init fsl_ddr_mc_init(void)
 {
 	int res;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	/* make sure error reporting method is sane */
 	switch (edac_op_state) {
 	case EDAC_OPSTATE_POLL:
diff --git a/drivers/edac/thunderx_edac.c b/drivers/edac/thunderx_edac.c
index f13674081cb6..2c4baa6817a9 100644
--- a/drivers/edac/thunderx_edac.c
+++ b/drivers/edac/thunderx_edac.c
@@ -2114,6 +2114,9 @@ static int __init thunderx_edac_init(void)
 {
 	int rc = 0;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	rc = pci_register_driver(&thunderx_lmc_driver);
 	if (rc)
 		return rc;
diff --git a/drivers/edac/xgene_edac.c b/drivers/edac/xgene_edac.c
index 54081403db4f..9aa68220b625 100644
--- a/drivers/edac/xgene_edac.c
+++ b/drivers/edac/xgene_edac.c
@@ -2004,6 +2004,9 @@ static int __init xgene_edac_init(void)
 {
 	int rc;
 
+	if (ghes_edac_preferred())
+		return -EBUSY;
+
 	/* Make sure error reporting method is sane */
 	switch (edac_op_state) {
 	case EDAC_OPSTATE_POLL:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 0/9] Make ghes_edac a proper module
  2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
                   ` (8 preceding siblings ...)
  2022-08-22 15:40 ` [RESEND PATCH v3 9/9] edac: Don't load Arm specific edac drivers when ghes_edac is preferred Jia He
@ 2022-08-23  1:49 ` Justin He
  2022-08-23 17:19     ` [Devel] " Rafael J. Wysocki
  9 siblings, 1 reply; 27+ messages in thread
From: Justin He @ 2022-08-23  1:49 UTC (permalink / raw)
  To: Justin He, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc

Hi, 
Sorry for resending the v3.
There is an exceptional interrupt when I tried to post v3 at the first time.
Maybe it is caused by a comma "," inside the mail name.
E.g. 
Signed-off-by: Some, one <someone@site.com>
Looks like a git sendemail issue? 

Anyway, sorry for the inconvenience.
--
Cheers,
Justin (Jia He)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND PATCH v3 0/9] Make ghes_edac a proper module
@ 2022-08-23 17:19     ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2022-08-23 17:19 UTC (permalink / raw)
  To: Justin He
  Cc: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh,
	Kani Toshi, Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac,
	devel, Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi,
	nd, Paul E. McKenney, Andrew Morton, Neeraj Upadhyay,
	Randy Dunlap, Damien Le Moal, Muchun Song, linux-doc

On Tue, Aug 23, 2022 at 3:50 AM Justin He <Justin.He@arm.com> wrote:
>
> Hi,
> Sorry for resending the v3.
> There is an exceptional interrupt when I tried to post v3 at the first time.
> Maybe it is caused by a comma "," inside the mail name.
> E.g.
> Signed-off-by: Some, one <someone@site.com>
> Looks like a git sendemail issue?
>
> Anyway, sorry for the inconvenience.

I've received it twice, but no problem.

I need Boris to tell me what to do with this series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Devel] Re: [RESEND PATCH v3 0/9] Make ghes_edac a proper module
@ 2022-08-23 17:19     ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2022-08-23 17:19 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 486 bytes --]

On Tue, Aug 23, 2022 at 3:50 AM Justin He <Justin.He(a)arm.com> wrote:
>
> Hi,
> Sorry for resending the v3.
> There is an exceptional interrupt when I tried to post v3 at the first time.
> Maybe it is caused by a comma "," inside the mail name.
> E.g.
> Signed-off-by: Some, one <someone(a)site.com>
> Looks like a git sendemail issue?
>
> Anyway, sorry for the inconvenience.

I've received it twice, but no problem.

I need Boris to tell me what to do with this series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-22 15:40 ` [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
@ 2022-08-24 15:37   ` Borislav Petkov
  2022-08-25 12:21     ` Justin He
  2022-08-26 22:42   ` Elliott, Robert (Servers)
  1 sibling, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2022-08-24 15:37 UTC (permalink / raw)
  To: Jia He
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jonathan Corbet, Jan Luebbe, Khuong Dinh, Kani Toshi,
	Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, stable

On Mon, Aug 22, 2022 at 03:40:42PM +0000, Jia He wrote:
> Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
> apci_init()") introduced a bug that ghes_edac_register() would be invoked
> before edac_init(). Because at that time, the bus "edac" hadn't been even
> registered, this created sysfs /devices/mc0 instead of
> /sys/devices/system/edac/mc/mc0 on an Ampere eMag server.
> 
> To remove the dependency of ghes_edac on ghes, make it a proper module. Use
> a list to save the probing devices in ghes_probe(), and defer the
> ghes_edac_register() to module_init() of the new ghes_edac module by
> iterating over the devices list.
> 
> Co-developed-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Jia He <justin.he@arm.com>
> Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
> Cc: stable@kernel.org

Why is this marked for stable?

The prerequisite patches are needed too. I guess this needs to be
communicated to stable folks somehow by doing

Cc: stable@kernel.org # needs commits X, Y, ...

but I guess the committer needs to do that because only at commit time
will X and Y be known...

So, is there any particular reason why this should be in stable?

> @@ -1442,7 +1449,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  
>  	ghes_fini(ghes);
>  
> -	ghes_edac_unregister(ghes);
> +	mutex_lock(&ghes_devs_mutex);
> +	list_del_rcu(&ghes->elist);

Is that list RCU-protected?

> +	mutex_unlock(&ghes_devs_mutex);
>  
>  	kfree(ghes);

...

> @@ -566,3 +549,35 @@ void ghes_edac_unregister(struct ghes *ghes)
>  unlock:
>  	mutex_unlock(&ghes_reg_mutex);
>  }
> +
> +static int __init ghes_edac_init(void)
> +{
> +	struct ghes *g, *g_tmp;
> +
> +	if (!IS_ENABLED(CONFIG_X86))
> +		force_load = true;

No, this is not how this works.

> +	ghes_devs = ghes_get_devices(force_load);
> +	if (!ghes_devs)
> +		return -ENODEV;

You simply need to check force_load here.

> +	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
> +		ghes_edac_register(g->dev);
> +	}
> +
> +	return 0;
> +}

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND PATCH v3 4/9] EDAC/ghes: Move ghes_edac.force_load to setup parameter
  2022-08-22 15:40 ` [RESEND PATCH v3 4/9] EDAC/ghes: Move ghes_edac.force_load to setup parameter Jia He
@ 2022-08-24 15:52   ` Borislav Petkov
  2022-08-25  9:42     ` Justin He
  2022-08-30  1:21     ` Justin He
  0 siblings, 2 replies; 27+ messages in thread
From: Borislav Petkov @ 2022-08-24 15:52 UTC (permalink / raw)
  To: Jia He
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jonathan Corbet, Jan Luebbe, Khuong Dinh, Kani Toshi,
	Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc

On Mon, Aug 22, 2022 at 03:40:43PM +0000, Jia He wrote:
> ghes_edac_init() is too late to set this module flag ghes_edac.force_load.
> Also, other edac drivers should not be able to control this flag.
> 
> Move this flag to setup parameter in ghes.
> 
> Suggested-by: Toshi Kani <toshi.kani@hpe.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 +++
>  drivers/acpi/apei/ghes.c                      | 24 +++++++++++-
>  drivers/edac/ghes_edac.c                      | 38 +++++++------------
>  include/acpi/ghes.h                           |  7 +++-
>  4 files changed, 46 insertions(+), 28 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index d7f30902fda0..a5f0ee0d7727 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1593,6 +1593,11 @@
>  			When zero, profiling data is discarded and associated
>  			debugfs files are removed at module unload time.
>  
> +	ghes_edac_force= [X86] Skip the platform check and forcibly load the

So there already is ghes.disable which is using the module param thing.
Why don't you do that too?

> +			ghes_edac modules.

"module" - singular.

> +			Format: <bool>
> +			default: false (0)
> +
>  	goldfish	[X86] Enable the goldfish android emulator platform.
>  			Don't use this when you are not running on the
>  			android emulator
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 9c52183e3ad9..e17e0ee8f842 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -94,6 +94,26 @@
>  #define FIX_APEI_GHES_SDEI_CRITICAL	__end_of_fixed_addresses
>  #endif
>  
> +/*
> + * "ghes_edac_force=1" forcibly loads ghes_edac and skips the platform
> + * check.
> + */
> +bool __read_mostly ghes_edac_force;
> +EXPORT_SYMBOL(ghes_edac_force);
> +
> +static int __init setup_ghes_edac_load(char *str)
> +{
> +	if (str)
> +		if (!strcmp("true", str) || !strcmp("1", str))
> +			ghes_edac_force = true;
> +
> +	if (!IS_ENABLED(CONFIG_X86))
> +		ghes_edac_force = true;
> +
> +	return 1;
> +}
> +__setup("ghes_edac_force=", setup_ghes_edac_load);

Why all that?

Isn't specifying

ghes.edac_force_load

on the kernel command line enough? I.e., you don't need to parse the
passed in option - just the presence of the parameter is enough.

> +
>  static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
>  
>  static inline bool is_hest_type_generic_v2(struct ghes *ghes)
> @@ -1517,13 +1537,13 @@ static struct acpi_platform_list plat_list[] = {
>  	{ } /* End */
>  };
>  
> -struct list_head *ghes_get_devices(bool force)
> +struct list_head *ghes_get_devices(void)
>  {
>  	int idx = -1;
>  
>  	if (IS_ENABLED(CONFIG_X86)) {
>  		idx = acpi_match_platform_list(plat_list);
> -		if (idx < 0 && !force)
> +		if (idx < 0 && !ghes_edac_force)
>  			return NULL;
>  	}
>  
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index bb3ea42ba70b..6a2b54cc7240 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
>   */
>  static DEFINE_SPINLOCK(ghes_lock);
>  
> -/* "ghes_edac.force_load=1" skips the platform check */
> -static bool __read_mostly force_load;
> -module_param(force_load, bool, 0);
> -
>  static bool system_scanned;
>  
>  static struct list_head *ghes_devs;
> @@ -437,23 +433,12 @@ static int ghes_edac_register(struct device *dev)
>  	mci->ctl_name = "ghes_edac";
>  	mci->dev_name = "ghes";
>  
> -	if (fake) {
> -		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> -		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> -		pr_info("work on such system. Use this driver with caution\n");
> -	} else if (force_load) {
> -		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
> -		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
> -		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
> -		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
> -		pr_info("to correct its BIOS.\n");
> -		pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
> -	}
> -
>  	if (!fake) {
>  		struct dimm_info *src, *dst;
>  		int i = 0;
>  
> +		pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
> +
>  		mci_for_each_dimm(mci, dst) {
>  			src = &ghes_hw.dimms[i];
>  

This hunk...

> @@ -478,6 +463,17 @@ static int ghes_edac_register(struct device *dev)
>  	} else {
>  		struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
>  
> +		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> +		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> +		pr_info("work on such system. Use this driver with caution\n");
> +
> +		if (ghes_edac_force) {
> +			pr_info("This EDAC driver relies on BIOS to enumerate memory and get\n");
> +			pr_info("error reports. Unfortunately, not all BIOSes reflect the\n");
> +			pr_info("memory layout correctly. If you find incorrect reports, please\n");
> +			pr_info("contact your hardware vendor for its in correct BIOS.\n");
> +		}
> +
>  		dimm->nr_pages = 1;
>  		dimm->grain = 128;
>  		dimm->mtype = MEM_UNKNOWN;

... and this hunk look unrelated to what this patch is doing. What are
they for?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac drivers when ghes_edac is preferred
  2022-08-22 15:40 ` [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac drivers when ghes_edac is preferred Jia He
@ 2022-08-24 23:04   ` Kani, Toshi
  2022-08-25  9:45     ` Justin He
  0 siblings, 1 reply; 27+ messages in thread
From: Kani, Toshi @ 2022-08-24 23:04 UTC (permalink / raw)
  To: Jia He, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc

On Monday, August 22, 2022 9:41 AM, Jia He wrote:
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index e17e0ee8f842..327386f3cf33 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1537,16 +1537,25 @@ static struct acpi_platform_list plat_list[] = {
>  	{ } /* End */
>  };
> 
> -struct list_head *ghes_get_devices(void)
> +bool ghes_edac_preferred(void)
>  {
>  	int idx = -1;
> 
>  	if (IS_ENABLED(CONFIG_X86)) {
>  		idx = acpi_match_platform_list(plat_list);
>  		if (idx < 0 && !ghes_edac_force)
> -			return NULL;
> +			return false;
>  	}
> 
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(ghes_edac_preferred);
> +
> +struct list_head *ghes_get_devices(void)
> +{
> +	if (!ghes_edac_preferred())
> +		return NULL;
> +
>  	return &ghes_devs;
>  }

ghes_get_devices() changing multiple times in the series is 
confusing to me.   Can you simply introduce ghes_get_devices()
and ghes_preferred() in the right state in a patch?  Perhaps,
patch #2, #5, #6 can collapse to introduce the two funcs?

The rest of patch #5 adding the call to ghes_edac_preferred()
into other edac drivers can remain as a separate patch.

Toshi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 4/9] EDAC/ghes: Move ghes_edac.force_load to setup parameter
  2022-08-24 15:52   ` Borislav Petkov
@ 2022-08-25  9:42     ` Justin He
  2022-08-30  1:21     ` Justin He
  1 sibling, 0 replies; 27+ messages in thread
From: Justin He @ 2022-08-25  9:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jonathan Corbet, Jan Luebbe, Khuong Dinh, Kani Toshi,
	Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc



> -----Original Message-----
[...]
> > +static int __init setup_ghes_edac_load(char *str) {
> > +	if (str)
> > +		if (!strcmp("true", str) || !strcmp("1", str))
> > +			ghes_edac_force = true;
> > +
> > +	if (!IS_ENABLED(CONFIG_X86))
> > +		ghes_edac_force = true;
> > +
> > +	return 1;
> > +}
> > +__setup("ghes_edac_force=", setup_ghes_edac_load);
> 
> Why all that?
> 
> Isn't specifying
> 
> ghes.edac_force_load

Ok, I will use module parameter ghes.edac_force_load.
> 
> on the kernel command line enough? I.e., you don't need to parse the passed
> in option - just the presence of the parameter is enough.
> 
> > +
> >  static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
> >
> >  static inline bool is_hest_type_generic_v2(struct ghes *ghes) @@
> > -1517,13 +1537,13 @@ static struct acpi_platform_list plat_list[] = {
> >  	{ } /* End */
> >  };
> >
> > -struct list_head *ghes_get_devices(bool force)
> > +struct list_head *ghes_get_devices(void)
> >  {
> >  	int idx = -1;
> >
> >  	if (IS_ENABLED(CONFIG_X86)) {
> >  		idx = acpi_match_platform_list(plat_list);
> > -		if (idx < 0 && !force)
> > +		if (idx < 0 && !ghes_edac_force)
> >  			return NULL;
> >  	}
> >
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
> > bb3ea42ba70b..6a2b54cc7240 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
> >   */
> >  static DEFINE_SPINLOCK(ghes_lock);
> >
> > -/* "ghes_edac.force_load=1" skips the platform check */ -static bool
> > __read_mostly force_load; -module_param(force_load, bool, 0);
> > -
> >  static bool system_scanned;
> >
> >  static struct list_head *ghes_devs;
> > @@ -437,23 +433,12 @@ static int ghes_edac_register(struct device *dev)
> >  	mci->ctl_name = "ghes_edac";
> >  	mci->dev_name = "ghes";
> >
> > -	if (fake) {
> > -		pr_info("This system has a very crappy BIOS: It doesn't even list the
> DIMMS.\n");
> > -		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report
> would\n");
> > -		pr_info("work on such system. Use this driver with caution\n");
> > -	} else if (force_load) {
> > -		pr_info("This EDAC driver relies on BIOS to enumerate memory and
> get error reports.\n");
> > -		pr_info("Unfortunately, not all BIOSes reflect the memory layout
> correctly.\n");
> > -		pr_info("So, the end result of using this driver varies from vendor to
> vendor.\n");
> > -		pr_info("If you find incorrect reports, please contact your hardware
> vendor\n");
> > -		pr_info("to correct its BIOS.\n");
> > -		pr_info("This system has %d DIMM sockets.\n",
> ghes_hw.num_dimms);
> > -	}
> > -
> >  	if (!fake) {
> >  		struct dimm_info *src, *dst;
> >  		int i = 0;
> >
> > +		pr_info("This system has %d DIMM sockets.\n",
> ghes_hw.num_dimms);
> > +
> >  		mci_for_each_dimm(mci, dst) {
> >  			src = &ghes_hw.dimms[i];
> >
> 
> This hunk...

Sorry, should move pr_info after this line

> 
> > @@ -478,6 +463,17 @@ static int ghes_edac_register(struct device *dev)
> >  	} else {
> >  		struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
> >
> > +		pr_info("This system has a very crappy BIOS: It doesn't even list the
> DIMMS.\n");
> > +		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report
> would\n");
> > +		pr_info("work on such system. Use this driver with caution\n");
> > +
> > +		if (ghes_edac_force) {
> > +			pr_info("This EDAC driver relies on BIOS to enumerate memory
> and get\n");
> > +			pr_info("error reports. Unfortunately, not all BIOSes reflect
> the\n");
> > +			pr_info("memory layout correctly. If you find incorrect reports,
> please\n");
> > +			pr_info("contact your hardware vendor for its in correct
> BIOS.\n");
> > +		}
> > +
> >  		dimm->nr_pages = 1;
> >  		dimm->grain = 128;
> >  		dimm->mtype = MEM_UNKNOWN;
> 
> ... and this hunk look unrelated to what this patch is doing. What are they for?
> 
I tried to keep the previous same logic.
 if (fake)
	print_something
 - !fake && ghes not preferred.
	print_otherthings


--
Cheers,
Justin (Jia He)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac drivers when ghes_edac is preferred
  2022-08-24 23:04   ` Kani, Toshi
@ 2022-08-25  9:45     ` Justin He
  2022-08-25 23:38       ` Kani, Toshi
  0 siblings, 1 reply; 27+ messages in thread
From: Justin He @ 2022-08-25  9:45 UTC (permalink / raw)
  To: Kani, Toshi, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc



> -----Original Message-----
> From: Kani, Toshi <toshi.kani@hpe.com>
> Sent: Thursday, August 25, 2022 7:04 AM
> To: Justin He <Justin.He@arm.com>; Len Brown <lenb@kernel.org>; James
> Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Borislav
> Petkov <bp@alien8.de>; Mauro Carvalho Chehab <mchehab@kernel.org>;
> Robert Richter <rric@kernel.org>; Robert Moore <robert.moore@intel.com>;
> Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen Ghannam
> <yazen.ghannam@amd.com>; Jonathan Corbet <corbet@lwn.net>; Jan
> Luebbe <jlu@pengutronix.de>; Khuong Dinh
> <khuong@os.amperecomputing.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>; linux-acpi@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; devel@acpica.org;
> Rafael J . Wysocki <rafael@kernel.org>; Shuai Xue
> <xueshuai@linux.alibaba.com>; Jarkko Sakkinen <jarkko@kernel.org>;
> linux-efi@vger.kernel.org; nd <nd@arm.com>; Paul E. McKenney
> <paulmck@kernel.org>; Andrew Morton <akpm@linux-foundation.org>;
> Neeraj Upadhyay <quic_neeraju@quicinc.com>; Randy Dunlap
> <rdunlap@infradead.org>; Damien Le Moal
> <damien.lemoal@opensource.wdc.com>; Muchun Song
> <songmuchun@bytedance.com>; linux-doc@vger.kernel.org
> Subject: RE: [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac
> drivers when ghes_edac is preferred
> 
> On Monday, August 22, 2022 9:41 AM, Jia He wrote:
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
> > e17e0ee8f842..327386f3cf33 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -1537,16 +1537,25 @@ static struct acpi_platform_list plat_list[] = {
> >  	{ } /* End */
> >  };
> >
> > -struct list_head *ghes_get_devices(void)
> > +bool ghes_edac_preferred(void)
> >  {
> >  	int idx = -1;
> >
> >  	if (IS_ENABLED(CONFIG_X86)) {
> >  		idx = acpi_match_platform_list(plat_list);
> >  		if (idx < 0 && !ghes_edac_force)
> > -			return NULL;
> > +			return false;
> >  	}
> >
> > +	return true;
> > +}
> > +EXPORT_SYMBOL_GPL(ghes_edac_preferred);
> > +
> > +struct list_head *ghes_get_devices(void) {
> > +	if (!ghes_edac_preferred())
> > +		return NULL;
> > +
> >  	return &ghes_devs;
> >  }
> 
> ghes_get_devices() changing multiple times in the series is
> confusing to me.   Can you simply introduce ghes_get_devices()
> and ghes_preferred() in the right state in a patch?  Perhaps, patch #2, #5, #6
> can collapse to introduce the two funcs?
> 

My purpose was to make it easy for review. I am ok to merge these patches into one.

> The rest of patch #5 adding the call to ghes_edac_preferred() into other edac
> drivers can remain as a separate patch.

Okay, I assume you mean to merge all of the ghes_edac_preferred() checking for intel and
Arm edac drivers into 1 separate patch, am I understanding well?


--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-24 15:37   ` Borislav Petkov
@ 2022-08-25 12:21     ` Justin He
  2022-08-26 19:30       ` Kani, Toshi
  0 siblings, 1 reply; 27+ messages in thread
From: Justin He @ 2022-08-25 12:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jonathan Corbet, Jan Luebbe, Khuong Dinh, Kani Toshi,
	Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, stable

Hi Borislav

> -----Original Message-----
> On Mon, Aug 22, 2022 at 03:40:42PM +0000, Jia He wrote:
> > Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
> > apci_init()") introduced a bug that ghes_edac_register() would be
> > invoked before edac_init(). Because at that time, the bus "edac"
> > hadn't been even registered, this created sysfs /devices/mc0 instead
> > of
> > /sys/devices/system/edac/mc/mc0 on an Ampere eMag server.
> >
> > To remove the dependency of ghes_edac on ghes, make it a proper
> > module. Use a list to save the probing devices in ghes_probe(), and
> > defer the
> > ghes_edac_register() to module_init() of the new ghes_edac module by
> > iterating over the devices list.
> >
> > Co-developed-by: Borislav Petkov <bp@alien8.de>
> > Signed-off-by: Borislav Petkov <bp@alien8.de>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
> > apci_init()")
> > Cc: stable@kernel.org
> 
> Why is this marked for stable?
> 
> The prerequisite patches are needed too. I guess this needs to be
> communicated to stable folks somehow by doing
> 
> Cc: stable@kernel.org # needs commits X, Y, ...
> 
> but I guess the committer needs to do that because only at commit time will X
> and Y be known...
> 
> So, is there any particular reason why this should be in stable?

Okay, I am fine with removing the stable line if dc4e8c07e9e2 will not be included in
any stable tree branch.

> 
> > @@ -1442,7 +1449,9 @@ static int ghes_remove(struct platform_device
> > *ghes_dev)
> >
> >  	ghes_fini(ghes);
> >
> > -	ghes_edac_unregister(ghes);
> > +	mutex_lock(&ghes_devs_mutex);
> > +	list_del_rcu(&ghes->elist);
> 
> Is that list RCU-protected?

No, I will remove the "rcu" suffix since I use list_add_tail.
 
> 
> > +	mutex_unlock(&ghes_devs_mutex);
> >
> >  	kfree(ghes);
> 
> ...
> 
> > @@ -566,3 +549,35 @@ void ghes_edac_unregister(struct ghes *ghes)
> >  unlock:
> >  	mutex_unlock(&ghes_reg_mutex);
> >  }
> > +
> > +static int __init ghes_edac_init(void) {
> > +	struct ghes *g, *g_tmp;
> > +
> > +	if (!IS_ENABLED(CONFIG_X86))
> > +		force_load = true;
> 
> No, this is not how this works.
> 
> > +	ghes_devs = ghes_get_devices(force_load);
> > +	if (!ghes_devs)
> > +		return -ENODEV;
> 
> You simply need to check force_load here.
> 

Okay, hence should I export the *ghes_devs* in ghes?


--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac drivers when ghes_edac is preferred
  2022-08-25  9:45     ` Justin He
@ 2022-08-25 23:38       ` Kani, Toshi
  0 siblings, 0 replies; 27+ messages in thread
From: Kani, Toshi @ 2022-08-25 23:38 UTC (permalink / raw)
  To: Justin He, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc

On Thursday, August 25, 2022 3:46 AM, Justin He wrote:
> > ghes_get_devices() changing multiple times in the series is
> > confusing to me.   Can you simply introduce ghes_get_devices()
> > and ghes_preferred() in the right state in a patch?  Perhaps, patch #2, #5,
> > #6 can collapse to introduce the two funcs?
> 
> My purpose was to make it easy for review. I am ok to merge these patches
> into one.

This series starts with your original patchset and then has additional
patches to address the issues with the original patchset.  The number
of the patches continues to increase in this way...  You do not need to 
record the history of discussions and design changes in the series.
 
> > The rest of patch #5 adding the call to ghes_edac_preferred() into other edac
> > drivers can remain as a separate patch.
> 
> Okay, I assume you mean to merge all of the ghes_edac_preferred()
> checking for intel and
> Arm edac drivers into 1 separate patch, am I understanding well?

No, that's not what I meant.

ghes_get_devices() and ghes_edac_preferred() look good to me after
all patches are applied.  So, instead of introducing the original design of
ghes_get_devices() and then changing its design multiple times in the
series, please simply add the final version of ghes_get_devices() and 
ghes_edac_preferred() in a single patch.  They are small functions anyway.

That is, your series includes something like:
  - PATCH: Add ghes_get_devices() and ghes_edac_preferred() interfaces
  - PATCH: Add ghes_edac_preferred check to chipset-specific edac drivers

Toshi





^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-25 12:21     ` Justin He
@ 2022-08-26 19:30       ` Kani, Toshi
  0 siblings, 0 replies; 27+ messages in thread
From: Kani, Toshi @ 2022-08-26 19:30 UTC (permalink / raw)
  To: Justin He, Borislav Petkov
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jonathan Corbet, Jan Luebbe, Khuong Dinh, Ard Biesheuvel,
	linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd, Paul E. McKenney,
	Andrew Morton, Neeraj Upadhyay, Randy Dunlap, Damien Le Moal,
	Muchun Song, linux-doc, stable

On Thursday, August 25, 2022 6:21 AM, Justin He wrote:
> > > @@ -566,3 +549,35 @@ void ghes_edac_unregister(struct ghes *ghes)
> > >  unlock:
> > >  	mutex_unlock(&ghes_reg_mutex);
> > >  }
> > > +
> > > +static int __init ghes_edac_init(void) {
> > > +	struct ghes *g, *g_tmp;
> > > +
> > > +	if (!IS_ENABLED(CONFIG_X86))
> > > +		force_load = true;
> >
> > No, this is not how this works.
> >
> > > +	ghes_devs = ghes_get_devices(force_load);
> > > +	if (!ghes_devs)
> > > +		return -ENODEV;
> >
> > You simply need to check force_load here.
> >
> 
> Okay, hence should I export the *ghes_devs* in ghes?

It does not matter.  This series then moves the force_load check
to ghes_edac_preferred().  It is confusing for reviewers...
Please divide patches based on the final design. 

Toshi

 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-22 15:40 ` [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
  2022-08-24 15:37   ` Borislav Petkov
@ 2022-08-26 22:42   ` Elliott, Robert (Servers)
  2022-08-27  5:22     ` Borislav Petkov
  1 sibling, 1 reply; 27+ messages in thread
From: Elliott, Robert (Servers) @ 2022-08-26 22:42 UTC (permalink / raw)
  To: Jia He, Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jonathan Corbet, Jan Luebbe, Khuong Dinh, Kani,
	Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, stable



> -----Original Message-----
> From: Jia He <justin.he@arm.com>
> Sent: Monday, August 22, 2022 10:41 AM
> Subject: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to
> remove the dependency on ghes

1. I suggest adding:
    MODULE_ALIAS("acpi*")

so udev will automatically load the module on any system with ACPI. 

> drivers/edac/Kconfig
> config EDAC_GHES
> +	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"

2. I suggest:
    tristate "APEI (ACPI Platform Error Interfaces) GHES (Generic Hardware Error Source)"

That's in a menu of EDAC drivers, so no suffix is needed.

3. The Kconfig help text needs some updates, since the drivers are now ordering
themselves to avoid race conditions.

Current:
          Not all machines support hardware-driven error report. Some of those
          provide a BIOS-driven error report mechanism via ACPI, using the
          APEI/GHES driver. By enabling this option, the error reports provided
          by GHES are sent to userspace via the EDAC API.

          When this option is enabled, it will disable the hardware-driven
          mechanisms, if a GHES BIOS is detected, entering into the
          "Firmware First" mode.

          It should be noticed that keeping both GHES and a hardware-driven
          error mechanism won't work well, as BIOS will race with OS, while
          reading the error registers. So, if you want to not use "Firmware
          first" GHES error mechanism, you should disable GHES either at
          compilation time or by passing "ghes.disable=1" Kernel parameter
          at boot time.

          In doubt, say 'Y'.

Suggestion:
  Support for error detection and correction based on APEI (ACPI Platform
  Error Interfaces), which allows system firmware to report hardware errors 
  via the HEST (Hardware Error Source Table) using GHES (Generic Hardware
  Error Source) records. Some systems perform "firmware first" processing
  of errors before reporting them.

  This module is supported in systems supporting GHES. If the architecture
  is x86, the module only loads if the platform is listed in a known-good
  platform list (see drivers/edac/ghes_edac.c) or if ghes.force_load=1
  is specified on the kernel command line).


4. In the help text for each module that looks for GHES and refuses to load 
(e.g., EDAC_AMD64), add a sentence:

  This module does not load on a system supporting ACPI GHES.

> drivers/acpi/apei/ghes.c
> +MODULE_DESCRIPTION("Output ACPI APEI/GHES BIOS detected errors module via EDAC");

5. I suggest:
    MODULE_DESCRIPTION("APEI (ACPI Platform Error Interfaces) GHES (Generic Hardware Error Source) EDAC driver")



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-26 22:42   ` Elliott, Robert (Servers)
@ 2022-08-27  5:22     ` Borislav Petkov
  2022-08-29 15:59       ` Yazen Ghannam
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2022-08-27  5:22 UTC (permalink / raw)
  To: Elliott, Robert (Servers)
  Cc: Jia He, Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jonathan Corbet, Jan Luebbe, Khuong Dinh, Kani, Toshi,
	Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc, stable

On Fri, Aug 26, 2022 at 10:42:13PM +0000, Elliott, Robert (Servers) wrote:
> 4. In the help text for each module that looks for GHES and refuses to load 
> (e.g., EDAC_AMD64), add a sentence:
> 
>   This module does not load on a system supporting ACPI GHES.

It is not "system supporting ACPI GHES." - it is on a system which is
*known* to have a more or less tested GHES implementation. The notoriety
of firmware RAS brokenness is well known.

So please stop this - there's a world outside HP BIOS.

None of this is needed for this patchset.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-27  5:22     ` Borislav Petkov
@ 2022-08-29 15:59       ` Yazen Ghannam
  2022-08-29 20:39         ` Borislav Petkov
  0 siblings, 1 reply; 27+ messages in thread
From: Yazen Ghannam @ 2022-08-29 15:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Elliott, Robert (Servers),
	Jia He, Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Jonathan Corbet,
	Jan Luebbe, Khuong Dinh, Kani, Toshi, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, Paul E. McKenney, Andrew Morton,
	Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
	linux-doc, stable

On Sat, Aug 27, 2022 at 07:22:48AM +0200, Borislav Petkov wrote:
> On Fri, Aug 26, 2022 at 10:42:13PM +0000, Elliott, Robert (Servers) wrote:
> > 4. In the help text for each module that looks for GHES and refuses to load 
> > (e.g., EDAC_AMD64), add a sentence:
> > 
> >   This module does not load on a system supporting ACPI GHES.
> 
> It is not "system supporting ACPI GHES." - it is on a system which is
> *known* to have a more or less tested GHES implementation. The notoriety
> of firmware RAS brokenness is well known.
> 
> So please stop this - there's a world outside HP BIOS.
> 
> None of this is needed for this patchset.
>

GHES can be used for more than just memory errors. There are platforms where
memory errors are handled through the OS MCA, and PCIe AER errors are handled
through the FW, for example.

Is the HPE Server platform guaranteed to always provide memory errors through
GHES regardless of CPU vendor/architecture?

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-29 15:59       ` Yazen Ghannam
@ 2022-08-29 20:39         ` Borislav Petkov
  2022-08-29 21:37           ` Kani, Toshi
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2022-08-29 20:39 UTC (permalink / raw)
  To: Yazen Ghannam, Elliott, Robert (Servers), Kani, Toshi
  Cc: Jia He, Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Jonathan Corbet,
	Jan Luebbe, Khuong Dinh, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, Paul E. McKenney, Andrew Morton,
	Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
	linux-doc, stable

On Mon, Aug 29, 2022 at 03:59:28PM +0000, Yazen Ghannam wrote:
> GHES can be used for more than just memory errors. There are platforms where
> memory errors are handled through the OS MCA, and PCIe AER errors are handled
> through the FW, for example.
> 
> Is the HPE Server platform guaranteed to always provide memory errors through
> GHES regardless of CPU vendor/architecture?

/me looks in the direction of HPE folks...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-08-29 20:39         ` Borislav Petkov
@ 2022-08-29 21:37           ` Kani, Toshi
  0 siblings, 0 replies; 27+ messages in thread
From: Kani, Toshi @ 2022-08-29 21:37 UTC (permalink / raw)
  To: Borislav Petkov, Yazen Ghannam, Elliott, Robert (Servers)
  Cc: Jia He, Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Jonathan Corbet,
	Jan Luebbe, Khuong Dinh, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, Paul E. McKenney, Andrew Morton,
	Neeraj Upadhyay, Randy Dunlap, Damien Le Moal, Muchun Song,
	linux-doc, stable

On Monday, August 29, 2022 2:39 PM, Borislav Petkov wrote:
> On Mon, Aug 29, 2022 at 03:59:28PM +0000, Yazen Ghannam wrote:
> > GHES can be used for more than just memory errors. There are platforms where
> > memory errors are handled through the OS MCA, and PCIe AER errors are handled
> > through the FW, for example.
> >
> > Is the HPE Server platform guaranteed to always provide memory errors through
> > GHES regardless of CPU vendor/architecture?
> 
> /me looks in the direction of HPE folks...

The HPE platforms enabled by the platform check are guaranteed to be operating
in FW First mode, which FW decides which error to report to the OS via GHES or 
other means.  This may include multiple CPU vendors/architecture.
On such platforms, for instance, FW does not report corrected errors to the OS 
since FW manages the threshold & FRU notification.  Chipset-specific edac drivers,
designed for OS First mode, is not necessary on such platforms.  Disabling such OS
First edac driver is achieved by enabling ghes_edac as well.

OS MCA is still used for uncorrected errors, such as SRAR (software recoverable
action required) which requires recovery action synchronous to the execution via
MCE signalling.

Toshi 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RESEND PATCH v3 4/9] EDAC/ghes: Move ghes_edac.force_load to setup parameter
  2022-08-24 15:52   ` Borislav Petkov
  2022-08-25  9:42     ` Justin He
@ 2022-08-30  1:21     ` Justin He
  1 sibling, 0 replies; 27+ messages in thread
From: Justin He @ 2022-08-30  1:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jonathan Corbet, Jan Luebbe, Khuong Dinh, Kani Toshi,
	Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Paul E. McKenney, Andrew Morton, Neeraj Upadhyay, Randy Dunlap,
	Damien Le Moal, Muchun Song, linux-doc

Hi Borislav

> -----Original Message-----
[...]
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index d7f30902fda0..a5f0ee0d7727 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1593,6 +1593,11 @@
> >  			When zero, profiling data is discarded and associated
> >  			debugfs files are removed at module unload time.
> >
> > +	ghes_edac_force= [X86] Skip the platform check and forcibly load the
> 
> So there already is ghes.disable which is using the module param thing.
> Why don't you do that too?
> 
> > +			ghes_edac modules.
> 
> "module" - singular.
> 
> > +			Format: <bool>
> > +			default: false (0)
> > +
> >  	goldfish	[X86] Enable the goldfish android emulator platform.
> >  			Don't use this when you are not running on the
> >  			android emulator
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
> > 9c52183e3ad9..e17e0ee8f842 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -94,6 +94,26 @@
> >  #define FIX_APEI_GHES_SDEI_CRITICAL	__end_of_fixed_addresses
> >  #endif
> >
> > +/*
> > + * "ghes_edac_force=1" forcibly loads ghes_edac and skips the
> > +platform
> > + * check.
> > + */
> > +bool __read_mostly ghes_edac_force;
> > +EXPORT_SYMBOL(ghes_edac_force);
> > +
> > +static int __init setup_ghes_edac_load(char *str) {
> > +	if (str)
> > +		if (!strcmp("true", str) || !strcmp("1", str))
> > +			ghes_edac_force = true;
> > +
> > +	if (!IS_ENABLED(CONFIG_X86))
> > +		ghes_edac_force = true;
> > +
> > +	return 1;
> > +}
> > +__setup("ghes_edac_force=", setup_ghes_edac_load);
> 
> Why all that?
> 
> Isn't specifying
> 
> ghes.edac_force_load
> 
> on the kernel command line enough? I.e., you don't need to parse the passed
> in option - just the presence of the parameter is enough.
> 
> > +
> >  static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
> >
> >  static inline bool is_hest_type_generic_v2(struct ghes *ghes) @@
> > -1517,13 +1537,13 @@ static struct acpi_platform_list plat_list[] = {
> >  	{ } /* End */
> >  };
> >
> > -struct list_head *ghes_get_devices(bool force)
> > +struct list_head *ghes_get_devices(void)
> >  {
> >  	int idx = -1;
> >
> >  	if (IS_ENABLED(CONFIG_X86)) {
> >  		idx = acpi_match_platform_list(plat_list);
> > -		if (idx < 0 && !force)
> > +		if (idx < 0 && !ghes_edac_force)
> >  			return NULL;
> >  	}
> >
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
> > bb3ea42ba70b..6a2b54cc7240 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
> >   */
> >  static DEFINE_SPINLOCK(ghes_lock);
> >
> > -/* "ghes_edac.force_load=1" skips the platform check */ -static bool
> > __read_mostly force_load; -module_param(force_load, bool, 0);
> > -
> >  static bool system_scanned;
> >
> >  static struct list_head *ghes_devs;
> > @@ -437,23 +433,12 @@ static int ghes_edac_register(struct device *dev)
> >  	mci->ctl_name = "ghes_edac";
> >  	mci->dev_name = "ghes";
> >
> > -	if (fake) {
> > -		pr_info("This system has a very crappy BIOS: It doesn't even list the
> DIMMS.\n");
> > -		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report
> would\n");
> > -		pr_info("work on such system. Use this driver with caution\n");
> > -	} else if (force_load) {
> > -		pr_info("This EDAC driver relies on BIOS to enumerate memory and
> get error reports.\n");
> > -		pr_info("Unfortunately, not all BIOSes reflect the memory layout
> correctly.\n");
> > -		pr_info("So, the end result of using this driver varies from vendor to
> vendor.\n");
> > -		pr_info("If you find incorrect reports, please contact your hardware
> vendor\n");
> > -		pr_info("to correct its BIOS.\n");
> > -		pr_info("This system has %d DIMM sockets.\n",
> ghes_hw.num_dimms);
> > -	}
> > -
> >  	if (!fake) {
> >  		struct dimm_info *src, *dst;
> >  		int i = 0;
> >
> > +		pr_info("This system has %d DIMM sockets.\n",
> ghes_hw.num_dimms);
> > +
> >  		mci_for_each_dimm(mci, dst) {
> >  			src = &ghes_hw.dimms[i];
> >
> 
> This hunk...
> 
> > @@ -478,6 +463,17 @@ static int ghes_edac_register(struct device *dev)
> >  	} else {
> >  		struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
> >
> > +		pr_info("This system has a very crappy BIOS: It doesn't even list the
> DIMMS.\n");
> > +		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report
> would\n");
> > +		pr_info("work on such system. Use this driver with caution\n");
> > +
> > +		if (ghes_edac_force) {
> > +			pr_info("This EDAC driver relies on BIOS to enumerate memory
> and get\n");
> > +			pr_info("error reports. Unfortunately, not all BIOSes reflect
> the\n");
> > +			pr_info("memory layout correctly. If you find incorrect reports,
> please\n");
> > +			pr_info("contact your hardware vendor for its in correct
> BIOS.\n");
> > +		}
> > +
> >  		dimm->nr_pages = 1;
> >  		dimm->grain = 128;
> >  		dimm->mtype = MEM_UNKNOWN;
> 
> ... and this hunk look unrelated to what this patch is doing. What are they for?
> 
Another purpose for adjusting the pr_info hunk is to suppress the warning dmesg log
on Arm server.
On Arm, the ghes_edac_force_load (module parameter in ghes) should be true unconditionally.
So I moved the following block in fake==true check, is this reasonable?
+If (!fake)
+...
+else
+		if (ghes_edac_force) {
+			pr_info("This EDAC driver relies on BIOS to enumerate memory and get\n");
+			pr_info("error reports. Unfortunately, not all BIOSes reflect the\n");
+			pr_info("memory layout correctly. If you find incorrect reports, please\n");
+			pr_info("contact your hardware vendor for its in correct BIOS.\n");
+		}

What do you think of it? 

--
Cheers,
Justin (Jia He)


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-08-30  1:21 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-22 15:40 [RESEND PATCH v3 0/9] Make ghes_edac a proper module Jia He
2022-08-22 15:40 ` [RESEND PATCH v3 1/9] efi/cper: export several helpers for ghes_edac to use Jia He
2022-08-22 15:40 ` [RESEND PATCH v3 2/9] EDAC/ghes: Add a notifier for reporting memory errors Jia He
2022-08-22 15:40 ` [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
2022-08-24 15:37   ` Borislav Petkov
2022-08-25 12:21     ` Justin He
2022-08-26 19:30       ` Kani, Toshi
2022-08-26 22:42   ` Elliott, Robert (Servers)
2022-08-27  5:22     ` Borislav Petkov
2022-08-29 15:59       ` Yazen Ghannam
2022-08-29 20:39         ` Borislav Petkov
2022-08-29 21:37           ` Kani, Toshi
2022-08-22 15:40 ` [RESEND PATCH v3 4/9] EDAC/ghes: Move ghes_edac.force_load to setup parameter Jia He
2022-08-24 15:52   ` Borislav Petkov
2022-08-25  9:42     ` Justin He
2022-08-30  1:21     ` Justin He
2022-08-22 15:40 ` [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac drivers when ghes_edac is preferred Jia He
2022-08-24 23:04   ` Kani, Toshi
2022-08-25  9:45     ` Justin He
2022-08-25 23:38       ` Kani, Toshi
2022-08-22 15:40 ` [RESEND PATCH v3 6/9] ghes: Introduce a flag ghes_present Jia He
2022-08-22 15:40 ` [RESEND PATCH v3 7/9] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
2022-08-22 15:40 ` [RESEND PATCH v3 8/9] EDAC/igen6: Keep returned errno consistent when edac mc has been enabled Jia He
2022-08-22 15:40 ` [RESEND PATCH v3 9/9] edac: Don't load Arm specific edac drivers when ghes_edac is preferred Jia He
2022-08-23  1:49 ` [RESEND PATCH v3 0/9] Make ghes_edac a proper module Justin He
2022-08-23 17:19   ` Rafael J. Wysocki
2022-08-23 17:19     ` [Devel] " Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.