All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Modularize ghes_edac driver
@ 2022-08-17 14:34 Jia He
  2022-08-17 14:34 ` [PATCH v2 1/7] efi/cper: export several helpers for ghes edac to use Jia He
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Jia He @ 2022-08-17 14:34 UTC (permalink / raw)
  To: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd, Jia He

Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hasn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

The solution is modularizing the ghes_edac driver.

Changelog:
v2:
 - add acked-by tag of Patch 1 from Ard
 - split the notifier patch
 - add 2 patch to get regular drivers selected when ghes edac is not loaded
 - fix an errno in igen6 driver
 - add a patch to fix the sparse warning of ghes
 - refine the commit logs

Jia He (7):
  efi/cper: export several helpers for ghes edac to use
  EDAC/ghes: Add notifier to report ghes_edac mem error
  EDAC/ghes: Modularize ghes_edac driver to remove the dependency on
    ghes
  EDAC: Get chipset-specific edac drivers selected only when ghes_edac
    is not enabled
  EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac
    is registered
  apei/ghes: Use unrcu_pointer for cmpxchg
  EDAC/igen6: Keep returned errno consistent when edac mc has been
    enabled

 drivers/acpi/apei/ghes.c    | 72 ++++++++++++++++++++++++++++++++---
 drivers/edac/Kconfig        |  4 +-
 drivers/edac/amd64_edac.c   |  3 ++
 drivers/edac/ghes_edac.c    | 76 ++++++++++++++++++++++++++-----------
 drivers/edac/igen6_edac.c   |  2 +-
 drivers/edac/pnd2_edac.c    |  3 ++
 drivers/edac/sb_edac.c      |  3 ++
 drivers/edac/skx_base.c     |  3 ++
 drivers/firmware/efi/cper.c |  3 ++
 include/acpi/ghes.h         | 37 ++++++------------
 10 files changed, 150 insertions(+), 56 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/7] efi/cper: export several helpers for ghes edac to use
  2022-08-17 14:34 [PATCH v2 0/7] Modularize ghes_edac driver Jia He
@ 2022-08-17 14:34 ` Jia He
  2022-08-18 14:38   ` Borislav Petkov
  2022-08-17 14:34 ` [PATCH v2 2/7] EDAC/ghes: Add notifier to report ghes_edac mem error Jia He
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Jia He @ 2022-08-17 14:34 UTC (permalink / raw)
  To: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd, Jia He

Before the ghes_edac codes are modularized, export several efi/cper helpers
to avoid linking error of ghes_edac.

Signed-off-by: Jia He <justin.he@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/cper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index e4e5ea7ce910..053eae13f409 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -290,6 +290,7 @@ int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(cper_mem_err_location);
 
 int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 {
@@ -310,6 +311,7 @@ int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(cper_dimm_err_location);
 
 void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 		       struct cper_mem_err_compact *cmem)
@@ -331,6 +333,7 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 	cmem->mem_array_handle = mem->mem_array_handle;
 	cmem->mem_dev_handle = mem->mem_dev_handle;
 }
+EXPORT_SYMBOL_GPL(cper_mem_err_pack);
 
 const char *cper_mem_err_unpack(struct trace_seq *p,
 				struct cper_mem_err_compact *cmem)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/7] EDAC/ghes: Add notifier to report ghes_edac mem error
  2022-08-17 14:34 [PATCH v2 0/7] Modularize ghes_edac driver Jia He
  2022-08-17 14:34 ` [PATCH v2 1/7] efi/cper: export several helpers for ghes edac to use Jia He
@ 2022-08-17 14:34 ` Jia He
  2022-08-18 15:42   ` Borislav Petkov
  2022-08-17 14:34 ` [PATCH v2 3/7] EDAC/ghes: Modularize ghes_edac driver to remove the dependency on ghes Jia He
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Jia He @ 2022-08-17 14:34 UTC (permalink / raw)
  To: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd, Jia He

To modularize ghes_edac driver, any ghes_edac codes should not be invoked
in ghes. Add a notifier of registering the ghes_edac_report_mem_error() to
resolve the build dependency. The atomic notifier is used because
ghes_proc_in_irq() can be in the irq context.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 16 +++++++++++++++-
 drivers/edac/ghes_edac.c | 19 +++++++++++++++++--
 include/acpi/ghes.h      | 10 +++-------
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d91ad378c00d..8cb65f757d06 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -94,6 +94,8 @@
 #define FIX_APEI_GHES_SDEI_CRITICAL	__end_of_fixed_addresses
 #endif
 
+static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
+
 static inline bool is_hest_type_generic_v2(struct ghes *ghes)
 {
 	return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
@@ -645,7 +647,7 @@ static bool ghes_do_proc(struct ghes *ghes,
 		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
-			ghes_edac_report_mem_error(sev, mem_err);
+			atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
 			queued = ghes_handle_memory_failure(gdata, sev);
@@ -1497,3 +1499,15 @@ void __init acpi_ghes_init(void)
 	else
 		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
 }
+
+void ghes_register_report_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_register(&ghes_report_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_register_report_chain);
+
+void ghes_unregister_report_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&ghes_report_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_report_chain);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index c8fa7dcfdbd0..7b8d56a769f6 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -14,6 +14,7 @@
 #include <linux/dmi.h>
 #include "edac_module.h"
 #include <ras/ras_event.h>
+#include <linux/notifier.h>
 
 #define OTHER_DETAIL_LEN	400
 
@@ -267,11 +268,14 @@ static int print_mem_error_other_detail(const struct cper_sec_mem_err *mem, char
 	return n;
 }
 
-void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
+static int ghes_edac_report_mem_error(struct notifier_block *nb,
+				      unsigned long val, void *data)
 {
+	struct cper_sec_mem_err *mem_err = (struct cper_sec_mem_err *)data;
 	struct cper_mem_err_compact cmem;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
+	unsigned long sev = val;
 	struct ghes_pvt *pvt;
 	unsigned long flags;
 	char *p;
@@ -282,7 +286,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 	 * know.
 	 */
 	if (WARN_ON_ONCE(in_nmi()))
-		return;
+		return NOTIFY_OK;
 
 	spin_lock_irqsave(&ghes_lock, flags);
 
@@ -374,8 +378,15 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 unlock:
 	spin_unlock_irqrestore(&ghes_lock, flags);
+
+	return NOTIFY_OK;
 }
 
+static struct notifier_block ghes_edac_mem_err_nb = {
+	.notifier_call	= ghes_edac_report_mem_error,
+	.priority	= 0,
+};
+
 /*
  * Known systems that are safe to enable this module.
  */
@@ -503,6 +514,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	ghes_pvt = pvt;
 	spin_unlock_irqrestore(&ghes_lock, flags);
 
+	ghes_register_report_chain(&ghes_edac_mem_err_nb);
+
 	/* only set on success */
 	refcount_set(&ghes_refcount, 1);
 
@@ -548,6 +561,8 @@ void ghes_edac_unregister(struct ghes *ghes)
 	if (mci)
 		edac_mc_free(mci);
 
+	ghes_unregister_report_chain(&ghes_edac_mem_err_nb);
+
 unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 34fb3431a8f3..5cbd38b6e4e1 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -76,18 +76,11 @@ int ghes_estatus_pool_init(int num_ghes);
 /* From drivers/edac/ghes_edac.c */
 
 #ifdef CONFIG_EDAC_GHES
-void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
-
 int ghes_edac_register(struct ghes *ghes, struct device *dev);
 
 void ghes_edac_unregister(struct ghes *ghes);
 
 #else
-static inline void ghes_edac_report_mem_error(int sev,
-				       struct cper_sec_mem_err *mem_err)
-{
-}
-
 static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	return -ENODEV;
@@ -145,4 +138,7 @@ int ghes_notify_sea(void);
 static inline int ghes_notify_sea(void) { return -ENOENT; }
 #endif
 
+struct notifier_block;
+extern void ghes_register_report_chain(struct notifier_block *nb);
+extern void ghes_unregister_report_chain(struct notifier_block *nb);
 #endif /* GHES_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/7] EDAC/ghes: Modularize ghes_edac driver to remove the dependency on ghes
  2022-08-17 14:34 [PATCH v2 0/7] Modularize ghes_edac driver Jia He
  2022-08-17 14:34 ` [PATCH v2 1/7] efi/cper: export several helpers for ghes edac to use Jia He
  2022-08-17 14:34 ` [PATCH v2 2/7] EDAC/ghes: Add notifier to report ghes_edac mem error Jia He
@ 2022-08-17 14:34 ` Jia He
  2022-08-17 14:34 ` [PATCH v2 4/7] EDAC: Get chipset-specific edac drivers selected only when ghes_edac is not enabled Jia He
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Jia He @ 2022-08-17 14:34 UTC (permalink / raw)
  To: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd, Jia He,
	stable

Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hadn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

The solution is modularizing the ghes_edac driver. Use a list to save the
probing devices in ghes_probe(), and defer the ghes_edac_register() to the
new ghes_edac module_init() by iterating over the devices list.

Co-developed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Cc: stable@kernel.org
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/acpi/apei/ghes.c | 35 ++++++++++++++++++++++--
 drivers/edac/Kconfig     |  4 +--
 drivers/edac/ghes_edac.c | 59 +++++++++++++++++++++++++---------------
 include/acpi/ghes.h      | 23 ++++------------
 4 files changed, 77 insertions(+), 44 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8cb65f757d06..9c52183e3ad9 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -120,6 +120,9 @@ module_param_named(disable, ghes_disable, bool, 0);
 static LIST_HEAD(ghes_hed);
 static DEFINE_MUTEX(ghes_list_mutex);
 
+static LIST_HEAD(ghes_devs);
+static DEFINE_MUTEX(ghes_devs_mutex);
+
 /*
  * Because the memory area used to transfer hardware error information
  * from BIOS to Linux can be determined only in NMI, IRQ or timer
@@ -1378,7 +1381,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 
 	platform_set_drvdata(ghes_dev, ghes);
 
-	ghes_edac_register(ghes, &ghes_dev->dev);
+	ghes->dev = &ghes_dev->dev;
+
+	mutex_lock(&ghes_devs_mutex);
+	list_add_tail(&ghes->elist, &ghes_devs);
+	mutex_unlock(&ghes_devs_mutex);
 
 	/* Handle any pending errors right away */
 	spin_lock_irqsave(&ghes_notify_lock_irq, flags);
@@ -1442,7 +1449,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 
 	ghes_fini(ghes);
 
-	ghes_edac_unregister(ghes);
+	mutex_lock(&ghes_devs_mutex);
+	list_del_rcu(&ghes->elist);
+	mutex_unlock(&ghes_devs_mutex);
 
 	kfree(ghes);
 
@@ -1500,6 +1509,28 @@ void __init acpi_ghes_init(void)
 		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
 }
 
+/*
+ * Known x86 systems that prefer GHES error reporting:
+ */
+static struct acpi_platform_list plat_list[] = {
+	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+	{ } /* End */
+};
+
+struct list_head *ghes_get_devices(bool force)
+{
+	int idx = -1;
+
+	if (IS_ENABLED(CONFIG_X86)) {
+		idx = acpi_match_platform_list(plat_list);
+		if (idx < 0 && !force)
+			return NULL;
+	}
+
+	return &ghes_devs;
+}
+EXPORT_SYMBOL_GPL(ghes_get_devices);
+
 void ghes_register_report_chain(struct notifier_block *nb)
 {
 	atomic_notifier_chain_register(&ghes_report_chain, nb);
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 17562cf1fe97..df45db81858b 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -53,8 +53,8 @@ config EDAC_DECODE_MCE
 	  has been initialized.
 
 config EDAC_GHES
-	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
-	depends on ACPI_APEI_GHES && (EDAC=y)
+	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"
+	depends on ACPI_APEI_GHES
 	select UEFI_CPER
 	help
 	  Not all machines support hardware-driven error report. Some of those
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 7b8d56a769f6..3bdf8469882d 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -60,6 +60,8 @@ module_param(force_load, bool, 0);
 
 static bool system_scanned;
 
+static struct list_head *ghes_devs;
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
 	u8 type;
@@ -387,34 +389,15 @@ static struct notifier_block ghes_edac_mem_err_nb = {
 	.priority	= 0,
 };
 
-/*
- * Known systems that are safe to enable this module.
- */
-static struct acpi_platform_list plat_list[] = {
-	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
-	{ } /* End */
-};
-
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static int ghes_edac_register(struct device *dev)
 {
 	bool fake = false;
 	struct mem_ctl_info *mci;
 	struct ghes_pvt *pvt;
 	struct edac_mc_layer layers[1];
 	unsigned long flags;
-	int idx = -1;
 	int rc = 0;
 
-	if (IS_ENABLED(CONFIG_X86)) {
-		/* Check if safe to enable on this system */
-		idx = acpi_match_platform_list(plat_list);
-		if (!force_load && idx < 0)
-			return -ENODEV;
-	} else {
-		force_load = true;
-		idx = 0;
-	}
-
 	/* finish another registration/unregistration instance first */
 	mutex_lock(&ghes_reg_mutex);
 
@@ -458,7 +441,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
 		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
 		pr_info("work on such system. Use this driver with caution\n");
-	} else if (idx < 0) {
+	} else if (force_load) {
 		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
 		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
 		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
@@ -530,7 +513,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	return rc;
 }
 
-void ghes_edac_unregister(struct ghes *ghes)
+static void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
 	unsigned long flags;
@@ -566,3 +549,35 @@ void ghes_edac_unregister(struct ghes *ghes)
 unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
+
+static int __init ghes_edac_init(void)
+{
+	struct ghes *g, *g_tmp;
+
+	ghes_devs = ghes_get_devices(force_load);
+	if (!ghes_devs)
+		return -ENODEV;
+
+	if (!IS_ENABLED(CONFIG_X86))
+		force_load = true;
+
+	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
+		ghes_edac_register(g->dev);
+	}
+
+	return 0;
+}
+module_init(ghes_edac_init);
+
+static void __exit ghes_edac_exit(void)
+{
+	struct ghes *g, *g_tmp;
+
+	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
+		ghes_edac_unregister(g);
+	}
+}
+module_exit(ghes_edac_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Output ACPI APEI/GHES BIOS detected errors module via EDAC");
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 5cbd38b6e4e1..150c0b9500d6 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -27,6 +27,8 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+	struct device *dev;
+	struct list_head elist;
 };
 
 struct ghes_estatus_node {
@@ -69,28 +71,13 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
  * @nb: pointer to the notifier_block structure of the vendor record handler.
  */
 void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
+struct list_head *ghes_get_devices(bool force);
+#else
+static inline struct list_head *ghes_get_devices(bool force) { return NULL; }
 #endif
 
 int ghes_estatus_pool_init(int num_ghes);
 
-/* From drivers/edac/ghes_edac.c */
-
-#ifdef CONFIG_EDAC_GHES
-int ghes_edac_register(struct ghes *ghes, struct device *dev);
-
-void ghes_edac_unregister(struct ghes *ghes);
-
-#else
-static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline void ghes_edac_unregister(struct ghes *ghes)
-{
-}
-#endif
-
 static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
 {
 	return gdata->revision >> 8;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/7] EDAC: Get chipset-specific edac drivers selected only when ghes_edac is not enabled
  2022-08-17 14:34 [PATCH v2 0/7] Modularize ghes_edac driver Jia He
                   ` (2 preceding siblings ...)
  2022-08-17 14:34 ` [PATCH v2 3/7] EDAC/ghes: Modularize ghes_edac driver to remove the dependency on ghes Jia He
@ 2022-08-17 14:34 ` Jia He
  2022-08-18 23:56   ` Kani, Toshi
  2022-08-17 14:34 ` [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered Jia He
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Jia He @ 2022-08-17 14:34 UTC (permalink / raw)
  To: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd, Jia He

When ghes_edac is loaded, a regular edac driver for the CPU type / platform
still attempts to register itself and fails in its module_init call.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/edac/amd64_edac.c | 3 +++
 drivers/edac/pnd2_edac.c  | 3 +++
 drivers/edac/sb_edac.c    | 3 +++
 drivers/edac/skx_base.c   | 3 +++
 4 files changed, 12 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2f854feeeb23..e4eaf6668feb 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -4329,6 +4329,9 @@ static int __init amd64_edac_init(void)
 	int err = -ENODEV;
 	int i;
 
+	if (ghes_get_devices(0))
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index a20b299f1202..73f2ba0e64e3 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1528,6 +1528,9 @@ static int __init pnd2_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_get_devices(0))
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 9678ab97c7ac..1d0520a16840 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -3506,6 +3506,9 @@ static int __init sbridge_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_get_devices(0))
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c
index 1abc020d49ab..fe267f8543f5 100644
--- a/drivers/edac/skx_base.c
+++ b/drivers/edac/skx_base.c
@@ -653,6 +653,9 @@ static int __init skx_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_get_devices(0))
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-17 14:34 [PATCH v2 0/7] Modularize ghes_edac driver Jia He
                   ` (3 preceding siblings ...)
  2022-08-17 14:34 ` [PATCH v2 4/7] EDAC: Get chipset-specific edac drivers selected only when ghes_edac is not enabled Jia He
@ 2022-08-17 14:34 ` Jia He
  2022-08-19  1:29   ` Kani, Toshi
  2022-08-17 14:34 ` [PATCH v2 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
  2022-08-17 14:34 ` [PATCH v2 7/7] EDAC/igen6: Keep returned errno consistent when edac mc has been enabled Jia He
  6 siblings, 1 reply; 25+ messages in thread
From: Jia He @ 2022-08-17 14:34 UTC (permalink / raw)
  To: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd, Jia He

Previous, there is only one edac can be registering in the EDAC core. After
ghes_edac is modularized, the registering of ghes devices may be deferred
when ghes_edac is loaded. Prevent the chipset-specific edac drivers from
loading after ghes_edac is registered, which allow the edac drivers to
get selected in e.g. HPE platform.

Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c  | 14 ++++++++++++++
 drivers/edac/amd64_edac.c |  2 +-
 drivers/edac/ghes_edac.c  |  2 ++
 drivers/edac/pnd2_edac.c  |  2 +-
 drivers/edac/sb_edac.c    |  2 +-
 drivers/edac/skx_base.c   |  2 +-
 include/acpi/ghes.h       |  4 ++++
 7 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 9c52183e3ad9..9272d963b57d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -95,6 +95,7 @@
 #endif
 
 static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
+static bool devs_registered;
 
 static inline bool is_hest_type_generic_v2(struct ghes *ghes)
 {
@@ -1266,6 +1267,18 @@ static int apei_sdei_unregister_ghes(struct ghes *ghes)
 	return sdei_unregister_ghes(ghes);
 }
 
+void set_ghes_devs_registered(bool flag)
+{
+	devs_registered = flag;
+}
+EXPORT_SYMBOL_GPL(set_ghes_devs_registered);
+
+bool ghes_devs_registered(void)
+{
+	return devs_registered;
+}
+EXPORT_SYMBOL_GPL(ghes_devs_registered);
+
 static int ghes_probe(struct platform_device *ghes_dev)
 {
 	struct acpi_hest_generic *generic;
@@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	platform_set_drvdata(ghes_dev, ghes);
 
 	ghes->dev = &ghes_dev->dev;
+	set_ghes_devs_registered(false);
 
 	mutex_lock(&ghes_devs_mutex);
 	list_add_tail(&ghes->elist, &ghes_devs);
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index e4eaf6668feb..f48507fa7b94 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -4329,7 +4329,7 @@ static int __init amd64_edac_init(void)
 	int err = -ENODEV;
 	int i;
 
-	if (ghes_get_devices(0))
+	if (ghes_get_devices(0) && ghes_devs_registered())
 		return -EBUSY;
 
 	owner = edac_get_owner();
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 3bdf8469882d..d26644b3bc47 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -564,6 +564,7 @@ static int __init ghes_edac_init(void)
 	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
 		ghes_edac_register(g->dev);
 	}
+	set_ghes_devs_registered(true);
 
 	return 0;
 }
@@ -576,6 +577,7 @@ static void __exit ghes_edac_exit(void)
 	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
 		ghes_edac_unregister(g);
 	}
+	set_ghes_devs_registered(false);
 }
 module_exit(ghes_edac_exit);
 
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 73f2ba0e64e3..66d89d00c4b3 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1528,7 +1528,7 @@ static int __init pnd2_init(void)
 
 	edac_dbg(2, "\n");
 
-	if (ghes_get_devices(0))
+	if (ghes_get_devices(0) && ghes_devs_registered())
 		return -EBUSY;
 
 	owner = edac_get_owner();
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 1d0520a16840..a3a89e366a74 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -3506,7 +3506,7 @@ static int __init sbridge_init(void)
 
 	edac_dbg(2, "\n");
 
-	if (ghes_get_devices(0))
+	if (ghes_get_devices(0) && ghes_devs_registered())
 		return -EBUSY;
 
 	owner = edac_get_owner();
diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c
index fe267f8543f5..efa1ae79c35a 100644
--- a/drivers/edac/skx_base.c
+++ b/drivers/edac/skx_base.c
@@ -653,7 +653,7 @@ static int __init skx_init(void)
 
 	edac_dbg(2, "\n");
 
-	if (ghes_get_devices(0))
+	if (ghes_get_devices(0) && ghes_devs_registered())
 		return -EBUSY;
 
 	owner = edac_get_owner();
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 150c0b9500d6..21b9d4d4c3e9 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -72,8 +72,12 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
  */
 void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
 struct list_head *ghes_get_devices(bool force);
+bool ghes_devs_registered(void);
+void set_ghes_devs_registered(bool flag);
 #else
 static inline struct list_head *ghes_get_devices(bool force) { return NULL; }
+static inline bool ghes_devs_registered(void) { return false; }
+static inline void set_ghes_devs_registered(bool flag) { return; }
 #endif
 
 int ghes_estatus_pool_init(int num_ghes);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-08-17 14:34 [PATCH v2 0/7] Modularize ghes_edac driver Jia He
                   ` (4 preceding siblings ...)
  2022-08-17 14:34 ` [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered Jia He
@ 2022-08-17 14:34 ` Jia He
  2022-08-17 15:38   ` David Laight
  2022-08-17 14:34 ` [PATCH v2 7/7] EDAC/igen6: Keep returned errno consistent when edac mc has been enabled Jia He
  6 siblings, 1 reply; 25+ messages in thread
From: Jia He @ 2022-08-17 14:34 UTC (permalink / raw)
  To: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd, Jia He,
	kernel test robot

ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
   drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache *
   drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache *

unrcu_pointer is to strip the __rcu in cmpxchg.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 9272d963b57d..92ae58f4f7bb 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -144,7 +144,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
 
-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
@@ -834,8 +834,9 @@ static void ghes_estatus_cache_add(
 	}
 	/* new_cache must be put into array after its contents are written */
 	smp_wmb();
-	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-				  slot_cache, new_cache) == slot_cache) {
+	if (slot != -1 && unrcu_pointer(cmpxchg(ghes_estatus_caches + slot,
+				RCU_INITIALIZER(slot_cache),
+				RCU_INITIALIZER(new_cache)))) {
 		if (slot_cache)
 			call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
 	} else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 7/7] EDAC/igen6: Keep returned errno consistent when edac mc has been enabled
  2022-08-17 14:34 [PATCH v2 0/7] Modularize ghes_edac driver Jia He
                   ` (5 preceding siblings ...)
  2022-08-17 14:34 ` [PATCH v2 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
@ 2022-08-17 14:34 ` Jia He
  6 siblings, 0 replies; 25+ messages in thread
From: Jia He @ 2022-08-17 14:34 UTC (permalink / raw)
  To: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd, Jia He

Only a single edac driver can be enabled for EDAC MC. The igen6_init()
should be returned with EBUSY instead of ENODEV, which is consistent with
other edac drivers.

Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/edac/igen6_edac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index a07bbfd075d0..41503b80347c 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1273,7 +1273,7 @@ static int __init igen6_init(void)
 
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
-		return -ENODEV;
+		return -EBUSY;
 
 	edac_op_state = EDAC_OPSTATE_NMI;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-08-17 14:34 ` [PATCH v2 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
@ 2022-08-17 15:38   ` David Laight
  2022-08-18  1:22     ` Justin He
  0 siblings, 1 reply; 25+ messages in thread
From: David Laight @ 2022-08-17 15:38 UTC (permalink / raw)
  To: 'Jia He',
	Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd,
	kernel test robot

From: Jia He
> Sent: 17 August 2022 15:35
> 
> ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
>    drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression
> (different address spaces):
>    drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
>    drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache *
>    drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression
> (different address spaces):
>    drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
>    drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache *
> 
> unrcu_pointer is to strip the __rcu in cmpxchg.
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  drivers/acpi/apei/ghes.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 9272d963b57d..92ae58f4f7bb 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -144,7 +144,7 @@ struct ghes_vendor_record_entry {
>  static struct gen_pool *ghes_estatus_pool;
>  static unsigned long ghes_estatus_pool_size_request;
> 
> -static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> +static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>  static atomic_t ghes_estatus_cache_alloced;
> 
>  static int ghes_panic_timeout __read_mostly = 30;
> @@ -834,8 +834,9 @@ static void ghes_estatus_cache_add(
>  	}
>  	/* new_cache must be put into array after its contents are written */
>  	smp_wmb();
> -	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
> -				  slot_cache, new_cache) == slot_cache) {
> +	if (slot != -1 && unrcu_pointer(cmpxchg(ghes_estatus_caches + slot,
> +				RCU_INITIALIZER(slot_cache),
> +				RCU_INITIALIZER(new_cache)))) {

Did you test this?
There seems to be an == missing.

	David

>  		if (slot_cache)
>  			call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
>  	} else
> --
> 2.25.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-08-17 15:38   ` David Laight
@ 2022-08-18  1:22     ` Justin He
  0 siblings, 0 replies; 25+ messages in thread
From: Justin He @ 2022-08-18  1:22 UTC (permalink / raw)
  To: David Laight, Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, toshi.kani, nd,
	kernel test robot



> -----Original Message-----
> From: David Laight <David.Laight@ACULAB.COM>
> Sent: Wednesday, August 17, 2022 11:39 PM
> To: Justin He <Justin.He@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len
> Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony
> Luck <tony.luck@intel.com>; Borislav Petkov <bp@alien8.de>; Mauro
> Carvalho Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>;
> Robert Moore <robert.moore@intel.com>; Qiuxu Zhuo
> <qiuxu.zhuo@intel.com>; Yazen Ghannam <yazen.ghannam@amd.com>
> Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org;
> toshi.kani@hpe.com; nd <nd@arm.com>; kernel test robot <lkp@intel.com>
> Subject: RE: [PATCH v2 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> From: Jia He
> > Sent: 17 August 2022 15:35
> >
> > ghes_estatus_caches should be add rcu annotation to avoid sparse
> warnings.
> >    drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types
> > in comparison expression (different address spaces):
> >    drivers/acpi/apei/ghes.c:733:25: sparse:    struct
> ghes_estatus_cache [noderef] __rcu *
> >    drivers/acpi/apei/ghes.c:733:25: sparse:    struct
> ghes_estatus_cache *
> >    drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types
> > in comparison expression (different address spaces):
> >    drivers/acpi/apei/ghes.c:813:25: sparse:    struct
> ghes_estatus_cache [noderef] __rcu *
> >    drivers/acpi/apei/ghes.c:813:25: sparse:    struct
> ghes_estatus_cache *
> >
> > unrcu_pointer is to strip the __rcu in cmpxchg.
> >
> > Reported-by: kernel test robot <lkp@intel.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  drivers/acpi/apei/ghes.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
> > 9272d963b57d..92ae58f4f7bb 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -144,7 +144,7 @@ struct ghes_vendor_record_entry {  static struct
> > gen_pool *ghes_estatus_pool;  static unsigned long
> > ghes_estatus_pool_size_request;
> >
> > -static struct ghes_estatus_cache
> > *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> > +static struct ghes_estatus_cache __rcu
> > +*ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> >  static atomic_t ghes_estatus_cache_alloced;
> >
> >  static int ghes_panic_timeout __read_mostly = 30; @@ -834,8 +834,9
> @@
> > static void ghes_estatus_cache_add(
> >  	}
> >  	/* new_cache must be put into array after its contents are written */
> >  	smp_wmb();
> > -	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
> > -				  slot_cache, new_cache) == slot_cache) {
> > +	if (slot != -1 && unrcu_pointer(cmpxchg(ghes_estatus_caches + slot,
> > +				RCU_INITIALIZER(slot_cache),
> > +				RCU_INITIALIZER(new_cache)))) {
> 
> Did you test this?
> There seems to be an == missing.

Sorry about it, 


--
Cheers,
Justin (Jia He)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] efi/cper: export several helpers for ghes edac to use
  2022-08-17 14:34 ` [PATCH v2 1/7] efi/cper: export several helpers for ghes edac to use Jia He
@ 2022-08-18 14:38   ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2022-08-18 14:38 UTC (permalink / raw)
  To: Jia He
  Cc: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi,
	toshi.kani, nd

On Wed, Aug 17, 2022 at 02:34:52PM +0000, Jia He wrote:
> Subject: Re: [PATCH v2 1/7] efi/cper: export several helpers for ghes edac to use

"ghes_edac"

> Before the ghes_edac codes are modularized, export several efi/cper helpers

"Before ghes_edac can be turned back into a proper module again, export
several helpers which are going to be used by it."

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/7] EDAC/ghes: Add notifier to report ghes_edac mem error
  2022-08-17 14:34 ` [PATCH v2 2/7] EDAC/ghes: Add notifier to report ghes_edac mem error Jia He
@ 2022-08-18 15:42   ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2022-08-18 15:42 UTC (permalink / raw)
  To: Jia He
  Cc: Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi,
	toshi.kani, nd

On Wed, Aug 17, 2022 at 02:34:53PM +0000, Jia He wrote:
> Subject: Re: [PATCH v2 2/7] EDAC/ghes: Add notifier to report ghes_edac mem error

Subject: ...: Add a notifier for reporting memory errors.

> To modularize ghes_edac driver, any ghes_edac codes should not be invoked

s/modularize/make a proper module/

and replace that everywhere.

There's no such thing as "codes" - please try to write proper English.

> in ghes. Add a notifier of registering the ghes_edac_report_mem_error() to
> resolve the build dependency.

"Add a notifier for reporting memory errors."

When you say "to resolve the build dependency" it sounds like this is
some kind of a nuisance. But it isn't one - it is simply an improvement.

> The atomic notifier is used because
> ghes_proc_in_irq() can be in the irq context.

"Use an atomic notifier because calls sites like ghes_proc_in_irq() run
in interrupt context."

> Suggested-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  drivers/acpi/apei/ghes.c | 16 +++++++++++++++-
>  drivers/edac/ghes_edac.c | 19 +++++++++++++++++--
>  include/acpi/ghes.h      | 10 +++-------
>  3 files changed, 35 insertions(+), 10 deletions(-)

Patch itself looks ok.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 4/7] EDAC: Get chipset-specific edac drivers selected only when ghes_edac is not enabled
  2022-08-17 14:34 ` [PATCH v2 4/7] EDAC: Get chipset-specific edac drivers selected only when ghes_edac is not enabled Jia He
@ 2022-08-18 23:56   ` Kani, Toshi
  2022-08-19  1:57     ` Justin He
  0 siblings, 1 reply; 25+ messages in thread
From: Kani, Toshi @ 2022-08-18 23:56 UTC (permalink / raw)
  To: Jia He, Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd

On Wednesday, August 17, 2022 8:35 AM, Jia He wrote:
> When ghes_edac is loaded, a regular edac driver for the CPU type / platform
> still attempts to register itself and fails in its module_init call.
> 
> Suggested-by: Toshi Kani <toshi.kani@hpe.com>
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  drivers/edac/amd64_edac.c | 3 +++
>  drivers/edac/pnd2_edac.c  | 3 +++
>  drivers/edac/sb_edac.c    | 3 +++
>  drivers/edac/skx_base.c   | 3 +++
>  4 files changed, 12 insertions(+)

Can you change i10nm_base.c and igen6_edac.c as well?

They are listed in your list below.

On Monday, August 15, 2022 8:20 PM, Justin He wrote:
> I assume that all those edac drivers which used owner checking are
> impacted, right? So the impacted list should be:
> drivers/edac/pnd2_edac.c:1532:  if (owner && strncmp(owner,
> EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
> drivers/edac/sb_edac.c:3513:    if (owner && strncmp(owner,
> EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
> drivers/edac/amd64_edac.c:4333: if (owner && strncmp(owner,
> EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
> drivers/edac/i10nm_base.c:552:  if (owner && strncmp(owner,
> EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
> drivers/edac/skx_base.c:657:    if (owner && strncmp(owner,
> EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
> drivers/edac/igen6_edac.c:1275: if (owner && strncmp(owner,
> EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))

Thanks,
Toshi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-17 14:34 ` [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered Jia He
@ 2022-08-19  1:29   ` Kani, Toshi
  2022-08-19 10:34     ` Justin He
  0 siblings, 1 reply; 25+ messages in thread
From: Kani, Toshi @ 2022-08-19  1:29 UTC (permalink / raw)
  To: Jia He, Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd

On Wednesday, August 17, 2022 8:35 AM, Jia He wrote:
> Previous, there is only one edac can be registering in the EDAC core. After
> ghes_edac is modularized, the registering of ghes devices may be deferred
> when ghes_edac is loaded. Prevent the chipset-specific edac drivers from
> loading after ghes_edac is registered, which allow the edac drivers to
> get selected in e.g. HPE platform.
...
> @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device
> *ghes_dev)
>  	platform_set_drvdata(ghes_dev, ghes);
> 
>  	ghes->dev = &ghes_dev->dev;
> +	set_ghes_devs_registered(false);

This does not look right to me.

The condition of using ghes_edac is (ghes-present) && (ghes-preferred),
where:
 - ghes-present is latched on ghes_probe()
 - ghes-preferred is true if the platform-check passes.

ghes_get_device() introduced in the previous patch works as the 
ghes-preferred check.

We cannot use ghes_edac registration as the ghes-present check in
this scheme since it is deferred to module_init().

I'd suggest the following changes:
- Update ghes_get_device() to check a flag latched on ghes_probe().
- Remove 'force' argument from ghes_get_device().  ghes_edac_init()
is too late to set this flag.  Also, other edac drivers should not be able
to control this flag.  I think this force_load kernel option needs to
belong to ghes with this scheme.
- ghes_get_device() exposes 'ghes_devs' to chipset-specific edac drivers,
which should be internal to ghes.  ghes_get_device() may be renamed
to something like ghes_edac_preferred() which returns true / false.

Toshi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 4/7] EDAC: Get chipset-specific edac drivers selected only when ghes_edac is not enabled
  2022-08-18 23:56   ` Kani, Toshi
@ 2022-08-19  1:57     ` Justin He
  0 siblings, 0 replies; 25+ messages in thread
From: Justin He @ 2022-08-19  1:57 UTC (permalink / raw)
  To: Kani, Toshi, Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd



> -----Original Message-----
> From: Kani, Toshi <toshi.kani@hpe.com>
> Sent: Friday, August 19, 2022 7:57 AM
> To: Justin He <Justin.He@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len
> Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony Luck
> <tony.luck@intel.com>; Borislav Petkov <bp@alien8.de>; Mauro Carvalho
> Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert
> Moore <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>;
> Yazen Ghannam <yazen.ghannam@amd.com>
> Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>
> Subject: RE: [PATCH v2 4/7] EDAC: Get chipset-specific edac drivers selected
> only when ghes_edac is not enabled
> 
> On Wednesday, August 17, 2022 8:35 AM, Jia He wrote:
> > When ghes_edac is loaded, a regular edac driver for the CPU type /
> > platform still attempts to register itself and fails in its module_init call.
> >
> > Suggested-by: Toshi Kani <toshi.kani@hpe.com>
> > Suggested-by: Borislav Petkov <bp@alien8.de>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  drivers/edac/amd64_edac.c | 3 +++
> >  drivers/edac/pnd2_edac.c  | 3 +++
> >  drivers/edac/sb_edac.c    | 3 +++
> >  drivers/edac/skx_base.c   | 3 +++
> >  4 files changed, 12 insertions(+)
> 
> Can you change i10nm_base.c and igen6_edac.c as well?
> 
> They are listed in your list below.
> 
Okay, will do. And will also include the ARM specific edac drivers just as 
Borislav mentioned.


--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19  1:29   ` Kani, Toshi
@ 2022-08-19 10:34     ` Justin He
  2022-08-19 17:48       ` Kani, Toshi
  0 siblings, 1 reply; 25+ messages in thread
From: Justin He @ 2022-08-19 10:34 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd, Ard Biesheuvel,
	Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

Hi Toshi,

> -----Original Message-----
> From: Kani, Toshi <toshi.kani@hpe.com>
> Sent: Friday, August 19, 2022 9:30 AM
> To: Justin He <Justin.He@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len
> Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony Luck
> <tony.luck@intel.com>; Borislav Petkov <bp@alien8.de>; Mauro Carvalho
> Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert
> Moore <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>;
> Yazen Ghannam <yazen.ghannam@amd.com>
> Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>
> Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from
> loading after ghes_edac is registered
> 
> On Wednesday, August 17, 2022 8:35 AM, Jia He wrote:
> > Previous, there is only one edac can be registering in the EDAC core.
> > After ghes_edac is modularized, the registering of ghes devices may be
> > deferred when ghes_edac is loaded. Prevent the chipset-specific edac
> > drivers from loading after ghes_edac is registered, which allow the
> > edac drivers to get selected in e.g. HPE platform.
> ...
> > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device
> > *ghes_dev)
> >  	platform_set_drvdata(ghes_dev, ghes);
> >
> >  	ghes->dev = &ghes_dev->dev;
> > +	set_ghes_devs_registered(false);
> 
> This does not look right to me.
> 
> The condition of using ghes_edac is (ghes-present) && (ghes-preferred),
> where:
>  - ghes-present is latched on ghes_probe()
>  - ghes-preferred is true if the platform-check passes.
> 
> ghes_get_device() introduced in the previous patch works as the
> ghes-preferred check.
> 
> We cannot use ghes_edac registration as the ghes-present check in this
> scheme since it is deferred to module_init().

What is the logic for ghes-present check? In this patch, I assumed it is equal to 
"ghes_edac devices have been registered". Seems it is not correct.
But should we consider the case as follows:
What if sbridge_init () is invoked before ghes_edac_init()? i.e. Should we get
sb_edac driver selected when ghes_edac is not loaded yet (e.g. on HPE)?

> 
> I'd suggest the following changes:
> - Update ghes_get_device() to check a flag latched on ghes_probe().

Still need your elaborating about the details of the flag. i.e. When is this flag
latched? When is it unlatched?

> - Remove 'force' argument from ghes_get_device().  ghes_edac_init() is too
> late to set this flag.  Also, other edac drivers should not be able to control this
> flag.  I think this force_load kernel option needs to belong to ghes with this
> scheme.
Agree, I will remove force_load in ghes_get_device(), and put all check/update of
force_load in ghes

> - ghes_get_device() exposes 'ghes_devs' to chipset-specific edac drivers, which
> should be internal to ghes.  ghes_get_device() may be renamed to something
> like ghes_edac_preferred() which returns true / false.
> 

Okay, agree

--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 10:34     ` Justin He
@ 2022-08-19 17:48       ` Kani, Toshi
  2022-08-19 18:29         ` Borislav Petkov
  2022-08-19 18:57         ` Elliott, Robert (Servers)
  0 siblings, 2 replies; 25+ messages in thread
From: Kani, Toshi @ 2022-08-19 17:48 UTC (permalink / raw)
  To: Justin He
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd, Ard Biesheuvel,
	Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

On Friday, August 19, 2022 4:35 AM, Justin He wrote:
> > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device
> > > *ghes_dev)
> > >  	platform_set_drvdata(ghes_dev, ghes);
> > >
> > >  	ghes->dev = &ghes_dev->dev;
> > > +	set_ghes_devs_registered(false);
> >
> > This does not look right to me.
> >
> > The condition of using ghes_edac is (ghes-present) && (ghes-preferred),
> > where:
> >  - ghes-present is latched on ghes_probe()
> >  - ghes-preferred is true if the platform-check passes.
> >
> > ghes_get_device() introduced in the previous patch works as the
> > ghes-preferred check.
> >
> > We cannot use ghes_edac registration as the ghes-present check in this
> > scheme since it is deferred to module_init().
> 
> What is the logic for ghes-present check? In this patch, I assumed it is equal to
> "ghes_edac devices have been registered". Seems it is not correct.

Using (ghes_edac-registered) is a wrong check in this scheme
since ghes_edac registration is deferred.  This check is fine in
the current scheme since ghes_edac gets registered before
any other chipset-specific edac drivers.

> But should we consider the case as follows:
> What if sbridge_init () is invoked before ghes_edac_init()? i.e. Should we get
> sb_edac driver selected when ghes_edac is not loaded yet (e.g. on HPE)?

No.  The point is that ghes_edac driver needs to be selected,
"regardless of the module ordering", on a system with GHES
present & preferred.

Note that this new scheme leads to the following condition
change: 
- Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered)
- New: (ghes-present) && (ghes-preferred)

The option I suggested previously keeps the current condition,
but this new scheme does not for better modularity.

What this means is that if ghes_edac is not enabled (but ghes
is enabled) on a system with GHES present & preferred, no edac
driver gets registered.  This change is fine from my (HPE) perspective
and should be fine for other GHES systems.  GHES systems have
chipset-specific edac driver in FW.  OS-based chipset-specific edac
driver is not necessary and may lead to a conflict of chipset register
ownership.

> > I'd suggest the following changes:
> > - Update ghes_get_device() to check a flag latched on ghes_probe().
> 
> Still need your elaborating about the details of the flag. i.e. When is this flag
> latched? When is it unlatched?

This flag is a static variable, say ghes_present, which is set to false initially.
ghes_probe() sets it to true.  ghes_edac_preferred() (aka. ghes_get_device)
checks this flag at beginning and returns false if this flag is false.  It does not
get unlatched since ACPI GHES table is static.

Toshi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 17:48       ` Kani, Toshi
@ 2022-08-19 18:29         ` Borislav Petkov
  2022-08-19 18:46           ` Kani, Toshi
  2022-08-19 18:57         ` Elliott, Robert (Servers)
  1 sibling, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2022-08-19 18:29 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: Justin He, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

On Fri, Aug 19, 2022 at 05:48:39PM +0000, Kani, Toshi wrote:
> This flag is a static variable, say ghes_present, which is set to
> false initially. ghes_probe() sets it to true. ghes_edac_preferred()
> (aka. ghes_get_device) checks this flag at beginning and returns false
> if this flag is false. It does not get unlatched since ACPI GHES table
> is static.

What is that flag needed for at all?

There are two possibilities:

1. ghes_probe() succeeds. ghes_edac loads properly and other drivers use
ghes_get_devices() to know when to load.

2. ghes_probe() fails and that is caught during platform testing of all
those platforms who wish to use ghes_edac. BIOS is fixed and goto 1.

No need for funky flags whatsoever.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 18:29         ` Borislav Petkov
@ 2022-08-19 18:46           ` Kani, Toshi
  2022-08-19 18:50             ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Kani, Toshi @ 2022-08-19 18:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Justin He, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

On Friday, August 19, 2022 12:30 PM, Borislav Petkov wrote:
> > This flag is a static variable, say ghes_present, which is set to
> > false initially. ghes_probe() sets it to true. ghes_edac_preferred()
> > (aka. ghes_get_device) checks this flag at beginning and returns false
> > if this flag is false. It does not get unlatched since ACPI GHES table
> > is static.
> 
> What is that flag needed for at all?

Because ghes_get_device() always returns &ghes_devs on Arm,
which is !NULL.  It does not check if GHES is present.

> There are two possibilities:
> 
> 1. ghes_probe() succeeds. ghes_edac loads properly and other drivers use
> ghes_get_devices() to know when to load.
> 
> 2. ghes_probe() fails and that is caught during platform testing of all
> those platforms who wish to use ghes_edac. BIOS is fixed and goto 1.
> 
> No need for funky flags whatsoever.

3. ghes_probe() is not called, but ghes_get_device() is called from
other edac drivers.

Toshi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 18:46           ` Kani, Toshi
@ 2022-08-19 18:50             ` Borislav Petkov
  2022-08-19 18:53               ` Kani, Toshi
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2022-08-19 18:50 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: Justin He, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

On Fri, Aug 19, 2022 at 06:46:34PM +0000, Kani, Toshi wrote:
> 3. ghes_probe() is not called,

When is ghes_probe() not called on ARM?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 18:50             ` Borislav Petkov
@ 2022-08-19 18:53               ` Kani, Toshi
  2022-08-19 19:31                 ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Kani, Toshi @ 2022-08-19 18:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Justin He, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

On Friday, August 19, 2022 12:50 PM, Borislav Petkov wrote:
> > 3. ghes_probe() is not called,
> 
> When is ghes_probe() not called on ARM?

When the system does not implement ACPI GHES table,
which I suppose is most of the case on ARM.

Toshi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 17:48       ` Kani, Toshi
  2022-08-19 18:29         ` Borislav Petkov
@ 2022-08-19 18:57         ` Elliott, Robert (Servers)
  2022-08-19 19:32           ` Borislav Petkov
  1 sibling, 1 reply; 25+ messages in thread
From: Elliott, Robert (Servers) @ 2022-08-19 18:57 UTC (permalink / raw)
  To: Kani, Toshi, Justin He
  Cc: linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd, Ard Biesheuvel,
	Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam



> -----Original Message-----
> From: Kani, Toshi <toshi.kani@hpe.com>
> Sent: Friday, August 19, 2022 12:49 PM
> To: Justin He <Justin.He@arm.com>
> Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd
> <nd@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len Brown
> <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony Luck
> <tony.luck@intel.com>; Borislav Petkov <bp@alien8.de>; Mauro Carvalho
> Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert
> Moore <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>;
> Yazen Ghannam <yazen.ghannam@amd.com>
> Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac
> from loading after ghes_edac is registered
> 
> On Friday, August 19, 2022 4:35 AM, Justin He wrote:
> > > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct
> platform_device
> > > > *ghes_dev)
> > > >  	platform_set_drvdata(ghes_dev, ghes);
> > > >
> > > >  	ghes->dev = &ghes_dev->dev;
> > > > +	set_ghes_devs_registered(false);
> > >
> > > This does not look right to me.
> > >
> > > The condition of using ghes_edac is (ghes-present) && (ghes-
> preferred),
> > > where:
> > >  - ghes-present is latched on ghes_probe()
> > >  - ghes-preferred is true if the platform-check passes.
> > >
> > > ghes_get_device() introduced in the previous patch works as the
> > > ghes-preferred check.
> > >
> > > We cannot use ghes_edac registration as the ghes-present check in
> this
> > > scheme since it is deferred to module_init().
> >
> > What is the logic for ghes-present check? In this patch, I assumed it
> is equal to
> > "ghes_edac devices have been registered". Seems it is not correct.
> 
> Using (ghes_edac-registered) is a wrong check in this scheme
> since ghes_edac registration is deferred.  This check is fine in
> the current scheme since ghes_edac gets registered before
> any other chipset-specific edac drivers.
> 
> > But should we consider the case as follows:
> > What if sbridge_init () is invoked before ghes_edac_init()? i.e.
> Should we get
> > sb_edac driver selected when ghes_edac is not loaded yet (e.g. on
> HPE)?
> 
> No.  The point is that ghes_edac driver needs to be selected,
> "regardless of the module ordering", on a system with GHES
> present & preferred.
>
> Note that this new scheme leads to the following condition
> change:
> - Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered)
> - New: (ghes-present) && (ghes-preferred)
> 
> The option I suggested previously keeps the current condition,
> but this new scheme does not for better modularity.
> 
> What this means is that if ghes_edac is not enabled (but ghes
> is enabled) on a system with GHES present & preferred, no edac
> driver gets registered.  This change is fine from my (HPE) perspective
> and should be fine for other GHES systems.  GHES systems have
> chipset-specific edac driver in FW.  OS-based chipset-specific edac
> driver is not necessary and may lead to a conflict of chipset register
> ownership.

Currently, running with this on the kernel command line
	ghes.disable

causes the ACPI ghes driver to quit early in acpi_ghes_init():

  /*
   * This driver isn't really modular, however for the time being,
   * continuing to use module_param is the easiest way to remain
   * compatible with existing boot arg use cases.
   */
  bool ghes_disable;
  module_param_named(disable, ghes_disable, bool, 0);

which results in the skx_edac module assuming it should run:
  [   33.628140] calling  skx_init+0x0/0xe5a [skx_edac] @ 1444
  [   33.628531] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:36:0a.0 (INTERRUPT)
  [   33.641432] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:36:0c.0 (INTERRUPT)
  [   33.653256] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
  [   33.665055] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
  [   33.676801] initcall skx_init+0x0/0xe5a [skx_edac] returned 0 after 36343 usecs

We might need to differentiate between the system ROM really not
offering GHES vs. the ghes module not running.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 18:53               ` Kani, Toshi
@ 2022-08-19 19:31                 ` Borislav Petkov
  2022-08-19 19:37                   ` Kani, Toshi
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2022-08-19 19:31 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: Justin He, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

On Fri, Aug 19, 2022 at 06:53:41PM +0000, Kani, Toshi wrote:
> When the system does not implement ACPI GHES table,
> which I suppose is most of the case on ARM.

That should be detected in ghes_get_devices() - just like on x86.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 18:57         ` Elliott, Robert (Servers)
@ 2022-08-19 19:32           ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2022-08-19 19:32 UTC (permalink / raw)
  To: Elliott, Robert (Servers)
  Cc: Kani, Toshi, Justin He, linux-acpi, linux-kernel, linux-edac,
	devel, Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi,
	nd, Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

On Fri, Aug 19, 2022 at 06:57:11PM +0000, Elliott, Robert (Servers) wrote:
> We might need to differentiate between the system ROM really not
> offering GHES vs. the ghes module not running.

All detectable in ghes_get_devices().

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered
  2022-08-19 19:31                 ` Borislav Petkov
@ 2022-08-19 19:37                   ` Kani, Toshi
  0 siblings, 0 replies; 25+ messages in thread
From: Kani, Toshi @ 2022-08-19 19:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Justin He, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Ard Biesheuvel, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam

On Friday, August 19, 2022 1:31 PM, Borislav Petkov wrote:
> > When the system does not implement ACPI GHES table,
> > which I suppose is most of the case on ARM.
> 
> That should be detected in ghes_get_devices() - just like on x86.

Agreed.  And that is the check to ghes_present flag...

Toshi

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-08-19 19:38 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17 14:34 [PATCH v2 0/7] Modularize ghes_edac driver Jia He
2022-08-17 14:34 ` [PATCH v2 1/7] efi/cper: export several helpers for ghes edac to use Jia He
2022-08-18 14:38   ` Borislav Petkov
2022-08-17 14:34 ` [PATCH v2 2/7] EDAC/ghes: Add notifier to report ghes_edac mem error Jia He
2022-08-18 15:42   ` Borislav Petkov
2022-08-17 14:34 ` [PATCH v2 3/7] EDAC/ghes: Modularize ghes_edac driver to remove the dependency on ghes Jia He
2022-08-17 14:34 ` [PATCH v2 4/7] EDAC: Get chipset-specific edac drivers selected only when ghes_edac is not enabled Jia He
2022-08-18 23:56   ` Kani, Toshi
2022-08-19  1:57     ` Justin He
2022-08-17 14:34 ` [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered Jia He
2022-08-19  1:29   ` Kani, Toshi
2022-08-19 10:34     ` Justin He
2022-08-19 17:48       ` Kani, Toshi
2022-08-19 18:29         ` Borislav Petkov
2022-08-19 18:46           ` Kani, Toshi
2022-08-19 18:50             ` Borislav Petkov
2022-08-19 18:53               ` Kani, Toshi
2022-08-19 19:31                 ` Borislav Petkov
2022-08-19 19:37                   ` Kani, Toshi
2022-08-19 18:57         ` Elliott, Robert (Servers)
2022-08-19 19:32           ` Borislav Petkov
2022-08-17 14:34 ` [PATCH v2 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
2022-08-17 15:38   ` David Laight
2022-08-18  1:22     ` Justin He
2022-08-17 14:34 ` [PATCH v2 7/7] EDAC/igen6: Keep returned errno consistent when edac mc has been enabled Jia He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.