linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/7] Make ghes_edac a proper module
@ 2022-10-10  2:35 Jia He
  2022-10-10  2:35 ` [PATCH v8 1/7] efi/cper: export several helpers for ghes_edac to use Jia He
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Jia He @ 2022-10-10  2:35 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hasn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

The solution is to make ghes_edac a proper module.

Changelog:
v8:
 - merge v7 two force_enable and ghes_get_devices() patches into one
 - make force_enable static
v7:https://lore.kernel.org/lkml/20220929023726.73727-1-justin.he@arm.com/
 - remove the ghes_edac_preferred and ghes_present (suggested by Borislav)
 - adjust the patch splitting, no major functional changes
 - remove the r-b tag in those changed patches
v6:https://www.spinics.net/lists/kernel/msg4511453.html
 - no code changes from v5 patches
 - add the reviewed and acked by from Toshi
 - describe the removal of ghes_edac_force_enable checking in Patch 05
v5: https://www.spinics.net/lists/kernel/msg4502787.html
 - add the review-by from Toshi for patch 04 and 06
 - refine the commit msg
 - remove the unconditional set of ghes_edac_force_enable on Arm
v4: https://lore.kernel.org/lkml/20220831074027.13849-6-justin.he@arm.com/
 - move the kernel boot option to ghes module parameter
 - collapse th ghes_present and ghes_edac_preferred into one patch
v3: https://lore.kernel.org/lkml/20220822154048.188253-1-justin.he@arm.com/
 - refine the commit logs
 - introduce ghes preferred and present flag (by Toshi)
 - move force_load to setup parameter
 - add the ghes_edac_preferred() check for x86/Arm edac drivers
v2: https://lore.kernel.org/lkml/20220817143458.335938-1-justin.he@arm.com/
 - add acked-by tag of Patch 1 from Ard
 - split the notifier patch
 - add 2 patch to get regular drivers selected when ghes edac is not loaded
 - fix an errno in igen6 driver
 - add a patch to fix the sparse warning of ghes
 - refine the commit logs
v1: https://lore.kernel.org/lkml/20220811091713.10427-1-justin.he@arm.com/

Jia He (7):
  efi/cper: export several helpers for ghes_edac to use
  EDAC/ghes: Add a notifier for reporting memory errors
  EDAC/ghes: Prepare to make ghes_edac a proper module
  EDAC/ghes: Make ghes_edac a proper module to remove the dependency on
    ghes
  EDAC: Add the ghes_get_devices() check for chipset-specific edac
    drivers
  apei/ghes: Use unrcu_pointer for cmpxchg
  EDAC/igen6: Return consistent errno when another edac driver is
    enabled

 drivers/acpi/apei/ghes.c       | 69 +++++++++++++++++++++++---
 drivers/edac/Kconfig           |  4 +-
 drivers/edac/amd64_edac.c      |  3 ++
 drivers/edac/armada_xp_edac.c  |  3 ++
 drivers/edac/edac_module.h     |  1 +
 drivers/edac/ghes_edac.c       | 90 +++++++++++++++++++++-------------
 drivers/edac/i10nm_base.c      |  3 ++
 drivers/edac/igen6_edac.c      |  5 +-
 drivers/edac/layerscape_edac.c |  3 ++
 drivers/edac/pnd2_edac.c       |  3 ++
 drivers/edac/sb_edac.c         |  3 ++
 drivers/edac/skx_base.c        |  3 ++
 drivers/edac/thunderx_edac.c   |  3 ++
 drivers/edac/xgene_edac.c      |  3 ++
 drivers/firmware/efi/cper.c    |  3 ++
 include/acpi/ghes.h            | 34 ++++---------
 16 files changed, 164 insertions(+), 69 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v8 1/7] efi/cper: export several helpers for ghes_edac to use
  2022-10-10  2:35 [PATCH v8 0/7] Make ghes_edac a proper module Jia He
@ 2022-10-10  2:35 ` Jia He
  2022-10-10  2:35 ` [PATCH v8 2/7] EDAC/ghes: Add a notifier for reporting memory errors Jia He
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Jia He @ 2022-10-10  2:35 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Before ghes_edac can be turned back into a proper module again, export
several helpers which are going to be used by it.

Signed-off-by: Jia He <justin.he@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/cper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index e4e5ea7ce910..053eae13f409 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -290,6 +290,7 @@ int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(cper_mem_err_location);
 
 int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 {
@@ -310,6 +311,7 @@ int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(cper_dimm_err_location);
 
 void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 		       struct cper_mem_err_compact *cmem)
@@ -331,6 +333,7 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 	cmem->mem_array_handle = mem->mem_array_handle;
 	cmem->mem_dev_handle = mem->mem_dev_handle;
 }
+EXPORT_SYMBOL_GPL(cper_mem_err_pack);
 
 const char *cper_mem_err_unpack(struct trace_seq *p,
 				struct cper_mem_err_compact *cmem)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v8 2/7] EDAC/ghes: Add a notifier for reporting memory errors
  2022-10-10  2:35 [PATCH v8 0/7] Make ghes_edac a proper module Jia He
  2022-10-10  2:35 ` [PATCH v8 1/7] efi/cper: export several helpers for ghes_edac to use Jia He
@ 2022-10-10  2:35 ` Jia He
  2022-10-10  2:35 ` [PATCH v8 3/7] EDAC/ghes: Prepare to make ghes_edac a proper module Jia He
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Jia He @ 2022-10-10  2:35 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

To make a proper module, add a notifier for reporting memory errors. Use
an atomic notifier because calls sites like ghes_proc_in_irq() run in
interrupt context.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 16 +++++++++++++++-
 drivers/edac/ghes_edac.c | 19 +++++++++++++++++--
 include/acpi/ghes.h      | 10 +++-------
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d91ad378c00d..8cb65f757d06 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -94,6 +94,8 @@
 #define FIX_APEI_GHES_SDEI_CRITICAL	__end_of_fixed_addresses
 #endif
 
+static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
+
 static inline bool is_hest_type_generic_v2(struct ghes *ghes)
 {
 	return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
@@ -645,7 +647,7 @@ static bool ghes_do_proc(struct ghes *ghes,
 		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
-			ghes_edac_report_mem_error(sev, mem_err);
+			atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
 			queued = ghes_handle_memory_failure(gdata, sev);
@@ -1497,3 +1499,15 @@ void __init acpi_ghes_init(void)
 	else
 		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
 }
+
+void ghes_register_report_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_register(&ghes_report_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_register_report_chain);
+
+void ghes_unregister_report_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&ghes_report_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_report_chain);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index c8fa7dcfdbd0..7b8d56a769f6 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -14,6 +14,7 @@
 #include <linux/dmi.h>
 #include "edac_module.h"
 #include <ras/ras_event.h>
+#include <linux/notifier.h>
 
 #define OTHER_DETAIL_LEN	400
 
@@ -267,11 +268,14 @@ static int print_mem_error_other_detail(const struct cper_sec_mem_err *mem, char
 	return n;
 }
 
-void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
+static int ghes_edac_report_mem_error(struct notifier_block *nb,
+				      unsigned long val, void *data)
 {
+	struct cper_sec_mem_err *mem_err = (struct cper_sec_mem_err *)data;
 	struct cper_mem_err_compact cmem;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
+	unsigned long sev = val;
 	struct ghes_pvt *pvt;
 	unsigned long flags;
 	char *p;
@@ -282,7 +286,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 	 * know.
 	 */
 	if (WARN_ON_ONCE(in_nmi()))
-		return;
+		return NOTIFY_OK;
 
 	spin_lock_irqsave(&ghes_lock, flags);
 
@@ -374,8 +378,15 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 unlock:
 	spin_unlock_irqrestore(&ghes_lock, flags);
+
+	return NOTIFY_OK;
 }
 
+static struct notifier_block ghes_edac_mem_err_nb = {
+	.notifier_call	= ghes_edac_report_mem_error,
+	.priority	= 0,
+};
+
 /*
  * Known systems that are safe to enable this module.
  */
@@ -503,6 +514,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	ghes_pvt = pvt;
 	spin_unlock_irqrestore(&ghes_lock, flags);
 
+	ghes_register_report_chain(&ghes_edac_mem_err_nb);
+
 	/* only set on success */
 	refcount_set(&ghes_refcount, 1);
 
@@ -548,6 +561,8 @@ void ghes_edac_unregister(struct ghes *ghes)
 	if (mci)
 		edac_mc_free(mci);
 
+	ghes_unregister_report_chain(&ghes_edac_mem_err_nb);
+
 unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 34fb3431a8f3..5cbd38b6e4e1 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -76,18 +76,11 @@ int ghes_estatus_pool_init(int num_ghes);
 /* From drivers/edac/ghes_edac.c */
 
 #ifdef CONFIG_EDAC_GHES
-void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
-
 int ghes_edac_register(struct ghes *ghes, struct device *dev);
 
 void ghes_edac_unregister(struct ghes *ghes);
 
 #else
-static inline void ghes_edac_report_mem_error(int sev,
-				       struct cper_sec_mem_err *mem_err)
-{
-}
-
 static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	return -ENODEV;
@@ -145,4 +138,7 @@ int ghes_notify_sea(void);
 static inline int ghes_notify_sea(void) { return -ENOENT; }
 #endif
 
+struct notifier_block;
+extern void ghes_register_report_chain(struct notifier_block *nb);
+extern void ghes_unregister_report_chain(struct notifier_block *nb);
 #endif /* GHES_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v8 3/7] EDAC/ghes: Prepare to make ghes_edac a proper module
  2022-10-10  2:35 [PATCH v8 0/7] Make ghes_edac a proper module Jia He
  2022-10-10  2:35 ` [PATCH v8 1/7] efi/cper: export several helpers for ghes_edac to use Jia He
  2022-10-10  2:35 ` [PATCH v8 2/7] EDAC/ghes: Add a notifier for reporting memory errors Jia He
@ 2022-10-10  2:35 ` Jia He
  2022-10-10  2:35 ` [PATCH v8 4/7] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Jia He @ 2022-10-10  2:35 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

To make ghes_edac a proper module, prepare to decouple its dependencies
on ghes.

ghes_edac_register() is too late to set the module flag ghes_edac.force_load.
Also, other edac drivers should not be able to control this flag. Move this
flag to the module parameter in ghes and make it static.

Introduce a helper ghes_get_devices(), which returns the dev list GHES
probed when the platform-check passes on the system.

The previous force_load check is not needed in ghes_edac_unregister()
since it will be checked in module_init of ghes_edac later.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 46 ++++++++++++++++++++++++++++++++++++++++
 drivers/edac/ghes_edac.c | 35 ++----------------------------
 include/acpi/ghes.h      |  6 ++++++
 3 files changed, 54 insertions(+), 33 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8cb65f757d06..6a0dcb8c0901 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -109,6 +109,13 @@ static inline bool is_hest_type_generic_v2(struct ghes *ghes)
 bool ghes_disable;
 module_param_named(disable, ghes_disable, bool, 0);
 
+/*
+ * "ghes.edac_force_enable" forcibly enables ghes_edac and skips the platform
+ * check.
+ */
+static bool ghes_edac_force_enable;
+module_param_named(edac_force_enable, ghes_edac_force_enable, bool, 0);
+
 /*
  * All error sources notified with HED (Hardware Error Device) share a
  * single notifier callback, so they need to be linked and checked one
@@ -120,6 +127,9 @@ module_param_named(disable, ghes_disable, bool, 0);
 static LIST_HEAD(ghes_hed);
 static DEFINE_MUTEX(ghes_list_mutex);
 
+static LIST_HEAD(ghes_devs);
+static DEFINE_MUTEX(ghes_devs_mutex);
+
 /*
  * Because the memory area used to transfer hardware error information
  * from BIOS to Linux can be determined only in NMI, IRQ or timer
@@ -1380,6 +1390,12 @@ static int ghes_probe(struct platform_device *ghes_dev)
 
 	ghes_edac_register(ghes, &ghes_dev->dev);
 
+	ghes->dev = &ghes_dev->dev;
+
+	mutex_lock(&ghes_devs_mutex);
+	list_add_tail(&ghes->elist, &ghes_devs);
+	mutex_unlock(&ghes_devs_mutex);
+
 	/* Handle any pending errors right away */
 	spin_lock_irqsave(&ghes_notify_lock_irq, flags);
 	ghes_proc(ghes);
@@ -1444,6 +1460,10 @@ static int ghes_remove(struct platform_device *ghes_dev)
 
 	ghes_edac_unregister(ghes);
 
+	mutex_lock(&ghes_devs_mutex);
+	list_del(&ghes->elist);
+	mutex_unlock(&ghes_devs_mutex);
+
 	kfree(ghes);
 
 	platform_set_drvdata(ghes_dev, NULL);
@@ -1500,6 +1520,32 @@ void __init acpi_ghes_init(void)
 		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
 }
 
+/*
+ * Known x86 systems that prefer GHES error reporting:
+ */
+static struct acpi_platform_list plat_list[] = {
+	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+	{ } /* End */
+};
+
+struct list_head *ghes_get_devices(void)
+{
+	int idx = -1;
+
+	if (IS_ENABLED(CONFIG_X86)) {
+		idx = acpi_match_platform_list(plat_list);
+		if (idx < 0) {
+			if (!ghes_edac_force_enable)
+				return NULL;
+
+			pr_warn_once("Force-loading ghes_edac on an unsupported platform. You're on your own!\n");
+		}
+	}
+
+	return &ghes_devs;
+}
+EXPORT_SYMBOL_GPL(ghes_get_devices);
+
 void ghes_register_report_chain(struct notifier_block *nb)
 {
 	atomic_notifier_chain_register(&ghes_report_chain, nb);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 7b8d56a769f6..b85a545d1cb0 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
-/* "ghes_edac.force_load=1" skips the platform check */
-static bool __read_mostly force_load;
-module_param(force_load, bool, 0);
-
 static bool system_scanned;
 
 /* Memory Device - Type 17 of SMBIOS spec */
@@ -387,14 +383,6 @@ static struct notifier_block ghes_edac_mem_err_nb = {
 	.priority	= 0,
 };
 
-/*
- * Known systems that are safe to enable this module.
- */
-static struct acpi_platform_list plat_list[] = {
-	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
-	{ } /* End */
-};
-
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	bool fake = false;
@@ -402,19 +390,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	struct ghes_pvt *pvt;
 	struct edac_mc_layer layers[1];
 	unsigned long flags;
-	int idx = -1;
 	int rc = 0;
 
-	if (IS_ENABLED(CONFIG_X86)) {
-		/* Check if safe to enable on this system */
-		idx = acpi_match_platform_list(plat_list);
-		if (!force_load && idx < 0)
-			return -ENODEV;
-	} else {
-		force_load = true;
-		idx = 0;
-	}
-
 	/* finish another registration/unregistration instance first */
 	mutex_lock(&ghes_reg_mutex);
 
@@ -458,15 +435,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
 		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
 		pr_info("work on such system. Use this driver with caution\n");
-	} else if (idx < 0) {
-		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-		pr_info("to correct its BIOS.\n");
-		pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
 	}
 
+	pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
+
 	if (!fake) {
 		struct dimm_info *src, *dst;
 		int i = 0;
@@ -535,9 +507,6 @@ void ghes_edac_unregister(struct ghes *ghes)
 	struct mem_ctl_info *mci;
 	unsigned long flags;
 
-	if (!force_load)
-		return;
-
 	mutex_lock(&ghes_reg_mutex);
 
 	system_scanned = false;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 5cbd38b6e4e1..ce693e9f07a0 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -27,6 +27,8 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+	struct device *dev;
+	struct list_head elist;
 };
 
 struct ghes_estatus_node {
@@ -80,6 +82,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev);
 
 void ghes_edac_unregister(struct ghes *ghes);
 
+struct list_head *ghes_get_devices(void);
+
 #else
 static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
@@ -89,6 +93,8 @@ static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
 static inline void ghes_edac_unregister(struct ghes *ghes)
 {
 }
+
+static inline struct list_head *ghes_get_devices(void) { return NULL; }
 #endif
 
 static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v8 4/7] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes
  2022-10-10  2:35 [PATCH v8 0/7] Make ghes_edac a proper module Jia He
                   ` (2 preceding siblings ...)
  2022-10-10  2:35 ` [PATCH v8 3/7] EDAC/ghes: Prepare to make ghes_edac a proper module Jia He
@ 2022-10-10  2:35 ` Jia He
  2022-10-10  2:35 ` [PATCH v8 5/7] EDAC: Add the ghes_get_devices() check for chipset-specific edac drivers Jia He
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Jia He @ 2022-10-10  2:35 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hadn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

To remove the dependency of ghes_edac on ghes, make it a proper module. Use
a list to save the probing devices in ghes_probe(), and defer the
ghes_edac_register() to module_init() of the new ghes_edac module by
iterating over the devices list.

Co-developed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/acpi/apei/ghes.c |  4 ----
 drivers/edac/Kconfig     |  4 ++--
 drivers/edac/ghes_edac.c | 40 ++++++++++++++++++++++++++++++++++++++--
 include/acpi/ghes.h      | 22 ++--------------------
 4 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6a0dcb8c0901..27c72b175e4b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1388,8 +1388,6 @@ static int ghes_probe(struct platform_device *ghes_dev)
 
 	platform_set_drvdata(ghes_dev, ghes);
 
-	ghes_edac_register(ghes, &ghes_dev->dev);
-
 	ghes->dev = &ghes_dev->dev;
 
 	mutex_lock(&ghes_devs_mutex);
@@ -1458,8 +1456,6 @@ static int ghes_remove(struct platform_device *ghes_dev)
 
 	ghes_fini(ghes);
 
-	ghes_edac_unregister(ghes);
-
 	mutex_lock(&ghes_devs_mutex);
 	list_del(&ghes->elist);
 	mutex_unlock(&ghes_devs_mutex);
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 17562cf1fe97..df45db81858b 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -53,8 +53,8 @@ config EDAC_DECODE_MCE
 	  has been initialized.
 
 config EDAC_GHES
-	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
-	depends on ACPI_APEI_GHES && (EDAC=y)
+	tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"
+	depends on ACPI_APEI_GHES
 	select UEFI_CPER
 	help
 	  Not all machines support hardware-driven error report. Some of those
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index b85a545d1cb0..dec1bb96e4b3 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -56,6 +56,8 @@ static DEFINE_SPINLOCK(ghes_lock);
 
 static bool system_scanned;
 
+static struct list_head *ghes_devs;
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
 	u8 type;
@@ -383,7 +385,7 @@ static struct notifier_block ghes_edac_mem_err_nb = {
 	.priority	= 0,
 };
 
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static int ghes_edac_register(struct device *dev)
 {
 	bool fake = false;
 	struct mem_ctl_info *mci;
@@ -502,7 +504,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	return rc;
 }
 
-void ghes_edac_unregister(struct ghes *ghes)
+static void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
 	unsigned long flags;
@@ -535,3 +537,37 @@ void ghes_edac_unregister(struct ghes *ghes)
 unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
+
+static int __init ghes_edac_init(void)
+{
+	struct ghes *g, *g_tmp;
+
+	ghes_devs = ghes_get_devices();
+	if (!ghes_devs)
+		return -ENODEV;
+
+	if (list_empty(ghes_devs)) {
+		pr_info("GHES probing device list is empty");
+		return -ENODEV;
+	}
+
+	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
+		ghes_edac_register(g->dev);
+	}
+
+	return 0;
+}
+module_init(ghes_edac_init);
+
+static void __exit ghes_edac_exit(void)
+{
+	struct ghes *g, *g_tmp;
+
+	list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
+		ghes_edac_unregister(g);
+	}
+}
+module_exit(ghes_edac_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Output ACPI APEI/GHES BIOS detected errors module via EDAC");
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index ce693e9f07a0..2e785d3554d8 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -71,32 +71,14 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
  * @nb: pointer to the notifier_block structure of the vendor record handler.
  */
 void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
-#endif
-
-int ghes_estatus_pool_init(int num_ghes);
-
-/* From drivers/edac/ghes_edac.c */
-
-#ifdef CONFIG_EDAC_GHES
-int ghes_edac_register(struct ghes *ghes, struct device *dev);
-
-void ghes_edac_unregister(struct ghes *ghes);
 
 struct list_head *ghes_get_devices(void);
-
 #else
-static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline void ghes_edac_unregister(struct ghes *ghes)
-{
-}
-
 static inline struct list_head *ghes_get_devices(void) { return NULL; }
 #endif
 
+int ghes_estatus_pool_init(int num_ghes);
+
 static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
 {
 	return gdata->revision >> 8;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v8 5/7] EDAC: Add the ghes_get_devices() check for chipset-specific edac drivers
  2022-10-10  2:35 [PATCH v8 0/7] Make ghes_edac a proper module Jia He
                   ` (3 preceding siblings ...)
  2022-10-10  2:35 ` [PATCH v8 4/7] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
@ 2022-10-10  2:35 ` Jia He
  2022-10-10  2:35 ` [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
  2022-10-10  2:35 ` [PATCH v8 7/7] EDAC/igen6: Return consistent errno when another edac driver is enabled Jia He
  6 siblings, 0 replies; 28+ messages in thread
From: Jia He @ 2022-10-10  2:35 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Add ghes_get_devices() check for chipset-specific edac drivers to ensure
that ghes_edac is used on the platform where ghes_edac is preferred over
chipset-specific edac driver.

Unlike the existing edac_get_owner() check, the ghes_get_devices()
check works independent to the module_init ordering.

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/edac/amd64_edac.c      | 3 +++
 drivers/edac/armada_xp_edac.c  | 3 +++
 drivers/edac/edac_module.h     | 1 +
 drivers/edac/i10nm_base.c      | 3 +++
 drivers/edac/igen6_edac.c      | 3 +++
 drivers/edac/layerscape_edac.c | 3 +++
 drivers/edac/pnd2_edac.c       | 3 +++
 drivers/edac/sb_edac.c         | 3 +++
 drivers/edac/skx_base.c        | 3 +++
 drivers/edac/thunderx_edac.c   | 3 +++
 drivers/edac/xgene_edac.c      | 3 +++
 11 files changed, 31 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2f854feeeb23..e3318e5575a3 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -4329,6 +4329,9 @@ static int __init amd64_edac_init(void)
 	int err = -ENODEV;
 	int i;
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/armada_xp_edac.c b/drivers/edac/armada_xp_edac.c
index 038abbb83f4b..c4bd2fb9c46b 100644
--- a/drivers/edac/armada_xp_edac.c
+++ b/drivers/edac/armada_xp_edac.c
@@ -599,6 +599,9 @@ static int __init armada_xp_edac_init(void)
 {
 	int res;
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	/* only polling is supported */
 	edac_op_state = EDAC_OPSTATE_POLL;
 
diff --git a/drivers/edac/edac_module.h b/drivers/edac/edac_module.h
index 96f6de0c8ff6..3826f82de487 100644
--- a/drivers/edac/edac_module.h
+++ b/drivers/edac/edac_module.h
@@ -11,6 +11,7 @@
 #ifndef	__EDAC_MODULE_H__
 #define	__EDAC_MODULE_H__
 
+#include <acpi/ghes.h>
 #include "edac_mc.h"
 #include "edac_pci.h"
 #include "edac_device.h"
diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c
index 6cf50ee0b77c..75211ee4cd12 100644
--- a/drivers/edac/i10nm_base.c
+++ b/drivers/edac/i10nm_base.c
@@ -548,6 +548,9 @@ static int __init i10nm_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index a07bbfd075d0..d33c666221f9 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1271,6 +1271,9 @@ static int __init igen6_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -ENODEV;
diff --git a/drivers/edac/layerscape_edac.c b/drivers/edac/layerscape_edac.c
index 94cac7686a56..35ceaca578e1 100644
--- a/drivers/edac/layerscape_edac.c
+++ b/drivers/edac/layerscape_edac.c
@@ -38,6 +38,9 @@ static int __init fsl_ddr_mc_init(void)
 {
 	int res;
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	/* make sure error reporting method is sane */
 	switch (edac_op_state) {
 	case EDAC_OPSTATE_POLL:
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index a20b299f1202..2b306f2cc605 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1528,6 +1528,9 @@ static int __init pnd2_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 9678ab97c7ac..2c860adf54a7 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -3506,6 +3506,9 @@ static int __init sbridge_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c
index 1abc020d49ab..80a7334111b1 100644
--- a/drivers/edac/skx_base.c
+++ b/drivers/edac/skx_base.c
@@ -653,6 +653,9 @@ static int __init skx_init(void)
 
 	edac_dbg(2, "\n");
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
diff --git a/drivers/edac/thunderx_edac.c b/drivers/edac/thunderx_edac.c
index f13674081cb6..0bcd9f02c84a 100644
--- a/drivers/edac/thunderx_edac.c
+++ b/drivers/edac/thunderx_edac.c
@@ -2114,6 +2114,9 @@ static int __init thunderx_edac_init(void)
 {
 	int rc = 0;
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	rc = pci_register_driver(&thunderx_lmc_driver);
 	if (rc)
 		return rc;
diff --git a/drivers/edac/xgene_edac.c b/drivers/edac/xgene_edac.c
index 54081403db4f..c52b9dd9154c 100644
--- a/drivers/edac/xgene_edac.c
+++ b/drivers/edac/xgene_edac.c
@@ -2004,6 +2004,9 @@ static int __init xgene_edac_init(void)
 {
 	int rc;
 
+	if (ghes_get_devices())
+		return -EBUSY;
+
 	/* Make sure error reporting method is sane */
 	switch (edac_op_state) {
 	case EDAC_OPSTATE_POLL:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-10  2:35 [PATCH v8 0/7] Make ghes_edac a proper module Jia He
                   ` (4 preceding siblings ...)
  2022-10-10  2:35 ` [PATCH v8 5/7] EDAC: Add the ghes_get_devices() check for chipset-specific edac drivers Jia He
@ 2022-10-10  2:35 ` Jia He
  2022-10-11 10:33   ` Borislav Petkov
  2022-10-10  2:35 ` [PATCH v8 7/7] EDAC/igen6: Return consistent errno when another edac driver is enabled Jia He
  6 siblings, 1 reply; 28+ messages in thread
From: Jia He @ 2022-10-10  2:35 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He, kernel test robot

ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
   drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache *
   drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache *

unrcu_pointer is to strip the __rcu in cmpxchg.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 27c72b175e4b..c2fc0dd05bf7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -150,7 +150,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
 
-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
@@ -840,8 +840,9 @@ static void ghes_estatus_cache_add(
 	}
 	/* new_cache must be put into array after its contents are written */
 	smp_wmb();
-	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-				  slot_cache, new_cache) == slot_cache) {
+	if (slot != -1 && unrcu_pointer(cmpxchg(ghes_estatus_caches + slot,
+				RCU_INITIALIZER(slot_cache),
+				RCU_INITIALIZER(new_cache))) == slot_cache) {
 		if (slot_cache)
 			call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
 	} else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v8 7/7] EDAC/igen6: Return consistent errno when another edac driver is enabled
  2022-10-10  2:35 [PATCH v8 0/7] Make ghes_edac a proper module Jia He
                   ` (5 preceding siblings ...)
  2022-10-10  2:35 ` [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
@ 2022-10-10  2:35 ` Jia He
  6 siblings, 0 replies; 28+ messages in thread
From: Jia He @ 2022-10-10  2:35 UTC (permalink / raw)
  To: Len Brown, James Morse, Tony Luck, Borislav Petkov,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi
  Cc: Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	Jia He

Only a single edac driver can be enabled for EDAC MC. The igen6_init()
should be returned with EBUSY instead of ENODEV, which is consistent with
other edac drivers.

Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/edac/igen6_edac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index d33c666221f9..544dd19072ea 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1276,7 +1276,7 @@ static int __init igen6_init(void)
 
 	owner = edac_get_owner();
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
-		return -ENODEV;
+		return -EBUSY;
 
 	edac_op_state = EDAC_OPSTATE_NMI;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-10  2:35 ` [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
@ 2022-10-11 10:33   ` Borislav Petkov
  2022-10-11 14:32     ` Justin He
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2022-10-11 10:33 UTC (permalink / raw)
  To: Jia He
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jan Luebbe, Khuong Dinh, Kani Toshi, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Mon, Oct 10, 2022 at 02:35:58AM +0000, Jia He wrote:
> ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
>    drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
>    drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
>    drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache *
>    drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
>    drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
>    drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache *
> 
> unrcu_pointer is to strip the __rcu in cmpxchg.

Is this only to shut up sparse or actually fixing anything?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-11 10:33   ` Borislav Petkov
@ 2022-10-11 14:32     ` Justin He
  2022-10-11 14:45       ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Justin He @ 2022-10-11 14:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jan Luebbe, Khuong Dinh, Kani Toshi, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

Hi Borislav

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Tuesday, October 11, 2022 6:34 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Len Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>;
> Tony Luck <tony.luck@intel.com>; Mauro Carvalho Chehab
> <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert Moore
> <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen
> Ghannam <yazen.ghannam@amd.com>; Jan Luebbe <jlu@pengutronix.de>;
> Khuong Dinh <khuong@os.amperecomputing.com>; Kani Toshi
> <toshi.kani@hpe.com>; Ard Biesheuvel <ardb@kernel.org>;
> linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> On Mon, Oct 10, 2022 at 02:35:58AM +0000, Jia He wrote:
> > ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
> >    drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in
> comparison expression (different address spaces):
> >    drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache
> [noderef] __rcu *
> >    drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache *
> >    drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in
> comparison expression (different address spaces):
> >    drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache
> [noderef] __rcu *
> >    drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache *
> >
> > unrcu_pointer is to strip the __rcu in cmpxchg.
> 
> Is this only to shut up sparse or actually fixing anything?

My original purpose is to make it pass the sparse checking. 

--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-11 14:32     ` Justin He
@ 2022-10-11 14:45       ` Borislav Petkov
  2022-10-12  4:35         ` Justin He
  2022-10-12 12:04         ` Justin He
  0 siblings, 2 replies; 28+ messages in thread
From: Borislav Petkov @ 2022-10-11 14:45 UTC (permalink / raw)
  To: Justin He
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jan Luebbe, Khuong Dinh, Kani Toshi, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Tue, Oct 11, 2022 at 02:32:48PM +0000, Justin He wrote:
> My original purpose is to make it pass the sparse checking.

Then do this pls.

This is a combined diff - do a second patch which does only remove the
smp_wmb(). The smp_wmb() there is not needed as the cmpxchg() already
implies a smp_mb() so there's no need for that separate, explicit one.

But it would be prudent to have it in a separate patch just in case.

Thx.

---
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 85acfc589fb7..fd285bf8d114 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -154,7 +154,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
 
-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
@@ -842,9 +842,7 @@ static void ghes_estatus_cache_add(
 			slot_cache = cache;
 		}
 	}
-	/* new_cache must be put into array after its contents are written */
-	smp_wmb();
-	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
+	if (slot != -1 && cmpxchg((struct ghes_estatus_cache ** __force)(ghes_estatus_caches + slot),
 				  slot_cache, new_cache) == slot_cache) {
 		if (slot_cache)
 			call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-11 14:45       ` Borislav Petkov
@ 2022-10-12  4:35         ` Justin He
  2022-10-12 12:04         ` Justin He
  1 sibling, 0 replies; 28+ messages in thread
From: Justin He @ 2022-10-12  4:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jan Luebbe, Khuong Dinh, Kani Toshi, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot



> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Tuesday, October 11, 2022 10:46 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Len Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>;
> Tony Luck <tony.luck@intel.com>; Mauro Carvalho Chehab
> <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert Moore
> <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen
> Ghannam <yazen.ghannam@amd.com>; Jan Luebbe <jlu@pengutronix.de>;
> Khuong Dinh <khuong@os.amperecomputing.com>; Kani Toshi
> <toshi.kani@hpe.com>; Ard Biesheuvel <ardb@kernel.org>;
> linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> On Tue, Oct 11, 2022 at 02:32:48PM +0000, Justin He wrote:
> > My original purpose is to make it pass the sparse checking.
> 
> Then do this pls.
> 
> This is a combined diff - do a second patch which does only remove the
> smp_wmb(). The smp_wmb() there is not needed as the cmpxchg() already
> implies a smp_mb() so there's no need for that separate, explicit one.
> 
> But it would be prudent to have it in a separate patch just in case.
> 
Sure, I will insert/append your patch in my v9 series if there are no more comments.

Thanks for the 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-11 14:45       ` Borislav Petkov
  2022-10-12  4:35         ` Justin He
@ 2022-10-12 12:04         ` Justin He
  2022-10-13 13:37           ` Borislav Petkov
  2022-10-13 16:41           ` Peter Zijlstra
  1 sibling, 2 replies; 28+ messages in thread
From: Justin He @ 2022-10-12 12:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jan Luebbe, Khuong Dinh, Kani Toshi, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

Hi Borislav

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Tuesday, October 11, 2022 10:46 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Len Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>;
> Tony Luck <tony.luck@intel.com>; Mauro Carvalho Chehab
> <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert Moore
> <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen
> Ghannam <yazen.ghannam@amd.com>; Jan Luebbe <jlu@pengutronix.de>;
> Khuong Dinh <khuong@os.amperecomputing.com>; Kani Toshi
> <toshi.kani@hpe.com>; Ard Biesheuvel <ardb@kernel.org>;
> linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> On Tue, Oct 11, 2022 at 02:32:48PM +0000, Justin He wrote:
> > My original purpose is to make it pass the sparse checking.
> 
> Then do this pls.
> 
> This is a combined diff - do a second patch which does only remove the
> smp_wmb(). The smp_wmb() there is not needed as the cmpxchg() already
> implies a smp_mb() so there's no need for that separate, explicit one.
> 
I have a concern about what if cmpxchg failed? Do we have to still guarantee the ordering since cmpxchg will not imply a smp_mb if it failed.

Besides, I didn't find the paired smp_mb or smp_rmb
for this smp_wmb. Do you have any ideas?

--
Cheers,
Justin (Jia He)




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-12 12:04         ` Justin He
@ 2022-10-13 13:37           ` Borislav Petkov
  2022-10-13 15:41             ` Ard Biesheuvel
  2022-10-13 16:41           ` Peter Zijlstra
  1 sibling, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2022-10-13 13:37 UTC (permalink / raw)
  To: Justin He
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jan Luebbe, Khuong Dinh, Kani Toshi, Ard Biesheuvel, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> I have a concern about what if cmpxchg failed? Do we have to still
> guarantee the ordering since cmpxchg will not imply a smp_mb if it
> failed.

Of course it will imply that. At least on x86 it does. smp_wmb() is a
compiler barrier there and cmpxchg() already has that barrier semantics
by clobbering "memory". I'm pretty sure you should have the same thing
on ARM.

And even if that weren't the case, the write barrier is, as the comment
says, "new_cache must be put into array after its contents are written".

Are we writing anything into the cache if cmpxchg fails?

> Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.

Why would there be pairs? I don't understand that statement here.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-13 13:37           ` Borislav Petkov
@ 2022-10-13 15:41             ` Ard Biesheuvel
  2022-10-13 16:37               ` Borislav Petkov
                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2022-10-13 15:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Justin He, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> > I have a concern about what if cmpxchg failed? Do we have to still
> > guarantee the ordering since cmpxchg will not imply a smp_mb if it
> > failed.
>
> Of course it will imply that. At least on x86 it does. smp_wmb() is a
> compiler barrier there and cmpxchg() already has that barrier semantics
> by clobbering "memory". I'm pretty sure you should have the same thing
> on ARM.
>

No it definitely does not imply that. A memory clobber is a codegen
construct, and the hardware could still complete the writes in a way
that could result in another observer seeing a mix of old and new
values that is inconsistent with the ordering of the stores as issued
by the compiler.

> says, "new_cache must be put into array after its contents are written".
>
> Are we writing anything into the cache if cmpxchg fails?
>

The cache fields get updated but the pointer to the struct is never
shared globally if the cmpxchg() fails so not having the barrier on
failure should not be an issue here.

> > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.
>
> Why would there be pairs? I don't understand that statement here.
>

Typically, the other observer pairs the write barrier with a read barrier.

In this case, the other observer appears to be ghes_estatus_cached(),
and the reads of the cache struct fields must be ordered after the
read of the cache struct's address. However, there is an implicit
ordering there through an address dependency (you cannot dereference a
struct without knowing its address) so the ordering is implied (and
rcu_dereference() has a READ_ONCE() inside so we are guaranteed to
always dereference the same struct, even if the array slot gets
updated concurrently.)

If you want to get rid of the barrier, you could drop it and change
the cmpxchg() to cmpxchg_release().

Justin: so why are the RCU_INITIALIZER()s needed here?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-13 15:41             ` Ard Biesheuvel
@ 2022-10-13 16:37               ` Borislav Petkov
  2022-10-13 16:45               ` Peter Zijlstra
  2022-10-14 12:00               ` Justin He
  2 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2022-10-13 16:37 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Justin He, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Thu, Oct 13, 2022 at 05:41:06PM +0200, Ard Biesheuvel wrote:
> No it definitely does not imply that. A memory clobber is a codegen
> construct, and the hardware could still complete the writes in a way
> that could result in another observer seeing a mix of old and new
> values that is inconsistent with the ordering of the stores as issued
> by the compiler.

Yes, but look at the code. There's a:

	smp_wmb();

which on x86 is

	#define smp_wmb() barrier()

which is

	#define barrier() __asm__ __volatile__("": : :"memory")

so there wasn't a hardware memory barrier there in the first place.

Unless ARM does something else in those primitives...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-12 12:04         ` Justin He
  2022-10-13 13:37           ` Borislav Petkov
@ 2022-10-13 16:41           ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2022-10-13 16:41 UTC (permalink / raw)
  To: Justin He
  Cc: Borislav Petkov, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi,
	Ard Biesheuvel, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi, nd,
	kernel test robot

On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:

> > This is a combined diff - do a second patch which does only remove the
> > smp_wmb(). The smp_wmb() there is not needed as the cmpxchg() already
> > implies a smp_mb() so there's no need for that separate, explicit one.
> > 
> I have a concern about what if cmpxchg failed? Do we have to still
> guarantee the ordering since cmpxchg will not imply a smp_mb if it
> failed.
> 
> Besides, I didn't find the paired smp_mb or smp_rmb
> for this smp_wmb. Do you have any ideas?

failed cmpxchg does indeed not imply smp_mb; but in that case I can't
find a store it orders against; and the comment is utterly shite since
it doesn't spell out the ordering in any case.

The way I read that code is that the cmpxchg effectively publishes the
data and all it wants to ensure is that if the pointer is publised the
object is complete -- in which case the cpmxchg() is sufficient, on
success the object is published and you get the ordering, on failure the
object isn't published and nobody cares about the ordering anyway.

If there's anything else, the comment is in dire need of fixing anyway.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-13 15:41             ` Ard Biesheuvel
  2022-10-13 16:37               ` Borislav Petkov
@ 2022-10-13 16:45               ` Peter Zijlstra
  2022-10-13 17:42                 ` Borislav Petkov
  2022-10-14 12:00               ` Justin He
  2 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2022-10-13 16:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Justin He, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Thu, Oct 13, 2022 at 05:41:06PM +0200, Ard Biesheuvel wrote:
> On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> > > I have a concern about what if cmpxchg failed? Do we have to still
> > > guarantee the ordering since cmpxchg will not imply a smp_mb if it
> > > failed.
> >
> > Of course it will imply that. At least on x86 it does. smp_wmb() is a
> > compiler barrier there and cmpxchg() already has that barrier semantics
> > by clobbering "memory". I'm pretty sure you should have the same thing
> > on ARM.
> >
> 
> No it definitely does not imply that. A memory clobber is a codegen
> construct, and the hardware could still complete the writes in a way
> that could result in another observer seeing a mix of old and new
> values that is inconsistent with the ordering of the stores as issued
> by the compiler.

Borislav is thinking too much x86. Failed cmpxchg() does indeed not
imply any memory ordering for all architectures -- and while the memory
clobber (aka. barrier()) is equivalent to an smp_wmb() on x86, that most
certainly doesn't hold for non x86 code.

> > says, "new_cache must be put into array after its contents are written".
> >
> > Are we writing anything into the cache if cmpxchg fails?
> >
> 
> The cache fields get updated but the pointer to the struct is never
> shared globally if the cmpxchg() fails so not having the barrier on
> failure should not be an issue here.

That is how I read the code too; so if the cmpxchg fails the object is
not published and nobody cares about the ordering.

> 
> > > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.
> >
> > Why would there be pairs? I don't understand that statement here.
> >
> 
> Typically, the other observer pairs the write barrier with a read barrier.
> 
> In this case, the other observer appears to be ghes_estatus_cached(),
> and the reads of the cache struct fields must be ordered after the
> read of the cache struct's address. However, there is an implicit
> ordering there through an address dependency (you cannot dereference a
> struct without knowing its address) so the ordering is implied (and
> rcu_dereference() has a READ_ONCE() inside so we are guaranteed to
> always dereference the same struct, even if the array slot gets
> updated concurrently.)
> 
> If you want to get rid of the barrier, you could drop it and change
> the cmpxchg() to cmpxchg_release().

cmpxchg_release() is strictly weaker than cmpxchg(); so I don't see the
point there other than optimizing for weak architectures. It can't
fundamentally fix anything.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-13 16:45               ` Peter Zijlstra
@ 2022-10-13 17:42                 ` Borislav Petkov
  2022-10-14  9:40                   ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2022-10-13 17:42 UTC (permalink / raw)
  To: Peter Zijlstra, Huang Ying
  Cc: Ard Biesheuvel, Justin He, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Thu, Oct 13, 2022 at 06:45:44PM +0200, Peter Zijlstra wrote:
> Borislav is thinking too much x86. Failed cmpxchg() does indeed not
> imply any memory ordering for all architectures -- and while the memory
> clobber (aka. barrier()) is equivalent to an smp_wmb() on x86, that most
> certainly doesn't hold for non x86 code.

Right, but the patch was addied by an Intel person, CCed:

152cef40a808 ("ACPI, APEI, GHES, Error records content based throttle")a

So I don't think he was thinking about ARM when doing that.

And that commit message doesn't say one whit why that memory barrier is
needed there.

Reading that comment, it sounds like he wanted a hw memory barrier -
MFENCE - but I don't see how normal data dependency wouldn't enforce the proper order
already...

So that barrier looks out of place there.

Btw, this is the next perfect example why I'm asking people to write
proper commit messages so that when we do git archeology later, we can
figure out why something was done the way it has been.

And in this case, we can't. ;-\

Because writing proper commit messages is for losers. Yeah, right.</sarcasm>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-13 17:42                 ` Borislav Petkov
@ 2022-10-14  9:40                   ` Ard Biesheuvel
  2022-10-14 19:40                     ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2022-10-14  9:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Huang Ying, Justin He, Len Brown, James Morse,
	Tony Luck, Mauro Carvalho Chehab, Robert Richter, Robert Moore,
	Qiuxu Zhuo, Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi,
	linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Thu, 13 Oct 2022 at 19:42, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Oct 13, 2022 at 06:45:44PM +0200, Peter Zijlstra wrote:
> > Borislav is thinking too much x86. Failed cmpxchg() does indeed not
> > imply any memory ordering for all architectures -- and while the memory
> > clobber (aka. barrier()) is equivalent to an smp_wmb() on x86, that most
> > certainly doesn't hold for non x86 code.
>
> Right, but the patch was addied by an Intel person, CCed:
>
> 152cef40a808 ("ACPI, APEI, GHES, Error records content based throttle")a
>
> So I don't think he was thinking about ARM when doing that.
>
> And that commit message doesn't say one whit why that memory barrier is
> needed there.
>
> Reading that comment, it sounds like he wanted a hw memory barrier -
> MFENCE - but I don't see how normal data dependency wouldn't enforce the proper order
> already...
>
> So that barrier looks out of place there.
>

The cache struct pointer should not be published until after the
struct itself is fully populated. So on the producer side, some kind
of hardware barrier is definitely needed, or the struct may appear
half-baked to other cores that can read the updated pointer.

But as Peter points out, cmpxchg() itself has the required barrier
semantics already, so the separate smp_wmb() is likely unnecessary.
And as I suggested earlier, a full barrier is not necessary so we
could relax this to cmpxchg_release() if desired.

OTOH the code seems to be working fine as is, so why modify it at all?
(apart from the purely cosmetic changes)

> Btw, this is the next perfect example why I'm asking people to write
> proper commit messages so that when we do git archeology later, we can
> figure out why something was done the way it has been.
>
> And in this case, we can't. ;-\
>
> Because writing proper commit messages is for losers. Yeah, right.</sarcasm>
>

Yeah, agree there.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-13 15:41             ` Ard Biesheuvel
  2022-10-13 16:37               ` Borislav Petkov
  2022-10-13 16:45               ` Peter Zijlstra
@ 2022-10-14 12:00               ` Justin He
  2022-10-14 14:31                 ` Ard Biesheuvel
  2 siblings, 1 reply; 28+ messages in thread
From: Justin He @ 2022-10-14 12:00 UTC (permalink / raw)
  To: Ard Biesheuvel, Borislav Petkov
  Cc: Len Brown, James Morse, Tony Luck, Mauro Carvalho Chehab,
	Robert Richter, Robert Moore, Qiuxu Zhuo, Yazen Ghannam,
	Jan Luebbe, Khuong Dinh, Kani Toshi, linux-acpi, linux-kernel,
	linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, nd, kernel test robot

Hi Ard

> -----Original Message-----
> From: Ard Biesheuvel <ardb@kernel.org>
> Sent: Thursday, October 13, 2022 11:41 PM
> To: Borislav Petkov <bp@alien8.de>
> Cc: Justin He <Justin.He@arm.com>; Len Brown <lenb@kernel.org>; James
> Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Mauro
> Carvalho Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>;
> Robert Moore <robert.moore@intel.com>; Qiuxu Zhuo
> <qiuxu.zhuo@intel.com>; Yazen Ghannam <yazen.ghannam@amd.com>; Jan
> Luebbe <jlu@pengutronix.de>; Khuong Dinh
> <khuong@os.amperecomputing.com>; Kani Toshi <toshi.kani@hpe.com>;
> linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> > > I have a concern about what if cmpxchg failed? Do we have to still
> > > guarantee the ordering since cmpxchg will not imply a smp_mb if it
> > > failed.
> >
> > Of course it will imply that. At least on x86 it does. smp_wmb() is a
> > compiler barrier there and cmpxchg() already has that barrier
> > semantics by clobbering "memory". I'm pretty sure you should have the
> > same thing on ARM.
> >
> 
> No it definitely does not imply that. A memory clobber is a codegen construct,
> and the hardware could still complete the writes in a way that could result in
> another observer seeing a mix of old and new values that is inconsistent with
> the ordering of the stores as issued by the compiler.
> 
> > says, "new_cache must be put into array after its contents are written".
> >
> > Are we writing anything into the cache if cmpxchg fails?
> >
> 
> The cache fields get updated but the pointer to the struct is never shared
> globally if the cmpxchg() fails so not having the barrier on failure should not be
> an issue here.
> 
> > > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.
> >
> > Why would there be pairs? I don't understand that statement here.
> >
> 
> Typically, the other observer pairs the write barrier with a read barrier.
> 
> In this case, the other observer appears to be ghes_estatus_cached(), and the
> reads of the cache struct fields must be ordered after the read of the cache
> struct's address. However, there is an implicit ordering there through an address
> dependency (you cannot dereference a struct without knowing its address) so
> the ordering is implied (and
> rcu_dereference() has a READ_ONCE() inside so we are guaranteed to always
> dereference the same struct, even if the array slot gets updated concurrently.)
> 
> If you want to get rid of the barrier, you could drop it and change the cmpxchg()
> to cmpxchg_release().
> 
> Justin: so why are the RCU_INITIALIZER()s needed here?

In my this patch, I add the "__rcu" to the definition of ghes_estatus_caches. Hence
Sparse will still have the warning on X86 with this RCU_INITIALIZER cast.
drivers/acpi/apei/ghes.c:843:27: sparse: warning: incorrect type in initializer (different address spaces)
drivers/acpi/apei/ghes.c:843:27: sparse:    expected struct ghes_estatus_cache [noderef] __rcu *__old
drivers/acpi/apei/ghes.c:843:27: sparse:    got struct ghes_estatus_cache *[assigned] slot_cache
drivers/acpi/apei/ghes.c:843:27: sparse: warning: incorrect type in initializer (different address spaces)
drivers/acpi/apei/ghes.c:843:27: sparse:    expected struct ghes_estatus_cache [noderef] __rcu *__new
drivers/acpi/apei/ghes.c:843:27: sparse:    got struct ghes_estatus_cache *[assigned] new_cache

On Arm, IMO the macro cmpxchg doesn't care about it, that is, sparse will not report warnings with or
without RCU_INITIALIZER cast.

I tend to remain this cast, what do you think of it.

--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-14 12:00               ` Justin He
@ 2022-10-14 14:31                 ` Ard Biesheuvel
  2022-10-14 15:10                   ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2022-10-14 14:31 UTC (permalink / raw)
  To: Justin He
  Cc: Borislav Petkov, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, kernel test robot, Peter Zijlstra

On Fri, 14 Oct 2022 at 14:00, Justin He <Justin.He@arm.com> wrote:
>
> Hi Ard
>
> > -----Original Message-----
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Sent: Thursday, October 13, 2022 11:41 PM
> > To: Borislav Petkov <bp@alien8.de>
> > Cc: Justin He <Justin.He@arm.com>; Len Brown <lenb@kernel.org>; James
> > Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Mauro
> > Carvalho Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>;
> > Robert Moore <robert.moore@intel.com>; Qiuxu Zhuo
> > <qiuxu.zhuo@intel.com>; Yazen Ghannam <yazen.ghannam@amd.com>; Jan
> > Luebbe <jlu@pengutronix.de>; Khuong Dinh
> > <khuong@os.amperecomputing.com>; Kani Toshi <toshi.kani@hpe.com>;
> > linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> > linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> > <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> > Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> > kernel test robot <lkp@intel.com>
> > Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> >
> > On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> > > > I have a concern about what if cmpxchg failed? Do we have to still
> > > > guarantee the ordering since cmpxchg will not imply a smp_mb if it
> > > > failed.
> > >
> > > Of course it will imply that. At least on x86 it does. smp_wmb() is a
> > > compiler barrier there and cmpxchg() already has that barrier
> > > semantics by clobbering "memory". I'm pretty sure you should have the
> > > same thing on ARM.
> > >
> >
> > No it definitely does not imply that. A memory clobber is a codegen construct,
> > and the hardware could still complete the writes in a way that could result in
> > another observer seeing a mix of old and new values that is inconsistent with
> > the ordering of the stores as issued by the compiler.
> >
> > > says, "new_cache must be put into array after its contents are written".
> > >
> > > Are we writing anything into the cache if cmpxchg fails?
> > >
> >
> > The cache fields get updated but the pointer to the struct is never shared
> > globally if the cmpxchg() fails so not having the barrier on failure should not be
> > an issue here.
> >
> > > > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.
> > >
> > > Why would there be pairs? I don't understand that statement here.
> > >
> >
> > Typically, the other observer pairs the write barrier with a read barrier.
> >
> > In this case, the other observer appears to be ghes_estatus_cached(), and the
> > reads of the cache struct fields must be ordered after the read of the cache
> > struct's address. However, there is an implicit ordering there through an address
> > dependency (you cannot dereference a struct without knowing its address) so
> > the ordering is implied (and
> > rcu_dereference() has a READ_ONCE() inside so we are guaranteed to always
> > dereference the same struct, even if the array slot gets updated concurrently.)
> >
> > If you want to get rid of the barrier, you could drop it and change the cmpxchg()
> > to cmpxchg_release().
> >
> > Justin: so why are the RCU_INITIALIZER()s needed here?
>
> In my this patch, I add the "__rcu" to the definition of ghes_estatus_caches. Hence
> Sparse will still have the warning on X86 with this RCU_INITIALIZER cast.
> drivers/acpi/apei/ghes.c:843:27: sparse: warning: incorrect type in initializer (different address spaces)
> drivers/acpi/apei/ghes.c:843:27: sparse:    expected struct ghes_estatus_cache [noderef] __rcu *__old
> drivers/acpi/apei/ghes.c:843:27: sparse:    got struct ghes_estatus_cache *[assigned] slot_cache
> drivers/acpi/apei/ghes.c:843:27: sparse: warning: incorrect type in initializer (different address spaces)
> drivers/acpi/apei/ghes.c:843:27: sparse:    expected struct ghes_estatus_cache [noderef] __rcu *__new
> drivers/acpi/apei/ghes.c:843:27: sparse:    got struct ghes_estatus_cache *[assigned] new_cache
>
> On Arm, IMO the macro cmpxchg doesn't care about it, that is, sparse will not report warnings with or
> without RCU_INITIALIZER cast.
>
> I tend to remain this cast, what do you think of it.
>

OK, fair enough, I had only tested the arm64 build myself.

But just putting unrcu_pointer() and RCU_INITIALIZER() arbitrarily to
shut up sparse is a bit sloppy, imho. Passing around pointers like
this code does makes that necessary, unfortunately, so it would be
nice if we could clean that up, by getting rid of the slot_cache
variable.

And now that I have spent some time looking at this code, I wonder
what the point of the cmpxchg() is in the first place.
ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a new cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since we only insert new items, the race can only cause a failure if
the selected slot was updated with another new item concurrently,
which means that it is arbitrary which of those two items gets
dropped. This means we don't need the cmpxchg() and the special case,
and we can just drop the existing item unconditionally. Note that this
does not result in loss of error events, it simply means we might
cause a false cache miss, and report the same event one additional
time in quick succession even if the cache should have prevented that.


------------------------8<------------------------------
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 80ad530583c9..03acdfa35dab 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -138,7 +138,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;

-static struct ghes_estatus_cache
*ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu
*ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;

 static int ghes_panic_timeout __read_mostly = 30;
@@ -773,31 +773,26 @@ static struct ghes_estatus_cache
*ghes_estatus_cache_alloc(
        return cache;
 }

-static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
+static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
 {
+       struct ghes_estatus_cache *cache;
        u32 len;

+       cache = container_of(head, struct ghes_estatus_cache, rcu);
        len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
        len = GHES_ESTATUS_CACHE_LEN(len);
        gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
        atomic_dec(&ghes_estatus_cache_alloced);
 }

-static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
-{
-       struct ghes_estatus_cache *cache;
-
-       cache = container_of(head, struct ghes_estatus_cache, rcu);
-       ghes_estatus_cache_free(cache);
-}
-
 static void ghes_estatus_cache_add(
        struct acpi_hest_generic *generic,
        struct acpi_hest_generic_status *estatus)
 {
        int i, slot = -1, count;
        unsigned long long now, duration, period, max_period = 0;
-       struct ghes_estatus_cache *cache, *slot_cache = NULL, *new_cache;
+       struct ghes_estatus_cache *cache, *new_cache;
+       struct ghes_estatus_cache __rcu *victim;

        new_cache = ghes_estatus_cache_alloc(generic, estatus);
        if (new_cache == NULL)
@@ -808,13 +803,11 @@ static void ghes_estatus_cache_add(
                cache = rcu_dereference(ghes_estatus_caches[i]);
                if (cache == NULL) {
                        slot = i;
-                       slot_cache = NULL;
                        break;
                }
                duration = now - cache->time_in;
                if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
                        slot = i;
-                       slot_cache = cache;
                        break;
                }
                count = atomic_read(&cache->count);
@@ -823,17 +816,28 @@ static void ghes_estatus_cache_add(
                if (period > max_period) {
                        max_period = period;
                        slot = i;
-                       slot_cache = cache;
                }
        }
-       /* new_cache must be put into array after its contents are written */
-       smp_wmb();
-       if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-                                 slot_cache, new_cache) == slot_cache) {
-               if (slot_cache)
-                       call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
-       } else
-               ghes_estatus_cache_free(new_cache);
+       if (slot != -1) {
+               /*
+                * Use release semantics to ensure that ghes_estatus_cached()
+                * running on another CPU will see the updated cache fields if
+                * it can see the new value of the pointer.
+                */
+               victim = xchg_release(ghes_estatus_caches + slot,
+                                     RCU_INITIALIZER(new_cache));
+
+               /*
+                * At this point, victim may point to a cached item different
+                * from the one based on which we selected the slot. Instead of
+                * going to the loop again to pick another slot, let's just
+                * drop the other item anyway: this may cause a false cache
+                * miss later on, but that won't cause any problems.
+                */
+               if (victim)
+                       call_rcu(&rcu_dereference(victim)->rcu,
+                                ghes_estatus_cache_rcu_free);
+       }
        rcu_read_unlock();
 }

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-14 14:31                 ` Ard Biesheuvel
@ 2022-10-14 15:10                   ` Peter Zijlstra
  2022-10-14 15:24                     ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2022-10-14 15:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Justin He, Borislav Petkov, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, kernel test robot

On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> +       if (slot != -1) {
> +               /*
> +                * Use release semantics to ensure that ghes_estatus_cached()
> +                * running on another CPU will see the updated cache fields if
> +                * it can see the new value of the pointer.
> +                */
> +               victim = xchg_release(ghes_estatus_caches + slot,
> +                                     RCU_INITIALIZER(new_cache));
> +
> +               /*
> +                * At this point, victim may point to a cached item different
> +                * from the one based on which we selected the slot. Instead of
> +                * going to the loop again to pick another slot, let's just
> +                * drop the other item anyway: this may cause a false cache
> +                * miss later on, but that won't cause any problems.
> +                */
> +               if (victim) {
> +                       call_rcu(&rcu_dereference(victim)->rcu,
> +                                ghes_estatus_cache_rcu_free);
		}

I think you can use unrcu_pointer() here instead, there should not be a
data dependency since the ->rcu member itself should be otherwise unused
(and if it were, we wouldn't care about its previous content anyway).

But only Alpha cares about that distinction anyway, so *shrug*.

While I much like the xchg() variant; I still don't really fancy the
verbage the sparse nonsense makes us do.

		victim = xchg_release(&ghes_estatus_caches[slot], new_cache);
		if (victim)
			call_rcu(&victim->rcu, ghes_estatus_cache_rcu_free);

is much nicer code.

Over all; I'd simply ignore sparse (I often do).

> +       }
>         rcu_read_unlock();
>  }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-14 15:10                   ` Peter Zijlstra
@ 2022-10-14 15:24                     ` Ard Biesheuvel
  2022-10-17  8:47                       ` Justin He
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2022-10-14 15:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Justin He, Borislav Petkov, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Robert Richter, Robert Moore, Qiuxu Zhuo,
	Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi, linux-acpi,
	linux-kernel, linux-edac, devel, Rafael J . Wysocki, Shuai Xue,
	Jarkko Sakkinen, linux-efi, kernel test robot

On Fri, 14 Oct 2022 at 17:11, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> > +       if (slot != -1) {
> > +               /*
> > +                * Use release semantics to ensure that ghes_estatus_cached()
> > +                * running on another CPU will see the updated cache fields if
> > +                * it can see the new value of the pointer.
> > +                */
> > +               victim = xchg_release(ghes_estatus_caches + slot,
> > +                                     RCU_INITIALIZER(new_cache));
> > +
> > +               /*
> > +                * At this point, victim may point to a cached item different
> > +                * from the one based on which we selected the slot. Instead of
> > +                * going to the loop again to pick another slot, let's just
> > +                * drop the other item anyway: this may cause a false cache
> > +                * miss later on, but that won't cause any problems.
> > +                */
> > +               if (victim) {
> > +                       call_rcu(&rcu_dereference(victim)->rcu,
> > +                                ghes_estatus_cache_rcu_free);
>                 }
>
> I think you can use unrcu_pointer() here instead, there should not be a
> data dependency since the ->rcu member itself should be otherwise unused
> (and if it were, we wouldn't care about its previous content anyway).
>
> But only Alpha cares about that distinction anyway, so *shrug*.
>

Ah yeah good point - and we are not actually dereferencing the pointer
at all here, just adding an offset to get at the address of the rcu
member.

So we can take this block out of the rcu_read_lock() section as well.


> While I much like the xchg() variant; I still don't really fancy the
> verbage the sparse nonsense makes us do.
>
>                 victim = xchg_release(&ghes_estatus_caches[slot], new_cache);
>                 if (victim)
>                         call_rcu(&victim->rcu, ghes_estatus_cache_rcu_free);
>
> is much nicer code.
>
> Over all; I'd simply ignore sparse (I often do).
>

No disagreement there.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-14  9:40                   ` Ard Biesheuvel
@ 2022-10-14 19:40                     ` Borislav Petkov
  0 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2022-10-14 19:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Peter Zijlstra, Huang Ying, Justin He, Len Brown, James Morse,
	Tony Luck, Mauro Carvalho Chehab, Robert Richter, Robert Moore,
	Qiuxu Zhuo, Yazen Ghannam, Jan Luebbe, Khuong Dinh, Kani Toshi,
	linux-acpi, linux-kernel, linux-edac, devel, Rafael J . Wysocki,
	Shuai Xue, Jarkko Sakkinen, linux-efi, nd, kernel test robot

On Fri, Oct 14, 2022 at 11:40:39AM +0200, Ard Biesheuvel wrote:
> The cache struct pointer should not be published until after the
> struct itself is fully populated. So on the producer side, some kind
> of hardware barrier is definitely needed, or the struct may appear
> half-baked to other cores that can read the updated pointer.

Ah, right you are, ghes_estatus_cached() is called in all kinds of
contexts and by other cores, sure.

> OTOH the code seems to be working fine as is, so why modify it at all?
> (apart from the purely cosmetic changes)

peterz questioned that smp_wmb() there and then we started digging. And
frankly, even if removing that barrier won't make any difference, I'd
still prefer it gone and have the code simpler and cleaner.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-14 15:24                     ` Ard Biesheuvel
@ 2022-10-17  8:47                       ` Justin He
  2022-10-17  9:27                         ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Justin He @ 2022-10-17  8:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Peter Zijlstra, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam, Jan Luebbe, Khuong Dinh,
	Kani Toshi, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi,
	kernel test robot

Hi Ard

> -----Original Message-----
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
>
> On Fri, 14 Oct 2022 at 17:11, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> > > +       if (slot != -1) {
> > > +               /*
> > > +                * Use release semantics to ensure that
> ghes_estatus_cached()
> > > +                * running on another CPU will see the updated cache
> fields if
> > > +                * it can see the new value of the pointer.
> > > +                */
> > > +               victim = xchg_release(ghes_estatus_caches + slot,
> > > +
> RCU_INITIALIZER(new_cache));
> > > +
> > > +               /*
> > > +                * At this point, victim may point to a cached item
> different
> > > +                * from the one based on which we selected the slot.
> Instead of
> > > +                * going to the loop again to pick another slot, let's
> just
> > > +                * drop the other item anyway: this may cause a false
> cache
> > > +                * miss later on, but that won't cause any problems.
> > > +                */
> > > +               if (victim) {
> > > +                       call_rcu(&rcu_dereference(victim)->rcu,
> > > +                                ghes_estatus_cache_rcu_free);
> >                 }
> >
> > I think you can use unrcu_pointer() here instead, there should not be
> > a data dependency since the ->rcu member itself should be otherwise
> > unused (and if it were, we wouldn't care about its previous content anyway).
> >
> > But only Alpha cares about that distinction anyway, so *shrug*.
> >
>
> Ah yeah good point - and we are not actually dereferencing the pointer at all
> here, just adding an offset to get at the address of the rcu member.
>
> So we can take this block out of the rcu_read_lock() section as well.
>
>
> > While I much like the xchg() variant; I still don't really fancy the
> > verbage the sparse nonsense makes us do.
> >
> >                 victim = xchg_release(&ghes_estatus_caches[slot],
> new_cache);
> >                 if (victim)
> >                         call_rcu(&victim->rcu,
> > ghes_estatus_cache_rcu_free);
> >
> > is much nicer code.
> >
> > Over all; I'd simply ignore sparse (I often do).
> >
>
> No disagreement there.

What do you think of the updated patch:

apei/ghes: Use xchg() for updating new cache slot instead of
 cmpxchg()

From: Ard Biesheuvel <ardb@kernel.org>

ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a new cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since only inserting new items is needed, the race can only cause a failure
if the selected slot was updated with another new item concurrently,
which means that it is arbitrary which of those two items gets
dropped. This means the cmpxchg() and the special case are not necessary,
and hence just drop the existing item unconditionally. Note that this
does not result in loss of error events, it simply means we might
cause a false cache miss, and report the same event one additional
time in quick succession even if the cache should have prevented that.

Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
[Justin]: I removed __rcu annotation of victim, removed the RCU_INITIALIZER
cast and added the unptr for xchg.

drivers/acpi/apei/ghes.c | 44 ++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 27c72b175e4b..5fc8a135450b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -150,7 +150,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;

-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;

 static int ghes_panic_timeout __read_mostly = 30;
@@ -785,31 +785,26 @@ static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
        return cache;
 }

-static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
+static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
 {
+       struct ghes_estatus_cache *cache;
        u32 len;

+       cache = container_of(head, struct ghes_estatus_cache, rcu);
        len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
        len = GHES_ESTATUS_CACHE_LEN(len);
        gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
        atomic_dec(&ghes_estatus_cache_alloced);
 }

-static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
-{
-       struct ghes_estatus_cache *cache;
-
-       cache = container_of(head, struct ghes_estatus_cache, rcu);
-       ghes_estatus_cache_free(cache);
-}
-
 static void ghes_estatus_cache_add(
        struct acpi_hest_generic *generic,
        struct acpi_hest_generic_status *estatus)
 {
        int i, slot = -1, count;
        unsigned long long now, duration, period, max_period = 0;
-       struct ghes_estatus_cache *cache, *slot_cache = NULL, *new_cache;
+       struct ghes_estatus_cache *cache, *new_cache;
+       struct ghes_estatus_cache *victim;

        new_cache = ghes_estatus_cache_alloc(generic, estatus);
        if (new_cache == NULL)
@@ -820,13 +815,11 @@ static void ghes_estatus_cache_add(
                cache = rcu_dereference(ghes_estatus_caches[i]);
                if (cache == NULL) {
                        slot = i;
-                       slot_cache = NULL;
                        break;
                }
                duration = now - cache->time_in;
                if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
                        slot = i;
-                       slot_cache = cache;
                        break;
                }
                count = atomic_read(&cache->count);
@@ -835,17 +828,24 @@ static void ghes_estatus_cache_add(
                if (period > max_period) {
                        max_period = period;
                        slot = i;
-                       slot_cache = cache;
                }
        }
-       /* new_cache must be put into array after its contents are written */
-       smp_wmb();
-       if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-                                 slot_cache, new_cache) == slot_cache) {
-               if (slot_cache)
-                       call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
-       } else
-               ghes_estatus_cache_free(new_cache);
+       if (slot != -1) {
+               /*
+                * Use release semantics to ensure that ghes_estatus_cached()
+                * running on another CPU will see the updated cache fields if
+                * it can see the new value of the pointer.
+                * At this point, victim may point to a cached item different
+                * from the one based on which we selected the slot. Instead of
+                * going to the loop again to pick another slot, let's just
+                * drop the other item anyway: this may cause a false cache
+                * miss later on, but that won't cause any problems.
+                */
+               victim = unrcu_pointer(xchg_release(&ghes_estatus_caches[slot],
+                                       new_cache));
+               if (victim)
+                       call_rcu(&victim->rcu, ghes_estatus_cache_rcu_free);
+       }
        rcu_read_unlock();
 }

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-17  8:47                       ` Justin He
@ 2022-10-17  9:27                         ` Ard Biesheuvel
  2022-10-17 11:57                           ` Justin He
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2022-10-17  9:27 UTC (permalink / raw)
  To: Justin He
  Cc: Borislav Petkov, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Peter Zijlstra, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam, Jan Luebbe, Khuong Dinh,
	Kani Toshi, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi,
	kernel test robot

Hi Justin,

On Mon, 17 Oct 2022 at 10:47, Justin He <Justin.He@arm.com> wrote:
>
> Hi Ard
>
> > -----Original Message-----
> > Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> >
> > On Fri, 14 Oct 2022 at 17:11, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> > > > +       if (slot != -1) {
> > > > +               /*
> > > > +                * Use release semantics to ensure that
> > ghes_estatus_cached()
> > > > +                * running on another CPU will see the updated cache
> > fields if
> > > > +                * it can see the new value of the pointer.
> > > > +                */
> > > > +               victim = xchg_release(ghes_estatus_caches + slot,
> > > > +
> > RCU_INITIALIZER(new_cache));
> > > > +
> > > > +               /*
> > > > +                * At this point, victim may point to a cached item
> > different
> > > > +                * from the one based on which we selected the slot.
> > Instead of
> > > > +                * going to the loop again to pick another slot, let's
> > just
> > > > +                * drop the other item anyway: this may cause a false
> > cache
> > > > +                * miss later on, but that won't cause any problems.
> > > > +                */
> > > > +               if (victim) {
> > > > +                       call_rcu(&rcu_dereference(victim)->rcu,
> > > > +                                ghes_estatus_cache_rcu_free);
> > >                 }
> > >
> > > I think you can use unrcu_pointer() here instead, there should not be
> > > a data dependency since the ->rcu member itself should be otherwise
> > > unused (and if it were, we wouldn't care about its previous content anyway).
> > >
> > > But only Alpha cares about that distinction anyway, so *shrug*.
> > >
> >
> > Ah yeah good point - and we are not actually dereferencing the pointer at all
> > here, just adding an offset to get at the address of the rcu member.
> >
> > So we can take this block out of the rcu_read_lock() section as well.
> >
> >
> > > While I much like the xchg() variant; I still don't really fancy the
> > > verbage the sparse nonsense makes us do.
> > >
> > >                 victim = xchg_release(&ghes_estatus_caches[slot],
> > new_cache);
> > >                 if (victim)
> > >                         call_rcu(&victim->rcu,
> > > ghes_estatus_cache_rcu_free);
> > >
> > > is much nicer code.
> > >
> > > Over all; I'd simply ignore sparse (I often do).
> > >
> >
> > No disagreement there.
>
> What do you think of the updated patch:
>
> apei/ghes: Use xchg() for updating new cache slot instead of
>  cmpxchg()
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> ghes_estatus_cache_add() selects a slot, and either succeeds in
> replacing its contents with a pointer to a new cached item, or it just
> gives up and frees the new item again, without attempting to select
> another slot even if one might be available.
>
> Since only inserting new items is needed, the race can only cause a failure
> if the selected slot was updated with another new item concurrently,
> which means that it is arbitrary which of those two items gets
> dropped. This means the cmpxchg() and the special case are not necessary,
> and hence just drop the existing item unconditionally. Note that this
> does not result in loss of error events, it simply means we might
> cause a false cache miss, and report the same event one additional
> time in quick succession even if the cache should have prevented that.
>

Please add a line here

Co-developed-by: Jia He <justin.he@arm.com>

> Signed-off-by: Jia He <justin.he@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> [Justin]: I removed __rcu annotation of victim, removed the RCU_INITIALIZER
> cast and added the unptr for xchg.
>
> drivers/acpi/apei/ghes.c | 44 ++++++++++++++++++++--------------------
>  1 file changed, 22 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 27c72b175e4b..5fc8a135450b 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -150,7 +150,7 @@ struct ghes_vendor_record_entry {
>  static struct gen_pool *ghes_estatus_pool;
>  static unsigned long ghes_estatus_pool_size_request;
>
> -static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> +static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>  static atomic_t ghes_estatus_cache_alloced;
>
>  static int ghes_panic_timeout __read_mostly = 30;
> @@ -785,31 +785,26 @@ static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
>         return cache;
>  }
>
> -static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
> +static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
>  {
> +       struct ghes_estatus_cache *cache;
>         u32 len;
>
> +       cache = container_of(head, struct ghes_estatus_cache, rcu);
>         len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
>         len = GHES_ESTATUS_CACHE_LEN(len);
>         gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
>         atomic_dec(&ghes_estatus_cache_alloced);
>  }
>
> -static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
> -{
> -       struct ghes_estatus_cache *cache;
> -
> -       cache = container_of(head, struct ghes_estatus_cache, rcu);
> -       ghes_estatus_cache_free(cache);
> -}
> -
>  static void ghes_estatus_cache_add(
>         struct acpi_hest_generic *generic,
>         struct acpi_hest_generic_status *estatus)
>  {
>         int i, slot = -1, count;
>         unsigned long long now, duration, period, max_period = 0;
> -       struct ghes_estatus_cache *cache, *slot_cache = NULL, *new_cache;
> +       struct ghes_estatus_cache *cache, *new_cache;
> +       struct ghes_estatus_cache *victim;
>
>         new_cache = ghes_estatus_cache_alloc(generic, estatus);
>         if (new_cache == NULL)
> @@ -820,13 +815,11 @@ static void ghes_estatus_cache_add(
>                 cache = rcu_dereference(ghes_estatus_caches[i]);
>                 if (cache == NULL) {
>                         slot = i;
> -                       slot_cache = NULL;
>                         break;
>                 }
>                 duration = now - cache->time_in;
>                 if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
>                         slot = i;
> -                       slot_cache = cache;
>                         break;
>                 }
>                 count = atomic_read(&cache->count);
> @@ -835,17 +828,24 @@ static void ghes_estatus_cache_add(
>                 if (period > max_period) {
>                         max_period = period;
>                         slot = i;
> -                       slot_cache = cache;
>                 }
>         }
> -       /* new_cache must be put into array after its contents are written */
> -       smp_wmb();
> -       if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
> -                                 slot_cache, new_cache) == slot_cache) {
> -               if (slot_cache)
> -                       call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
> -       } else
> -               ghes_estatus_cache_free(new_cache);
> +       if (slot != -1) {
> +               /*
> +                * Use release semantics to ensure that ghes_estatus_cached()
> +                * running on another CPU will see the updated cache fields if
> +                * it can see the new value of the pointer.

Please move the comment back where it was. 'At this point' is now
ambiguous because victim has not been assigned yet.

> +                * At this point, victim may point to a cached item different
> +                * from the one based on which we selected the slot. Instead of
> +                * going to the loop again to pick another slot, let's just
> +                * drop the other item anyway: this may cause a false cache
> +                * miss later on, but that won't cause any problems.
> +                */
> +               victim = unrcu_pointer(xchg_release(&ghes_estatus_caches[slot],
> +                                       new_cache));

Doesn't this still trigger the sparse warning on x86?

> +               if (victim)
> +                       call_rcu(&victim->rcu, ghes_estatus_cache_rcu_free);

I think it is better to add back the __rcu annotation to 'victim', and
change this line to

call_rcu(&unrcu_pointer(victim)->rcu, ghes_estatus_cache_rcu_free);

> +       }
>         rcu_read_unlock();

This can now be moved before the if()

>  }
>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Please get rid of this footer.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
  2022-10-17  9:27                         ` Ard Biesheuvel
@ 2022-10-17 11:57                           ` Justin He
  0 siblings, 0 replies; 28+ messages in thread
From: Justin He @ 2022-10-17 11:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Len Brown, James Morse, Tony Luck,
	Mauro Carvalho Chehab, Peter Zijlstra, Robert Richter,
	Robert Moore, Qiuxu Zhuo, Yazen Ghannam, Jan Luebbe, Khuong Dinh,
	Kani Toshi, linux-acpi, linux-kernel, linux-edac, devel,
	Rafael J . Wysocki, Shuai Xue, Jarkko Sakkinen, linux-efi,
	kernel test robot, nd

Hi Ard

> -----Original Message-----
> From: Ard Biesheuvel <ardb@kernel.org>
> Sent: Monday, October 17, 2022 5:27 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Borislav Petkov <bp@alien8.de>; Len Brown <lenb@kernel.org>; James
> Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Mauro
> Carvalho Chehab <mchehab@kernel.org>; Peter Zijlstra
> <peterz@infradead.org>; Robert Richter <rric@kernel.org>; Robert Moore
> <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen
> Ghannam <yazen.ghannam@amd.com>; Jan Luebbe <jlu@pengutronix.de>;
> Khuong Dinh <khuong@os.amperecomputing.com>; Kani Toshi
> <toshi.kani@hpe.com>; linux-acpi@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; devel@acpica.org;
> Rafael J . Wysocki <rafael@kernel.org>; Shuai Xue
> <xueshuai@linux.alibaba.com>; Jarkko Sakkinen <jarkko@kernel.org>;
> linux-efi@vger.kernel.org; kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> Hi Justin,
> 
> On Mon, 17 Oct 2022 at 10:47, Justin He <Justin.He@arm.com> wrote:
> >
> > Hi Ard
> >
> > > -----Original Message-----
> > > Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> > >
> > > On Fri, 14 Oct 2022 at 17:11, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> > > > > +       if (slot != -1) {
> > > > > +               /*
> > > > > +                * Use release semantics to ensure that
> > > ghes_estatus_cached()
> > > > > +                * running on another CPU will see the updated
> > > > > + cache
> > > fields if
> > > > > +                * it can see the new value of the pointer.
> > > > > +                */
> > > > > +               victim = xchg_release(ghes_estatus_caches +
> > > > > + slot,
> > > > > +
> > > RCU_INITIALIZER(new_cache));
> > > > > +
> > > > > +               /*
> > > > > +                * At this point, victim may point to a cached
> > > > > + item
> > > different
> > > > > +                * from the one based on which we selected the slot.
> > > Instead of
> > > > > +                * going to the loop again to pick another slot,
> > > > > + let's
> > > just
> > > > > +                * drop the other item anyway: this may cause a
> > > > > + false
> > > cache
> > > > > +                * miss later on, but that won't cause any problems.
> > > > > +                */
> > > > > +               if (victim) {
> > > > > +                       call_rcu(&rcu_dereference(victim)->rcu,
> > > > > +                                ghes_estatus_cache_rcu_free);
> > > >                 }
> > > >
> > > > I think you can use unrcu_pointer() here instead, there should not
> > > > be a data dependency since the ->rcu member itself should be
> > > > otherwise unused (and if it were, we wouldn't care about its previous
> content anyway).
> > > >
> > > > But only Alpha cares about that distinction anyway, so *shrug*.
> > > >
> > >
> > > Ah yeah good point - and we are not actually dereferencing the
> > > pointer at all here, just adding an offset to get at the address of the rcu
> member.
> > >
> > > So we can take this block out of the rcu_read_lock() section as well.
> > >
> > >
> > > > While I much like the xchg() variant; I still don't really fancy
> > > > the verbage the sparse nonsense makes us do.
> > > >
> > > >                 victim = xchg_release(&ghes_estatus_caches[slot],
> > > new_cache);
> > > >                 if (victim)
> > > >                         call_rcu(&victim->rcu,
> > > > ghes_estatus_cache_rcu_free);
> > > >
> > > > is much nicer code.
> > > >
> > > > Over all; I'd simply ignore sparse (I often do).
> > > >
> > >
> > > No disagreement there.
> >
> > What do you think of the updated patch:
> >
> > apei/ghes: Use xchg() for updating new cache slot instead of
> >  cmpxchg()
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > ghes_estatus_cache_add() selects a slot, and either succeeds in
> > replacing its contents with a pointer to a new cached item, or it just
> > gives up and frees the new item again, without attempting to select
> > another slot even if one might be available.
> >
> > Since only inserting new items is needed, the race can only cause a
> > failure if the selected slot was updated with another new item
> > concurrently, which means that it is arbitrary which of those two
> > items gets dropped. This means the cmpxchg() and the special case are
> > not necessary, and hence just drop the existing item unconditionally.
> > Note that this does not result in loss of error events, it simply
> > means we might cause a false cache miss, and report the same event one
> > additional time in quick succession even if the cache should have prevented
> that.
> >
> 
> Please add a line here
> 
> Co-developed-by: Jia He <justin.he@arm.com>
> 
> > Signed-off-by: Jia He <justin.he@arm.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > [Justin]: I removed __rcu annotation of victim, removed the
> > RCU_INITIALIZER cast and added the unptr for xchg.
> >
> > drivers/acpi/apei/ghes.c | 44 ++++++++++++++++++++--------------------
> >  1 file changed, 22 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
> > 27c72b175e4b..5fc8a135450b 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -150,7 +150,7 @@ struct ghes_vendor_record_entry {  static struct
> > gen_pool *ghes_estatus_pool;  static unsigned long
> > ghes_estatus_pool_size_request;
> >
> > -static struct ghes_estatus_cache
> > *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> > +static struct ghes_estatus_cache __rcu
> > +*ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> >  static atomic_t ghes_estatus_cache_alloced;
> >
> >  static int ghes_panic_timeout __read_mostly = 30; @@ -785,31 +785,26
> > @@ static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
> >         return cache;
> >  }
> >
> > -static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
> > +static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
> >  {
> > +       struct ghes_estatus_cache *cache;
> >         u32 len;
> >
> > +       cache = container_of(head, struct ghes_estatus_cache, rcu);
> >         len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
> >         len = GHES_ESTATUS_CACHE_LEN(len);
> >         gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
> >         atomic_dec(&ghes_estatus_cache_alloced);
> >  }
> >
> > -static void ghes_estatus_cache_rcu_free(struct rcu_head *head) -{
> > -       struct ghes_estatus_cache *cache;
> > -
> > -       cache = container_of(head, struct ghes_estatus_cache, rcu);
> > -       ghes_estatus_cache_free(cache);
> > -}
> > -
> >  static void ghes_estatus_cache_add(
> >         struct acpi_hest_generic *generic,
> >         struct acpi_hest_generic_status *estatus)  {
> >         int i, slot = -1, count;
> >         unsigned long long now, duration, period, max_period = 0;
> > -       struct ghes_estatus_cache *cache, *slot_cache = NULL,
> *new_cache;
> > +       struct ghes_estatus_cache *cache, *new_cache;
> > +       struct ghes_estatus_cache *victim;
> >
> >         new_cache = ghes_estatus_cache_alloc(generic, estatus);
> >         if (new_cache == NULL)
> > @@ -820,13 +815,11 @@ static void ghes_estatus_cache_add(
> >                 cache = rcu_dereference(ghes_estatus_caches[i]);
> >                 if (cache == NULL) {
> >                         slot = i;
> > -                       slot_cache = NULL;
> >                         break;
> >                 }
> >                 duration = now - cache->time_in;
> >                 if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
> >                         slot = i;
> > -                       slot_cache = cache;
> >                         break;
> >                 }
> >                 count = atomic_read(&cache->count); @@ -835,17
> +828,24
> > @@ static void ghes_estatus_cache_add(
> >                 if (period > max_period) {
> >                         max_period = period;
> >                         slot = i;
> > -                       slot_cache = cache;
> >                 }
> >         }
> > -       /* new_cache must be put into array after its contents are written
> */
> > -       smp_wmb();
> > -       if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
> > -                                 slot_cache, new_cache) == slot_cache)
> {
> > -               if (slot_cache)
> > -                       call_rcu(&slot_cache->rcu,
> ghes_estatus_cache_rcu_free);
> > -       } else
> > -               ghes_estatus_cache_free(new_cache);
> > +       if (slot != -1) {
> > +               /*
> > +                * Use release semantics to ensure that
> ghes_estatus_cached()
> > +                * running on another CPU will see the updated cache
> fields if
> > +                * it can see the new value of the pointer.
> 
> Please move the comment back where it was. 'At this point' is now ambiguous
> because victim has not been assigned yet.
> 
> > +                * At this point, victim may point to a cached item
> different
> > +                * from the one based on which we selected the slot.
> Instead of
> > +                * going to the loop again to pick another slot, let's just
> > +                * drop the other item anyway: this may cause a false
> cache
> > +                * miss later on, but that won't cause any problems.
> > +                */
> > +               victim =
> unrcu_pointer(xchg_release(&ghes_estatus_caches[slot],
> > +                                       new_cache));
> 
> Doesn't this still trigger the sparse warning on x86?
Yes,
I will add the RCU_INITIALIZER back if Peter doesn't object.


--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-10-17 11:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-10  2:35 [PATCH v8 0/7] Make ghes_edac a proper module Jia He
2022-10-10  2:35 ` [PATCH v8 1/7] efi/cper: export several helpers for ghes_edac to use Jia He
2022-10-10  2:35 ` [PATCH v8 2/7] EDAC/ghes: Add a notifier for reporting memory errors Jia He
2022-10-10  2:35 ` [PATCH v8 3/7] EDAC/ghes: Prepare to make ghes_edac a proper module Jia He
2022-10-10  2:35 ` [PATCH v8 4/7] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes Jia He
2022-10-10  2:35 ` [PATCH v8 5/7] EDAC: Add the ghes_get_devices() check for chipset-specific edac drivers Jia He
2022-10-10  2:35 ` [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Jia He
2022-10-11 10:33   ` Borislav Petkov
2022-10-11 14:32     ` Justin He
2022-10-11 14:45       ` Borislav Petkov
2022-10-12  4:35         ` Justin He
2022-10-12 12:04         ` Justin He
2022-10-13 13:37           ` Borislav Petkov
2022-10-13 15:41             ` Ard Biesheuvel
2022-10-13 16:37               ` Borislav Petkov
2022-10-13 16:45               ` Peter Zijlstra
2022-10-13 17:42                 ` Borislav Petkov
2022-10-14  9:40                   ` Ard Biesheuvel
2022-10-14 19:40                     ` Borislav Petkov
2022-10-14 12:00               ` Justin He
2022-10-14 14:31                 ` Ard Biesheuvel
2022-10-14 15:10                   ` Peter Zijlstra
2022-10-14 15:24                     ` Ard Biesheuvel
2022-10-17  8:47                       ` Justin He
2022-10-17  9:27                         ` Ard Biesheuvel
2022-10-17 11:57                           ` Justin He
2022-10-13 16:41           ` Peter Zijlstra
2022-10-10  2:35 ` [PATCH v8 7/7] EDAC/igen6: Return consistent errno when another edac driver is enabled Jia He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).