All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] enable ghes_edac on selected platforms
@ 2017-08-03 21:57 Toshi Kani
  2017-08-03 21:57   ` [v2,1/7] " Toshi Kani
                   ` (6 more replies)
  0 siblings, 7 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp; +Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

The ghes_edac driver was introduced in 2013 [1], but it has not been
enabled by any distro yet.  This is because the driver obtains error
info from firmware interfaces, which are not properly implemented on
many platforms.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms to
the list when they support ghes_edac.

Patch 1-2 introduces a common function for platform check.
Patch 3 improves GHES check for ghes_edac.
Patch 4 optimizes ghes_edac init code.
Patch 5 introduces platform check to ghes_edac.
Patch 6-7 optimizes regular edac driver's init code when ghes_edac is used.

v2:
 - Address review comments (patch 1).
 - Add OSC APEI check (patch 3).
 - Avoid multiple dmi_walk (patch 4).
 - Add EDAC MC owner check (patch 6,7)

---
Toshi Kani (7):
 1/7 ACPI / blacklist: add acpi_match_oemlist() interface
 2/7 intel_pstate: convert to use acpi_match_oemlist()
 3/7 ACPI / APEI: add OSC APEI bit check for ghes_edac
 4/7 ghes_edac: avoid multiple calls to dmi_walk()
 5/7 ghes_edac: add platform check to enable ghes_edac
 6/7 EDAC: add edac_check_mc_owner() to check MC owner
 7/7 edac drivers: add MC owner check in init

---
 drivers/acpi/apei/ghes.c       |  8 ++--
 drivers/acpi/blacklist.c       | 83 +++++++-----------------------------------
 drivers/acpi/utils.c           | 40 ++++++++++++++++++++
 drivers/cpufreq/intel_pstate.c | 64 +++++++++++++-------------------
 drivers/edac/amd64_edac.c      |  3 ++
 drivers/edac/edac_mc.c         | 15 +++++++-
 drivers/edac/edac_mc.h         | 12 ++++++
 drivers/edac/ghes_edac.c       | 36 ++++++++++++++----
 drivers/edac/pnd2_edac.c       |  7 +++-
 drivers/edac/sb_edac.c         |  7 +++-
 drivers/edac/skx_edac.c        |  6 ++-
 include/linux/acpi.h           | 19 ++++++++++
 12 files changed, 176 insertions(+), 124 deletions(-)


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH v2 1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

ACPI OEM ID / OEM Table ID / Revision can be used to identify
a platform based on ACPI firmware info.  acpi_blacklisted(),
intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
have been using similar check to detect a list of platforms
that require special handlings.

Move the platform check in acpi_blacklisted() to a new common
utility function, acpi_match_oemlist(), so that other drivers
do not have to implement their own version.

There is no change in functionality.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Borislav Petkov <bp@alien8.de>
---
 drivers/acpi/blacklist.c |   83 ++++++++--------------------------------------
 drivers/acpi/utils.c     |   40 ++++++++++++++++++++++
 include/linux/acpi.h     |   19 +++++++++++
 3 files changed, 73 insertions(+), 69 deletions(-)

diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
index bb542ac..bf44573 100644
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -30,30 +30,13 @@
 
 #include "internal.h"
 
-enum acpi_blacklist_predicates {
-	all_versions,
-	less_than_or_equal,
-	equal,
-	greater_than_or_equal,
-};
-
-struct acpi_blacklist_item {
-	char oem_id[7];
-	char oem_table_id[9];
-	u32 oem_revision;
-	char *table;
-	enum acpi_blacklist_predicates oem_revision_predicate;
-	char *reason;
-	u32 is_critical_error;
-};
-
 static struct dmi_system_id acpi_rev_dmi_table[] __initdata;
 
 /*
  * POLICY: If *anything* doesn't work, put it on the blacklist.
  *	   If they are critical errors, mark it critical, and abort driver load.
  */
-static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
+static struct acpi_oemlist acpi_blacklist[] __initdata = {
 	/* Compaq Presario 1700 */
 	{"PTLTD ", "  DSDT  ", 0x06040000, ACPI_SIG_DSDT, less_than_or_equal,
 	 "Multiple problems", 1},
@@ -67,65 +50,27 @@ static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
 	{"IBM   ", "TP600E  ", 0x00000105, ACPI_SIG_DSDT, less_than_or_equal,
 	 "Incorrect _ADR", 1},
 
-	{""}
+	{ }
 };
 
 int __init acpi_blacklisted(void)
 {
-	int i = 0;
+	int i;
 	int blacklisted = 0;
-	struct acpi_table_header table_header;
-
-	while (acpi_blacklist[i].oem_id[0] != '\0') {
-		if (acpi_get_table_header(acpi_blacklist[i].table, 0, &table_header)) {
-			i++;
-			continue;
-		}
-
-		if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) {
-			i++;
-			continue;
-		}
-
-		if (strncmp
-		    (acpi_blacklist[i].oem_table_id, table_header.oem_table_id,
-		     8)) {
-			i++;
-			continue;
-		}
-
-		if ((acpi_blacklist[i].oem_revision_predicate == all_versions)
-		    || (acpi_blacklist[i].oem_revision_predicate ==
-			less_than_or_equal
-			&& table_header.oem_revision <=
-			acpi_blacklist[i].oem_revision)
-		    || (acpi_blacklist[i].oem_revision_predicate ==
-			greater_than_or_equal
-			&& table_header.oem_revision >=
-			acpi_blacklist[i].oem_revision)
-		    || (acpi_blacklist[i].oem_revision_predicate == equal
-			&& table_header.oem_revision ==
-			acpi_blacklist[i].oem_revision)) {
 
-			printk(KERN_ERR PREFIX
-			       "Vendor \"%6.6s\" System \"%8.8s\" "
-			       "Revision 0x%x has a known ACPI BIOS problem.\n",
-			       acpi_blacklist[i].oem_id,
-			       acpi_blacklist[i].oem_table_id,
-			       acpi_blacklist[i].oem_revision);
+	i = acpi_match_oemlist(acpi_blacklist);
+	if (i >= 0) {
+		pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" Revision 0x%x has a known ACPI BIOS problem.\n",
+		       acpi_blacklist[i].oem_id,
+		       acpi_blacklist[i].oem_table_id,
+		       acpi_blacklist[i].oem_revision);
 
-			printk(KERN_ERR PREFIX
-			       "Reason: %s. This is a %s error\n",
-			       acpi_blacklist[i].reason,
-			       (acpi_blacklist[i].
-				is_critical_error ? "non-recoverable" :
-				"recoverable"));
+		pr_err(PREFIX "Reason: %s. This is a %s error\n",
+		       acpi_blacklist[i].reason,
+		       (acpi_blacklist[i].data ?
+			"non-recoverable" : "recoverable"));
 
-			blacklisted = acpi_blacklist[i].is_critical_error;
-			break;
-		} else {
-			i++;
-		}
+		blacklisted = acpi_blacklist[i].data;
 	}
 
 	(void)early_acpi_osi_init();
diff --git a/drivers/acpi/utils.c b/drivers/acpi/utils.c
index b9d956c..78b9422 100644
--- a/drivers/acpi/utils.c
+++ b/drivers/acpi/utils.c
@@ -816,3 +816,43 @@ static int __init acpi_backlight(char *str)
 	return 1;
 }
 __setup("acpi_backlight=", acpi_backlight);
+
+/**
+ * acpi_match_oemlist - Check if the system matches with an oem list
+ * @oem: pointer to acpi_oemlist table terminated by a NULL entry
+ *
+ * Return the matched index if the system is found in the oem list.
+ * Otherwise, return a negative error code.
+ */
+int acpi_match_oemlist(const struct acpi_oemlist *oem)
+{
+	struct acpi_table_header hdr;
+	int idx = 0;
+
+	if (acpi_disabled)
+		return -ENODEV;
+
+	for (; oem->oem_id[0]; oem++, idx++) {
+		if (ACPI_FAILURE(acpi_get_table_header(oem->table, 0, &hdr)))
+			continue;
+
+		if (strncmp(oem->oem_id, hdr.oem_id, ACPI_OEM_ID_SIZE))
+			continue;
+
+		if (strncmp(oem->oem_table_id, hdr.oem_table_id,
+			    ACPI_OEM_TABLE_ID_SIZE))
+			continue;
+
+		if ((oem->pred == all_versions) ||
+		    (oem->pred == less_than_or_equal
+			&& hdr.oem_revision <= oem->oem_revision) ||
+		    (oem->pred == greater_than_or_equal
+			&& hdr.oem_revision >= oem->oem_revision) ||
+		    (oem->pred == equal
+			&& hdr.oem_revision == oem->oem_revision))
+			return idx;
+	}
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(acpi_match_oemlist);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c749eef..6053acb 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -556,6 +556,25 @@ extern acpi_status acpi_pci_osc_control_set(acpi_handle handle,
 #define ACPI_OST_SC_DRIVER_LOAD_FAILURE		0x81
 #define ACPI_OST_SC_INSERT_NOT_SUPPORTED	0x82
 
+enum acpi_predicate {
+	all_versions,
+	less_than_or_equal,
+	equal,
+	greater_than_or_equal,
+};
+
+/* Table must be terminted by a NULL entry */
+struct acpi_oemlist {
+	char	oem_id[ACPI_OEM_ID_SIZE];
+	char	oem_table_id[ACPI_OEM_TABLE_ID_SIZE];
+	u32	oem_revision;
+	char	*table;
+	enum acpi_predicate pred;
+	char	*reason;
+	u32	data;
+};
+int acpi_match_oemlist(const struct acpi_oemlist *oem);
+
 extern void acpi_early_init(void);
 extern void acpi_subsystem_init(void);
 

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

ACPI OEM ID / OEM Table ID / Revision can be used to identify
a platform based on ACPI firmware info.  acpi_blacklisted(),
intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
have been using similar check to detect a list of platforms
that require special handlings.

Move the platform check in acpi_blacklisted() to a new common
utility function, acpi_match_oemlist(), so that other drivers
do not have to implement their own version.

There is no change in functionality.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Borislav Petkov <bp@alien8.de>
---
 drivers/acpi/blacklist.c |   83 ++++++++--------------------------------------
 drivers/acpi/utils.c     |   40 ++++++++++++++++++++++
 include/linux/acpi.h     |   19 +++++++++++
 3 files changed, 73 insertions(+), 69 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
index bb542ac..bf44573 100644
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -30,30 +30,13 @@
 
 #include "internal.h"
 
-enum acpi_blacklist_predicates {
-	all_versions,
-	less_than_or_equal,
-	equal,
-	greater_than_or_equal,
-};
-
-struct acpi_blacklist_item {
-	char oem_id[7];
-	char oem_table_id[9];
-	u32 oem_revision;
-	char *table;
-	enum acpi_blacklist_predicates oem_revision_predicate;
-	char *reason;
-	u32 is_critical_error;
-};
-
 static struct dmi_system_id acpi_rev_dmi_table[] __initdata;
 
 /*
  * POLICY: If *anything* doesn't work, put it on the blacklist.
  *	   If they are critical errors, mark it critical, and abort driver load.
  */
-static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
+static struct acpi_oemlist acpi_blacklist[] __initdata = {
 	/* Compaq Presario 1700 */
 	{"PTLTD ", "  DSDT  ", 0x06040000, ACPI_SIG_DSDT, less_than_or_equal,
 	 "Multiple problems", 1},
@@ -67,65 +50,27 @@ static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
 	{"IBM   ", "TP600E  ", 0x00000105, ACPI_SIG_DSDT, less_than_or_equal,
 	 "Incorrect _ADR", 1},
 
-	{""}
+	{ }
 };
 
 int __init acpi_blacklisted(void)
 {
-	int i = 0;
+	int i;
 	int blacklisted = 0;
-	struct acpi_table_header table_header;
-
-	while (acpi_blacklist[i].oem_id[0] != '\0') {
-		if (acpi_get_table_header(acpi_blacklist[i].table, 0, &table_header)) {
-			i++;
-			continue;
-		}
-
-		if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) {
-			i++;
-			continue;
-		}
-
-		if (strncmp
-		    (acpi_blacklist[i].oem_table_id, table_header.oem_table_id,
-		     8)) {
-			i++;
-			continue;
-		}
-
-		if ((acpi_blacklist[i].oem_revision_predicate == all_versions)
-		    || (acpi_blacklist[i].oem_revision_predicate ==
-			less_than_or_equal
-			&& table_header.oem_revision <=
-			acpi_blacklist[i].oem_revision)
-		    || (acpi_blacklist[i].oem_revision_predicate ==
-			greater_than_or_equal
-			&& table_header.oem_revision >=
-			acpi_blacklist[i].oem_revision)
-		    || (acpi_blacklist[i].oem_revision_predicate == equal
-			&& table_header.oem_revision ==
-			acpi_blacklist[i].oem_revision)) {
 
-			printk(KERN_ERR PREFIX
-			       "Vendor \"%6.6s\" System \"%8.8s\" "
-			       "Revision 0x%x has a known ACPI BIOS problem.\n",
-			       acpi_blacklist[i].oem_id,
-			       acpi_blacklist[i].oem_table_id,
-			       acpi_blacklist[i].oem_revision);
+	i = acpi_match_oemlist(acpi_blacklist);
+	if (i >= 0) {
+		pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" Revision 0x%x has a known ACPI BIOS problem.\n",
+		       acpi_blacklist[i].oem_id,
+		       acpi_blacklist[i].oem_table_id,
+		       acpi_blacklist[i].oem_revision);
 
-			printk(KERN_ERR PREFIX
-			       "Reason: %s. This is a %s error\n",
-			       acpi_blacklist[i].reason,
-			       (acpi_blacklist[i].
-				is_critical_error ? "non-recoverable" :
-				"recoverable"));
+		pr_err(PREFIX "Reason: %s. This is a %s error\n",
+		       acpi_blacklist[i].reason,
+		       (acpi_blacklist[i].data ?
+			"non-recoverable" : "recoverable"));
 
-			blacklisted = acpi_blacklist[i].is_critical_error;
-			break;
-		} else {
-			i++;
-		}
+		blacklisted = acpi_blacklist[i].data;
 	}
 
 	(void)early_acpi_osi_init();
diff --git a/drivers/acpi/utils.c b/drivers/acpi/utils.c
index b9d956c..78b9422 100644
--- a/drivers/acpi/utils.c
+++ b/drivers/acpi/utils.c
@@ -816,3 +816,43 @@ static int __init acpi_backlight(char *str)
 	return 1;
 }
 __setup("acpi_backlight=", acpi_backlight);
+
+/**
+ * acpi_match_oemlist - Check if the system matches with an oem list
+ * @oem: pointer to acpi_oemlist table terminated by a NULL entry
+ *
+ * Return the matched index if the system is found in the oem list.
+ * Otherwise, return a negative error code.
+ */
+int acpi_match_oemlist(const struct acpi_oemlist *oem)
+{
+	struct acpi_table_header hdr;
+	int idx = 0;
+
+	if (acpi_disabled)
+		return -ENODEV;
+
+	for (; oem->oem_id[0]; oem++, idx++) {
+		if (ACPI_FAILURE(acpi_get_table_header(oem->table, 0, &hdr)))
+			continue;
+
+		if (strncmp(oem->oem_id, hdr.oem_id, ACPI_OEM_ID_SIZE))
+			continue;
+
+		if (strncmp(oem->oem_table_id, hdr.oem_table_id,
+			    ACPI_OEM_TABLE_ID_SIZE))
+			continue;
+
+		if ((oem->pred == all_versions) ||
+		    (oem->pred == less_than_or_equal
+			&& hdr.oem_revision <= oem->oem_revision) ||
+		    (oem->pred == greater_than_or_equal
+			&& hdr.oem_revision >= oem->oem_revision) ||
+		    (oem->pred == equal
+			&& hdr.oem_revision == oem->oem_revision))
+			return idx;
+	}
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(acpi_match_oemlist);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c749eef..6053acb 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -556,6 +556,25 @@ extern acpi_status acpi_pci_osc_control_set(acpi_handle handle,
 #define ACPI_OST_SC_DRIVER_LOAD_FAILURE		0x81
 #define ACPI_OST_SC_INSERT_NOT_SUPPORTED	0x82
 
+enum acpi_predicate {
+	all_versions,
+	less_than_or_equal,
+	equal,
+	greater_than_or_equal,
+};
+
+/* Table must be terminted by a NULL entry */
+struct acpi_oemlist {
+	char	oem_id[ACPI_OEM_ID_SIZE];
+	char	oem_table_id[ACPI_OEM_TABLE_ID_SIZE];
+	u32	oem_revision;
+	char	*table;
+	enum acpi_predicate pred;
+	char	*reason;
+	u32	data;
+};
+int acpi_match_oemlist(const struct acpi_oemlist *oem);
+
 extern void acpi_early_init(void);
 extern void acpi_subsystem_init(void);
 

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v2 2/7] intel_pstate: convert to use acpi_match_oemlist()
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani, Srinivas Pandruvada

Convert to use acpi_match_oemlist() for the platform check.
There is no change in functionality.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
---
 drivers/cpufreq/intel_pstate.c |   64 ++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 6cd5035..7944723 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2475,39 +2475,31 @@ enum {
 	PPC,
 };
 
-struct hw_vendor_info {
-	u16  valid;
-	char oem_id[ACPI_OEM_ID_SIZE];
-	char oem_table_id[ACPI_OEM_TABLE_ID_SIZE];
-	int  oem_pwr_table;
-};
-
 /* Hardware vendor-specific info that has its own power management modes */
-static struct hw_vendor_info vendor_info[] __initdata = {
-	{1, "HP    ", "ProLiant", PSS},
-	{1, "ORACLE", "X4-2    ", PPC},
-	{1, "ORACLE", "X4-2L   ", PPC},
-	{1, "ORACLE", "X4-2B   ", PPC},
-	{1, "ORACLE", "X3-2    ", PPC},
-	{1, "ORACLE", "X3-2L   ", PPC},
-	{1, "ORACLE", "X3-2B   ", PPC},
-	{1, "ORACLE", "X4470M2 ", PPC},
-	{1, "ORACLE", "X4270M3 ", PPC},
-	{1, "ORACLE", "X4270M2 ", PPC},
-	{1, "ORACLE", "X4170M2 ", PPC},
-	{1, "ORACLE", "X4170 M3", PPC},
-	{1, "ORACLE", "X4275 M3", PPC},
-	{1, "ORACLE", "X6-2    ", PPC},
-	{1, "ORACLE", "Sudbury ", PPC},
-	{0, "", ""},
+static struct acpi_oemlist oemlist[] __initdata = {
+	{"HP    ", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS},
+	{"ORACLE", "X4-2    ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X3-2    ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X3-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X3-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X6-2    ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{ } /* End */
 };
 
 static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
 {
-	struct acpi_table_header hdr;
-	struct hw_vendor_info *v_info;
 	const struct x86_cpu_id *id;
 	u64 misc_pwr;
+	int idx;
 
 	id = x86_match_cpu(intel_pstate_cpu_oob_ids);
 	if (id) {
@@ -2516,21 +2508,15 @@ static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
 			return true;
 	}
 
-	if (acpi_disabled ||
-	    ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, &hdr)))
+	idx = acpi_match_oemlist(oemlist);
+	if (idx < 0)
 		return false;
 
-	for (v_info = vendor_info; v_info->valid; v_info++) {
-		if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) &&
-			!strncmp(hdr.oem_table_id, v_info->oem_table_id,
-						ACPI_OEM_TABLE_ID_SIZE))
-			switch (v_info->oem_pwr_table) {
-			case PSS:
-				return intel_pstate_no_acpi_pss();
-			case PPC:
-				return intel_pstate_has_acpi_ppc() &&
-					(!force_load);
-			}
+	switch (oemlist[idx].data) {
+	case PSS:
+		return intel_pstate_no_acpi_pss();
+	case PPC:
+		return intel_pstate_has_acpi_ppc() && !force_load;
 	}
 
 	return false;

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,2/7] intel_pstate: convert to use acpi_match_oemlist()
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani, Srinivas Pandruvada

Convert to use acpi_match_oemlist() for the platform check.
There is no change in functionality.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
---
 drivers/cpufreq/intel_pstate.c |   64 ++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 39 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 6cd5035..7944723 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2475,39 +2475,31 @@ enum {
 	PPC,
 };
 
-struct hw_vendor_info {
-	u16  valid;
-	char oem_id[ACPI_OEM_ID_SIZE];
-	char oem_table_id[ACPI_OEM_TABLE_ID_SIZE];
-	int  oem_pwr_table;
-};
-
 /* Hardware vendor-specific info that has its own power management modes */
-static struct hw_vendor_info vendor_info[] __initdata = {
-	{1, "HP    ", "ProLiant", PSS},
-	{1, "ORACLE", "X4-2    ", PPC},
-	{1, "ORACLE", "X4-2L   ", PPC},
-	{1, "ORACLE", "X4-2B   ", PPC},
-	{1, "ORACLE", "X3-2    ", PPC},
-	{1, "ORACLE", "X3-2L   ", PPC},
-	{1, "ORACLE", "X3-2B   ", PPC},
-	{1, "ORACLE", "X4470M2 ", PPC},
-	{1, "ORACLE", "X4270M3 ", PPC},
-	{1, "ORACLE", "X4270M2 ", PPC},
-	{1, "ORACLE", "X4170M2 ", PPC},
-	{1, "ORACLE", "X4170 M3", PPC},
-	{1, "ORACLE", "X4275 M3", PPC},
-	{1, "ORACLE", "X6-2    ", PPC},
-	{1, "ORACLE", "Sudbury ", PPC},
-	{0, "", ""},
+static struct acpi_oemlist oemlist[] __initdata = {
+	{"HP    ", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS},
+	{"ORACLE", "X4-2    ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X3-2    ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X3-2L   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X3-2B   ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "X6-2    ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC},
+	{ } /* End */
 };
 
 static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
 {
-	struct acpi_table_header hdr;
-	struct hw_vendor_info *v_info;
 	const struct x86_cpu_id *id;
 	u64 misc_pwr;
+	int idx;
 
 	id = x86_match_cpu(intel_pstate_cpu_oob_ids);
 	if (id) {
@@ -2516,21 +2508,15 @@ static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
 			return true;
 	}
 
-	if (acpi_disabled ||
-	    ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, &hdr)))
+	idx = acpi_match_oemlist(oemlist);
+	if (idx < 0)
 		return false;
 
-	for (v_info = vendor_info; v_info->valid; v_info++) {
-		if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) &&
-			!strncmp(hdr.oem_table_id, v_info->oem_table_id,
-						ACPI_OEM_TABLE_ID_SIZE))
-			switch (v_info->oem_pwr_table) {
-			case PSS:
-				return intel_pstate_no_acpi_pss();
-			case PPC:
-				return intel_pstate_has_acpi_ppc() &&
-					(!force_load);
-			}
+	switch (oemlist[idx].data) {
+	case PSS:
+		return intel_pstate_no_acpi_pss();
+	case PPC:
+		return intel_pstate_has_acpi_ppc() && !force_load;
 	}
 
 	return false;

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v2 3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

When 'osc_sb_apei_support_acked' is set, it indicates that
the platform supports APEI, firmware-first mode, as ACPI _OSC
capability bit 4, APEI Support, was set in query.  While _OSC
is an optional method, platforms with APEI support should
implement it to inform its capability to the OS properly.

Add check to 'osc_sb_apei_support_acked' before calling
ghes_edac_register() to restrict that ghes_edac is enabled
on platforms with APEI support capability set in _OSC.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d45..49d75eb 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1139,9 +1139,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		goto err;
 	}
 
-	rc = ghes_edac_register(ghes, &ghes_dev->dev);
-	if (rc < 0)
-		goto err;
+	if (osc_sb_apei_support_acked) {
+		rc = ghes_edac_register(ghes, &ghes_dev->dev);
+		if (rc < 0)
+			goto err;
+	}
 
 	switch (generic->notify.type) {
 	case ACPI_HEST_NOTIFY_POLLED:

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

When 'osc_sb_apei_support_acked' is set, it indicates that
the platform supports APEI, firmware-first mode, as ACPI _OSC
capability bit 4, APEI Support, was set in query.  While _OSC
is an optional method, platforms with APEI support should
implement it to inform its capability to the OS properly.

Add check to 'osc_sb_apei_support_acked' before calling
ghes_edac_register() to restrict that ghes_edac is enabled
on platforms with APEI support capability set in _OSC.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d45..49d75eb 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1139,9 +1139,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		goto err;
 	}
 
-	rc = ghes_edac_register(ghes, &ghes_dev->dev);
-	if (rc < 0)
-		goto err;
+	if (osc_sb_apei_support_acked) {
+		rc = ghes_edac_register(ghes, &ghes_dev->dev);
+		if (rc < 0)
+			goto err;
+	}
 
 	switch (generic->notify.type) {
 	case ACPI_HEST_NOTIFY_POLLED:

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

ghes_edac_register() is called for each GHES platform device
instantiated per a GHES entry in ACPI HEST table.  dmi_walk()
counts the number of DIMMs on the system, and there is no need
to call it multiple times.

Change ghes_edac_register() to call dmi_walk() only when
'num_dimm' is uninitialized.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
---
 drivers/edac/ghes_edac.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4e61a62..2e9ce9c 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -407,15 +407,18 @@ EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
-	bool fake = false;
-	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
+	int rc;
+
+	static int num_dimm;
+	static bool fake;
 
 	/* Get the number of DIMMs */
-	dmi_walk(ghes_edac_count_dimms, &num_dimm);
+	if (num_dimm == 0)
+		dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
 	/* Check if we've got a bogus BIOS */
 	if (num_dimm == 0) {

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

ghes_edac_register() is called for each GHES platform device
instantiated per a GHES entry in ACPI HEST table.  dmi_walk()
counts the number of DIMMs on the system, and there is no need
to call it multiple times.

Change ghes_edac_register() to call dmi_walk() only when
'num_dimm' is uninitialized.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
---
 drivers/edac/ghes_edac.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4e61a62..2e9ce9c 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -407,15 +407,18 @@ EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
-	bool fake = false;
-	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
+	int rc;
+
+	static int num_dimm;
+	static bool fake;
 
 	/* Get the number of DIMMs */
-	dmi_walk(ghes_edac_count_dimms, &num_dimm);
+	if (num_dimm == 0)
+		dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
 	/* Check if we've got a bogus BIOS */
 	if (num_dimm == 0) {

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v2 5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.any_oem=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/edac/ghes_edac.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 2e9ce9c..72bab02 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -34,6 +34,9 @@ static LIST_HEAD(ghes_reglist);
 static DEFINE_MUTEX(ghes_edac_lock);
 static int ghes_edac_mc_num;
 
+/* Set 1 to skip the platform check */
+static bool __read_mostly ghes_edac_any_oem;
+module_param_named(any_oem, ghes_edac_any_oem, bool, 0);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -405,17 +408,31 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
+/*
+ * Known systems that are safe to enable this module.
+ * "ghes_edac.any_oem=1" skips this check if necessary.
+ */
+static struct acpi_oemlist oemlist[] = {
+	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+	{ } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
-	int rc;
+	int rc, idx;
 
 	static int num_dimm;
 	static bool fake;
 
+	/* Check if safe to enable on this system */
+	idx = acpi_match_oemlist(oemlist);
+	if (!ghes_edac_any_oem && idx < 0)
+		return 0;
+
 	/* Get the number of DIMMs */
 	if (num_dimm == 0)
 		dmi_walk(ghes_edac_count_dimms, &num_dimm);
@@ -459,7 +476,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->dev_name = "ghes";
 
 	if (!ghes_edac_mc_num) {
-		if (!fake) {
+		if (fake) {
+			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+			pr_info("work on such system. Use this driver with caution\n");
+		} else if (idx < 0) {
 			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
 			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
 			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
@@ -467,10 +488,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 			pr_info("to correct its BIOS.\n");
 			pr_info("This system has %d DIMM sockets.\n",
 				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.any_oem=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/edac/ghes_edac.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 2e9ce9c..72bab02 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -34,6 +34,9 @@ static LIST_HEAD(ghes_reglist);
 static DEFINE_MUTEX(ghes_edac_lock);
 static int ghes_edac_mc_num;
 
+/* Set 1 to skip the platform check */
+static bool __read_mostly ghes_edac_any_oem;
+module_param_named(any_oem, ghes_edac_any_oem, bool, 0);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -405,17 +408,31 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
+/*
+ * Known systems that are safe to enable this module.
+ * "ghes_edac.any_oem=1" skips this check if necessary.
+ */
+static struct acpi_oemlist oemlist[] = {
+	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+	{ } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
-	int rc;
+	int rc, idx;
 
 	static int num_dimm;
 	static bool fake;
 
+	/* Check if safe to enable on this system */
+	idx = acpi_match_oemlist(oemlist);
+	if (!ghes_edac_any_oem && idx < 0)
+		return 0;
+
 	/* Get the number of DIMMs */
 	if (num_dimm == 0)
 		dmi_walk(ghes_edac_count_dimms, &num_dimm);
@@ -459,7 +476,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->dev_name = "ghes";
 
 	if (!ghes_edac_mc_num) {
-		if (!fake) {
+		if (fake) {
+			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+			pr_info("work on such system. Use this driver with caution\n");
+		} else if (idx < 0) {
 			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
 			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
 			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
@@ -467,10 +488,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 			pr_info("to correct its BIOS.\n");
 			pr_info("This system has %d DIMM sockets.\n",
 				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

Only a single edac driver can be enabled for EDAC MC.  When ghes_edac
is enabled, a regular edac driver for the CPU type / platform still
attempts to register itself and fails in edac_mc_add_mc().

Add edac_check_mc_owner() so that regular edac drivers can check
the owner of EDAC MC at the beginning of initialization.

Also change the owner check to string comparison from address check.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/edac/edac_mc.c |   15 ++++++++++++++-
 drivers/edac/edac_mc.h |   12 ++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 4800721..52d8d5e 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -701,6 +701,19 @@ struct mem_ctl_info *edac_mc_find(int idx)
 }
 EXPORT_SYMBOL(edac_mc_find);
 
+/*
+ * Returns:
+ *	1 when EDAC MC is free or owned by the module name
+ *	0 when EDAC MC is owned by other module
+ */
+int edac_check_mc_owner(const char *mod_name)
+{
+	if (edac_mc_owner && strcmp(edac_mc_owner, mod_name))
+		return 0;
+
+	return 1;
+}
+EXPORT_SYMBOL(edac_check_mc_owner);
 
 /* FIXME - should a warning be printed if no error detection? correction? */
 int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
@@ -742,7 +755,7 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
 #endif
 	mutex_lock(&mem_ctls_mutex);
 
-	if (edac_mc_owner && edac_mc_owner != mci->mod_name) {
+	if (!edac_check_mc_owner(mci->mod_name)) {
 		ret = -EPERM;
 		goto fail0;
 	}
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 5357800..0e95eba 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -128,6 +128,18 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 				   unsigned sz_pvt);
 
 /**
+ *
+ * edac_check_mc_owner - Check the owner of EDAC MC
+ *
+ * @mod_name: pointer to the module name
+ *
+ * Returns:
+ *	1 when EDAC MC is free or owned by the module name
+ *	0 when EDAC MC is owned by other module
+ */
+extern int edac_check_mc_owner(const char *mod_name);
+
+/*
  * edac_mc_add_mc_with_groups() - Insert the @mci structure into the mci
  *	global list and create sysfs entries associated with @mci structure.
  *

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

Only a single edac driver can be enabled for EDAC MC.  When ghes_edac
is enabled, a regular edac driver for the CPU type / platform still
attempts to register itself and fails in edac_mc_add_mc().

Add edac_check_mc_owner() so that regular edac drivers can check
the owner of EDAC MC at the beginning of initialization.

Also change the owner check to string comparison from address check.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/edac/edac_mc.c |   15 ++++++++++++++-
 drivers/edac/edac_mc.h |   12 ++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 4800721..52d8d5e 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -701,6 +701,19 @@ struct mem_ctl_info *edac_mc_find(int idx)
 }
 EXPORT_SYMBOL(edac_mc_find);
 
+/*
+ * Returns:
+ *	1 when EDAC MC is free or owned by the module name
+ *	0 when EDAC MC is owned by other module
+ */
+int edac_check_mc_owner(const char *mod_name)
+{
+	if (edac_mc_owner && strcmp(edac_mc_owner, mod_name))
+		return 0;
+
+	return 1;
+}
+EXPORT_SYMBOL(edac_check_mc_owner);
 
 /* FIXME - should a warning be printed if no error detection? correction? */
 int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
@@ -742,7 +755,7 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
 #endif
 	mutex_lock(&mem_ctls_mutex);
 
-	if (edac_mc_owner && edac_mc_owner != mci->mod_name) {
+	if (!edac_check_mc_owner(mci->mod_name)) {
 		ret = -EPERM;
 		goto fail0;
 	}
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 5357800..0e95eba 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -128,6 +128,18 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 				   unsigned sz_pvt);
 
 /**
+ *
+ * edac_check_mc_owner - Check the owner of EDAC MC
+ *
+ * @mod_name: pointer to the module name
+ *
+ * Returns:
+ *	1 when EDAC MC is free or owned by the module name
+ *	0 when EDAC MC is owned by other module
+ */
+extern int edac_check_mc_owner(const char *mod_name);
+
+/*
  * edac_mc_add_mc_with_groups() - Insert the @mci structure into the mci
  *	global list and create sysfs entries associated with @mci structure.
  *

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v2 7/7] edac drivers: add MC owner check in init
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

Change generic x86 edac drivers, which probe CPU type with
x86_match_cpu(), to call edac_check_mc_owner() in their
module init functions.  This allows them to fail their init
at the beginning when ghes_edac is enabled.  Similar change
can be made to other edac drivers as necessary.

This is an optimization and there is no functional change.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/edac/amd64_edac.c |    3 +++
 drivers/edac/pnd2_edac.c  |    7 ++++++-
 drivers/edac/sb_edac.c    |    7 +++++--
 drivers/edac/skx_edac.c   |    6 +++++-
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3aea556..cdb40d6 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3444,6 +3444,9 @@ static int __init amd64_edac_init(void)
 	if (amd_cache_northbridges() < 0)
 		return -ENODEV;
 
+	if (!edac_check_mc_owner(EDAC_MOD_STR))
+		return -EBUSY;
+
 	opstate_init();
 
 	err = -ENOMEM;
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 8e59949..a5b7855 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -45,6 +45,8 @@
 #include "edac_module.h"
 #include "pnd2_edac.h"
 
+#define PND2_MOD_NAME		"pnd2_edac.c"
+
 #define APL_NUM_CHANNELS	4
 #define DNV_NUM_CHANNELS	2
 #define DNV_MAX_DIMMS		2 /* Max DIMMs per channel */
@@ -1313,7 +1315,7 @@ static int pnd2_register_mci(struct mem_ctl_info **ppmci)
 	pvt = mci->pvt_info;
 	memset(pvt, 0, sizeof(*pvt));
 
-	mci->mod_name = "pnd2_edac.c";
+	mci->mod_name = PND2_MOD_NAME;
 	mci->dev_name = ops->name;
 	mci->ctl_name = "Pondicherry2";
 
@@ -1513,6 +1515,9 @@ static int __init pnd2_init(void)
 	if (!id)
 		return -ENODEV;
 
+	if (!edac_check_mc_owner(PND2_MOD_NAME))
+		return -EBUSY;
+
 	ops = (struct dunit_ops *)id->driver_data;
 
 	/* Ensure that the OPSTATE is set correctly for POLL or NMI */
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 80d860c..71bd66e 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
  * Alter this version for the module when modifications are made
  */
 #define SBRIDGE_REVISION    " Ver: 1.1.2 "
-#define EDAC_MOD_STR      "sbridge_edac"
+#define SBRIDGE_MOD_NAME    "sb_edac.c"
 
 /*
  * Debug macros
@@ -3124,7 +3124,7 @@ static int sbridge_register_mci(struct sbridge_dev *sbridge_dev, enum type type)
 		MEM_FLAG_DDR4 : MEM_FLAG_DDR3;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
-	mci->mod_name = "sb_edac.c";
+	mci->mod_name = SBRIDGE_MOD_NAME;
 	mci->mod_ver = SBRIDGE_REVISION;
 	mci->dev_name = pci_name(pdev);
 	mci->ctl_page_to_phys = NULL;
@@ -3380,6 +3380,9 @@ static int __init sbridge_init(void)
 	if (!id)
 		return -ENODEV;
 
+	if (!edac_check_mc_owner(SBRIDGE_MOD_NAME))
+		return -EBUSY;
+
 	/* Ensure that the OPSTATE is set correctly for POLL or NMI */
 	opstate_init();
 
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index 64bef6c9..8347969 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -31,6 +31,7 @@
 
 #include "edac_module.h"
 
+#define SKX_MOD_NAME    "skx_edac.c"
 #define SKX_REVISION    " Ver: 1.0 "
 
 /*
@@ -471,7 +472,7 @@ static int skx_register_mci(struct skx_imc *imc)
 	mci->mtype_cap = MEM_FLAG_DDR4;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
-	mci->mod_name = "skx_edac.c";
+	mci->mod_name = SKX_MOD_NAME;
 	mci->dev_name = pci_name(imc->chan[0].cdev);
 	mci->mod_ver = SKX_REVISION;
 	mci->ctl_page_to_phys = NULL;
@@ -1052,6 +1053,9 @@ static int __init skx_init(void)
 	if (!id)
 		return -ENODEV;
 
+	if (!edac_check_mc_owner(SKX_MOD_NAME))
+		return -EBUSY;
+
 	rc = skx_get_hi_lo();
 	if (rc)
 		return rc;

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,7/7] edac drivers: add MC owner check in init
@ 2017-08-03 21:57   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-03 21:57 UTC (permalink / raw)
  To: rjw, bp
  Cc: mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel,
	Toshi Kani

Change generic x86 edac drivers, which probe CPU type with
x86_match_cpu(), to call edac_check_mc_owner() in their
module init functions.  This allows them to fail their init
at the beginning when ghes_edac is enabled.  Similar change
can be made to other edac drivers as necessary.

This is an optimization and there is no functional change.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/edac/amd64_edac.c |    3 +++
 drivers/edac/pnd2_edac.c  |    7 ++++++-
 drivers/edac/sb_edac.c    |    7 +++++--
 drivers/edac/skx_edac.c   |    6 +++++-
 4 files changed, 19 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3aea556..cdb40d6 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3444,6 +3444,9 @@ static int __init amd64_edac_init(void)
 	if (amd_cache_northbridges() < 0)
 		return -ENODEV;
 
+	if (!edac_check_mc_owner(EDAC_MOD_STR))
+		return -EBUSY;
+
 	opstate_init();
 
 	err = -ENOMEM;
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 8e59949..a5b7855 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -45,6 +45,8 @@
 #include "edac_module.h"
 #include "pnd2_edac.h"
 
+#define PND2_MOD_NAME		"pnd2_edac.c"
+
 #define APL_NUM_CHANNELS	4
 #define DNV_NUM_CHANNELS	2
 #define DNV_MAX_DIMMS		2 /* Max DIMMs per channel */
@@ -1313,7 +1315,7 @@ static int pnd2_register_mci(struct mem_ctl_info **ppmci)
 	pvt = mci->pvt_info;
 	memset(pvt, 0, sizeof(*pvt));
 
-	mci->mod_name = "pnd2_edac.c";
+	mci->mod_name = PND2_MOD_NAME;
 	mci->dev_name = ops->name;
 	mci->ctl_name = "Pondicherry2";
 
@@ -1513,6 +1515,9 @@ static int __init pnd2_init(void)
 	if (!id)
 		return -ENODEV;
 
+	if (!edac_check_mc_owner(PND2_MOD_NAME))
+		return -EBUSY;
+
 	ops = (struct dunit_ops *)id->driver_data;
 
 	/* Ensure that the OPSTATE is set correctly for POLL or NMI */
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 80d860c..71bd66e 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
  * Alter this version for the module when modifications are made
  */
 #define SBRIDGE_REVISION    " Ver: 1.1.2 "
-#define EDAC_MOD_STR      "sbridge_edac"
+#define SBRIDGE_MOD_NAME    "sb_edac.c"
 
 /*
  * Debug macros
@@ -3124,7 +3124,7 @@ static int sbridge_register_mci(struct sbridge_dev *sbridge_dev, enum type type)
 		MEM_FLAG_DDR4 : MEM_FLAG_DDR3;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
-	mci->mod_name = "sb_edac.c";
+	mci->mod_name = SBRIDGE_MOD_NAME;
 	mci->mod_ver = SBRIDGE_REVISION;
 	mci->dev_name = pci_name(pdev);
 	mci->ctl_page_to_phys = NULL;
@@ -3380,6 +3380,9 @@ static int __init sbridge_init(void)
 	if (!id)
 		return -ENODEV;
 
+	if (!edac_check_mc_owner(SBRIDGE_MOD_NAME))
+		return -EBUSY;
+
 	/* Ensure that the OPSTATE is set correctly for POLL or NMI */
 	opstate_init();
 
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index 64bef6c9..8347969 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -31,6 +31,7 @@
 
 #include "edac_module.h"
 
+#define SKX_MOD_NAME    "skx_edac.c"
 #define SKX_REVISION    " Ver: 1.0 "
 
 /*
@@ -471,7 +472,7 @@ static int skx_register_mci(struct skx_imc *imc)
 	mci->mtype_cap = MEM_FLAG_DDR4;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
-	mci->mod_name = "skx_edac.c";
+	mci->mod_name = SKX_MOD_NAME;
 	mci->dev_name = pci_name(imc->chan[0].cdev);
 	mci->mod_ver = SKX_REVISION;
 	mci->ctl_page_to_phys = NULL;
@@ -1052,6 +1053,9 @@ static int __init skx_init(void)
 	if (!id)
 		return -ENODEV;
 
+	if (!edac_check_mc_owner(SKX_MOD_NAME))
+		return -EBUSY;
+
 	rc = skx_get_hi_lo();
 	if (rc)
 		return rc;

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-04  3:42     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  3:42 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:47PM -0600, Toshi Kani wrote:
> ACPI OEM ID / OEM Table ID / Revision can be used to identify
> a platform based on ACPI firmware info.  acpi_blacklisted(),
> intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
> have been using similar check to detect a list of platforms
> that require special handlings.
> 
> Move the platform check in acpi_blacklisted() to a new common
> utility function, acpi_match_oemlist(), so that other drivers
> do not have to implement their own version.
> 
> There is no change in functionality.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Borislav Petkov <bp@alien8.de>
> ---
>  drivers/acpi/blacklist.c |   83 ++++++++--------------------------------------
>  drivers/acpi/utils.c     |   40 ++++++++++++++++++++++
>  include/linux/acpi.h     |   19 +++++++++++
>  3 files changed, 73 insertions(+), 69 deletions(-)
> 
> diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
> index bb542ac..bf44573 100644
> --- a/drivers/acpi/blacklist.c
> +++ b/drivers/acpi/blacklist.c
> @@ -30,30 +30,13 @@
>  
>  #include "internal.h"
>  
> -enum acpi_blacklist_predicates {
> -	all_versions,
> -	less_than_or_equal,
> -	equal,
> -	greater_than_or_equal,
> -};
> -
> -struct acpi_blacklist_item {
> -	char oem_id[7];
> -	char oem_table_id[9];
> -	u32 oem_revision;
> -	char *table;
> -	enum acpi_blacklist_predicates oem_revision_predicate;
> -	char *reason;
> -	u32 is_critical_error;
> -};
> -
>  static struct dmi_system_id acpi_rev_dmi_table[] __initdata;
>  
>  /*
>   * POLICY: If *anything* doesn't work, put it on the blacklist.
>   *	   If they are critical errors, mark it critical, and abort driver load.
>   */
> -static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
> +static struct acpi_oemlist acpi_blacklist[] __initdata = {

All that wasted energy to try to explain to you that "oemlist" is wrong
and that whole rename is pointless, went for nothing.

So NAK.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-04  3:42     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  3:42 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:47PM -0600, Toshi Kani wrote:
> ACPI OEM ID / OEM Table ID / Revision can be used to identify
> a platform based on ACPI firmware info.  acpi_blacklisted(),
> intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
> have been using similar check to detect a list of platforms
> that require special handlings.
> 
> Move the platform check in acpi_blacklisted() to a new common
> utility function, acpi_match_oemlist(), so that other drivers
> do not have to implement their own version.
> 
> There is no change in functionality.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Borislav Petkov <bp@alien8.de>
> ---
>  drivers/acpi/blacklist.c |   83 ++++++++--------------------------------------
>  drivers/acpi/utils.c     |   40 ++++++++++++++++++++++
>  include/linux/acpi.h     |   19 +++++++++++
>  3 files changed, 73 insertions(+), 69 deletions(-)
> 
> diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
> index bb542ac..bf44573 100644
> --- a/drivers/acpi/blacklist.c
> +++ b/drivers/acpi/blacklist.c
> @@ -30,30 +30,13 @@
>  
>  #include "internal.h"
>  
> -enum acpi_blacklist_predicates {
> -	all_versions,
> -	less_than_or_equal,
> -	equal,
> -	greater_than_or_equal,
> -};
> -
> -struct acpi_blacklist_item {
> -	char oem_id[7];
> -	char oem_table_id[9];
> -	u32 oem_revision;
> -	char *table;
> -	enum acpi_blacklist_predicates oem_revision_predicate;
> -	char *reason;
> -	u32 is_critical_error;
> -};
> -
>  static struct dmi_system_id acpi_rev_dmi_table[] __initdata;
>  
>  /*
>   * POLICY: If *anything* doesn't work, put it on the blacklist.
>   *	   If they are critical errors, mark it critical, and abort driver load.
>   */
> -static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
> +static struct acpi_oemlist acpi_blacklist[] __initdata = {

All that wasted energy to try to explain to you that "oemlist" is wrong
and that whole rename is pointless, went for nothing.

So NAK.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-04  3:44     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  3:44 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:49PM -0600, Toshi Kani wrote:
> When 'osc_sb_apei_support_acked' is set, it indicates that
> the platform supports APEI, firmware-first mode, as ACPI _OSC
> capability bit 4, APEI Support, was set in query.  While _OSC
> is an optional method, platforms with APEI support should
> implement it to inform its capability to the OS properly.
> 
> Add check to 'osc_sb_apei_support_acked' before calling
> ghes_edac_register() to restrict that ghes_edac is enabled
> on platforms with APEI support capability set in _OSC.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/acpi/apei/ghes.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d661d45..49d75eb 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1139,9 +1139,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		goto err;
>  	}
>  
> -	rc = ghes_edac_register(ghes, &ghes_dev->dev);
> -	if (rc < 0)
> -		goto err;
> +	if (osc_sb_apei_support_acked) {

What for when we have the whitelist?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-04  3:44     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  3:44 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:49PM -0600, Toshi Kani wrote:
> When 'osc_sb_apei_support_acked' is set, it indicates that
> the platform supports APEI, firmware-first mode, as ACPI _OSC
> capability bit 4, APEI Support, was set in query.  While _OSC
> is an optional method, platforms with APEI support should
> implement it to inform its capability to the OS properly.
> 
> Add check to 'osc_sb_apei_support_acked' before calling
> ghes_edac_register() to restrict that ghes_edac is enabled
> on platforms with APEI support capability set in _OSC.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/acpi/apei/ghes.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d661d45..49d75eb 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1139,9 +1139,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		goto err;
>  	}
>  
> -	rc = ghes_edac_register(ghes, &ghes_dev->dev);
> -	if (rc < 0)
> -		goto err;
> +	if (osc_sb_apei_support_acked) {

What for when we have the whitelist?

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-04  4:05     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  4:05 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:50PM -0600, Toshi Kani wrote:
> ghes_edac_register() is called for each GHES platform device
> instantiated per a GHES entry in ACPI HEST table.  dmi_walk()
> counts the number of DIMMs on the system, and there is no need
> to call it multiple times.
> 
> Change ghes_edac_register() to call dmi_walk() only when
> 'num_dimm' is uninitialized.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
>  drivers/edac/ghes_edac.c |    9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 4e61a62..2e9ce9c 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -407,15 +407,18 @@ EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
>  
>  int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  {
> -	bool fake = false;
> -	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
>  	struct ghes_edac_pvt *pvt;
>  	struct ghes_edac_dimm_fill dimm_fill;
> +	int rc;
> +
> +	static int num_dimm;
> +	static bool fake;
>  
>  	/* Get the number of DIMMs */
> -	dmi_walk(ghes_edac_count_dimms, &num_dimm);
> +	if (num_dimm == 0)
> +		dmi_walk(ghes_edac_count_dimms, &num_dimm);

So the problem is that ghes_edac_register() gets called multiple times
depending on how many GHES platform devices are on the system. But yet
they all scan *all* DIMMs. So instead you should return if the DIMMs
have been counted already and not register a second time.

Which makes that whole mc counting kinda useless. So you could rip that
out too.

Unless I'm missing something...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-04  4:05     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  4:05 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:50PM -0600, Toshi Kani wrote:
> ghes_edac_register() is called for each GHES platform device
> instantiated per a GHES entry in ACPI HEST table.  dmi_walk()
> counts the number of DIMMs on the system, and there is no need
> to call it multiple times.
> 
> Change ghes_edac_register() to call dmi_walk() only when
> 'num_dimm' is uninitialized.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
>  drivers/edac/ghes_edac.c |    9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 4e61a62..2e9ce9c 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -407,15 +407,18 @@ EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
>  
>  int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  {
> -	bool fake = false;
> -	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
>  	struct ghes_edac_pvt *pvt;
>  	struct ghes_edac_dimm_fill dimm_fill;
> +	int rc;
> +
> +	static int num_dimm;
> +	static bool fake;
>  
>  	/* Get the number of DIMMs */
> -	dmi_walk(ghes_edac_count_dimms, &num_dimm);
> +	if (num_dimm == 0)
> +		dmi_walk(ghes_edac_count_dimms, &num_dimm);

So the problem is that ghes_edac_register() gets called multiple times
depending on how many GHES platform devices are on the system. But yet
they all scan *all* DIMMs. So instead you should return if the DIMMs
have been counted already and not register a second time.

Which makes that whole mc counting kinda useless. So you could rip that
out too.

Unless I'm missing something...

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-04  8:30     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  8:30 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:52PM -0600, Toshi Kani wrote:
> Only a single edac driver can be enabled for EDAC MC.  When ghes_edac
> is enabled, a regular edac driver for the CPU type / platform still
> attempts to register itself and fails in edac_mc_add_mc().
> 
> Add edac_check_mc_owner() so that regular edac drivers can check
> the owner of EDAC MC at the beginning of initialization.
> 
> Also change the owner check to string comparison from address check.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/edac/edac_mc.c |   15 ++++++++++++++-
>  drivers/edac/edac_mc.h |   12 ++++++++++++
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 4800721..52d8d5e 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -701,6 +701,19 @@ struct mem_ctl_info *edac_mc_find(int idx)
>  }
>  EXPORT_SYMBOL(edac_mc_find);
>  
> +/*
> + * Returns:
> + *	1 when EDAC MC is free or owned by the module name

1 for free OR owned?!? WTF?

> + *	0 when EDAC MC is owned by other module
> + */
> +int edac_check_mc_owner(const char *mod_name)
> +{
> +	if (edac_mc_owner && strcmp(edac_mc_owner, mod_name))

strncmp() of course, with sensible maximal string length.

> +		return 0;
> +
> +	return 1;
> +}
> +EXPORT_SYMBOL(edac_check_mc_owner);

EXPORT_SYMBOL_GPL

>  
>  /* FIXME - should a warning be printed if no error detection? correction? */
>  int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
> @@ -742,7 +755,7 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
>  #endif
>  	mutex_lock(&mem_ctls_mutex);
>  
> -	if (edac_mc_owner && edac_mc_owner != mci->mod_name) {
> +	if (!edac_check_mc_owner(mci->mod_name)) {
>  		ret = -EPERM;
>  		goto fail0;
>  	}
> diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
> index 5357800..0e95eba 100644
> --- a/drivers/edac/edac_mc.h
> +++ b/drivers/edac/edac_mc.h
> @@ -128,6 +128,18 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>  				   unsigned sz_pvt);
>  
>  /**
> + *
> + * edac_check_mc_owner - Check the owner of EDAC MC
> + *
> + * @mod_name: pointer to the module name
> + *
> + * Returns:
> + *	1 when EDAC MC is free or owned by the module name
> + *	0 when EDAC MC is owned by other module
> + */

Documenting that function only once is enough.

> +extern int edac_check_mc_owner(const char *mod_name);
> +
> +/*
>   * edac_mc_add_mc_with_groups() - Insert the @mci structure into the mci
>   *	global list and create sysfs entries associated with @mci structure.
>   *

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-04  8:30     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  8:30 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:52PM -0600, Toshi Kani wrote:
> Only a single edac driver can be enabled for EDAC MC.  When ghes_edac
> is enabled, a regular edac driver for the CPU type / platform still
> attempts to register itself and fails in edac_mc_add_mc().
> 
> Add edac_check_mc_owner() so that regular edac drivers can check
> the owner of EDAC MC at the beginning of initialization.
> 
> Also change the owner check to string comparison from address check.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/edac/edac_mc.c |   15 ++++++++++++++-
>  drivers/edac/edac_mc.h |   12 ++++++++++++
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 4800721..52d8d5e 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -701,6 +701,19 @@ struct mem_ctl_info *edac_mc_find(int idx)
>  }
>  EXPORT_SYMBOL(edac_mc_find);
>  
> +/*
> + * Returns:
> + *	1 when EDAC MC is free or owned by the module name

1 for free OR owned?!? WTF?

> + *	0 when EDAC MC is owned by other module
> + */
> +int edac_check_mc_owner(const char *mod_name)
> +{
> +	if (edac_mc_owner && strcmp(edac_mc_owner, mod_name))

strncmp() of course, with sensible maximal string length.

> +		return 0;
> +
> +	return 1;
> +}
> +EXPORT_SYMBOL(edac_check_mc_owner);

EXPORT_SYMBOL_GPL

>  
>  /* FIXME - should a warning be printed if no error detection? correction? */
>  int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
> @@ -742,7 +755,7 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
>  #endif
>  	mutex_lock(&mem_ctls_mutex);
>  
> -	if (edac_mc_owner && edac_mc_owner != mci->mod_name) {
> +	if (!edac_check_mc_owner(mci->mod_name)) {
>  		ret = -EPERM;
>  		goto fail0;
>  	}
> diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
> index 5357800..0e95eba 100644
> --- a/drivers/edac/edac_mc.h
> +++ b/drivers/edac/edac_mc.h
> @@ -128,6 +128,18 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>  				   unsigned sz_pvt);
>  
>  /**
> + *
> + * edac_check_mc_owner - Check the owner of EDAC MC
> + *
> + * @mod_name: pointer to the module name
> + *
> + * Returns:
> + *	1 when EDAC MC is free or owned by the module name
> + *	0 when EDAC MC is owned by other module
> + */

Documenting that function only once is enough.

> +extern int edac_check_mc_owner(const char *mod_name);
> +
> +/*
>   * edac_mc_add_mc_with_groups() - Insert the @mci structure into the mci
>   *	global list and create sysfs entries associated with @mci structure.
>   *

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-04  8:31     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  8:31 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:51PM -0600, Toshi Kani wrote:
> The ghes_edac driver was introduced in 2013 [1], but it has not
> been enabled by any distro yet.  This driver obtains error info
> from firmware interfaces, which are not properly implemented on
> many platforms, as the driver always emits the messages below:
> 
>  This EDAC driver relies on BIOS to enumerate memory and get error reports.
>  Unfortunately, not all BIOSes reflect the memory layout correctly
>  So, the end result of using this driver varies from vendor to vendor
>  If you find incorrect reports, please contact your hardware vendor
>  to correct its BIOS.
> 
> To get out from this situation, add a platform check to selectively
> enable the driver on the platforms that are known to have proper
> firmware implementation.  Platform vendors can add their platforms
> to the list when they support ghes_edac.
> 
> "ghes_edac.any_oem=1" skips this platform check.

There's that non-generic OEM thing again. NAK!

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-04  8:31     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  8:31 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:51PM -0600, Toshi Kani wrote:
> The ghes_edac driver was introduced in 2013 [1], but it has not
> been enabled by any distro yet.  This driver obtains error info
> from firmware interfaces, which are not properly implemented on
> many platforms, as the driver always emits the messages below:
> 
>  This EDAC driver relies on BIOS to enumerate memory and get error reports.
>  Unfortunately, not all BIOSes reflect the memory layout correctly
>  So, the end result of using this driver varies from vendor to vendor
>  If you find incorrect reports, please contact your hardware vendor
>  to correct its BIOS.
> 
> To get out from this situation, add a platform check to selectively
> enable the driver on the platforms that are known to have proper
> firmware implementation.  Platform vendors can add their platforms
> to the list when they support ghes_edac.
> 
> "ghes_edac.any_oem=1" skips this platform check.

There's that non-generic OEM thing again. NAK!

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 7/7] edac drivers: add MC owner check in init
@ 2017-08-04  8:39     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  8:39 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:53PM -0600, Toshi Kani wrote:
> Change generic x86 edac drivers, which probe CPU type with
> x86_match_cpu(), to call edac_check_mc_owner() in their
> module init functions.  This allows them to fail their init
> at the beginning when ghes_edac is enabled.  Similar change
> can be made to other edac drivers as necessary.
> 
> This is an optimization and there is no functional change.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/edac/amd64_edac.c |    3 +++
>  drivers/edac/pnd2_edac.c  |    7 ++++++-
>  drivers/edac/sb_edac.c    |    7 +++++--
>  drivers/edac/skx_edac.c   |    6 +++++-
>  4 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 3aea556..cdb40d6 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3444,6 +3444,9 @@ static int __init amd64_edac_init(void)
>  	if (amd_cache_northbridges() < 0)
>  		return -ENODEV;
>  
> +	if (!edac_check_mc_owner(EDAC_MOD_STR))
> +		return -EBUSY;
> +

That needs to happen first in the init function.

>  	opstate_init();
>  
>  	err = -ENOMEM;
> diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
> index 8e59949..a5b7855 100644
> --- a/drivers/edac/pnd2_edac.c
> +++ b/drivers/edac/pnd2_edac.c
> @@ -45,6 +45,8 @@
>  #include "edac_module.h"
>  #include "pnd2_edac.h"
>  
> +#define PND2_MOD_NAME		"pnd2_edac.c"

EDAC_MOD_STR and look how the other drivers define it, i.e., without the ".c"

> +
>  #define APL_NUM_CHANNELS	4
>  #define DNV_NUM_CHANNELS	2
>  #define DNV_MAX_DIMMS		2 /* Max DIMMs per channel */
> @@ -1313,7 +1315,7 @@ static int pnd2_register_mci(struct mem_ctl_info **ppmci)
>  	pvt = mci->pvt_info;
>  	memset(pvt, 0, sizeof(*pvt));
>  
> -	mci->mod_name = "pnd2_edac.c";
> +	mci->mod_name = PND2_MOD_NAME;
>  	mci->dev_name = ops->name;
>  	mci->ctl_name = "Pondicherry2";
>  
> @@ -1513,6 +1515,9 @@ static int __init pnd2_init(void)
>  	if (!id)
>  		return -ENODEV;
>  
> +	if (!edac_check_mc_owner(PND2_MOD_NAME))
> +		return -EBUSY;

Also first thing to do in the function.

> +
>  	ops = (struct dunit_ops *)id->driver_data;
>  
>  	/* Ensure that the OPSTATE is set correctly for POLL or NMI */
> diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
> index 80d860c..71bd66e 100644
> --- a/drivers/edac/sb_edac.c
> +++ b/drivers/edac/sb_edac.c
> @@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
>   * Alter this version for the module when modifications are made
>   */
>  #define SBRIDGE_REVISION    " Ver: 1.1.2 "
> -#define EDAC_MOD_STR      "sbridge_edac"
> +#define SBRIDGE_MOD_NAME    "sb_edac.c"

Why? EDAC_MOD_STR is just fine.

>  /*
>   * Debug macros
> @@ -3124,7 +3124,7 @@ static int sbridge_register_mci(struct sbridge_dev *sbridge_dev, enum type type)
>  		MEM_FLAG_DDR4 : MEM_FLAG_DDR3;
>  	mci->edac_ctl_cap = EDAC_FLAG_NONE;
>  	mci->edac_cap = EDAC_FLAG_NONE;
> -	mci->mod_name = "sb_edac.c";
> +	mci->mod_name = SBRIDGE_MOD_NAME;
>  	mci->mod_ver = SBRIDGE_REVISION;
>  	mci->dev_name = pci_name(pdev);
>  	mci->ctl_page_to_phys = NULL;
> @@ -3380,6 +3380,9 @@ static int __init sbridge_init(void)
>  	if (!id)
>  		return -ENODEV;
>  
> +	if (!edac_check_mc_owner(SBRIDGE_MOD_NAME))
> +		return -EBUSY;

See above.

> +
>  	/* Ensure that the OPSTATE is set correctly for POLL or NMI */
>  	opstate_init();
>  
> diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
> index 64bef6c9..8347969 100644
> --- a/drivers/edac/skx_edac.c
> +++ b/drivers/edac/skx_edac.c
> @@ -31,6 +31,7 @@
>  
>  #include "edac_module.h"
>  
> +#define SKX_MOD_NAME    "skx_edac.c"
>  #define SKX_REVISION    " Ver: 1.0 "

Ditto.

>  
>  /*
> @@ -471,7 +472,7 @@ static int skx_register_mci(struct skx_imc *imc)
>  	mci->mtype_cap = MEM_FLAG_DDR4;
>  	mci->edac_ctl_cap = EDAC_FLAG_NONE;
>  	mci->edac_cap = EDAC_FLAG_NONE;
> -	mci->mod_name = "skx_edac.c";
> +	mci->mod_name = SKX_MOD_NAME;
>  	mci->dev_name = pci_name(imc->chan[0].cdev);
>  	mci->mod_ver = SKX_REVISION;
>  	mci->ctl_page_to_phys = NULL;
> @@ -1052,6 +1053,9 @@ static int __init skx_init(void)
>  	if (!id)
>  		return -ENODEV;
>  
> +	if (!edac_check_mc_owner(SKX_MOD_NAME))
> +		return -EBUSY;
> +

Ditto.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,7/7] edac drivers: add MC owner check in init
@ 2017-08-04  8:39     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-04  8:39 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, mchehab, tony.luck, lenb, linux-acpi, linux-edac, linux-kernel

On Thu, Aug 03, 2017 at 03:57:53PM -0600, Toshi Kani wrote:
> Change generic x86 edac drivers, which probe CPU type with
> x86_match_cpu(), to call edac_check_mc_owner() in their
> module init functions.  This allows them to fail their init
> at the beginning when ghes_edac is enabled.  Similar change
> can be made to other edac drivers as necessary.
> 
> This is an optimization and there is no functional change.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/edac/amd64_edac.c |    3 +++
>  drivers/edac/pnd2_edac.c  |    7 ++++++-
>  drivers/edac/sb_edac.c    |    7 +++++--
>  drivers/edac/skx_edac.c   |    6 +++++-
>  4 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 3aea556..cdb40d6 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3444,6 +3444,9 @@ static int __init amd64_edac_init(void)
>  	if (amd_cache_northbridges() < 0)
>  		return -ENODEV;
>  
> +	if (!edac_check_mc_owner(EDAC_MOD_STR))
> +		return -EBUSY;
> +

That needs to happen first in the init function.

>  	opstate_init();
>  
>  	err = -ENOMEM;
> diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
> index 8e59949..a5b7855 100644
> --- a/drivers/edac/pnd2_edac.c
> +++ b/drivers/edac/pnd2_edac.c
> @@ -45,6 +45,8 @@
>  #include "edac_module.h"
>  #include "pnd2_edac.h"
>  
> +#define PND2_MOD_NAME		"pnd2_edac.c"

EDAC_MOD_STR and look how the other drivers define it, i.e., without the ".c"

> +
>  #define APL_NUM_CHANNELS	4
>  #define DNV_NUM_CHANNELS	2
>  #define DNV_MAX_DIMMS		2 /* Max DIMMs per channel */
> @@ -1313,7 +1315,7 @@ static int pnd2_register_mci(struct mem_ctl_info **ppmci)
>  	pvt = mci->pvt_info;
>  	memset(pvt, 0, sizeof(*pvt));
>  
> -	mci->mod_name = "pnd2_edac.c";
> +	mci->mod_name = PND2_MOD_NAME;
>  	mci->dev_name = ops->name;
>  	mci->ctl_name = "Pondicherry2";
>  
> @@ -1513,6 +1515,9 @@ static int __init pnd2_init(void)
>  	if (!id)
>  		return -ENODEV;
>  
> +	if (!edac_check_mc_owner(PND2_MOD_NAME))
> +		return -EBUSY;

Also first thing to do in the function.

> +
>  	ops = (struct dunit_ops *)id->driver_data;
>  
>  	/* Ensure that the OPSTATE is set correctly for POLL or NMI */
> diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
> index 80d860c..71bd66e 100644
> --- a/drivers/edac/sb_edac.c
> +++ b/drivers/edac/sb_edac.c
> @@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
>   * Alter this version for the module when modifications are made
>   */
>  #define SBRIDGE_REVISION    " Ver: 1.1.2 "
> -#define EDAC_MOD_STR      "sbridge_edac"
> +#define SBRIDGE_MOD_NAME    "sb_edac.c"

Why? EDAC_MOD_STR is just fine.

>  /*
>   * Debug macros
> @@ -3124,7 +3124,7 @@ static int sbridge_register_mci(struct sbridge_dev *sbridge_dev, enum type type)
>  		MEM_FLAG_DDR4 : MEM_FLAG_DDR3;
>  	mci->edac_ctl_cap = EDAC_FLAG_NONE;
>  	mci->edac_cap = EDAC_FLAG_NONE;
> -	mci->mod_name = "sb_edac.c";
> +	mci->mod_name = SBRIDGE_MOD_NAME;
>  	mci->mod_ver = SBRIDGE_REVISION;
>  	mci->dev_name = pci_name(pdev);
>  	mci->ctl_page_to_phys = NULL;
> @@ -3380,6 +3380,9 @@ static int __init sbridge_init(void)
>  	if (!id)
>  		return -ENODEV;
>  
> +	if (!edac_check_mc_owner(SBRIDGE_MOD_NAME))
> +		return -EBUSY;

See above.

> +
>  	/* Ensure that the OPSTATE is set correctly for POLL or NMI */
>  	opstate_init();
>  
> diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
> index 64bef6c9..8347969 100644
> --- a/drivers/edac/skx_edac.c
> +++ b/drivers/edac/skx_edac.c
> @@ -31,6 +31,7 @@
>  
>  #include "edac_module.h"
>  
> +#define SKX_MOD_NAME    "skx_edac.c"
>  #define SKX_REVISION    " Ver: 1.0 "

Ditto.

>  
>  /*
> @@ -471,7 +472,7 @@ static int skx_register_mci(struct skx_imc *imc)
>  	mci->mtype_cap = MEM_FLAG_DDR4;
>  	mci->edac_ctl_cap = EDAC_FLAG_NONE;
>  	mci->edac_cap = EDAC_FLAG_NONE;
> -	mci->mod_name = "skx_edac.c";
> +	mci->mod_name = SKX_MOD_NAME;
>  	mci->dev_name = pci_name(imc->chan[0].cdev);
>  	mci->mod_ver = SKX_REVISION;
>  	mci->ctl_page_to_phys = NULL;
> @@ -1052,6 +1053,9 @@ static int __init skx_init(void)
>  	if (!id)
>  		return -ENODEV;
>  
> +	if (!edac_check_mc_owner(SKX_MOD_NAME))
> +		return -EBUSY;
> +

Ditto.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner
  2017-08-03 21:57   ` [v2,6/7] " Toshi Kani
  (?)
@ 2017-08-04 13:06     ` kbuild test robot
  -1 siblings, 0 replies; 130+ messages in thread
From: kbuild test robot @ 2017-08-04 13:06 UTC (permalink / raw)
  Cc: kbuild-all, rjw, bp, mchehab, tony.luck, lenb, linux-acpi,
	linux-edac, linux-kernel, Toshi Kani

[-- Attachment #1: Type: text/plain, Size: 12192 bytes --]

Hi Toshi,

[auto build test WARNING on pm/linux-next]
[also build test WARNING on v4.13-rc3]
[cannot apply to edac/linux_next bp/for-next next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Toshi-Kani/enable-ghes_edac-on-selected-platforms/20170804-190846
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   include/linux/init.h:1: warning: no structured comments found
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id'
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id'
   include/linux/kthread.h:26: warning: Excess function parameter '...' description in 'kthread_create'
   kernel/sys.c:1: warning: no structured comments found
   include/linux/device.h:968: warning: No description found for parameter 'dma_ops'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
>> drivers/edac/edac_mc.h:131: warning: Cannot understand  *
    on line 131 - I thought it was a doc line
   include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'
   include/linux/device.h:969: warning: No description found for parameter 'dma_ops'
   drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
   drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
   drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
   drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   arch/s390/include/asm/cmb.h:1: warning: no structured comments found
   drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq'
   drivers/scsi/constants.c:1: warning: no structured comments found
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'claimed'
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'enabled'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_altset_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_stall_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_zlp_not_supp'
   fs/inode.c:1666: warning: No description found for parameter 'rcu'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_next_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_list'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_vfs_inode'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_flags'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_rsv_handle'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_reserved'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_type'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_line_no'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_start_jiffies'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_requested_credits'
   include/linux/jbd2.h:497: warning: No description found for parameter 'saved_alloc_context'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chkpt_bhs'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_devname'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_average_commit_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_min_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_max_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_commit_callback'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_failed_commit'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chksum_driver'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_csum_seed'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no'
   fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'set_busid'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'debugfs_init'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_open_object'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_close_object'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'prime_handle_to_fd'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'prime_fd_to_handle'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_export'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_import'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_pin'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_unpin'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_res_obj'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_get_sg_table'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_import_sg_table'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_vmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_vunmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_mmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_vm_ops'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'major'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'minor'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'patchlevel'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'name'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'desc'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'date'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'driver_features'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'ioctls'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'num_ioctls'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'fops'
   include/drm/drm_color_mgmt.h:1: warning: no structured comments found
   drivers/gpu/drm/drm_syncobj.c:341: warning: Excess function parameter 'dev' description in 'drm_syncobj_open'
   drivers/gpu/drm/drm_syncobj.c:366: warning: Excess function parameter 'dev' description in 'drm_syncobj_release'
   include/drm/drm_syncobj.h:1: warning: no structured comments found
   drivers/gpu/drm/drm_syncobj.c:342: warning: Excess function parameter 'dev' description in 'drm_syncobj_open'
   drivers/gpu/drm/drm_syncobj.c:367: warning: Excess function parameter 'dev' description in 'drm_syncobj_release'
   drivers/gpu/host1x/bus.c:50: warning: Excess function parameter 'driver' description in 'host1x_subdev_add'
   Documentation/doc-guide/sphinx.rst:121: ERROR: Unknown target name: "sphinx c domain".
   kernel/sched/fair.c:7584: WARNING: Inline emphasis start-string without end-string.
   kernel/time/timer.c:1200: ERROR: Unexpected indentation.
   kernel/time/timer.c:1202: ERROR: Unexpected indentation.
   kernel/time/timer.c:1203: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:108: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:111: ERROR: Unexpected indentation.
   include/linux/wait.h:113: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/time/hrtimer.c:991: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/signal.c:323: WARNING: Inline literal start-string without end-string.
   kernel/rcu/tree.c:3187: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3214: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3215: WARNING: Bullet list ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:219: ERROR: Unexpected indentation.
   include/linux/iio/iio.h:220: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:226: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/iio/industrialio-core.c:633: ERROR: Unknown target name: "iio_val".
   drivers/iio/industrialio-core.c:640: ERROR: Unknown target name: "iio_val".
   drivers/ata/libata-core.c:5906: ERROR: Unknown target name: "hw".
   drivers/message/fusion/mptbase.c:5051: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/tty/serial/serial_core.c:1897: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/pci/pci.c:3470: ERROR: Unexpected indentation.

vim +131 drivers/edac/edac_mc.h

    97	
    98	/**
    99	 * edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
   100	 *
   101	 * @mc_num:		Memory controller number
   102	 * @n_layers:		Number of MC hierarchy layers
   103	 * @layers:		Describes each layer as seen by the Memory Controller
   104	 * @sz_pvt:		size of private storage needed
   105	 *
   106	 *
   107	 * Everything is kmalloc'ed as one big chunk - more efficient.
   108	 * Only can be used if all structures have the same lifetime - otherwise
   109	 * you have to allocate and initialize your own structures.
   110	 *
   111	 * Use edac_mc_free() to free mc structures allocated by this function.
   112	 *
   113	 * .. note::
   114	 *
   115	 *   drivers handle multi-rank memories in different ways: in some
   116	 *   drivers, one multi-rank memory stick is mapped as one entry, while, in
   117	 *   others, a single multi-rank memory stick would be mapped into several
   118	 *   entries. Currently, this function will allocate multiple struct dimm_info
   119	 *   on such scenarios, as grouping the multiple ranks require drivers change.
   120	 *
   121	 * Returns:
   122	 *	On success, return a pointer to struct mem_ctl_info pointer;
   123	 *	%NULL otherwise
   124	 */
   125	struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
   126					   unsigned n_layers,
   127					   struct edac_mc_layer *layers,
   128					   unsigned sz_pvt);
   129	
   130	/**
 > 131	 *
   132	 * edac_check_mc_owner - Check the owner of EDAC MC
   133	 *
   134	 * @mod_name: pointer to the module name
   135	 *
   136	 * Returns:
   137	 *	1 when EDAC MC is free or owned by the module name
   138	 *	0 when EDAC MC is owned by other module
   139	 */
   140	extern int edac_check_mc_owner(const char *mod_name);
   141	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6735 bytes --]

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-04 13:06     ` kbuild test robot
  0 siblings, 0 replies; 130+ messages in thread
From: kbuild test robot @ 2017-08-04 13:06 UTC (permalink / raw)
  To: Toshi Kani
  Cc: kbuild-all, rjw, bp, mchehab, tony.luck, lenb, linux-acpi,
	linux-edac, linux-kernel, Toshi Kani

[-- Attachment #1: Type: text/plain, Size: 12192 bytes --]

Hi Toshi,

[auto build test WARNING on pm/linux-next]
[also build test WARNING on v4.13-rc3]
[cannot apply to edac/linux_next bp/for-next next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Toshi-Kani/enable-ghes_edac-on-selected-platforms/20170804-190846
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   include/linux/init.h:1: warning: no structured comments found
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id'
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id'
   include/linux/kthread.h:26: warning: Excess function parameter '...' description in 'kthread_create'
   kernel/sys.c:1: warning: no structured comments found
   include/linux/device.h:968: warning: No description found for parameter 'dma_ops'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
>> drivers/edac/edac_mc.h:131: warning: Cannot understand  *
    on line 131 - I thought it was a doc line
   include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'
   include/linux/device.h:969: warning: No description found for parameter 'dma_ops'
   drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
   drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
   drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
   drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   arch/s390/include/asm/cmb.h:1: warning: no structured comments found
   drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq'
   drivers/scsi/constants.c:1: warning: no structured comments found
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'claimed'
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'enabled'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_altset_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_stall_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_zlp_not_supp'
   fs/inode.c:1666: warning: No description found for parameter 'rcu'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_next_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_list'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_vfs_inode'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_flags'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_rsv_handle'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_reserved'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_type'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_line_no'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_start_jiffies'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_requested_credits'
   include/linux/jbd2.h:497: warning: No description found for parameter 'saved_alloc_context'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chkpt_bhs'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_devname'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_average_commit_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_min_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_max_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_commit_callback'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_failed_commit'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chksum_driver'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_csum_seed'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no'
   fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'set_busid'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'debugfs_init'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_open_object'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_close_object'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'prime_handle_to_fd'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'prime_fd_to_handle'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_export'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_import'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_pin'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_unpin'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_res_obj'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_get_sg_table'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_import_sg_table'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_vmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_vunmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_mmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_vm_ops'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'major'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'minor'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'patchlevel'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'name'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'desc'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'date'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'driver_features'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'ioctls'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'num_ioctls'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'fops'
   include/drm/drm_color_mgmt.h:1: warning: no structured comments found
   drivers/gpu/drm/drm_syncobj.c:341: warning: Excess function parameter 'dev' description in 'drm_syncobj_open'
   drivers/gpu/drm/drm_syncobj.c:366: warning: Excess function parameter 'dev' description in 'drm_syncobj_release'
   include/drm/drm_syncobj.h:1: warning: no structured comments found
   drivers/gpu/drm/drm_syncobj.c:342: warning: Excess function parameter 'dev' description in 'drm_syncobj_open'
   drivers/gpu/drm/drm_syncobj.c:367: warning: Excess function parameter 'dev' description in 'drm_syncobj_release'
   drivers/gpu/host1x/bus.c:50: warning: Excess function parameter 'driver' description in 'host1x_subdev_add'
   Documentation/doc-guide/sphinx.rst:121: ERROR: Unknown target name: "sphinx c domain".
   kernel/sched/fair.c:7584: WARNING: Inline emphasis start-string without end-string.
   kernel/time/timer.c:1200: ERROR: Unexpected indentation.
   kernel/time/timer.c:1202: ERROR: Unexpected indentation.
   kernel/time/timer.c:1203: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:108: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:111: ERROR: Unexpected indentation.
   include/linux/wait.h:113: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/time/hrtimer.c:991: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/signal.c:323: WARNING: Inline literal start-string without end-string.
   kernel/rcu/tree.c:3187: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3214: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3215: WARNING: Bullet list ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:219: ERROR: Unexpected indentation.
   include/linux/iio/iio.h:220: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:226: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/iio/industrialio-core.c:633: ERROR: Unknown target name: "iio_val".
   drivers/iio/industrialio-core.c:640: ERROR: Unknown target name: "iio_val".
   drivers/ata/libata-core.c:5906: ERROR: Unknown target name: "hw".
   drivers/message/fusion/mptbase.c:5051: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/tty/serial/serial_core.c:1897: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/pci/pci.c:3470: ERROR: Unexpected indentation.

vim +131 drivers/edac/edac_mc.h

    97	
    98	/**
    99	 * edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
   100	 *
   101	 * @mc_num:		Memory controller number
   102	 * @n_layers:		Number of MC hierarchy layers
   103	 * @layers:		Describes each layer as seen by the Memory Controller
   104	 * @sz_pvt:		size of private storage needed
   105	 *
   106	 *
   107	 * Everything is kmalloc'ed as one big chunk - more efficient.
   108	 * Only can be used if all structures have the same lifetime - otherwise
   109	 * you have to allocate and initialize your own structures.
   110	 *
   111	 * Use edac_mc_free() to free mc structures allocated by this function.
   112	 *
   113	 * .. note::
   114	 *
   115	 *   drivers handle multi-rank memories in different ways: in some
   116	 *   drivers, one multi-rank memory stick is mapped as one entry, while, in
   117	 *   others, a single multi-rank memory stick would be mapped into several
   118	 *   entries. Currently, this function will allocate multiple struct dimm_info
   119	 *   on such scenarios, as grouping the multiple ranks require drivers change.
   120	 *
   121	 * Returns:
   122	 *	On success, return a pointer to struct mem_ctl_info pointer;
   123	 *	%NULL otherwise
   124	 */
   125	struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
   126					   unsigned n_layers,
   127					   struct edac_mc_layer *layers,
   128					   unsigned sz_pvt);
   129	
   130	/**
 > 131	 *
   132	 * edac_check_mc_owner - Check the owner of EDAC MC
   133	 *
   134	 * @mod_name: pointer to the module name
   135	 *
   136	 * Returns:
   137	 *	1 when EDAC MC is free or owned by the module name
   138	 *	0 when EDAC MC is owned by other module
   139	 */
   140	extern int edac_check_mc_owner(const char *mod_name);
   141	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6735 bytes --]

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-04 13:06     ` kbuild test robot
  0 siblings, 0 replies; 130+ messages in thread
From: kbuild test robot @ 2017-08-04 13:06 UTC (permalink / raw)
  To: Toshi Kani
  Cc: kbuild-all, rjw, bp, mchehab, tony.luck, lenb, linux-acpi,
	linux-edac, linux-kernel

Hi Toshi,

[auto build test WARNING on pm/linux-next]
[also build test WARNING on v4.13-rc3]
[cannot apply to edac/linux_next bp/for-next next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Toshi-Kani/enable-ghes_edac-on-selected-platforms/20170804-190846
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   include/linux/init.h:1: warning: no structured comments found
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id'
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id'
   include/linux/kthread.h:26: warning: Excess function parameter '...' description in 'kthread_create'
   kernel/sys.c:1: warning: no structured comments found
   include/linux/device.h:968: warning: No description found for parameter 'dma_ops'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
>> drivers/edac/edac_mc.h:131: warning: Cannot understand  *
    on line 131 - I thought it was a doc line
   include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'
   include/linux/device.h:969: warning: No description found for parameter 'dma_ops'
   drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
   drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
   drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
   drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   arch/s390/include/asm/cmb.h:1: warning: no structured comments found
   drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq'
   drivers/scsi/constants.c:1: warning: no structured comments found
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'claimed'
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'enabled'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_altset_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_stall_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_zlp_not_supp'
   fs/inode.c:1666: warning: No description found for parameter 'rcu'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_next_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_list'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_vfs_inode'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_flags'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_rsv_handle'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_reserved'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_type'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_line_no'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_start_jiffies'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_requested_credits'
   include/linux/jbd2.h:497: warning: No description found for parameter 'saved_alloc_context'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chkpt_bhs'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_devname'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_average_commit_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_min_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_max_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_commit_callback'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_failed_commit'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chksum_driver'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_csum_seed'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no'
   fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'set_busid'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'debugfs_init'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_open_object'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_close_object'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'prime_handle_to_fd'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'prime_fd_to_handle'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_export'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_import'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_pin'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_unpin'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_res_obj'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_get_sg_table'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_import_sg_table'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_vmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_vunmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_prime_mmap'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'gem_vm_ops'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'major'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'minor'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'patchlevel'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'name'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'desc'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'date'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'driver_features'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'ioctls'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'num_ioctls'
   include/drm/drm_drv.h:537: warning: No description found for parameter 'fops'
   include/drm/drm_color_mgmt.h:1: warning: no structured comments found
   drivers/gpu/drm/drm_syncobj.c:341: warning: Excess function parameter 'dev' description in 'drm_syncobj_open'
   drivers/gpu/drm/drm_syncobj.c:366: warning: Excess function parameter 'dev' description in 'drm_syncobj_release'
   include/drm/drm_syncobj.h:1: warning: no structured comments found
   drivers/gpu/drm/drm_syncobj.c:342: warning: Excess function parameter 'dev' description in 'drm_syncobj_open'
   drivers/gpu/drm/drm_syncobj.c:367: warning: Excess function parameter 'dev' description in 'drm_syncobj_release'
   drivers/gpu/host1x/bus.c:50: warning: Excess function parameter 'driver' description in 'host1x_subdev_add'
   Documentation/doc-guide/sphinx.rst:121: ERROR: Unknown target name: "sphinx c domain".
   kernel/sched/fair.c:7584: WARNING: Inline emphasis start-string without end-string.
   kernel/time/timer.c:1200: ERROR: Unexpected indentation.
   kernel/time/timer.c:1202: ERROR: Unexpected indentation.
   kernel/time/timer.c:1203: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:108: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:111: ERROR: Unexpected indentation.
   include/linux/wait.h:113: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/time/hrtimer.c:991: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/signal.c:323: WARNING: Inline literal start-string without end-string.
   kernel/rcu/tree.c:3187: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3214: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3215: WARNING: Bullet list ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:219: ERROR: Unexpected indentation.
   include/linux/iio/iio.h:220: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:226: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/iio/industrialio-core.c:633: ERROR: Unknown target name: "iio_val".
   drivers/iio/industrialio-core.c:640: ERROR: Unknown target name: "iio_val".
   drivers/ata/libata-core.c:5906: ERROR: Unknown target name: "hw".
   drivers/message/fusion/mptbase.c:5051: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/tty/serial/serial_core.c:1897: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/pci/pci.c:3470: ERROR: Unexpected indentation.

vim +131 drivers/edac/edac_mc.h

    97	
    98	/**
    99	 * edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
   100	 *
   101	 * @mc_num:		Memory controller number
   102	 * @n_layers:		Number of MC hierarchy layers
   103	 * @layers:		Describes each layer as seen by the Memory Controller
   104	 * @sz_pvt:		size of private storage needed
   105	 *
   106	 *
   107	 * Everything is kmalloc'ed as one big chunk - more efficient.
   108	 * Only can be used if all structures have the same lifetime - otherwise
   109	 * you have to allocate and initialize your own structures.
   110	 *
   111	 * Use edac_mc_free() to free mc structures allocated by this function.
   112	 *
   113	 * .. note::
   114	 *
   115	 *   drivers handle multi-rank memories in different ways: in some
   116	 *   drivers, one multi-rank memory stick is mapped as one entry, while, in
   117	 *   others, a single multi-rank memory stick would be mapped into several
   118	 *   entries. Currently, this function will allocate multiple struct dimm_info
   119	 *   on such scenarios, as grouping the multiple ranks require drivers change.
   120	 *
   121	 * Returns:
   122	 *	On success, return a pointer to struct mem_ctl_info pointer;
   123	 *	%NULL otherwise
   124	 */
   125	struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
   126					   unsigned n_layers,
   127					   struct edac_mc_layer *layers,
   128					   unsigned sz_pvt);
   129	
   130	/**
 > 131	 *
   132	 * edac_check_mc_owner - Check the owner of EDAC MC
   133	 *
   134	 * @mod_name: pointer to the module name
   135	 *
   136	 * Returns:
   137	 *	1 when EDAC MC is free or owned by the module name
   138	 *	0 when EDAC MC is owned by other module
   139	 */
   140	extern int edac_check_mc_owner(const char *mod_name);
   141
---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-04 15:21       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-04 15:21 UTC (permalink / raw)
  To: lkp
  Cc: linux-kernel, mchehab, rjw, kbuild-all, bp, tony.luck, lenb,
	linux-acpi, linux-edac

On Fri, 2017-08-04 at 21:06 +0800, kbuild test robot wrote:
> Hi Toshi,
> 
> [auto build test WARNING on pm/linux-next]
> [also build test WARNING on v4.13-rc3]
> [cannot apply to edac/linux_next bp/for-next next-20170804]
> [if your patch is applied to the wrong git tree, please drop us a
> note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Toshi-Kani/enable-gh
> es_edac-on-selected-platforms/20170804-190846
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-
> pm.git linux-next
> reproduce: make htmldocs
> 
> All warnings (new ones prefixed by >>):
 :
> 	   unsigned sz_pvt);
>    129	
>    130	/**
>  > 131	 *

Removed this extra commented line.

Thanks!
-Toshi


>    132	 * edac_check_mc_owner - Check the owner of EDAC MC
>    133	 *
>    134	 * @mod_name: pointer to the module name
>    135	 *
>    136	 * Returns:
>    137	 *	1 when EDAC MC is free or owned by the module
> name
>    138	 *	0 when EDAC MC is owned by other module
>    139	 */
>    140	extern int edac_check_mc_owner(const char *mod_name);
>    141	

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-04 15:21       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-04 15:21 UTC (permalink / raw)
  To: lkp
  Cc: linux-kernel, mchehab, rjw, kbuild-all, bp, tony.luck, lenb,
	linux-acpi, linux-edac

On Fri, 2017-08-04 at 21:06 +0800, kbuild test robot wrote:
> Hi Toshi,
> 
> [auto build test WARNING on pm/linux-next]
> [also build test WARNING on v4.13-rc3]
> [cannot apply to edac/linux_next bp/for-next next-20170804]
> [if your patch is applied to the wrong git tree, please drop us a
> note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Toshi-Kani/enable-gh
> es_edac-on-selected-platforms/20170804-190846
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-
> pm.git linux-next
> reproduce: make htmldocs
> 
> All warnings (new ones prefixed by >>):
 :
> 	   unsigned sz_pvt);
>    129	
>    130	/**
>  > 131	 *

Removed this extra commented line.

Thanks!
-Toshi


>    132	 * edac_check_mc_owner - Check the owner of EDAC MC
>    133	 *
>    134	 * @mod_name: pointer to the module name
>    135	 *
>    136	 * Returns:
>    137	 *	1 when EDAC MC is free or owned by the module
> name
>    138	 *	0 when EDAC MC is owned by other module
>    139	 */
>    140	extern int edac_check_mc_owner(const char *mod_name);
>    141

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-04 20:39       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-04 20:39 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 05:42 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:47PM -0600, Toshi Kani wrote:
> > ACPI OEM ID / OEM Table ID / Revision can be used to identify
> > a platform based on ACPI firmware info.  acpi_blacklisted(),
> > intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
> > have been using similar check to detect a list of platforms
> > that require special handlings.
> > 
> > Move the platform check in acpi_blacklisted() to a new common
> > utility function, acpi_match_oemlist(), so that other drivers
> > do not have to implement their own version.
> > 
> > There is no change in functionality.
 :
> >  /*
> >   * POLICY: If *anything* doesn't work, put it on the blacklist.
> >   *	   If they are critical errors, mark it critical, and
> > abort driver load.
> >   */
> > -static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
> > +static struct acpi_oemlist acpi_blacklist[] __initdata = {
> 
> All that wasted energy to try to explain to you that "oemlist" is
> wrong and that whole rename is pointless, went for nothing.
>
> So NAK.

Well, we did talk a lot about your suggested name, "acpi_blacklist",
and I explained that it did not work since it'd be used for both black
and white-list.  We've agreed on that.

You then suggested it's "platform", not "OEM".  Since this is an ACPI
structure defined in "acpi.h", its terminology generally follows ACPI
spec, which I did.

I understand that you feel strongly against "OEM" (sorry about that). 
How about "acpi_platform_list"?  Does it work for you?

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-04 20:39       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-04 20:39 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 05:42 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:47PM -0600, Toshi Kani wrote:
> > ACPI OEM ID / OEM Table ID / Revision can be used to identify
> > a platform based on ACPI firmware info.  acpi_blacklisted(),
> > intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
> > have been using similar check to detect a list of platforms
> > that require special handlings.
> > 
> > Move the platform check in acpi_blacklisted() to a new common
> > utility function, acpi_match_oemlist(), so that other drivers
> > do not have to implement their own version.
> > 
> > There is no change in functionality.
 :
> >  /*
> >   * POLICY: If *anything* doesn't work, put it on the blacklist.
> >   *	   If they are critical errors, mark it critical, and
> > abort driver load.
> >   */
> > -static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
> > +static struct acpi_oemlist acpi_blacklist[] __initdata = {
> 
> All that wasted energy to try to explain to you that "oemlist" is
> wrong and that whole rename is pointless, went for nothing.
>
> So NAK.

Well, we did talk a lot about your suggested name, "acpi_blacklist",
and I explained that it did not work since it'd be used for both black
and white-list.  We've agreed on that.

You then suggested it's "platform", not "OEM".  Since this is an ACPI
structure defined in "acpi.h", its terminology generally follows ACPI
spec, which I did.

I understand that you feel strongly against "OEM" (sorry about that). 
How about "acpi_platform_list"?  Does it work for you?

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-04 20:49       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-04 20:49 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 05:44 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:49PM -0600, Toshi Kani wrote:
> > When 'osc_sb_apei_support_acked' is set, it indicates that
> > the platform supports APEI, firmware-first mode, as ACPI _OSC
> > capability bit 4, APEI Support, was set in query.  While _OSC
> > is an optional method, platforms with APEI support should
> > implement it to inform its capability to the OS properly.
> > 
> > Add check to 'osc_sb_apei_support_acked' before calling
> > ghes_edac_register() to restrict that ghes_edac is enabled
> > on platforms with APEI support capability set in _OSC.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > ---
> >  drivers/acpi/apei/ghes.c |    8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > index d661d45..49d75eb 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -1139,9 +1139,11 @@ static int ghes_probe(struct platform_device
> > *ghes_dev)
> >  		goto err;
> >  	}
> >  
> > -	rc = ghes_edac_register(ghes, &ghes_dev->dev);
> > -	if (rc < 0)
> > -		goto err;
> > +	if (osc_sb_apei_support_acked) {
> 
> What for when we have the whitelist?

Some firmware features can be enabled / disabled in BIOS.  While HPE
firmware does not allow to disable FF, it's possible that other vendors
might allow such and still have the same OEM ID info.  So, this check
makes sure that FF is in fact enabled in firmware.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-04 20:49       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-04 20:49 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 05:44 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:49PM -0600, Toshi Kani wrote:
> > When 'osc_sb_apei_support_acked' is set, it indicates that
> > the platform supports APEI, firmware-first mode, as ACPI _OSC
> > capability bit 4, APEI Support, was set in query.  While _OSC
> > is an optional method, platforms with APEI support should
> > implement it to inform its capability to the OS properly.
> > 
> > Add check to 'osc_sb_apei_support_acked' before calling
> > ghes_edac_register() to restrict that ghes_edac is enabled
> > on platforms with APEI support capability set in _OSC.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > ---
> >  drivers/acpi/apei/ghes.c |    8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > index d661d45..49d75eb 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -1139,9 +1139,11 @@ static int ghes_probe(struct platform_device
> > *ghes_dev)
> >  		goto err;
> >  	}
> >  
> > -	rc = ghes_edac_register(ghes, &ghes_dev->dev);
> > -	if (rc < 0)
> > -		goto err;
> > +	if (osc_sb_apei_support_acked) {
> 
> What for when we have the whitelist?

Some firmware features can be enabled / disabled in BIOS.  While HPE
firmware does not allow to disable FF, it's possible that other vendors
might allow such and still have the same OEM ID info.  So, this check
makes sure that FF is in fact enabled in firmware.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-04 21:02       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-04 21:02 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 06:05 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:50PM -0600, Toshi Kani wrote:
> > ghes_edac_register() is called for each GHES platform device
> > instantiated per a GHES entry in ACPI HEST table.  dmi_walk()
> > counts the number of DIMMs on the system, and there is no need
> > to call it multiple times.
> > 
> > Change ghes_edac_register() to call dmi_walk() only when
> > 'num_dimm' is uninitialized.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> > Suggested-by: Borislav Petkov <bp@alien8.de>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > ---
> >  drivers/edac/ghes_edac.c |    9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index 4e61a62..2e9ce9c 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -407,15 +407,18 @@
> > EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
> >  
> >  int ghes_edac_register(struct ghes *ghes, struct device *dev)
> >  {
> > -	bool fake = false;
> > -	int rc, num_dimm = 0;
> >  	struct mem_ctl_info *mci;
> >  	struct edac_mc_layer layers[1];
> >  	struct ghes_edac_pvt *pvt;
> >  	struct ghes_edac_dimm_fill dimm_fill;
> > +	int rc;
> > +
> > +	static int num_dimm;
> > +	static bool fake;
> >  
> >  	/* Get the number of DIMMs */
> > -	dmi_walk(ghes_edac_count_dimms, &num_dimm);
> > +	if (num_dimm == 0)
> > +		dmi_walk(ghes_edac_count_dimms, &num_dimm);
> 
> So the problem is that ghes_edac_register() gets called multiple
> times depending on how many GHES platform devices are on the system.
> But yet they all scan *all* DIMMs. So instead you should return if
> the DIMMs have been counted already and not register a second time.
> 
> Which makes that whole mc counting kinda useless. So you could rip
> that out too.
> 
> Unless I'm missing something...

GHES platform devices correspond to GHES entries, which define firmware
interfaces to report generic memory errors to the OS, such as NMI and
SCI.  These devices are associated with all DIMMs, not a particular
DIMM.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-04 21:02       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-04 21:02 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 06:05 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:50PM -0600, Toshi Kani wrote:
> > ghes_edac_register() is called for each GHES platform device
> > instantiated per a GHES entry in ACPI HEST table.  dmi_walk()
> > counts the number of DIMMs on the system, and there is no need
> > to call it multiple times.
> > 
> > Change ghes_edac_register() to call dmi_walk() only when
> > 'num_dimm' is uninitialized.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> > Suggested-by: Borislav Petkov <bp@alien8.de>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > ---
> >  drivers/edac/ghes_edac.c |    9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index 4e61a62..2e9ce9c 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -407,15 +407,18 @@
> > EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
> >  
> >  int ghes_edac_register(struct ghes *ghes, struct device *dev)
> >  {
> > -	bool fake = false;
> > -	int rc, num_dimm = 0;
> >  	struct mem_ctl_info *mci;
> >  	struct edac_mc_layer layers[1];
> >  	struct ghes_edac_pvt *pvt;
> >  	struct ghes_edac_dimm_fill dimm_fill;
> > +	int rc;
> > +
> > +	static int num_dimm;
> > +	static bool fake;
> >  
> >  	/* Get the number of DIMMs */
> > -	dmi_walk(ghes_edac_count_dimms, &num_dimm);
> > +	if (num_dimm == 0)
> > +		dmi_walk(ghes_edac_count_dimms, &num_dimm);
> 
> So the problem is that ghes_edac_register() gets called multiple
> times depending on how many GHES platform devices are on the system.
> But yet they all scan *all* DIMMs. So instead you should return if
> the DIMMs have been counted already and not register a second time.
> 
> Which makes that whole mc counting kinda useless. So you could rip
> that out too.
> 
> Unless I'm missing something...

GHES platform devices correspond to GHES entries, which define firmware
interfaces to report generic memory errors to the OS, such as NMI and
SCI.  These devices are associated with all DIMMs, not a particular
DIMM.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-04 21:06       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-04 21:06 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 10:31 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:51PM -0600, Toshi Kani wrote:
> > The ghes_edac driver was introduced in 2013 [1], but it has not
> > been enabled by any distro yet.  This driver obtains error info
> > from firmware interfaces, which are not properly implemented on
> > many platforms, as the driver always emits the messages below:
> > 
> >  This EDAC driver relies on BIOS to enumerate memory and get error
> > reports.  Unfortunately, not all BIOSes reflect the memory layout
> > correctly  So, the end result of using this driver varies from
> > vendor to vendor  If you find incorrect reports, please contact
> > your hardware vendor to correct its BIOS.
> > 
> > To get out from this situation, add a platform check to selectively
> > enable the driver on the platforms that are known to have proper
> > firmware implementation.  Platform vendors can add their platforms
> > to the list when they support ghes_edac.
> > 
> > "ghes_edac.any_oem=1" skips this platform check.
> 
> There's that non-generic OEM thing again. NAK!

How about "ghes_edac.any_platform"?

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-04 21:06       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-04 21:06 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 10:31 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:51PM -0600, Toshi Kani wrote:
> > The ghes_edac driver was introduced in 2013 [1], but it has not
> > been enabled by any distro yet.  This driver obtains error info
> > from firmware interfaces, which are not properly implemented on
> > many platforms, as the driver always emits the messages below:
> > 
> >  This EDAC driver relies on BIOS to enumerate memory and get error
> > reports.  Unfortunately, not all BIOSes reflect the memory layout
> > correctly  So, the end result of using this driver varies from
> > vendor to vendor  If you find incorrect reports, please contact
> > your hardware vendor to correct its BIOS.
> > 
> > To get out from this situation, add a platform check to selectively
> > enable the driver on the platforms that are known to have proper
> > firmware implementation.  Platform vendors can add their platforms
> > to the list when they support ghes_edac.
> > 
> > "ghes_edac.any_oem=1" skips this platform check.
> 
> There's that non-generic OEM thing again. NAK!

How about "ghes_edac.any_platform"?

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-04 21:35       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-04 21:35 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 10:30 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:52PM -0600, Toshi Kani wrote:
> > Only a single edac driver can be enabled for EDAC MC.  When
> > ghes_edac is enabled, a regular edac driver for the CPU type /
> > platform still attempts to register itself and fails in
> > edac_mc_add_mc().
> > 
> > Add edac_check_mc_owner() so that regular edac drivers can check
> > the owner of EDAC MC at the beginning of initialization.
> > 
> > Also change the owner check to string comparison from address
> > check.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > ---
> >  drivers/edac/edac_mc.c |   15 ++++++++++++++-
> >  drivers/edac/edac_mc.h |   12 ++++++++++++
> >  2 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> > index 4800721..52d8d5e 100644
> > --- a/drivers/edac/edac_mc.c
> > +++ b/drivers/edac/edac_mc.c
> > @@ -701,6 +701,19 @@ struct mem_ctl_info *edac_mc_find(int idx)
> >  }
> >  EXPORT_SYMBOL(edac_mc_find);
> >  
> > +/*
> > + * Returns:
> > + *	1 when EDAC MC is free or owned by the module name
> 
> 1 for free OR owned?!? WTF?

1 means the caller's init function can continue its initialization --
such conditions are free or owned by itself.

> > + *	0 when EDAC MC is owned by other module
> > + */
> > +int edac_check_mc_owner(const char *mod_name)
> > +{
> > +	if (edac_mc_owner && strcmp(edac_mc_owner, mod_name))
> 
> strncmp() of course, with sensible maximal string length.

strcmp() breaks when either edac_mc_owner or mod_name string ends. 
strncmp() assumes mod_name string is valid for a given length.  Hence,
the caller needs to supply the length to this function by adding a new
arg 'length', which does not make it any safer.  I think using
strncmp() would require all edac drivers to declare a pre-defined
length of module name...

> > +		return 0;
> > +
> > +	return 1;
> > +}
> > +EXPORT_SYMBOL(edac_check_mc_owner);
> 
> EXPORT_SYMBOL_GPL

Will do.

> >  
> >  /* FIXME - should a warning be printed if no error detection?
> > correction? */
> >  int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
> > @@ -742,7 +755,7 @@ int edac_mc_add_mc_with_groups(struct
> > mem_ctl_info *mci,
> >  #endif
> >  	mutex_lock(&mem_ctls_mutex);
> >  
> > -	if (edac_mc_owner && edac_mc_owner != mci->mod_name) {
> > +	if (!edac_check_mc_owner(mci->mod_name)) {
> >  		ret = -EPERM;
> >  		goto fail0;
> >  	}
> > diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
> > index 5357800..0e95eba 100644
> > --- a/drivers/edac/edac_mc.h
> > +++ b/drivers/edac/edac_mc.h
> > @@ -128,6 +128,18 @@ struct mem_ctl_info *edac_mc_alloc(unsigned
> > mc_num,
> >  				   unsigned sz_pvt);
> >  
> >  /**
> > + *
> > + * edac_check_mc_owner - Check the owner of EDAC MC
> > + *
> > + * @mod_name: pointer to the module name
> > + *
> > + * Returns:
> > + *	1 when EDAC MC is free or owned by the module name
> > + *	0 when EDAC MC is owned by other module
> > + */
> 
> Documenting that function only once is enough.

I personally prefer to document in .c, but since other funcs documented
in dac_mc.h, I will keep the same style.  Will remove the document in
edac_mc.c. 

> > +extern int edac_check_mc_owner(const char *mod_name);
> > +
> > +/*
> >   * edac_mc_add_mc_with_groups() - Insert the @mci structure into
> > the mci
> >   *	global list and create sysfs entries associated with
> > @mci structure.
> >   *

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-04 21:35       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-04 21:35 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

T24gRnJpLCAyMDE3LTA4LTA0IGF0IDEwOjMwICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIFRodSwgQXVnIDAzLCAyMDE3IGF0IDAzOjU3OjUyUE0gLTA2MDAsIFRvc2hpIEthbmkg
d3JvdGU6DQo+ID4gT25seSBhIHNpbmdsZSBlZGFjIGRyaXZlciBjYW4gYmUgZW5hYmxlZCBmb3Ig
RURBQyBNQy7CoMKgV2hlbg0KPiA+IGdoZXNfZWRhYyBpcyBlbmFibGVkLCBhIHJlZ3VsYXIgZWRh
YyBkcml2ZXIgZm9yIHRoZSBDUFUgdHlwZSAvDQo+ID4gcGxhdGZvcm0gc3RpbGwgYXR0ZW1wdHMg
dG8gcmVnaXN0ZXIgaXRzZWxmIGFuZCBmYWlscyBpbg0KPiA+IGVkYWNfbWNfYWRkX21jKCkuDQo+
ID4gDQo+ID4gQWRkIGVkYWNfY2hlY2tfbWNfb3duZXIoKSBzbyB0aGF0IHJlZ3VsYXIgZWRhYyBk
cml2ZXJzIGNhbiBjaGVjaw0KPiA+IHRoZSBvd25lciBvZiBFREFDIE1DIGF0IHRoZSBiZWdpbm5p
bmcgb2YgaW5pdGlhbGl6YXRpb24uDQo+ID4gDQo+ID4gQWxzbyBjaGFuZ2UgdGhlIG93bmVyIGNo
ZWNrIHRvIHN0cmluZyBjb21wYXJpc29uIGZyb20gYWRkcmVzcw0KPiA+IGNoZWNrLg0KPiA+IA0K
PiA+IFNpZ25lZC1vZmYtYnk6IFRvc2hpIEthbmkgPHRvc2hpLmthbmlAaHBlLmNvbT4NCj4gPiBD
YzogQm9yaXNsYXYgUGV0a292IDxicEBhbGllbjguZGU+DQo+ID4gQ2M6IE1hdXJvIENhcnZhbGhv
IENoZWhhYiA8bWNoZWhhYkBrZXJuZWwub3JnPg0KPiA+IENjOiBUb255IEx1Y2sgPHRvbnkubHVj
a0BpbnRlbC5jb20+DQo+ID4gLS0tDQo+ID4gwqBkcml2ZXJzL2VkYWMvZWRhY19tYy5jIHzCoMKg
wqAxNSArKysrKysrKysrKysrKy0NCj4gPiDCoGRyaXZlcnMvZWRhYy9lZGFjX21jLmggfMKgwqDC
oDEyICsrKysrKysrKysrKw0KPiA+IMKgMiBmaWxlcyBjaGFuZ2VkLCAyNiBpbnNlcnRpb25zKCsp
LCAxIGRlbGV0aW9uKC0pDQo+ID4gDQo+ID4gZGlmZiAtLWdpdCBhL2RyaXZlcnMvZWRhYy9lZGFj
X21jLmMgYi9kcml2ZXJzL2VkYWMvZWRhY19tYy5jDQo+ID4gaW5kZXggNDgwMDcyMS4uNTJkOGQ1
ZSAxMDA2NDQNCj4gPiAtLS0gYS9kcml2ZXJzL2VkYWMvZWRhY19tYy5jDQo+ID4gKysrIGIvZHJp
dmVycy9lZGFjL2VkYWNfbWMuYw0KPiA+IEBAIC03MDEsNiArNzAxLDE5IEBAIHN0cnVjdCBtZW1f
Y3RsX2luZm8gKmVkYWNfbWNfZmluZChpbnQgaWR4KQ0KPiA+IMKgfQ0KPiA+IMKgRVhQT1JUX1NZ
TUJPTChlZGFjX21jX2ZpbmQpOw0KPiA+IMKgDQo+ID4gKy8qDQo+ID4gKyAqIFJldHVybnM6DQo+
ID4gKyAqCTEgd2hlbiBFREFDIE1DIGlzIGZyZWUgb3Igb3duZWQgYnkgdGhlIG1vZHVsZSBuYW1l
DQo+IA0KPiAxIGZvciBmcmVlIE9SIG93bmVkPyE/IFdURj8NCg0KMSBtZWFucyB0aGUgY2FsbGVy
J3MgaW5pdCBmdW5jdGlvbiBjYW4gY29udGludWUgaXRzIGluaXRpYWxpemF0aW9uIC0tDQpzdWNo
IGNvbmRpdGlvbnMgYXJlIGZyZWUgb3Igb3duZWQgYnkgaXRzZWxmLg0KDQo+ID4gKyAqCTAgd2hl
biBFREFDIE1DIGlzIG93bmVkIGJ5IG90aGVyIG1vZHVsZQ0KPiA+ICsgKi8NCj4gPiAraW50IGVk
YWNfY2hlY2tfbWNfb3duZXIoY29uc3QgY2hhciAqbW9kX25hbWUpDQo+ID4gK3sNCj4gPiArCWlm
IChlZGFjX21jX293bmVyICYmIHN0cmNtcChlZGFjX21jX293bmVyLCBtb2RfbmFtZSkpDQo+IA0K
PiBzdHJuY21wKCkgb2YgY291cnNlLCB3aXRoIHNlbnNpYmxlIG1heGltYWwgc3RyaW5nIGxlbmd0
aC4NCg0Kc3RyY21wKCkgYnJlYWtzIHdoZW4gZWl0aGVyIGVkYWNfbWNfb3duZXIgb3IgbW9kX25h
bWUgc3RyaW5nIGVuZHMuIA0Kc3RybmNtcCgpIGFzc3VtZXMgbW9kX25hbWUgc3RyaW5nIGlzIHZh
bGlkIGZvciBhIGdpdmVuIGxlbmd0aC4gIEhlbmNlLA0KdGhlIGNhbGxlciBuZWVkcyB0byBzdXBw
bHkgdGhlIGxlbmd0aCB0byB0aGlzIGZ1bmN0aW9uIGJ5IGFkZGluZyBhIG5ldw0KYXJnICdsZW5n
dGgnLCB3aGljaCBkb2VzIG5vdCBtYWtlIGl0IGFueSBzYWZlci4gIEkgdGhpbmsgdXNpbmcNCnN0
cm5jbXAoKSB3b3VsZCByZXF1aXJlIGFsbCBlZGFjIGRyaXZlcnMgdG8gZGVjbGFyZSBhIHByZS1k
ZWZpbmVkDQpsZW5ndGggb2YgbW9kdWxlIG5hbWUuLi4NCg0KPiA+ICsJCXJldHVybiAwOw0KPiA+
ICsNCj4gPiArCXJldHVybiAxOw0KPiA+ICt9DQo+ID4gK0VYUE9SVF9TWU1CT0woZWRhY19jaGVj
a19tY19vd25lcik7DQo+IA0KPiBFWFBPUlRfU1lNQk9MX0dQTA0KDQpXaWxsIGRvLg0KDQo+ID4g
wqANCj4gPiDCoC8qIEZJWE1FIC0gc2hvdWxkIGEgd2FybmluZyBiZSBwcmludGVkIGlmIG5vIGVy
cm9yIGRldGVjdGlvbj8NCj4gPiBjb3JyZWN0aW9uPyAqLw0KPiA+IMKgaW50IGVkYWNfbWNfYWRk
X21jX3dpdGhfZ3JvdXBzKHN0cnVjdCBtZW1fY3RsX2luZm8gKm1jaSwNCj4gPiBAQCAtNzQyLDcg
Kzc1NSw3IEBAIGludCBlZGFjX21jX2FkZF9tY193aXRoX2dyb3VwcyhzdHJ1Y3QNCj4gPiBtZW1f
Y3RsX2luZm8gKm1jaSwNCj4gPiDCoCNlbmRpZg0KPiA+IMKgCW11dGV4X2xvY2soJm1lbV9jdGxz
X211dGV4KTsNCj4gPiDCoA0KPiA+IC0JaWYgKGVkYWNfbWNfb3duZXIgJiYgZWRhY19tY19vd25l
ciAhPSBtY2ktPm1vZF9uYW1lKSB7DQo+ID4gKwlpZiAoIWVkYWNfY2hlY2tfbWNfb3duZXIobWNp
LT5tb2RfbmFtZSkpIHsNCj4gPiDCoAkJcmV0ID0gLUVQRVJNOw0KPiA+IMKgCQlnb3RvIGZhaWww
Ow0KPiA+IMKgCX0NCj4gPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9lZGFjL2VkYWNfbWMuaCBiL2Ry
aXZlcnMvZWRhYy9lZGFjX21jLmgNCj4gPiBpbmRleCA1MzU3ODAwLi4wZTk1ZWJhIDEwMDY0NA0K
PiA+IC0tLSBhL2RyaXZlcnMvZWRhYy9lZGFjX21jLmgNCj4gPiArKysgYi9kcml2ZXJzL2VkYWMv
ZWRhY19tYy5oDQo+ID4gQEAgLTEyOCw2ICsxMjgsMTggQEAgc3RydWN0IG1lbV9jdGxfaW5mbyAq
ZWRhY19tY19hbGxvYyh1bnNpZ25lZA0KPiA+IG1jX251bSwNCj4gPiDCoAkJCQnCoMKgwqB1bnNp
Z25lZCBzel9wdnQpOw0KPiA+IMKgDQo+ID4gwqAvKioNCj4gPiArICoNCj4gPiArICogZWRhY19j
aGVja19tY19vd25lciAtIENoZWNrIHRoZSBvd25lciBvZiBFREFDIE1DDQo+ID4gKyAqDQo+ID4g
KyAqIEBtb2RfbmFtZTogcG9pbnRlciB0byB0aGUgbW9kdWxlIG5hbWUNCj4gPiArICoNCj4gPiAr
ICogUmV0dXJuczoNCj4gPiArICoJMSB3aGVuIEVEQUMgTUMgaXMgZnJlZSBvciBvd25lZCBieSB0
aGUgbW9kdWxlIG5hbWUNCj4gPiArICoJMCB3aGVuIEVEQUMgTUMgaXMgb3duZWQgYnkgb3RoZXIg
bW9kdWxlDQo+ID4gKyAqLw0KPiANCj4gRG9jdW1lbnRpbmcgdGhhdCBmdW5jdGlvbiBvbmx5IG9u
Y2UgaXMgZW5vdWdoLg0KDQpJIHBlcnNvbmFsbHkgcHJlZmVyIHRvIGRvY3VtZW50IGluIC5jLCBi
dXQgc2luY2Ugb3RoZXIgZnVuY3MgZG9jdW1lbnRlZA0KaW4gZGFjX21jLmgsIEkgd2lsbCBrZWVw
IHRoZSBzYW1lIHN0eWxlLiAgV2lsbCByZW1vdmUgdGhlIGRvY3VtZW50IGluDQplZGFjX21jLmMu
IA0KDQo+ID4gK2V4dGVybiBpbnQgZWRhY19jaGVja19tY19vd25lcihjb25zdCBjaGFyICptb2Rf
bmFtZSk7DQo+ID4gKw0KPiA+ICsvKg0KPiA+IMKgICogZWRhY19tY19hZGRfbWNfd2l0aF9ncm91
cHMoKSAtIEluc2VydCB0aGUgQG1jaSBzdHJ1Y3R1cmUgaW50bw0KPiA+IHRoZSBtY2kNCj4gPiDC
oCAqCWdsb2JhbCBsaXN0IGFuZCBjcmVhdGUgc3lzZnMgZW50cmllcyBhc3NvY2lhdGVkIHdpdGgN
Cj4gPiBAbWNpIHN0cnVjdHVyZS4NCj4gPiDCoCAqDQoNClRoYW5rcywNCi1Ub3NoaQ0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 7/7] edac drivers: add MC owner check in init
@ 2017-08-04 21:48       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-04 21:48 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, 2017-08-04 at 10:39 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:53PM -0600, Toshi Kani wrote:
> > Change generic x86 edac drivers, which probe CPU type with
> > x86_match_cpu(), to call edac_check_mc_owner() in their
> > module init functions.  This allows them to fail their init
> > at the beginning when ghes_edac is enabled.  Similar change
> > can be made to other edac drivers as necessary.
> > 
> > This is an optimization and there is no functional change.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> > Suggested-by: Borislav Petkov <bp@alien8.de>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > ---
> >  drivers/edac/amd64_edac.c |    3 +++
> >  drivers/edac/pnd2_edac.c  |    7 ++++++-
> >  drivers/edac/sb_edac.c    |    7 +++++--
> >  drivers/edac/skx_edac.c   |    6 +++++-
> >  4 files changed, 19 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index 3aea556..cdb40d6 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -3444,6 +3444,9 @@ static int __init amd64_edac_init(void)
> >  	if (amd_cache_northbridges() < 0)
> >  		return -ENODEV;
> >  
> > +	if (!edac_check_mc_owner(EDAC_MOD_STR))
> > +		return -EBUSY;
> > +
> 
> That needs to happen first in the init function.

Not sure if anyone cares, but I thought it should return with -ENODEV
when this modules found no target, and -EBUSY when it found a target
but it's busy.  Hence, this ordering.

> >  	opstate_init();
> >  
> >  	err = -ENOMEM;
> > diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
> > index 8e59949..a5b7855 100644
> > --- a/drivers/edac/pnd2_edac.c
> > +++ b/drivers/edac/pnd2_edac.c
> > @@ -45,6 +45,8 @@
> >  #include "edac_module.h"
> >  #include "pnd2_edac.h"
> >  
> > +#define PND2_MOD_NAME		"pnd2_edac.c"
> 
> EDAC_MOD_STR and look how the other drivers define it, i.e., without
> the ".c"

OK, I will change to use EDAC_MOD_STR and remove ".c".

> > +
> >  #define APL_NUM_CHANNELS	4
> >  #define DNV_NUM_CHANNELS	2
> >  #define DNV_MAX_DIMMS		2 /* Max DIMMs per channel */
> > @@ -1313,7 +1315,7 @@ static int pnd2_register_mci(struct
> > mem_ctl_info **ppmci)
> >  	pvt = mci->pvt_info;
> >  	memset(pvt, 0, sizeof(*pvt));
> >  
> > -	mci->mod_name = "pnd2_edac.c";
> > +	mci->mod_name = PND2_MOD_NAME;
> >  	mci->dev_name = ops->name;
> >  	mci->ctl_name = "Pondicherry2";
> >  
> > @@ -1513,6 +1515,9 @@ static int __init pnd2_init(void)
> >  	if (!id)
> >  		return -ENODEV;
> >  
> > +	if (!edac_check_mc_owner(PND2_MOD_NAME))
> > +		return -EBUSY;
> 
> Also first thing to do in the function.
>
> > +
> >  	ops = (struct dunit_ops *)id->driver_data;
> >  
> >  	/* Ensure that the OPSTATE is set correctly for POLL or
> > NMI */
> > diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
> > index 80d860c..71bd66e 100644
> > --- a/drivers/edac/sb_edac.c
> > +++ b/drivers/edac/sb_edac.c
> > @@ -36,7 +36,7 @@ static LIST_HEAD(sbridge_edac_list);
> >   * Alter this version for the module when modifications are made
> >   */
> >  #define SBRIDGE_REVISION    " Ver: 1.1.2 "
> > -#define EDAC_MOD_STR      "sbridge_edac"
> > +#define SBRIDGE_MOD_NAME    "sb_edac.c"
> 
> Why? EDAC_MOD_STR is just fine.

I simply kept what it has today.  Will remove ".c".

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,7/7] edac drivers: add MC owner check in init
@ 2017-08-04 21:48       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-04 21:48 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

T24gRnJpLCAyMDE3LTA4LTA0IGF0IDEwOjM5ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIFRodSwgQXVnIDAzLCAyMDE3IGF0IDAzOjU3OjUzUE0gLTA2MDAsIFRvc2hpIEthbmkg
d3JvdGU6DQo+ID4gQ2hhbmdlIGdlbmVyaWMgeDg2IGVkYWMgZHJpdmVycywgd2hpY2ggcHJvYmUg
Q1BVIHR5cGUgd2l0aA0KPiA+IHg4Nl9tYXRjaF9jcHUoKSwgdG8gY2FsbCBlZGFjX2NoZWNrX21j
X293bmVyKCkgaW4gdGhlaXINCj4gPiBtb2R1bGUgaW5pdCBmdW5jdGlvbnMuwqDCoFRoaXMgYWxs
b3dzIHRoZW0gdG8gZmFpbCB0aGVpciBpbml0DQo+ID4gYXQgdGhlIGJlZ2lubmluZyB3aGVuIGdo
ZXNfZWRhYyBpcyBlbmFibGVkLsKgwqBTaW1pbGFyIGNoYW5nZQ0KPiA+IGNhbiBiZSBtYWRlIHRv
IG90aGVyIGVkYWMgZHJpdmVycyBhcyBuZWNlc3NhcnkuDQo+ID4gDQo+ID4gVGhpcyBpcyBhbiBv
cHRpbWl6YXRpb24gYW5kIHRoZXJlIGlzIG5vIGZ1bmN0aW9uYWwgY2hhbmdlLg0KPiA+IA0KPiA+
IFNpZ25lZC1vZmYtYnk6IFRvc2hpIEthbmkgPHRvc2hpLmthbmlAaHBlLmNvbT4NCj4gPiBTdWdn
ZXN0ZWQtYnk6IEJvcmlzbGF2IFBldGtvdiA8YnBAYWxpZW44LmRlPg0KPiA+IENjOiBCb3Jpc2xh
diBQZXRrb3YgPGJwQGFsaWVuOC5kZT4NCj4gPiBDYzogTWF1cm8gQ2FydmFsaG8gQ2hlaGFiIDxt
Y2hlaGFiQGtlcm5lbC5vcmc+DQo+ID4gQ2M6IFRvbnkgTHVjayA8dG9ueS5sdWNrQGludGVsLmNv
bT4NCj4gPiAtLS0NCj4gPiDCoGRyaXZlcnMvZWRhYy9hbWQ2NF9lZGFjLmMgfMKgwqDCoMKgMyAr
KysNCj4gPiDCoGRyaXZlcnMvZWRhYy9wbmQyX2VkYWMuY8KgwqB8wqDCoMKgwqA3ICsrKysrKy0N
Cj4gPiDCoGRyaXZlcnMvZWRhYy9zYl9lZGFjLmPCoMKgwqDCoHzCoMKgwqDCoDcgKysrKystLQ0K
PiA+IMKgZHJpdmVycy9lZGFjL3NreF9lZGFjLmPCoMKgwqB8wqDCoMKgwqA2ICsrKysrLQ0KPiA+
IMKgNCBmaWxlcyBjaGFuZ2VkLCAxOSBpbnNlcnRpb25zKCspLCA0IGRlbGV0aW9ucygtKQ0KPiA+
IA0KPiA+IGRpZmYgLS1naXQgYS9kcml2ZXJzL2VkYWMvYW1kNjRfZWRhYy5jIGIvZHJpdmVycy9l
ZGFjL2FtZDY0X2VkYWMuYw0KPiA+IGluZGV4IDNhZWE1NTYuLmNkYjQwZDYgMTAwNjQ0DQo+ID4g
LS0tIGEvZHJpdmVycy9lZGFjL2FtZDY0X2VkYWMuYw0KPiA+ICsrKyBiL2RyaXZlcnMvZWRhYy9h
bWQ2NF9lZGFjLmMNCj4gPiBAQCAtMzQ0NCw2ICszNDQ0LDkgQEAgc3RhdGljIGludCBfX2luaXQg
YW1kNjRfZWRhY19pbml0KHZvaWQpDQo+ID4gwqAJaWYgKGFtZF9jYWNoZV9ub3J0aGJyaWRnZXMo
KSA8IDApDQo+ID4gwqAJCXJldHVybiAtRU5PREVWOw0KPiA+IMKgDQo+ID4gKwlpZiAoIWVkYWNf
Y2hlY2tfbWNfb3duZXIoRURBQ19NT0RfU1RSKSkNCj4gPiArCQlyZXR1cm4gLUVCVVNZOw0KPiA+
ICsNCj4gDQo+IFRoYXQgbmVlZHMgdG8gaGFwcGVuIGZpcnN0IGluIHRoZSBpbml0IGZ1bmN0aW9u
Lg0KDQpOb3Qgc3VyZSBpZiBhbnlvbmUgY2FyZXMsIGJ1dCBJIHRob3VnaHQgaXQgc2hvdWxkIHJl
dHVybiB3aXRoIC1FTk9ERVYNCndoZW4gdGhpcyBtb2R1bGVzIGZvdW5kIG5vIHRhcmdldCwgYW5k
IC1FQlVTWSB3aGVuIGl0IGZvdW5kIGEgdGFyZ2V0DQpidXQgaXQncyBidXN5LiAgSGVuY2UsIHRo
aXMgb3JkZXJpbmcuDQoNCj4gPiDCoAlvcHN0YXRlX2luaXQoKTsNCj4gPiDCoA0KPiA+IMKgCWVy
ciA9IC1FTk9NRU07DQo+ID4gZGlmZiAtLWdpdCBhL2RyaXZlcnMvZWRhYy9wbmQyX2VkYWMuYyBi
L2RyaXZlcnMvZWRhYy9wbmQyX2VkYWMuYw0KPiA+IGluZGV4IDhlNTk5NDkuLmE1Yjc4NTUgMTAw
NjQ0DQo+ID4gLS0tIGEvZHJpdmVycy9lZGFjL3BuZDJfZWRhYy5jDQo+ID4gKysrIGIvZHJpdmVy
cy9lZGFjL3BuZDJfZWRhYy5jDQo+ID4gQEAgLTQ1LDYgKzQ1LDggQEANCj4gPiDCoCNpbmNsdWRl
ICJlZGFjX21vZHVsZS5oIg0KPiA+IMKgI2luY2x1ZGUgInBuZDJfZWRhYy5oIg0KPiA+IMKgDQo+
ID4gKyNkZWZpbmUgUE5EMl9NT0RfTkFNRQkJInBuZDJfZWRhYy5jIg0KPiANCj4gRURBQ19NT0Rf
U1RSIGFuZCBsb29rIGhvdyB0aGUgb3RoZXIgZHJpdmVycyBkZWZpbmUgaXQsIGkuZS4sIHdpdGhv
dXQNCj4gdGhlICIuYyINCg0KT0ssIEkgd2lsbCBjaGFuZ2UgdG8gdXNlIEVEQUNfTU9EX1NUUiBh
bmQgcmVtb3ZlICIuYyIuDQoNCj4gPiArDQo+ID4gwqAjZGVmaW5lIEFQTF9OVU1fQ0hBTk5FTFMJ
NA0KPiA+IMKgI2RlZmluZSBETlZfTlVNX0NIQU5ORUxTCTINCj4gPiDCoCNkZWZpbmUgRE5WX01B
WF9ESU1NUwkJMiAvKiBNYXggRElNTXMgcGVyIGNoYW5uZWwgKi8NCj4gPiBAQCAtMTMxMyw3ICsx
MzE1LDcgQEAgc3RhdGljIGludCBwbmQyX3JlZ2lzdGVyX21jaShzdHJ1Y3QNCj4gPiBtZW1fY3Rs
X2luZm8gKipwcG1jaSkNCj4gPiDCoAlwdnQgPSBtY2ktPnB2dF9pbmZvOw0KPiA+IMKgCW1lbXNl
dChwdnQsIDAsIHNpemVvZigqcHZ0KSk7DQo+ID4gwqANCj4gPiAtCW1jaS0+bW9kX25hbWUgPSAi
cG5kMl9lZGFjLmMiOw0KPiA+ICsJbWNpLT5tb2RfbmFtZSA9IFBORDJfTU9EX05BTUU7DQo+ID4g
wqAJbWNpLT5kZXZfbmFtZSA9IG9wcy0+bmFtZTsNCj4gPiDCoAltY2ktPmN0bF9uYW1lID0gIlBv
bmRpY2hlcnJ5MiI7DQo+ID4gwqANCj4gPiBAQCAtMTUxMyw2ICsxNTE1LDkgQEAgc3RhdGljIGlu
dCBfX2luaXQgcG5kMl9pbml0KHZvaWQpDQo+ID4gwqAJaWYgKCFpZCkNCj4gPiDCoAkJcmV0dXJu
IC1FTk9ERVY7DQo+ID4gwqANCj4gPiArCWlmICghZWRhY19jaGVja19tY19vd25lcihQTkQyX01P
RF9OQU1FKSkNCj4gPiArCQlyZXR1cm4gLUVCVVNZOw0KPiANCj4gQWxzbyBmaXJzdCB0aGluZyB0
byBkbyBpbiB0aGUgZnVuY3Rpb24uDQo+DQo+ID4gKw0KPiA+IMKgCW9wcyA9IChzdHJ1Y3QgZHVu
aXRfb3BzICopaWQtPmRyaXZlcl9kYXRhOw0KPiA+IMKgDQo+ID4gwqAJLyogRW5zdXJlIHRoYXQg
dGhlIE9QU1RBVEUgaXMgc2V0IGNvcnJlY3RseSBmb3IgUE9MTCBvcg0KPiA+IE5NSSAqLw0KPiA+
IGRpZmYgLS1naXQgYS9kcml2ZXJzL2VkYWMvc2JfZWRhYy5jIGIvZHJpdmVycy9lZGFjL3NiX2Vk
YWMuYw0KPiA+IGluZGV4IDgwZDg2MGMuLjcxYmQ2NmUgMTAwNjQ0DQo+ID4gLS0tIGEvZHJpdmVy
cy9lZGFjL3NiX2VkYWMuYw0KPiA+ICsrKyBiL2RyaXZlcnMvZWRhYy9zYl9lZGFjLmMNCj4gPiBA
QCAtMzYsNyArMzYsNyBAQCBzdGF0aWMgTElTVF9IRUFEKHNicmlkZ2VfZWRhY19saXN0KTsNCj4g
PiDCoCAqIEFsdGVyIHRoaXMgdmVyc2lvbiBmb3IgdGhlIG1vZHVsZSB3aGVuIG1vZGlmaWNhdGlv
bnMgYXJlIG1hZGUNCj4gPiDCoCAqLw0KPiA+IMKgI2RlZmluZSBTQlJJREdFX1JFVklTSU9OwqDC
oMKgwqAiIFZlcjogMS4xLjIgIg0KPiA+IC0jZGVmaW5lIEVEQUNfTU9EX1NUUsKgwqDCoMKgwqDC
oCJzYnJpZGdlX2VkYWMiDQo+ID4gKyNkZWZpbmUgU0JSSURHRV9NT0RfTkFNRcKgwqDCoMKgInNi
X2VkYWMuYyINCj4gDQo+IFdoeT8gRURBQ19NT0RfU1RSIGlzIGp1c3QgZmluZS4NCg0KSSBzaW1w
bHkga2VwdCB3aGF0IGl0IGhhcyB0b2RheS4gIFdpbGwgcmVtb3ZlICIuYyIuDQoNClRoYW5rcywN
Ci1Ub3NoaQ0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-05  5:12         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:12 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 08:39:35PM +0000, Kani, Toshimitsu wrote:
> Well, we did talk a lot about your suggested name, "acpi_blacklist",
> and I explained that it did not work since it'd be used for both black
> and white-list.  We've agreed on that.
> 
> You then suggested it's "platform", not "OEM".  Since this is an ACPI
> structure defined in "acpi.h", its terminology generally follows ACPI
> spec, which I did.
> 
> I understand that you feel strongly against "OEM" (sorry about that). 
> How about "acpi_platform_list"?  Does it work for you?

Yes, I thought we agreed on all that.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-05  5:12         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:12 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 08:39:35PM +0000, Kani, Toshimitsu wrote:
> Well, we did talk a lot about your suggested name, "acpi_blacklist",
> and I explained that it did not work since it'd be used for both black
> and white-list.  We've agreed on that.
> 
> You then suggested it's "platform", not "OEM".  Since this is an ACPI
> structure defined in "acpi.h", its terminology generally follows ACPI
> spec, which I did.
> 
> I understand that you feel strongly against "OEM" (sorry about that). 
> How about "acpi_platform_list"?  Does it work for you?

Yes, I thought we agreed on all that.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-05  5:14         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:14 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 08:49:51PM +0000, Kani, Toshimitsu wrote:
> Some firmware features can be enabled / disabled in BIOS.  While HPE
> firmware does not allow to disable FF, it's possible that other vendors
> might allow such and still have the same OEM ID info.

Yeah, we don't protect ourselves against "might allow". Either you have
an *actual* example for such platform or you don't and then you don't
need this check.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-05  5:14         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:14 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 08:49:51PM +0000, Kani, Toshimitsu wrote:
> Some firmware features can be enabled / disabled in BIOS.  While HPE
> firmware does not allow to disable FF, it's possible that other vendors
> might allow such and still have the same OEM ID info.

Yeah, we don't protect ourselves against "might allow". Either you have
an *actual* example for such platform or you don't and then you don't
need this check.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-05  5:16         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:16 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 09:02:17PM +0000, Kani, Toshimitsu wrote:
> GHES platform devices correspond to GHES entries, which define firmware
> interfaces to report generic memory errors to the OS, such as NMI and
> SCI.  These devices are associated with all DIMMs, not a particular
> DIMM.

And? Stating the obvious brings you what exactly?

IOW, you still haven't answered my question.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-05  5:16         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:16 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 09:02:17PM +0000, Kani, Toshimitsu wrote:
> GHES platform devices correspond to GHES entries, which define firmware
> interfaces to report generic memory errors to the OS, such as NMI and
> SCI.  These devices are associated with all DIMMs, not a particular
> DIMM.

And? Stating the obvious brings you what exactly?

IOW, you still haven't answered my question.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-05  5:37         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:37 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 09:06:27PM +0000, Kani, Toshimitsu wrote:
> How about "ghes_edac.any_platform"?

ghes_edac.force_load

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-05  5:37         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:37 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 09:06:27PM +0000, Kani, Toshimitsu wrote:
> How about "ghes_edac.any_platform"?

ghes_edac.force_load

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-05  5:44         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:44 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 09:35:05PM +0000, Kani, Toshimitsu wrote:
> 1 means the caller's init function can continue its initialization --
> such conditions are free or owned by itself.

Make that:

	edac_get_owner(void)

to return the owner string or NULL if there's no owner.

The caller can then do the string checking and do strncmp() to limit the
string length being checked.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-05  5:44         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:44 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 09:35:05PM +0000, Kani, Toshimitsu wrote:
> 1 means the caller's init function can continue its initialization --
> such conditions are free or owned by itself.

Make that:

	edac_get_owner(void)

to return the owner string or NULL if there's no owner.

The caller can then do the string checking and do strncmp() to limit the
string length being checked.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 7/7] edac drivers: add MC owner check in init
@ 2017-08-05  5:49         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:49 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 09:48:23PM +0000, Kani, Toshimitsu wrote:
> Not sure if anyone cares, but I thought it should return with -ENODEV
> when this modules found no target, and -EBUSY when it found a target
> but it's busy.  Hence, this ordering.

You can still return -EBUSY. Just do the owner check first because if
we're going to return due to ghes_edac being loaded already, we might
just as well skip the other checks as they're a pointless waste of
cycles and energy.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,7/7] edac drivers: add MC owner check in init
@ 2017-08-05  5:49         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-05  5:49 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Fri, Aug 04, 2017 at 09:48:23PM +0000, Kani, Toshimitsu wrote:
> Not sure if anyone cares, but I thought it should return with -ENODEV
> when this modules found no target, and -EBUSY when it found a target
> but it's busy.  Hence, this ordering.

You can still return -EBUSY. Just do the owner check first because if
we're going to return due to ghes_edac being loaded already, we might
just as well skip the other checks as they're a pointless waste of
cycles and energy.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-07 14:49           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-07 14:49 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:12 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 08:39:35PM +0000, Kani, Toshimitsu wrote:
> > Well, we did talk a lot about your suggested name,
> > "acpi_blacklist", and I explained that it did not work since it'd
> > be used for both black and white-list.  We've agreed on that.
> > 
> > You then suggested it's "platform", not "OEM".  Since this is an
> > ACPI structure defined in "acpi.h", its terminology generally
> > follows ACPI spec, which I did.
> > 
> > I understand that you feel strongly against "OEM" (sorry about
> > that). How about "acpi_platform_list"?  Does it work for you?
> 
> Yes, I thought we agreed on all that.

Sounds good.  Will change to "acpi_platform_list".

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,1/7] ACPI / blacklist: add acpi_match_oemlist() interface
@ 2017-08-07 14:49           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-07 14:49 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:12 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 08:39:35PM +0000, Kani, Toshimitsu wrote:
> > Well, we did talk a lot about your suggested name,
> > "acpi_blacklist", and I explained that it did not work since it'd
> > be used for both black and white-list.  We've agreed on that.
> > 
> > You then suggested it's "platform", not "OEM".  Since this is an
> > ACPI structure defined in "acpi.h", its terminology generally
> > follows ACPI spec, which I did.
> > 
> > I understand that you feel strongly against "OEM" (sorry about
> > that). How about "acpi_platform_list"?  Does it work for you?
> 
> Yes, I thought we agreed on all that.

Sounds good.  Will change to "acpi_platform_list".

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-07 14:50           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-07 14:50 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:14 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 08:49:51PM +0000, Kani, Toshimitsu wrote:
> > Some firmware features can be enabled / disabled in BIOS.  While
> > HPE firmware does not allow to disable FF, it's possible that other
> > vendors might allow such and still have the same OEM ID info.
> 
> Yeah, we don't protect ourselves against "might allow". Either you
> have an *actual* example for such platform or you don't and then you
> don't need this check.

No, I do not any actual example.  Will drop this patch.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac
@ 2017-08-07 14:50           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-07 14:50 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:14 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 08:49:51PM +0000, Kani, Toshimitsu wrote:
> > Some firmware features can be enabled / disabled in BIOS.  While
> > HPE firmware does not allow to disable FF, it's possible that other
> > vendors might allow such and still have the same OEM ID info.
> 
> Yeah, we don't protect ourselves against "might allow". Either you
> have an *actual* example for such platform or you don't and then you
> don't need this check.

No, I do not any actual example.  Will drop this patch.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-07 14:54           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-07 14:54 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:37 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 09:06:27PM +0000, Kani, Toshimitsu wrote:
> > How about "ghes_edac.any_platform"?
> 
> ghes_edac.force_load

Sounds good.  Will do.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,5/7] ghes_edac: add platform check to enable ghes_edac
@ 2017-08-07 14:54           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-07 14:54 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

T24gU2F0LCAyMDE3LTA4LTA1IGF0IDA3OjM3ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIEZyaSwgQXVnIDA0LCAyMDE3IGF0IDA5OjA2OjI3UE0gKzAwMDAsIEthbmksIFRvc2hp
bWl0c3Ugd3JvdGU6DQo+ID4gSG93IGFib3V0ICJnaGVzX2VkYWMuYW55X3BsYXRmb3JtIj8NCj4g
DQo+IGdoZXNfZWRhYy5mb3JjZV9sb2FkDQoNClNvdW5kcyBnb29kLiAgV2lsbCBkby4NCg0KVGhh
bmtzLA0KLVRvc2hpDQoNCg0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-07 14:55           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-07 14:55 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:44 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 09:35:05PM +0000, Kani, Toshimitsu wrote:
> > 1 means the caller's init function can continue its initialization
> > --
> > such conditions are free or owned by itself.
> 
> Make that:
> 
> 	edac_get_owner(void)
> 
> to return the owner string or NULL if there's no owner.
> 
> The caller can then do the string checking and do strncmp() to limit
> the string length being checked.

Good idea!  Will do.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,6/7] EDAC: add edac_check_mc_owner() to check MC owner
@ 2017-08-07 14:55           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-07 14:55 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

T24gU2F0LCAyMDE3LTA4LTA1IGF0IDA3OjQ0ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIEZyaSwgQXVnIDA0LCAyMDE3IGF0IDA5OjM1OjA1UE0gKzAwMDAsIEthbmksIFRvc2hp
bWl0c3Ugd3JvdGU6DQo+ID4gMSBtZWFucyB0aGUgY2FsbGVyJ3MgaW5pdCBmdW5jdGlvbiBjYW4g
Y29udGludWUgaXRzIGluaXRpYWxpemF0aW9uDQo+ID4gLS0NCj4gPiBzdWNoIGNvbmRpdGlvbnMg
YXJlIGZyZWUgb3Igb3duZWQgYnkgaXRzZWxmLg0KPiANCj4gTWFrZSB0aGF0Og0KPiANCj4gCWVk
YWNfZ2V0X293bmVyKHZvaWQpDQo+IA0KPiB0byByZXR1cm4gdGhlIG93bmVyIHN0cmluZyBvciBO
VUxMIGlmIHRoZXJlJ3Mgbm8gb3duZXIuDQo+IA0KPiBUaGUgY2FsbGVyIGNhbiB0aGVuIGRvIHRo
ZSBzdHJpbmcgY2hlY2tpbmcgYW5kIGRvIHN0cm5jbXAoKSB0byBsaW1pdA0KPiB0aGUgc3RyaW5n
IGxlbmd0aCBiZWluZyBjaGVja2VkLg0KDQpHb29kIGlkZWEhICBXaWxsIGRvLg0KDQpUaGFua3Ms
DQotVG9zaGkNCg0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 7/7] edac drivers: add MC owner check in init
@ 2017-08-07 14:57           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-07 14:57 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:49 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 09:48:23PM +0000, Kani, Toshimitsu wrote:
> > Not sure if anyone cares, but I thought it should return with
> > -ENODEV when this modules found no target, and -EBUSY when it found
> > a target but it's busy.  Hence, this ordering.
> 
> You can still return -EBUSY. Just do the owner check first because if
> we're going to return due to ghes_edac being loaded already, we might
> just as well skip the other checks as they're a pointless waste of
> cycles and energy.

Got it.  Will move up the check.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,7/7] edac drivers: add MC owner check in init
@ 2017-08-07 14:57           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-07 14:57 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

T24gU2F0LCAyMDE3LTA4LTA1IGF0IDA3OjQ5ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIEZyaSwgQXVnIDA0LCAyMDE3IGF0IDA5OjQ4OjIzUE0gKzAwMDAsIEthbmksIFRvc2hp
bWl0c3Ugd3JvdGU6DQo+ID4gTm90IHN1cmUgaWYgYW55b25lIGNhcmVzLCBidXQgSSB0aG91Z2h0
IGl0IHNob3VsZCByZXR1cm4gd2l0aA0KPiA+IC1FTk9ERVYgd2hlbiB0aGlzIG1vZHVsZXMgZm91
bmQgbm8gdGFyZ2V0LCBhbmQgLUVCVVNZIHdoZW4gaXQgZm91bmQNCj4gPiBhIHRhcmdldCBidXQg
aXQncyBidXN5LsKgwqBIZW5jZSwgdGhpcyBvcmRlcmluZy4NCj4gDQo+IFlvdSBjYW4gc3RpbGwg
cmV0dXJuIC1FQlVTWS4gSnVzdCBkbyB0aGUgb3duZXIgY2hlY2sgZmlyc3QgYmVjYXVzZSBpZg0K
PiB3ZSdyZSBnb2luZyB0byByZXR1cm4gZHVlIHRvIGdoZXNfZWRhYyBiZWluZyBsb2FkZWQgYWxy
ZWFkeSwgd2UgbWlnaHQNCj4ganVzdCBhcyB3ZWxsIHNraXAgdGhlIG90aGVyIGNoZWNrcyBhcyB0
aGV5J3JlIGEgcG9pbnRsZXNzIHdhc3RlIG9mDQo+IGN5Y2xlcyBhbmQgZW5lcmd5Lg0KDQpHb3Qg
aXQuICBXaWxsIG1vdmUgdXAgdGhlIGNoZWNrLg0KDQpUaGFua3MsDQotVG9zaGkNCg0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-07 17:59           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-07 17:59 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:16 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 09:02:17PM +0000, Kani, Toshimitsu wrote:
> > GHES platform devices correspond to GHES entries, which define
> > firmware interfaces to report generic memory errors to the OS, such
> > as NMI and SCI.  These devices are associated with all DIMMs, not a
> > particular DIMM.
> 
> And? Stating the obvious brings you what exactly?
> 
> IOW, you still haven't answered my question.

Sorry about that.  Let me copy your original comments to make sure I
answer your questions this time.

> So the problem is that ghes_edac_register() gets called multiple
> times depending on how many GHES platform devices are on the system.
> But yet they all scan *all* DIMMs. 

Right, and this patch changes ghes_edac_register() to scan all DIMMs at
the first time and do not scan them next times.

> So instead you should return if
> the DIMMs have been counted already and not register a second time.
> Which makes that whole mc counting kinda useless. So you could rip
>
that out too.

Oh, I see.  So, ghes_edac_register() should return no-op a second time,
and does not call edac_mc_add_mc() to register with a separate mci.

I think we should keep the current scheme, which registers an mci for
each GHES entry.  ghes_edac_report_mem_error() expects that error-
reporting is serialized per a GHES entry.  Sharing a single mci among
all GHES entries / error interfaces might lead to a race condition.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-07 17:59           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-07 17:59 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Sat, 2017-08-05 at 07:16 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 09:02:17PM +0000, Kani, Toshimitsu wrote:
> > GHES platform devices correspond to GHES entries, which define
> > firmware interfaces to report generic memory errors to the OS, such
> > as NMI and SCI.  These devices are associated with all DIMMs, not a
> > particular DIMM.
> 
> And? Stating the obvious brings you what exactly?
> 
> IOW, you still haven't answered my question.

Sorry about that.  Let me copy your original comments to make sure I
answer your questions this time.

> So the problem is that ghes_edac_register() gets called multiple
> times depending on how many GHES platform devices are on the system.
> But yet they all scan *all* DIMMs. 

Right, and this patch changes ghes_edac_register() to scan all DIMMs at
the first time and do not scan them next times.

> So instead you should return if
> the DIMMs have been counted already and not register a second time.
> Which makes that whole mc counting kinda useless. So you could rip
>
that out too.

Oh, I see.  So, ghes_edac_register() should return no-op a second time,
and does not call edac_mc_add_mc() to register with a separate mci.

I think we should keep the current scheme, which registers an mci for
each GHES entry.  ghes_edac_report_mem_error() expects that error-
reporting is serialized per a GHES entry.  Sharing a single mci among
all GHES entries / error interfaces might lead to a race condition.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-11  9:04             ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-11  9:04 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, Aug 07, 2017 at 05:59:15PM +0000, Kani, Toshimitsu wrote:
> I think we should keep the current scheme, which registers an mci for

No we shouldn't.

> each GHES entry.  ghes_edac_report_mem_error() expects that error-
> reporting is serialized per a GHES entry.  Sharing a single mci among
> all GHES entries / error interfaces might lead to a race condition.

See how I solved it in my patchset and feel free to reuse it.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-11  9:04             ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-11  9:04 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, Aug 07, 2017 at 05:59:15PM +0000, Kani, Toshimitsu wrote:
> I think we should keep the current scheme, which registers an mci for

No we shouldn't.

> each GHES entry.  ghes_edac_report_mem_error() expects that error-
> reporting is serialized per a GHES entry.  Sharing a single mci among
> all GHES entries / error interfaces might lead to a race condition.

See how I solved it in my patchset and feel free to reuse it.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 15:57               ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-14 15:57 UTC (permalink / raw)
  To: bp; +Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Fri, 2017-08-11 at 11:04 +0200, Borislav Petkov wrote:
> On Mon, Aug 07, 2017 at 05:59:15PM +0000, Kani, Toshimitsu wrote:
> > I think we should keep the current scheme, which registers an mci
> > for
> 
> No we shouldn't.
> 
> > each GHES entry.  ghes_edac_report_mem_error() expects that error-
> > reporting is serialized per a GHES entry.  Sharing a single mci
> > among all GHES entries / error interfaces might lead to a race
> > condition.
> 
> See how I solved it in my patchset and feel free to reuse it.

Hmm... Sorry, I failed to see how your patchset solved it.  Would you
mind to explain how it is done?

Thanks!
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 15:57               ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-14 15:57 UTC (permalink / raw)
  To: bp; +Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

T24gRnJpLCAyMDE3LTA4LTExIGF0IDExOjA0ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIE1vbiwgQXVnIDA3LCAyMDE3IGF0IDA1OjU5OjE1UE0gKzAwMDAsIEthbmksIFRvc2hp
bWl0c3Ugd3JvdGU6DQo+ID4gSSB0aGluayB3ZSBzaG91bGQga2VlcCB0aGUgY3VycmVudCBzY2hl
bWUsIHdoaWNoIHJlZ2lzdGVycyBhbiBtY2kNCj4gPiBmb3INCj4gDQo+IE5vIHdlIHNob3VsZG4n
dC4NCj4gDQo+ID4gZWFjaCBHSEVTIGVudHJ5LsKgwqBnaGVzX2VkYWNfcmVwb3J0X21lbV9lcnJv
cigpIGV4cGVjdHMgdGhhdCBlcnJvci0NCj4gPiByZXBvcnRpbmcgaXMgc2VyaWFsaXplZCBwZXIg
YSBHSEVTIGVudHJ5LsKgwqBTaGFyaW5nIGEgc2luZ2xlIG1jaQ0KPiA+IGFtb25nIGFsbCBHSEVT
IGVudHJpZXMgLyBlcnJvciBpbnRlcmZhY2VzIG1pZ2h0IGxlYWQgdG8gYSByYWNlDQo+ID4gY29u
ZGl0aW9uLg0KPiANCj4gU2VlIGhvdyBJIHNvbHZlZCBpdCBpbiBteSBwYXRjaHNldCBhbmQgZmVl
bCBmcmVlIHRvIHJldXNlIGl0Lg0KDQpIbW0uLi4gU29ycnksIEkgZmFpbGVkIHRvIHNlZSBob3cg
eW91ciBwYXRjaHNldCBzb2x2ZWQgaXQuICBXb3VsZCB5b3UNCm1pbmQgdG/CoGV4cGxhaW4gaG93
IGl0IGlzIGRvbmU/DQoNClRoYW5rcyENCi1Ub3NoaQ0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 16:24                 ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 16:24 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Mon, Aug 14, 2017 at 03:57:35PM +0000, Kani, Toshimitsu wrote:
> Hmm... Sorry, I failed to see how your patchset solved it.  Would you
> mind to explain how it is done?

+static int __init ghes_edac_register(void)
 {
+       struct ghes_edac_pvt *pvt = ghes_pvt;

Only one local ghes_pvt structure.

And you handle multiple calls into ghes_edac_register() by exiting all
those which are not the first one as the first one already did all the
init.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 16:24                 ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 16:24 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Mon, Aug 14, 2017 at 03:57:35PM +0000, Kani, Toshimitsu wrote:
> Hmm... Sorry, I failed to see how your patchset solved it.  Would you
> mind to explain how it is done?

+static int __init ghes_edac_register(void)
 {
+       struct ghes_edac_pvt *pvt = ghes_pvt;

Only one local ghes_pvt structure.

And you handle multiple calls into ghes_edac_register() by exiting all
those which are not the first one as the first one already did all the
init.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 16:48                   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-14 16:48 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Mon, 2017-08-14 at 18:24 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 03:57:35PM +0000, Kani, Toshimitsu wrote:
> > Hmm... Sorry, I failed to see how your patchset solved it.  Would
> > you mind to explain how it is done?
> 
> +static int __init ghes_edac_register(void)
>  {
> +       struct ghes_edac_pvt *pvt = ghes_pvt;
> 
> Only one local ghes_pvt structure.
> 
> And you handle multiple calls into ghes_edac_register() by exiting
> all those which are not the first one as the first one already did
> all the init.

Right, but the issue is how [ghes_edac_]report_mem_error() protects
from possible concurrent calls from multiple GHES sources when there is
only a single mci.

Thanks,
-Toshi
 

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 16:48                   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-14 16:48 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

T24gTW9uLCAyMDE3LTA4LTE0IGF0IDE4OjI0ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIE1vbiwgQXVnIDE0LCAyMDE3IGF0IDAzOjU3OjM1UE0gKzAwMDAsIEthbmksIFRvc2hp
bWl0c3Ugd3JvdGU6DQo+ID4gSG1tLi4uIFNvcnJ5LCBJIGZhaWxlZCB0byBzZWUgaG93IHlvdXIg
cGF0Y2hzZXQgc29sdmVkIGl0LsKgwqBXb3VsZA0KPiA+IHlvdSBtaW5kIHRvwqBleHBsYWluIGhv
dyBpdCBpcyBkb25lPw0KPiANCj4gK3N0YXRpYyBpbnQgX19pbml0IGdoZXNfZWRhY19yZWdpc3Rl
cih2b2lkKQ0KPiDCoHsNCj4gK8KgwqDCoMKgwqDCoMKgc3RydWN0IGdoZXNfZWRhY19wdnQgKnB2
dCA9IGdoZXNfcHZ0Ow0KPiANCj4gT25seSBvbmUgbG9jYWwgZ2hlc19wdnQgc3RydWN0dXJlLg0K
PiANCj4gQW5kIHlvdSBoYW5kbGUgbXVsdGlwbGUgY2FsbHMgaW50byBnaGVzX2VkYWNfcmVnaXN0
ZXIoKSBieSBleGl0aW5nDQo+IGFsbCB0aG9zZSB3aGljaCBhcmUgbm90IHRoZSBmaXJzdCBvbmUg
YXMgdGhlIGZpcnN0IG9uZSBhbHJlYWR5IGRpZA0KPiBhbGwgdGhlIGluaXQuDQoNClJpZ2h0LCBi
dXQgdGhlIGlzc3VlIGlzIGhvdyBbZ2hlc19lZGFjX11yZXBvcnRfbWVtX2Vycm9yKCkgcHJvdGVj
dHMNCmZyb20gcG9zc2libGUgY29uY3VycmVudCBjYWxscyBmcm9tIG11bHRpcGxlIEdIRVMgc291
cmNlcyB3aGVuIHRoZXJlIGlzDQpvbmx5IGEgc2luZ2xlIG1jaS4NCg0KVGhhbmtzLA0KLVRvc2hp
DQog
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 17:05                     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 17:05 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Mon, Aug 14, 2017 at 04:48:57PM +0000, Kani, Toshimitsu wrote:
> Right, but the issue is how [ghes_edac_]report_mem_error() protects
> from possible concurrent calls from multiple GHES sources when there is
> only a single mci.

Do you know of an actual firmware reporting multiple errors concurrently?

GHES v2 even needs to ACK the current error first before it can read the
next one.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 17:05                     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 17:05 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Mon, Aug 14, 2017 at 04:48:57PM +0000, Kani, Toshimitsu wrote:
> Right, but the issue is how [ghes_edac_]report_mem_error() protects
> from possible concurrent calls from multiple GHES sources when there is
> only a single mci.

Do you know of an actual firmware reporting multiple errors concurrently?

GHES v2 even needs to ACK the current error first before it can read the
next one.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 17:52                       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-14 17:52 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, 2017-08-14 at 19:05 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 04:48:57PM +0000, Kani, Toshimitsu wrote:
> > Right, but the issue is how [ghes_edac_]report_mem_error() protects
> > from possible concurrent calls from multiple GHES sources when
> > there is only a single mci.
> 
> Do you know of an actual firmware reporting multiple errors
> concurrently?

I do not know.  We have multiple GHES entries, but they all use SCI. 
Since ACPICA uses a single threaded workqueue for notify handlers, they
are serialized among SCIs.

ACPI 6.2 defines multiple notification types in Table 18-383, and
ghes_proc() can be called from ghes_poll_func(), ghes_irq_func(), and
ghes_notify_sci().  So, I think it is safe to operate per an entry
basis.

> GHES v2 even needs to ACK the current error first before it can read
> the next one.

Yes, but this ACK is done per a GHES entry as well.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 17:52                       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-14 17:52 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, 2017-08-14 at 19:05 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 04:48:57PM +0000, Kani, Toshimitsu wrote:
> > Right, but the issue is how [ghes_edac_]report_mem_error() protects
> > from possible concurrent calls from multiple GHES sources when
> > there is only a single mci.
> 
> Do you know of an actual firmware reporting multiple errors
> concurrently?

I do not know.  We have multiple GHES entries, but they all use SCI. 
Since ACPICA uses a single threaded workqueue for notify handlers, they
are serialized among SCIs.

ACPI 6.2 defines multiple notification types in Table 18-383, and
ghes_proc() can be called from ghes_poll_func(), ghes_irq_func(), and
ghes_notify_sci().  So, I think it is safe to operate per an entry
basis.

> GHES v2 even needs to ACK the current error first before it can read
> the next one.

Yes, but this ACK is done per a GHES entry as well.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 18:05                         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 18:05 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, Aug 14, 2017 at 05:52:25PM +0000, Kani, Toshimitsu wrote:
> Yes, but this ACK is done per a GHES entry as well.

So is the ghes_edac_report_mem_error() call.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 18:05                         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 18:05 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, Aug 14, 2017 at 05:52:25PM +0000, Kani, Toshimitsu wrote:
> Yes, but this ACK is done per a GHES entry as well.

So is the ghes_edac_report_mem_error() call.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 18:17                           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-14 18:17 UTC (permalink / raw)
  To: bp; +Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Mon, 2017-08-14 at 20:05 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 05:52:25PM +0000, Kani, Toshimitsu wrote:
> > Yes, but this ACK is done per a GHES entry as well.
> 
> So is the ghes_edac_report_mem_error() call.

Right, ghes_edac_report_mem_error() gets serialized per a GHES entry,
but not globally.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 18:17                           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-14 18:17 UTC (permalink / raw)
  To: bp; +Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

T24gTW9uLCAyMDE3LTA4LTE0IGF0IDIwOjA1ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIE1vbiwgQXVnIDE0LCAyMDE3IGF0IDA1OjUyOjI1UE0gKzAwMDAsIEthbmksIFRvc2hp
bWl0c3Ugd3JvdGU6DQo+ID4gWWVzLCBidXQgdGhpcyBBQ0sgaXMgZG9uZSBwZXIgYSBHSEVTIGVu
dHJ5IGFzIHdlbGwuDQo+IA0KPiBTbyBpcyB0aGUgZ2hlc19lZGFjX3JlcG9ydF9tZW1fZXJyb3Io
KSBjYWxsLg0KDQpSaWdodCwgZ2hlc19lZGFjX3JlcG9ydF9tZW1fZXJyb3IoKSBnZXRzIHNlcmlh
bGl6ZWQgcGVyIGEgR0hFUyBlbnRyeSwNCmJ1dCBub3QgZ2xvYmFsbHkuDQoNClRoYW5rcywNCi1U
b3NoaQ0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 18:35                             ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 18:35 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Mon, Aug 14, 2017 at 06:17:47PM +0000, Kani, Toshimitsu wrote:
> Right, ghes_edac_report_mem_error() gets serialized per a GHES entry,
> but not globally.

Globally what?

What is the actual potential scenario for concurrency issues you see?
Example pls.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 18:35                             ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 18:35 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Mon, Aug 14, 2017 at 06:17:47PM +0000, Kani, Toshimitsu wrote:
> Right, ghes_edac_report_mem_error() gets serialized per a GHES entry,
> but not globally.

Globally what?

What is the actual potential scenario for concurrency issues you see?
Example pls.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 19:02                               ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-14 19:02 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Mon, 2017-08-14 at 20:35 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 06:17:47PM +0000, Kani, Toshimitsu wrote:
> > Right, ghes_edac_report_mem_error() gets serialized per a GHES
> > entry, but not globally.
> 
> Globally what?

GHES v2's ACK is not a global lock.  So, it does not guarantee that
ghes_edac_report_mem_error() never gets called concurrently.

> What is the actual potential scenario for concurrency issues you see?
> Example pls.

ghes_probe() supports multiple sources defined in
acpi_hest_notify_types.  Say, there are two entries for memory errors,
one with ACPI_HEST_NOTIFY_EXTERNAL and the other with
ACPI_HEST_NOTIFY_SCI.  They may report errors independently.  While
ghes_edac_report_mem_error() is being called from the SCI, it can be
called from the ext interrupt at a same time.

I do not know how likely we see such case, but the code should be
written according to the spec.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 19:02                               ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-14 19:02 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Mon, 2017-08-14 at 20:35 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 06:17:47PM +0000, Kani, Toshimitsu wrote:
> > Right, ghes_edac_report_mem_error() gets serialized per a GHES
> > entry, but not globally.
> 
> Globally what?

GHES v2's ACK is not a global lock.  So, it does not guarantee that
ghes_edac_report_mem_error() never gets called concurrently.

> What is the actual potential scenario for concurrency issues you see?
> Example pls.

ghes_probe() supports multiple sources defined in
acpi_hest_notify_types.  Say, there are two entries for memory errors,
one with ACPI_HEST_NOTIFY_EXTERNAL and the other with
ACPI_HEST_NOTIFY_SCI.  They may report errors independently.  While
ghes_edac_report_mem_error() is being called from the SCI, it can be
called from the ext interrupt at a same time.

I do not know how likely we see such case, but the code should be
written according to the spec.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 19:34                                 ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 19:34 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Mon, Aug 14, 2017 at 07:02:15PM +0000, Kani, Toshimitsu wrote:
> I do not know how likely we see such case, but the code should be
> written according to the spec.

Well, then you'll have to make ghes_edac_report_mem_error() reentrant.
Which doesn't look that hard as the only thing it really needs from
struct ghes_edac_pvt are those string buffers. I guess you can try to do
the simplest thing first and allocate them on the stack.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 19:34                                 ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 19:34 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Mon, Aug 14, 2017 at 07:02:15PM +0000, Kani, Toshimitsu wrote:
> I do not know how likely we see such case, but the code should be
> written according to the spec.

Well, then you'll have to make ghes_edac_report_mem_error() reentrant.
Which doesn't look that hard as the only thing it really needs from
struct ghes_edac_pvt are those string buffers. I guess you can try to do
the simplest thing first and allocate them on the stack.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 20:17                                   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-14 20:17 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, 2017-08-14 at 21:34 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 07:02:15PM +0000, Kani, Toshimitsu wrote:
> > I do not know how likely we see such case, but the code should be
> > written according to the spec.
> 
> Well, then you'll have to make ghes_edac_report_mem_error()
> reentrant. Which doesn't look that hard as the only thing it really
> needs from struct ghes_edac_pvt are those string buffers. I guess you
> can try to do the simplest thing first and allocate them on the
> stack.

ghes_edac_report_mem_error() is reentrant as it is now.  I think the
current code design of allocating mci & ghes_edac_pvt for each GHES
source entry makes sense.  edac_raw_mc_handle_error() also has the same
  expectation that the call is serialized per mci.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 20:17                                   ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-14 20:17 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, 2017-08-14 at 21:34 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 07:02:15PM +0000, Kani, Toshimitsu wrote:
> > I do not know how likely we see such case, but the code should be
> > written according to the spec.
> 
> Well, then you'll have to make ghes_edac_report_mem_error()
> reentrant. Which doesn't look that hard as the only thing it really
> needs from struct ghes_edac_pvt are those string buffers. I guess you
> can try to do the simplest thing first and allocate them on the
> stack.

ghes_edac_report_mem_error() is reentrant as it is now.  I think the
current code design of allocating mci & ghes_edac_pvt for each GHES
source entry makes sense.  edac_raw_mc_handle_error() also has the same
  expectation that the call is serialized per mci.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 20:39                                     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 20:39 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, Aug 14, 2017 at 08:17:54PM +0000, Kani, Toshimitsu wrote:
> I think the current code design of allocating mci & ghes_edac_pvt for
> each GHES source entry makes sense.

And I don't.

> edac_raw_mc_handle_error() also has the same expectation that the call
> is serialized per mci.

There's no such thing as "per mci" if the driver scans *all DIMMs* per
register call. If it does it this way, then it is only one mci.

It is actually wrong right now because if you register more than one
mci and you do edac_inc_ce_error()/edac_inc_ue_error(), potentially
different counters get incremented for the same errors. Exactly because
each instance registered is *wrongly* responsible for all DIMMs on the
system.

So you either need to partition the DIMMs per mci (which I can't imagine
how it would work) or introduce locking when incrementing the mci->
counters.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-14 20:39                                     ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-14 20:39 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-acpi, lenb, mchehab, tony.luck, linux-kernel, linux-edac, rjw

On Mon, Aug 14, 2017 at 08:17:54PM +0000, Kani, Toshimitsu wrote:
> I think the current code design of allocating mci & ghes_edac_pvt for
> each GHES source entry makes sense.

And I don't.

> edac_raw_mc_handle_error() also has the same expectation that the call
> is serialized per mci.

There's no such thing as "per mci" if the driver scans *all DIMMs* per
register call. If it does it this way, then it is only one mci.

It is actually wrong right now because if you register more than one
mci and you do edac_inc_ce_error()/edac_inc_ue_error(), potentially
different counters get incremented for the same errors. Exactly because
each instance registered is *wrongly* responsible for all DIMMs on the
system.

So you either need to partition the DIMMs per mci (which I can't imagine
how it would work) or introduce locking when incrementing the mci->
counters.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 15:35                                       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-15 15:35 UTC (permalink / raw)
  To: bp; +Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Mon, 2017-08-14 at 22:39 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 08:17:54PM +0000, Kani, Toshimitsu wrote:
> > I think the current code design of allocating mci & ghes_edac_pvt
> > for each GHES source entry makes sense.
> 
> And I don't.
> 
> > edac_raw_mc_handle_error() also has the same expectation that the
> > call is serialized per mci.
> 
> There's no such thing as "per mci" if the driver scans *all DIMMs*
> per register call. If it does it this way, then it is only one mci.

ghes_edac instantiates an mci as a pseudo device representing a GHES
error source.  Each error source associates with all DIMMs, and may
report errors independently.  As ghes_edac is an GHES error-reporting
wrapper to edac, this abstraction makes sense.

> It is actually wrong right now because if you register more than one
> mci and you do edac_inc_ce_error()/edac_inc_ue_error(), potentially
> different counters get incremented for the same errors. Exactly
> because each instance registered is *wrongly* responsible for all
> DIMMs on the system.

I do not see a problem in having counters for each GHES error source. 
This is just statistics info, and ghes_edac does not expect any OS
action from the counters.

> So you either need to partition the DIMMs per mci (which I can't
> imagine how it would work) or introduce locking when incrementing the
> mci->counters.

I do not think changing the calling convention to edac library
interfaces is a good idea for a special case like ghes_edac.  Such
changes can be a burden for us going forward.  I think ghes_edac just
needs to work with the current prerequisite.

User apps like ras-mc-ctl works as expected for a given (not-so-great)
DIMM info from SMBIOS as well.  I do not see a probelm from user
perspective, either.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 15:35                                       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-15 15:35 UTC (permalink / raw)
  To: bp; +Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Mon, 2017-08-14 at 22:39 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 08:17:54PM +0000, Kani, Toshimitsu wrote:
> > I think the current code design of allocating mci & ghes_edac_pvt
> > for each GHES source entry makes sense.
> 
> And I don't.
> 
> > edac_raw_mc_handle_error() also has the same expectation that the
> > call is serialized per mci.
> 
> There's no such thing as "per mci" if the driver scans *all DIMMs*
> per register call. If it does it this way, then it is only one mci.

ghes_edac instantiates an mci as a pseudo device representing a GHES
error source.  Each error source associates with all DIMMs, and may
report errors independently.  As ghes_edac is an GHES error-reporting
wrapper to edac, this abstraction makes sense.

> It is actually wrong right now because if you register more than one
> mci and you do edac_inc_ce_error()/edac_inc_ue_error(), potentially
> different counters get incremented for the same errors. Exactly
> because each instance registered is *wrongly* responsible for all
> DIMMs on the system.

I do not see a problem in having counters for each GHES error source. 
This is just statistics info, and ghes_edac does not expect any OS
action from the counters.

> So you either need to partition the DIMMs per mci (which I can't
> imagine how it would work) or introduce locking when incrementing the
> mci->counters.

I do not think changing the calling convention to edac library
interfaces is a good idea for a special case like ghes_edac.  Such
changes can be a burden for us going forward.  I think ghes_edac just
needs to work with the current prerequisite.

User apps like ras-mc-ctl works as expected for a given (not-so-great)
DIMM info from SMBIOS as well.  I do not see a probelm from user
perspective, either.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 15:48                                         ` Luck, Tony
  0 siblings, 0 replies; 130+ messages in thread
From: Luck, Tony @ 2017-08-15 15:48 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: bp, rjw, lenb, mchehab, linux-kernel, linux-acpi, linux-edac

On Tue, Aug 15, 2017 at 08:35:51AM -0700, Kani, Toshimitsu wrote:
> User apps like ras-mc-ctl works as expected for a given (not-so-great)
> DIMM info from SMBIOS as well.  I do not see a probelm from user
> perspective, either.

Won't the user see all their DIMMs reported for each memory controller
under /sys/devices/system/edac/mc/mc*/dimm* ?

That sounds confusing.

-Tony

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 15:48                                         ` Luck, Tony
  0 siblings, 0 replies; 130+ messages in thread
From: Luck, Tony @ 2017-08-15 15:48 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: bp, rjw, lenb, mchehab, linux-kernel, linux-acpi, linux-edac

On Tue, Aug 15, 2017 at 08:35:51AM -0700, Kani, Toshimitsu wrote:
> User apps like ras-mc-ctl works as expected for a given (not-so-great)
> DIMM info from SMBIOS as well.  I do not see a probelm from user
> perspective, either.

Won't the user see all their DIMMs reported for each memory controller
under /sys/devices/system/edac/mc/mc*/dimm* ?

That sounds confusing.

-Tony
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 15:50                                         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-15 15:50 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Tue, Aug 15, 2017 at 03:35:51PM +0000, Kani, Toshimitsu wrote:
> ghes_edac instantiates an mci as a pseudo device representing a GHES
> error source.  Each error source associates with all DIMMs, and may
> report errors independently.  As ghes_edac is an GHES error-reporting
> wrapper to edac, this abstraction makes sense.

Bullshit.

An MCI is a memory controller descriptor. That doesn't fit the GHES
platform devices that get probed. GHES platform device != MCI. How many
times do I need to say this for it to get through to you?

> I do not see a problem in having counters for each GHES error source.

And the error counters of that "simulated" mci get incremented depending
on which pointer gets passed in from GHES? More bullshit.

> This is just statistics info, and ghes_edac does not expect any OS
> action from the counters.

So let me know if you don't want to do it and rather would prefer to
pointlessly debate. I certainly don't want to waste my time debating.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 15:50                                         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-15 15:50 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: rjw, lenb, mchehab, tony.luck, linux-kernel, linux-acpi, linux-edac

On Tue, Aug 15, 2017 at 03:35:51PM +0000, Kani, Toshimitsu wrote:
> ghes_edac instantiates an mci as a pseudo device representing a GHES
> error source.  Each error source associates with all DIMMs, and may
> report errors independently.  As ghes_edac is an GHES error-reporting
> wrapper to edac, this abstraction makes sense.

Bullshit.

An MCI is a memory controller descriptor. That doesn't fit the GHES
platform devices that get probed. GHES platform device != MCI. How many
times do I need to say this for it to get through to you?

> I do not see a problem in having counters for each GHES error source.

And the error counters of that "simulated" mci get incremented depending
on which pointer gets passed in from GHES? More bullshit.

> This is just statistics info, and ghes_edac does not expect any OS
> action from the counters.

So let me know if you don't want to do it and rather would prefer to
pointlessly debate. I certainly don't want to waste my time debating.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 15:53                                           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-15 15:53 UTC (permalink / raw)
  To: tony.luck; +Cc: linux-acpi, lenb, mchehab, linux-edac, linux-kernel, bp, rjw

On Tue, 2017-08-15 at 08:48 -0700, Luck, Tony wrote:
> On Tue, Aug 15, 2017 at 08:35:51AM -0700, Kani, Toshimitsu wrote:
> > User apps like ras-mc-ctl works as expected for a given (not-so-
> > great) DIMM info from SMBIOS as well.  I do not see a probelm from
> > user perspective, either.
> 
> Won't the user see all their DIMMs reported for each memory
> controller under /sys/devices/system/edac/mc/mc*/dimm* ?
> 
> That sounds confusing.

ghes_edac only fills dimm_info to the first mci.  So, users do not see
duplicated info.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 15:53                                           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-15 15:53 UTC (permalink / raw)
  To: tony.luck; +Cc: linux-acpi, lenb, mchehab, linux-edac, linux-kernel, bp, rjw

On Tue, 2017-08-15 at 08:48 -0700, Luck, Tony wrote:
> On Tue, Aug 15, 2017 at 08:35:51AM -0700, Kani, Toshimitsu wrote:
> > User apps like ras-mc-ctl works as expected for a given (not-so-
> > great) DIMM info from SMBIOS as well.  I do not see a probelm from
> > user perspective, either.
> 
> Won't the user see all their DIMMs reported for each memory
> controller under /sys/devices/system/edac/mc/mc*/dimm* ?
> 
> That sounds confusing.

ghes_edac only fills dimm_info to the first mci.  So, users do not see
duplicated info.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 16:19                                           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-15 16:19 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Tue, 2017-08-15 at 17:50 +0200, Borislav Petkov wrote:
> On Tue, Aug 15, 2017 at 03:35:51PM +0000, Kani, Toshimitsu wrote:
> > ghes_edac instantiates an mci as a pseudo device representing a
> > GHES error source.  Each error source associates with all DIMMs,
> > and may report errors independently.  As ghes_edac is an GHES
> > error-reporting wrapper to edac, this abstraction makes sense.
> 
> Bullshit.
> 
> An MCI is a memory controller descriptor. That doesn't fit the GHES
> platform devices that get probed. GHES platform device != MCI. How
> many times do I need to say this for it to get through to you?

Right, but it has to be a "pseudo" device for ghes_edac.  There is no
memory controller info available.  A single mci does not make it a real
memory controller, either.

> > I do not see a problem in having counters for each GHES error
> > source.
> 
> And the error counters of that "simulated" mci get incremented
> depending on which pointer gets passed in from GHES? More bullshit.
>
> > This is just statistics info, and ghes_edac does not expect any OS
> > action from the counters.
> 
> So let me know if you don't want to do it and rather would prefer to
> pointlessly debate. I certainly don't want to waste my time debating.

Yes, ghes_edac refactoring like this should be considered a separate
item.  My patchset is aimed to introduce a platform-check to attach
ghes_edac on supported platforms.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-15 16:19                                           ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-15 16:19 UTC (permalink / raw)
  To: bp; +Cc: linux-edac, lenb, mchehab, tony.luck, linux-kernel, rjw, linux-acpi

On Tue, 2017-08-15 at 17:50 +0200, Borislav Petkov wrote:
> On Tue, Aug 15, 2017 at 03:35:51PM +0000, Kani, Toshimitsu wrote:
> > ghes_edac instantiates an mci as a pseudo device representing a
> > GHES error source.  Each error source associates with all DIMMs,
> > and may report errors independently.  As ghes_edac is an GHES
> > error-reporting wrapper to edac, this abstraction makes sense.
> 
> Bullshit.
> 
> An MCI is a memory controller descriptor. That doesn't fit the GHES
> platform devices that get probed. GHES platform device != MCI. How
> many times do I need to say this for it to get through to you?

Right, but it has to be a "pseudo" device for ghes_edac.  There is no
memory controller info available.  A single mci does not make it a real
memory controller, either.

> > I do not see a problem in having counters for each GHES error
> > source.
> 
> And the error counters of that "simulated" mci get incremented
> depending on which pointer gets passed in from GHES? More bullshit.
>
> > This is just statistics info, and ghes_edac does not expect any OS
> > action from the counters.
> 
> So let me know if you don't want to do it and rather would prefer to
> pointlessly debate. I certainly don't want to waste my time debating.

Yes, ghes_edac refactoring like this should be considered a separate
item.  My patchset is aimed to introduce a platform-check to attach
ghes_edac on supported platforms.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16  8:29                                           ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16  8:29 UTC (permalink / raw)
  To: Luck, Tony, Steven Rostedt
  Cc: Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel, linux-acpi,
	linux-edac

On Tue, Aug 15, 2017 at 08:48:16AM -0700, Luck, Tony wrote:
> Won't the user see all their DIMMs reported for each memory controller
> under /sys/devices/system/edac/mc/mc*/dimm* ?
> 
> That sounds confusing.

Right, and adding the locking was really easy. If only people would
debate less and actually try to do what they're being advised to.
But not really: if you wanna have something done, you have to do it
yourself.

Anyway, I think I have a box to run it to, lemme go find it.

Steve, pls check my locking. It looks straightforward to me but I might
be missing some corner case.

Thx.

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..a22fabef4791 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,10 +28,14 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static struct ghes_edac_pvt *ghes_pvt;
 
+/*
+ * Sync with other, potentially concurrent callers of
+ * ghes_edac_report_mem_error(). We don't know what the
+ * "inventive" firmware would do.
+ */
+static DEFINE_SPINLOCK(ghes_lock);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -169,14 +173,11 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
+	unsigned long flags;
 	char *p;
 	u8 grain_bits;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
 		pr_err("Internal error: Can't find EDAC structure\n");
 		return;
@@ -398,8 +399,16 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
 		       grain_bits, e->syndrome, pvt->detail_location);
 
-	/* Report the error via EDAC API */
+	/*
+	 * We can do the locking below because GHES defers error processing
+	 * from NMI to IRQ context. Whenever that changes, we'd at least
+	 * know.
+	 */
+	WARN_ON_ONCE(in_nmi());
+
+	spin_lock_irqsave(&ghes_lock, flags);
 	edac_raw_mc_handle_error(type, mci, e);
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
@@ -409,9 +418,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
+	/*
+	 * We have only one logical memory controller to which all DIMMs belong.
+	 */
+	if (ghes_pvt)
+		return 0;
+
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
@@ -425,26 +439,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	/*
-	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
-	 * to avoid duplicated memory controller numbers
-	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
-	pvt->mci  = mci;
-	mci->pdev = dev;
+	ghes_pvt	= mci->pvt_info;
+	ghes_pvt->ghes	= ghes;
+	ghes_pvt->mci	= mci;
 
+	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -452,36 +457,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
+	if (!fake) {
+		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+		pr_info("to correct its BIOS.\n");
+		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+	} else {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {
-		/*
-		 * Fill DIMM info from DMI for the memory controller #0
-		 *
-		 * Keep it in blank for the other memory controllers, as
-		 * there's no reliable way to properly credit each DIMM to
-		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
-		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -497,12 +489,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
-
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ghes_edac_register);
@@ -510,15 +498,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	mci = ghes_pvt->mci;
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_unregister);

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16  8:29                                           ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16  8:29 UTC (permalink / raw)
  To: Luck, Tony, Steven Rostedt
  Cc: Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel, linux-acpi,
	linux-edac

On Tue, Aug 15, 2017 at 08:48:16AM -0700, Luck, Tony wrote:
> Won't the user see all their DIMMs reported for each memory controller
> under /sys/devices/system/edac/mc/mc*/dimm* ?
> 
> That sounds confusing.

Right, and adding the locking was really easy. If only people would
debate less and actually try to do what they're being advised to.
But not really: if you wanna have something done, you have to do it
yourself.

Anyway, I think I have a box to run it to, lemme go find it.

Steve, pls check my locking. It looks straightforward to me but I might
be missing some corner case.

Thx.

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..a22fabef4791 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,10 +28,14 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static struct ghes_edac_pvt *ghes_pvt;
 
+/*
+ * Sync with other, potentially concurrent callers of
+ * ghes_edac_report_mem_error(). We don't know what the
+ * "inventive" firmware would do.
+ */
+static DEFINE_SPINLOCK(ghes_lock);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -169,14 +173,11 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
+	unsigned long flags;
 	char *p;
 	u8 grain_bits;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
 		pr_err("Internal error: Can't find EDAC structure\n");
 		return;
@@ -398,8 +399,16 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
 		       grain_bits, e->syndrome, pvt->detail_location);
 
-	/* Report the error via EDAC API */
+	/*
+	 * We can do the locking below because GHES defers error processing
+	 * from NMI to IRQ context. Whenever that changes, we'd at least
+	 * know.
+	 */
+	WARN_ON_ONCE(in_nmi());
+
+	spin_lock_irqsave(&ghes_lock, flags);
 	edac_raw_mc_handle_error(type, mci, e);
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
@@ -409,9 +418,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
+	/*
+	 * We have only one logical memory controller to which all DIMMs belong.
+	 */
+	if (ghes_pvt)
+		return 0;
+
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
@@ -425,26 +439,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	/*
-	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
-	 * to avoid duplicated memory controller numbers
-	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
-	pvt->mci  = mci;
-	mci->pdev = dev;
+	ghes_pvt	= mci->pvt_info;
+	ghes_pvt->ghes	= ghes;
+	ghes_pvt->mci	= mci;
 
+	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -452,36 +457,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
+	if (!fake) {
+		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+		pr_info("to correct its BIOS.\n");
+		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+	} else {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {
-		/*
-		 * Fill DIMM info from DMI for the memory controller #0
-		 *
-		 * Keep it in blank for the other memory controllers, as
-		 * there's no reliable way to properly credit each DIMM to
-		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
-		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -497,12 +489,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
-
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ghes_edac_register);
@@ -510,15 +498,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	mci = ghes_pvt->mci;
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_unregister);

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 11:29                                             ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 11:29 UTC (permalink / raw)
  To: Luck, Tony, Steven Rostedt
  Cc: Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel, linux-acpi,
	linux-edac

On Wed, Aug 16, 2017 at 10:29:31AM +0200, Borislav Petkov wrote:
> Anyway, I think I have a box to run it to, lemme go find it.

Seems to boot.

It's a whole another story whether it actually works. :-)

Need some EINJ capabilities urgently but finding a box where it works
reliably is like finding gold.

[    7.784960] ERST: Failed to get Error Log Address Range.
[    7.795905] EDAC DEBUG: edac_mc_alloc: allocating 2444 bytes for mci data (16 dimms, 16 csrows/ch
annels)
[    7.815266] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports.
[    7.833372] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly.
[    7.850093] ghes_edac: So, the end result of using this driver varies from vendor to vendor.
[    7.867322] ghes_edac: If you find incorrect reports, please contact your hardware vendor
[    7.884034] ghes_edac: to correct its BIOS.
[    7.892599] ghes_edac: This system has 16 DIMM sockets.
[    7.903274] EDAC DEBUG: ghes_edac_dmidecode: DIMM8: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.920337] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    7.936532] EDAC DEBUG: ghes_edac_dmidecode: DIMM9: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.953591] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    7.969778] EDAC DEBUG: ghes_edac_dmidecode: DIMM10: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.987010] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.003199] EDAC DEBUG: ghes_edac_dmidecode: DIMM11: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.020432] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.036618] EDAC DEBUG: ghes_edac_dmidecode: DIMM12: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.053848] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.070038] EDAC DEBUG: ghes_edac_dmidecode: DIMM13: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.087268] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.103456] EDAC DEBUG: ghes_edac_dmidecode: DIMM14: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.120687] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.128053] tsc: Refined TSC clocksource calibration: 2099.999 MHz
[    8.128173] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1e452fc488e, max_idle_ns: 44
0795307124 ns
[    8.169800] EDAC DEBUG: ghes_edac_dmidecode: DIMM15: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.187043] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.203230] EDAC DEBUG: edac_mc_add_mc_with_groups: 
[    8.213363] EDAC DEBUG: edac_create_sysfs_mci_device: creating bus mc1
[    8.226643] EDAC DEBUG: edac_create_sysfs_mci_device: creating device mc1
[    8.240450] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm8, located at memory 8 
[    8.257362] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm8
[    8.272506] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm9, located at memory 9 
[    8.289409] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm9
[    8.304556] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm10, located at memory 10 
[    8.321808] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm10
[    8.337126] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm11, located at memory 11 
[    8.354377] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm11
[    8.369697] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm12, located at memory 12 
[    8.386946] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm12
[    8.402266] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm13, located at memory 13 
[    8.419516] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm13
[    8.434835] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm14, located at memory 14 
[    8.452094] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm14
[    8.467423] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm15, located at memory 15 
[    8.484674] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm15
[    8.499994] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow8
[    8.516215] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow9
[    8.532433] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow10
[    8.548824] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow11
[    8.565203] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow12
[    8.581585] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow13
[    8.597964] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow14
[    8.614347] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow15
[    8.630744] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[    8.650073] [Firmware Warn]: GHES: Poll interval is 0 for generic hardware error source: 1, disabled.
[    8.669012] GHES: APEI firmware first mode is enabled by WHEA _OSC.
[    8.681940] Serial: 8250/16550 driver, 32 ports, IRQ sharing disabled

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 11:29                                             ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 11:29 UTC (permalink / raw)
  To: Luck, Tony, Steven Rostedt
  Cc: Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel, linux-acpi,
	linux-edac

On Wed, Aug 16, 2017 at 10:29:31AM +0200, Borislav Petkov wrote:
> Anyway, I think I have a box to run it to, lemme go find it.

Seems to boot.

It's a whole another story whether it actually works. :-)

Need some EINJ capabilities urgently but finding a box where it works
reliably is like finding gold.

[    7.784960] ERST: Failed to get Error Log Address Range.
[    7.795905] EDAC DEBUG: edac_mc_alloc: allocating 2444 bytes for mci data (16 dimms, 16 csrows/ch
annels)
[    7.815266] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports.
[    7.833372] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly.
[    7.850093] ghes_edac: So, the end result of using this driver varies from vendor to vendor.
[    7.867322] ghes_edac: If you find incorrect reports, please contact your hardware vendor
[    7.884034] ghes_edac: to correct its BIOS.
[    7.892599] ghes_edac: This system has 16 DIMM sockets.
[    7.903274] EDAC DEBUG: ghes_edac_dmidecode: DIMM8: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.920337] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    7.936532] EDAC DEBUG: ghes_edac_dmidecode: DIMM9: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.953591] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    7.969778] EDAC DEBUG: ghes_edac_dmidecode: DIMM10: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.987010] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.003199] EDAC DEBUG: ghes_edac_dmidecode: DIMM11: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.020432] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.036618] EDAC DEBUG: ghes_edac_dmidecode: DIMM12: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.053848] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.070038] EDAC DEBUG: ghes_edac_dmidecode: DIMM13: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.087268] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.103456] EDAC DEBUG: ghes_edac_dmidecode: DIMM14: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.120687] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.128053] tsc: Refined TSC clocksource calibration: 2099.999 MHz
[    8.128173] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1e452fc488e, max_idle_ns: 44
0795307124 ns
[    8.169800] EDAC DEBUG: ghes_edac_dmidecode: DIMM15: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.187043] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.203230] EDAC DEBUG: edac_mc_add_mc_with_groups: 
[    8.213363] EDAC DEBUG: edac_create_sysfs_mci_device: creating bus mc1
[    8.226643] EDAC DEBUG: edac_create_sysfs_mci_device: creating device mc1
[    8.240450] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm8, located at memory 8 
[    8.257362] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm8
[    8.272506] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm9, located at memory 9 
[    8.289409] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm9
[    8.304556] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm10, located at memory 10 
[    8.321808] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm10
[    8.337126] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm11, located at memory 11 
[    8.354377] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm11
[    8.369697] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm12, located at memory 12 
[    8.386946] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm12
[    8.402266] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm13, located at memory 13 
[    8.419516] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm13
[    8.434835] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm14, located at memory 14 
[    8.452094] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm14
[    8.467423] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm15, located at memory 15 
[    8.484674] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm15
[    8.499994] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow8
[    8.516215] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow9
[    8.532433] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow10
[    8.548824] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow11
[    8.565203] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow12
[    8.581585] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow13
[    8.597964] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow14
[    8.614347] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow15
[    8.630744] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[    8.650073] [Firmware Warn]: GHES: Poll interval is 0 for generic hardware error source: 1, disabled.
[    8.669012] GHES: APEI firmware first mode is enabled by WHEA _OSC.
[    8.681940] Serial: 8250/16550 driver, 32 ports, IRQ sharing disabled

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 13:59                                             ` Steven Rostedt
  0 siblings, 0 replies; 130+ messages in thread
From: Steven Rostedt @ 2017-08-16 13:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel,
	linux-acpi, linux-edac

On Wed, 16 Aug 2017 10:29:31 +0200
Borislav Petkov <bp@alien8.de> wrote:


> ---
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 6f80eb65c26c..a22fabef4791 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -28,10 +28,14 @@ struct ghes_edac_pvt {
>  	char msg[80];
>  };
>  
> -static LIST_HEAD(ghes_reglist);
> -static DEFINE_MUTEX(ghes_edac_lock);
> -static int ghes_edac_mc_num;
> +static struct ghes_edac_pvt *ghes_pvt;
>  
> +/*
> + * Sync with other, potentially concurrent callers of
> + * ghes_edac_report_mem_error(). We don't know what the
> + * "inventive" firmware would do.
> + */
> +static DEFINE_SPINLOCK(ghes_lock);
>  
>  /* Memory Device - Type 17 of SMBIOS spec */
>  struct memdev_dmi_entry {
> @@ -169,14 +173,11 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = NULL;
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
> +	unsigned long flags;
>  	char *p;
>  	u8 grain_bits;
>  
> -	list_for_each_entry(pvt, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes)
> -			break;
> -	}
>  	if (!pvt) {
>  		pr_err("Internal error: Can't find EDAC structure\n");
>  		return;
> @@ -398,8 +399,16 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
>  		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
>  		       grain_bits, e->syndrome, pvt->detail_location);
>  
> -	/* Report the error via EDAC API */
> +	/*
> +	 * We can do the locking below because GHES defers error processing
> +	 * from NMI to IRQ context. Whenever that changes, we'd at least
> +	 * know.
> +	 */
> +	WARN_ON_ONCE(in_nmi());

Should the above be:

	if (WARN_ON_ONCE(in_nmi()))
		return;

To prevent a deadlock? Or do we not care?

> +
> +	spin_lock_irqsave(&ghes_lock, flags);
>  	edac_raw_mc_handle_error(type, mci, e);
> +	spin_unlock_irqrestore(&ghes_lock, flags);

The above looks fine, as long as there's nothing before it that needs
synchronization.

>  }
>  EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
>  
> @@ -409,9 +418,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
> -	struct ghes_edac_pvt *pvt;
>  	struct ghes_edac_dimm_fill dimm_fill;
>  
> +	/*
> +	 * We have only one logical memory controller to which all DIMMs belong.
> +	 */
> +	if (ghes_pvt)
> +		return 0;

What's the likelihood of two calls to ghes_edac_register being done
simultaneously?  Because two calls at the same time will get past this.

-- Steve


> +
>  	/* Get the number of DIMMs */
>  	dmi_walk(ghes_edac_count_dimms, &num_dimm);
>  
> @@ -425,26 +439,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	layers[0].size = num_dimm;
>  	layers[0].is_virt_csrow = true;
>  
> -	/*
> -	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
> -	 * to avoid duplicated memory controller numbers
> -	 */
> -	mutex_lock(&ghes_edac_lock);
> -	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
> -			    sizeof(*pvt));
> +	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
>  	if (!mci) {
>  		pr_info("Can't allocate memory for EDAC data\n");
> -		mutex_unlock(&ghes_edac_lock);
>  		return -ENOMEM;
>  	}
>  
> -	pvt = mci->pvt_info;
> -	memset(pvt, 0, sizeof(*pvt));
> -	list_add_tail(&pvt->list, &ghes_reglist);
> -	pvt->ghes = ghes;
> -	pvt->mci  = mci;
> -	mci->pdev = dev;
> +	ghes_pvt	= mci->pvt_info;
> +	ghes_pvt->ghes	= ghes;
> +	ghes_pvt->mci	= mci;
>  
> +	mci->pdev = dev;
>  	mci->mtype_cap = MEM_FLAG_EMPTY;
>  	mci->edac_ctl_cap = EDAC_FLAG_NONE;
>  	mci->edac_cap = EDAC_FLAG_NONE;
> @@ -452,36 +457,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	mci->ctl_name = "ghes_edac";
>  	mci->dev_name = "ghes";
>  
> -	if (!ghes_edac_mc_num) {
> -		if (!fake) {
> -			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
> -			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
> -			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
> -			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
> -			pr_info("to correct its BIOS.\n");
> -			pr_info("This system has %d DIMM sockets.\n",
> -				num_dimm);
> -		} else {
> -			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> -			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> -			pr_info("work on such system. Use this driver with caution\n");
> -		}
> +	if (!fake) {
> +		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
> +		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
> +		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
> +		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
> +		pr_info("to correct its BIOS.\n");
> +		pr_info("This system has %d DIMM sockets.\n", num_dimm);
> +	} else {
> +		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> +		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> +		pr_info("work on such system. Use this driver with caution\n");
>  	}
>  
>  	if (!fake) {
> -		/*
> -		 * Fill DIMM info from DMI for the memory controller #0
> -		 *
> -		 * Keep it in blank for the other memory controllers, as
> -		 * there's no reliable way to properly credit each DIMM to
> -		 * the memory controller, as different BIOSes fill the
> -		 * DMI bank location fields on different ways
> -		 */
> -		if (!ghes_edac_mc_num) {
> -			dimm_fill.count = 0;
> -			dimm_fill.mci = mci;
> -			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
> -		}
> +		dimm_fill.count = 0;
> +		dimm_fill.mci = mci;
> +		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
>  	} else {
>  		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
>  						       mci->n_layers, 0, 0, 0);
> @@ -497,12 +489,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	if (rc < 0) {
>  		pr_info("Can't register at EDAC core\n");
>  		edac_mc_free(mci);
> -		mutex_unlock(&ghes_edac_lock);
>  		return -ENODEV;
>  	}
> -
> -	ghes_edac_mc_num++;
> -	mutex_unlock(&ghes_edac_lock);
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(ghes_edac_register);
> @@ -510,15 +498,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
>  void ghes_edac_unregister(struct ghes *ghes)
>  {
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt, *tmp;
> -
> -	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes) {
> -			mci = pvt->mci;
> -			edac_mc_del_mc(mci->pdev);
> -			edac_mc_free(mci);
> -			list_del(&pvt->list);
> -		}
> -	}
> +
> +	mci = ghes_pvt->mci;
> +	edac_mc_del_mc(mci->pdev);
> +	edac_mc_free(mci);
>  }
>  EXPORT_SYMBOL_GPL(ghes_edac_unregister);
> 

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 13:59                                             ` Steven Rostedt
  0 siblings, 0 replies; 130+ messages in thread
From: Steven Rostedt @ 2017-08-16 13:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel,
	linux-acpi, linux-edac

On Wed, 16 Aug 2017 10:29:31 +0200
Borislav Petkov <bp@alien8.de> wrote:


> ---
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 6f80eb65c26c..a22fabef4791 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -28,10 +28,14 @@ struct ghes_edac_pvt {
>  	char msg[80];
>  };
>  
> -static LIST_HEAD(ghes_reglist);
> -static DEFINE_MUTEX(ghes_edac_lock);
> -static int ghes_edac_mc_num;
> +static struct ghes_edac_pvt *ghes_pvt;
>  
> +/*
> + * Sync with other, potentially concurrent callers of
> + * ghes_edac_report_mem_error(). We don't know what the
> + * "inventive" firmware would do.
> + */
> +static DEFINE_SPINLOCK(ghes_lock);
>  
>  /* Memory Device - Type 17 of SMBIOS spec */
>  struct memdev_dmi_entry {
> @@ -169,14 +173,11 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = NULL;
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
> +	unsigned long flags;
>  	char *p;
>  	u8 grain_bits;
>  
> -	list_for_each_entry(pvt, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes)
> -			break;
> -	}
>  	if (!pvt) {
>  		pr_err("Internal error: Can't find EDAC structure\n");
>  		return;
> @@ -398,8 +399,16 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
>  		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
>  		       grain_bits, e->syndrome, pvt->detail_location);
>  
> -	/* Report the error via EDAC API */
> +	/*
> +	 * We can do the locking below because GHES defers error processing
> +	 * from NMI to IRQ context. Whenever that changes, we'd at least
> +	 * know.
> +	 */
> +	WARN_ON_ONCE(in_nmi());

Should the above be:

	if (WARN_ON_ONCE(in_nmi()))
		return;

To prevent a deadlock? Or do we not care?

> +
> +	spin_lock_irqsave(&ghes_lock, flags);
>  	edac_raw_mc_handle_error(type, mci, e);
> +	spin_unlock_irqrestore(&ghes_lock, flags);

The above looks fine, as long as there's nothing before it that needs
synchronization.

>  }
>  EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
>  
> @@ -409,9 +418,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
> -	struct ghes_edac_pvt *pvt;
>  	struct ghes_edac_dimm_fill dimm_fill;
>  
> +	/*
> +	 * We have only one logical memory controller to which all DIMMs belong.
> +	 */
> +	if (ghes_pvt)
> +		return 0;

What's the likelihood of two calls to ghes_edac_register being done
simultaneously?  Because two calls at the same time will get past this.

-- Steve


> +
>  	/* Get the number of DIMMs */
>  	dmi_walk(ghes_edac_count_dimms, &num_dimm);
>  
> @@ -425,26 +439,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	layers[0].size = num_dimm;
>  	layers[0].is_virt_csrow = true;
>  
> -	/*
> -	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
> -	 * to avoid duplicated memory controller numbers
> -	 */
> -	mutex_lock(&ghes_edac_lock);
> -	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
> -			    sizeof(*pvt));
> +	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
>  	if (!mci) {
>  		pr_info("Can't allocate memory for EDAC data\n");
> -		mutex_unlock(&ghes_edac_lock);
>  		return -ENOMEM;
>  	}
>  
> -	pvt = mci->pvt_info;
> -	memset(pvt, 0, sizeof(*pvt));
> -	list_add_tail(&pvt->list, &ghes_reglist);
> -	pvt->ghes = ghes;
> -	pvt->mci  = mci;
> -	mci->pdev = dev;
> +	ghes_pvt	= mci->pvt_info;
> +	ghes_pvt->ghes	= ghes;
> +	ghes_pvt->mci	= mci;
>  
> +	mci->pdev = dev;
>  	mci->mtype_cap = MEM_FLAG_EMPTY;
>  	mci->edac_ctl_cap = EDAC_FLAG_NONE;
>  	mci->edac_cap = EDAC_FLAG_NONE;
> @@ -452,36 +457,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	mci->ctl_name = "ghes_edac";
>  	mci->dev_name = "ghes";
>  
> -	if (!ghes_edac_mc_num) {
> -		if (!fake) {
> -			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
> -			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
> -			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
> -			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
> -			pr_info("to correct its BIOS.\n");
> -			pr_info("This system has %d DIMM sockets.\n",
> -				num_dimm);
> -		} else {
> -			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> -			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> -			pr_info("work on such system. Use this driver with caution\n");
> -		}
> +	if (!fake) {
> +		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
> +		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
> +		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
> +		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
> +		pr_info("to correct its BIOS.\n");
> +		pr_info("This system has %d DIMM sockets.\n", num_dimm);
> +	} else {
> +		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> +		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> +		pr_info("work on such system. Use this driver with caution\n");
>  	}
>  
>  	if (!fake) {
> -		/*
> -		 * Fill DIMM info from DMI for the memory controller #0
> -		 *
> -		 * Keep it in blank for the other memory controllers, as
> -		 * there's no reliable way to properly credit each DIMM to
> -		 * the memory controller, as different BIOSes fill the
> -		 * DMI bank location fields on different ways
> -		 */
> -		if (!ghes_edac_mc_num) {
> -			dimm_fill.count = 0;
> -			dimm_fill.mci = mci;
> -			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
> -		}
> +		dimm_fill.count = 0;
> +		dimm_fill.mci = mci;
> +		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
>  	} else {
>  		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
>  						       mci->n_layers, 0, 0, 0);
> @@ -497,12 +489,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	if (rc < 0) {
>  		pr_info("Can't register at EDAC core\n");
>  		edac_mc_free(mci);
> -		mutex_unlock(&ghes_edac_lock);
>  		return -ENODEV;
>  	}
> -
> -	ghes_edac_mc_num++;
> -	mutex_unlock(&ghes_edac_lock);
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(ghes_edac_register);
> @@ -510,15 +498,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
>  void ghes_edac_unregister(struct ghes *ghes)
>  {
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt, *tmp;
> -
> -	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
> -		if (ghes == pvt->ghes) {
> -			mci = pvt->mci;
> -			edac_mc_del_mc(mci->pdev);
> -			edac_mc_free(mci);
> -			list_del(&pvt->list);
> -		}
> -	}
> +
> +	mci = ghes_pvt->mci;
> +	edac_mc_del_mc(mci->pdev);
> +	edac_mc_free(mci);
>  }
>  EXPORT_SYMBOL_GPL(ghes_edac_unregister);
>
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 14:03                                               ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 14:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luck, Tony, Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel,
	linux-acpi, linux-edac

On Wed, Aug 16, 2017 at 09:59:01AM -0400, Steven Rostedt wrote:
> Should the above be:
> 
> 	if (WARN_ON_ONCE(in_nmi()))
> 		return;
> 
> To prevent a deadlock? Or do we not care?

Yeah, better this way.

> What's the likelihood of two calls to ghes_edac_register being done
> simultaneously?  Because two calls at the same time will get past this.

Well, that thing gets called per GHES platform device and last time I
checked they do get probed back-to-back but I'll check that again.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 14:03                                               ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 14:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luck, Tony, Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel,
	linux-acpi, linux-edac

On Wed, Aug 16, 2017 at 09:59:01AM -0400, Steven Rostedt wrote:
> Should the above be:
> 
> 	if (WARN_ON_ONCE(in_nmi()))
> 		return;
> 
> To prevent a deadlock? Or do we not care?

Yeah, better this way.

> What's the likelihood of two calls to ghes_edac_register being done
> simultaneously?  Because two calls at the same time will get past this.

Well, that thing gets called per GHES platform device and last time I
checked they do get probed back-to-back but I'll check that again.

Thanks.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 14:22                                                 ` Steven Rostedt
  0 siblings, 0 replies; 130+ messages in thread
From: Steven Rostedt @ 2017-08-16 14:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel,
	linux-acpi, linux-edac

On Wed, 16 Aug 2017 16:03:50 +0200
Borislav Petkov <bp@alien8.de> wrote:

> > What's the likelihood of two calls to ghes_edac_register being done
> > simultaneously?  Because two calls at the same time will get past this.  
> 
> Well, that thing gets called per GHES platform device and last time I
> checked they do get probed back-to-back but I'll check that again.

Maybe keep that original mutex just in case.

-- Steve


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 14:22                                                 ` Steven Rostedt
  0 siblings, 0 replies; 130+ messages in thread
From: Steven Rostedt @ 2017-08-16 14:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel,
	linux-acpi, linux-edac

On Wed, 16 Aug 2017 16:03:50 +0200
Borislav Petkov <bp@alien8.de> wrote:

> > What's the likelihood of two calls to ghes_edac_register being done
> > simultaneously?  Because two calls at the same time will get past this.  
> 
> Well, that thing gets called per GHES platform device and last time I
> checked they do get probed back-to-back but I'll check that again.

Maybe keep that original mutex just in case.

-- Steve
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 15:26                                             ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-16 15:26 UTC (permalink / raw)
  To: tony.luck, rostedt, bp
  Cc: linux-edac, lenb, mchehab, linux-kernel, rjw, linux-acpi

On Wed, 2017-08-16 at 10:29 +0200, Borislav Petkov wrote:
> On Tue, Aug 15, 2017 at 08:48:16AM -0700, Luck, Tony wrote:
> > Won't the user see all their DIMMs reported for each memory
> > controller
> > under /sys/devices/system/edac/mc/mc*/dimm* ?
> > 
> > That sounds confusing.
> 
> Right, and adding the locking was really easy. If only people would
> debate less and actually try to do what they're being advised to.
> But not really: if you wanna have something done, you have to do it
> yourself.

Sorry, but I did not agree on allowing concurrent accesses to mci...

 /* Memory Device - Type 17 of SMBIOS spec */
>  struct memdev_dmi_entry {
> @@ -169,14 +173,11 @@ void ghes_edac_report_mem_error(struct ghes
> *ghes, int sev,
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = NULL;
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
> +	unsigned long flags;
>  	char *p;
>  	u8 grain_bits;

I believe you now need to protect from a race condition that a single
mci and pvt can be initialized / consumed from multiple threads.  This
protection is missing in your patch.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 15:26                                             ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-16 15:26 UTC (permalink / raw)
  To: tony.luck, rostedt, bp
  Cc: linux-edac, lenb, mchehab, linux-kernel, rjw, linux-acpi

On Wed, 2017-08-16 at 10:29 +0200, Borislav Petkov wrote:
> On Tue, Aug 15, 2017 at 08:48:16AM -0700, Luck, Tony wrote:
> > Won't the user see all their DIMMs reported for each memory
> > controller
> > under /sys/devices/system/edac/mc/mc*/dimm* ?
> > 
> > That sounds confusing.
> 
> Right, and adding the locking was really easy. If only people would
> debate less and actually try to do what they're being advised to.
> But not really: if you wanna have something done, you have to do it
> yourself.

Sorry, but I did not agree on allowing concurrent accesses to mci...

 /* Memory Device - Type 17 of SMBIOS spec */
>  struct memdev_dmi_entry {
> @@ -169,14 +173,11 @@ void ghes_edac_report_mem_error(struct ghes
> *ghes, int sev,
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = NULL;
> +	struct ghes_edac_pvt *pvt = ghes_pvt;
> +	unsigned long flags;
>  	char *p;
>  	u8 grain_bits;

I believe you now need to protect from a race condition that a single
mci and pvt can be initialized / consumed from multiple threads.  This
protection is missing in your patch.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 16:42                                               ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 16:42 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: tony.luck, rostedt, linux-edac, lenb, mchehab, linux-kernel, rjw,
	linux-acpi

On Wed, Aug 16, 2017 at 03:26:04PM +0000, Kani, Toshimitsu wrote:
> I believe you now need to protect from a race condition that a single
> mci and pvt can be initialized / consumed from multiple threads.  This
> protection is missing in your patch.

Easy. Done.

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..d4089c2980ef 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,10 +28,14 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static struct ghes_edac_pvt *ghes_pvt;
 
+/*
+ * Sync with other, potentially concurrent callers of
+ * ghes_edac_report_mem_error(). We don't know what the
+ * "inventive" firmware would do.
+ */
+static DEFINE_SPINLOCK(ghes_lock);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -169,18 +173,26 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
+	unsigned long flags;
 	char *p;
 	u8 grain_bits;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
 		pr_err("Internal error: Can't find EDAC structure\n");
 		return;
 	}
+
+	/*
+	 * We can do the locking below because GHES defers error processing
+	 * from NMI to IRQ context. Whenever that changes, we'd at least
+	 * know.
+	 */
+	if (WARN_ON_ONCE(in_nmi()))
+		return;
+
+	spin_lock_irqsave(&ghes_lock, flags);
+
 	mci = pvt->mci;
 	e = &mci->error_desc;
 
@@ -398,8 +410,8 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
 		       grain_bits, e->syndrome, pvt->detail_location);
 
-	/* Report the error via EDAC API */
 	edac_raw_mc_handle_error(type, mci, e);
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
@@ -409,9 +421,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
+	/*
+	 * We have only one logical memory controller to which all DIMMs belong.
+	 */
+	if (ghes_pvt)
+		return 0;
+
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
@@ -425,26 +442,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	/*
-	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
-	 * to avoid duplicated memory controller numbers
-	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
-	pvt->mci  = mci;
-	mci->pdev = dev;
+	ghes_pvt	= mci->pvt_info;
+	ghes_pvt->ghes	= ghes;
+	ghes_pvt->mci	= mci;
 
+	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -452,36 +460,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
+	if (!fake) {
+		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+		pr_info("to correct its BIOS.\n");
+		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+	} else {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {
-		/*
-		 * Fill DIMM info from DMI for the memory controller #0
-		 *
-		 * Keep it in blank for the other memory controllers, as
-		 * there's no reliable way to properly credit each DIMM to
-		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
-		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -497,12 +492,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
-
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ghes_edac_register);
@@ -510,15 +501,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	mci = ghes_pvt->mci;
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_unregister);
-- 
2.13.0


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 16:42                                               ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 16:42 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: tony.luck, rostedt, linux-edac, lenb, mchehab, linux-kernel, rjw,
	linux-acpi

On Wed, Aug 16, 2017 at 03:26:04PM +0000, Kani, Toshimitsu wrote:
> I believe you now need to protect from a race condition that a single
> mci and pvt can be initialized / consumed from multiple threads.  This
> protection is missing in your patch.

Easy. Done.

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..d4089c2980ef 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,10 +28,14 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static struct ghes_edac_pvt *ghes_pvt;
 
+/*
+ * Sync with other, potentially concurrent callers of
+ * ghes_edac_report_mem_error(). We don't know what the
+ * "inventive" firmware would do.
+ */
+static DEFINE_SPINLOCK(ghes_lock);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -169,18 +173,26 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
+	unsigned long flags;
 	char *p;
 	u8 grain_bits;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
 		pr_err("Internal error: Can't find EDAC structure\n");
 		return;
 	}
+
+	/*
+	 * We can do the locking below because GHES defers error processing
+	 * from NMI to IRQ context. Whenever that changes, we'd at least
+	 * know.
+	 */
+	if (WARN_ON_ONCE(in_nmi()))
+		return;
+
+	spin_lock_irqsave(&ghes_lock, flags);
+
 	mci = pvt->mci;
 	e = &mci->error_desc;
 
@@ -398,8 +410,8 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
 		       grain_bits, e->syndrome, pvt->detail_location);
 
-	/* Report the error via EDAC API */
 	edac_raw_mc_handle_error(type, mci, e);
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
@@ -409,9 +421,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
+	/*
+	 * We have only one logical memory controller to which all DIMMs belong.
+	 */
+	if (ghes_pvt)
+		return 0;
+
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
@@ -425,26 +442,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	/*
-	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
-	 * to avoid duplicated memory controller numbers
-	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
-	pvt->mci  = mci;
-	mci->pdev = dev;
+	ghes_pvt	= mci->pvt_info;
+	ghes_pvt->ghes	= ghes;
+	ghes_pvt->mci	= mci;
 
+	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -452,36 +460,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
+	if (!fake) {
+		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+		pr_info("to correct its BIOS.\n");
+		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+	} else {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {
-		/*
-		 * Fill DIMM info from DMI for the memory controller #0
-		 *
-		 * Keep it in blank for the other memory controllers, as
-		 * there's no reliable way to properly credit each DIMM to
-		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
-		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -497,12 +492,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
-
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ghes_edac_register);
@@ -510,15 +501,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	mci = ghes_pvt->mci;
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_unregister);

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 17:28                                                 ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-16 17:28 UTC (permalink / raw)
  To: bp
  Cc: linux-kernel, rostedt, mchehab, rjw, tony.luck, lenb, linux-acpi,
	linux-edac

On Wed, 2017-08-16 at 18:42 +0200, Borislav Petkov wrote:
> On Wed, Aug 16, 2017 at 03:26:04PM +0000, Kani, Toshimitsu wrote:
> > I believe you now need to protect from a race condition that a
> > single mci and pvt can be initialized / consumed from multiple
> > threads.  This protection is missing in your patch.
> 
> Easy. Done.

Assuming this big spinlock works, yes, this addresses my concern that
it does not allow concurrent accesses to mci. :-)

I will test the patch with an SCI when I got a chance.  I won't be able
to test other notification types or race conditions, though.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 17:28                                                 ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-16 17:28 UTC (permalink / raw)
  To: bp
  Cc: linux-kernel, rostedt, mchehab, rjw, tony.luck, lenb, linux-acpi,
	linux-edac

On Wed, 2017-08-16 at 18:42 +0200, Borislav Petkov wrote:
> On Wed, Aug 16, 2017 at 03:26:04PM +0000, Kani, Toshimitsu wrote:
> > I believe you now need to protect from a race condition that a
> > single mci and pvt can be initialized / consumed from multiple
> > threads.  This protection is missing in your patch.
> 
> Easy. Done.

Assuming this big spinlock works, yes, this addresses my concern that
it does not allow concurrent accesses to mci. :-)

I will test the patch with an SCI when I got a chance.  I won't be able
to test other notification types or race conditions, though.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 17:31                                                   ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 17:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luck, Tony, Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel,
	linux-acpi, linux-edac

On Wed, Aug 16, 2017 at 10:22:49AM -0400, Steven Rostedt wrote:
> Maybe keep that original mutex just in case.

Let's do the elegant thing:

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index d4089c2980ef..386e04a7bda0 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,6 +28,7 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
+static atomic_t ghes_init = ATOMIC_INIT(0);
 static struct ghes_edac_pvt *ghes_pvt;
 
 /*
@@ -426,7 +427,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	/*
 	 * We have only one logical memory controller to which all DIMMs belong.
 	 */
-	if (ghes_pvt)
+	if (atomic_inc_return(&ghes_init) > 1)
 		return 0;
 
 	/* Get the number of DIMMs */

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 17:31                                                   ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 17:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luck, Tony, Kani, Toshimitsu, rjw, lenb, mchehab, linux-kernel,
	linux-acpi, linux-edac

On Wed, Aug 16, 2017 at 10:22:49AM -0400, Steven Rostedt wrote:
> Maybe keep that original mutex just in case.

Let's do the elegant thing:

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index d4089c2980ef..386e04a7bda0 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,6 +28,7 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
+static atomic_t ghes_init = ATOMIC_INIT(0);
 static struct ghes_edac_pvt *ghes_pvt;
 
 /*
@@ -426,7 +427,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	/*
 	 * We have only one logical memory controller to which all DIMMs belong.
 	 */
-	if (ghes_pvt)
+	if (atomic_inc_return(&ghes_init) > 1)
 		return 0;
 
 	/* Get the number of DIMMs */

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 17:40                                                   ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 17:40 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-kernel, rostedt, mchehab, rjw, tony.luck, lenb, linux-acpi,
	linux-edac

On Wed, Aug 16, 2017 at 05:28:50PM +0000, Kani, Toshimitsu wrote:
> Assuming this big spinlock works, yes, this addresses my concern that

You mean, lengthy locked section. We can always switch to on-stack
buffers if there are issues or even to something more fancy like
genpool.

> I will test the patch with an SCI when I got a chance.  I won't be able
> to test other notification types or race conditions, though.

Thanks, here's the latest version with the atomic registration too.

---
 drivers/edac/ghes_edac.c | 116 +++++++++++++++++++++--------------------------
 1 file changed, 51 insertions(+), 65 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..386e04a7bda0 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,10 +28,15 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static atomic_t ghes_init = ATOMIC_INIT(0);
+static struct ghes_edac_pvt *ghes_pvt;
 
+/*
+ * Sync with other, potentially concurrent callers of
+ * ghes_edac_report_mem_error(). We don't know what the
+ * "inventive" firmware would do.
+ */
+static DEFINE_SPINLOCK(ghes_lock);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -169,18 +174,26 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
+	unsigned long flags;
 	char *p;
 	u8 grain_bits;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
 		pr_err("Internal error: Can't find EDAC structure\n");
 		return;
 	}
+
+	/*
+	 * We can do the locking below because GHES defers error processing
+	 * from NMI to IRQ context. Whenever that changes, we'd at least
+	 * know.
+	 */
+	if (WARN_ON_ONCE(in_nmi()))
+		return;
+
+	spin_lock_irqsave(&ghes_lock, flags);
+
 	mci = pvt->mci;
 	e = &mci->error_desc;
 
@@ -398,8 +411,8 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
 		       grain_bits, e->syndrome, pvt->detail_location);
 
-	/* Report the error via EDAC API */
 	edac_raw_mc_handle_error(type, mci, e);
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
@@ -409,9 +422,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
+	/*
+	 * We have only one logical memory controller to which all DIMMs belong.
+	 */
+	if (atomic_inc_return(&ghes_init) > 1)
+		return 0;
+
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
@@ -425,26 +443,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	/*
-	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
-	 * to avoid duplicated memory controller numbers
-	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
-	pvt->mci  = mci;
-	mci->pdev = dev;
+	ghes_pvt	= mci->pvt_info;
+	ghes_pvt->ghes	= ghes;
+	ghes_pvt->mci	= mci;
 
+	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -452,36 +461,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
+	if (!fake) {
+		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+		pr_info("to correct its BIOS.\n");
+		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+	} else {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {
-		/*
-		 * Fill DIMM info from DMI for the memory controller #0
-		 *
-		 * Keep it in blank for the other memory controllers, as
-		 * there's no reliable way to properly credit each DIMM to
-		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
-		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -497,12 +493,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
-
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ghes_edac_register);
@@ -510,15 +502,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	mci = ghes_pvt->mci;
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_unregister);
-- 
2.13.0

Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 17:40                                                   ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-16 17:40 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-kernel, rostedt, mchehab, rjw, tony.luck, lenb, linux-acpi,
	linux-edac

On Wed, Aug 16, 2017 at 05:28:50PM +0000, Kani, Toshimitsu wrote:
> Assuming this big spinlock works, yes, this addresses my concern that

You mean, lengthy locked section. We can always switch to on-stack
buffers if there are issues or even to something more fancy like
genpool.

> I will test the patch with an SCI when I got a chance.  I won't be able
> to test other notification types or race conditions, though.

Thanks, here's the latest version with the atomic registration too.
---
 drivers/edac/ghes_edac.c | 116 +++++++++++++++++++++--------------------------
 1 file changed, 51 insertions(+), 65 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..386e04a7bda0 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,10 +28,15 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static atomic_t ghes_init = ATOMIC_INIT(0);
+static struct ghes_edac_pvt *ghes_pvt;
 
+/*
+ * Sync with other, potentially concurrent callers of
+ * ghes_edac_report_mem_error(). We don't know what the
+ * "inventive" firmware would do.
+ */
+static DEFINE_SPINLOCK(ghes_lock);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -169,18 +174,26 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
+	unsigned long flags;
 	char *p;
 	u8 grain_bits;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
 		pr_err("Internal error: Can't find EDAC structure\n");
 		return;
 	}
+
+	/*
+	 * We can do the locking below because GHES defers error processing
+	 * from NMI to IRQ context. Whenever that changes, we'd at least
+	 * know.
+	 */
+	if (WARN_ON_ONCE(in_nmi()))
+		return;
+
+	spin_lock_irqsave(&ghes_lock, flags);
+
 	mci = pvt->mci;
 	e = &mci->error_desc;
 
@@ -398,8 +411,8 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
 		       grain_bits, e->syndrome, pvt->detail_location);
 
-	/* Report the error via EDAC API */
 	edac_raw_mc_handle_error(type, mci, e);
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
@@ -409,9 +422,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
+	/*
+	 * We have only one logical memory controller to which all DIMMs belong.
+	 */
+	if (atomic_inc_return(&ghes_init) > 1)
+		return 0;
+
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
@@ -425,26 +443,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	/*
-	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
-	 * to avoid duplicated memory controller numbers
-	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
-	pvt->mci  = mci;
-	mci->pdev = dev;
+	ghes_pvt	= mci->pvt_info;
+	ghes_pvt->ghes	= ghes;
+	ghes_pvt->mci	= mci;
 
+	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -452,36 +461,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
+	if (!fake) {
+		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+		pr_info("to correct its BIOS.\n");
+		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+	} else {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {
-		/*
-		 * Fill DIMM info from DMI for the memory controller #0
-		 *
-		 * Keep it in blank for the other memory controllers, as
-		 * there's no reliable way to properly credit each DIMM to
-		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
-		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -497,12 +493,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
-
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ghes_edac_register);
@@ -510,15 +502,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	mci = ghes_pvt->mci;
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_unregister);

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 18:01                                                     ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-16 18:01 UTC (permalink / raw)
  To: bp
  Cc: linux-kernel, rostedt, mchehab, rjw, tony.luck, lenb, linux-acpi,
	linux-edac

On Wed, 2017-08-16 at 19:40 +0200, Borislav Petkov wrote:
> On Wed, Aug 16, 2017 at 05:28:50PM +0000, Kani, Toshimitsu wrote:
> > Assuming this big spinlock works, yes, this addresses my concern
> > that
> 
> You mean, lengthy locked section. We can always switch to on-stack
> buffers if there are issues or even to something more fancy like
> genpool.

Yes, I meant the lengthy locked section.  I still think that multiple
mcis is a better approach for concurrency as it naturally addresses it.
But as long as edac func and error handlers are tolerant with this
spinlock and serialization, I am OK with that.

> > I will test the patch with an SCI when I got a chance.  I won't be
> > able to test other notification types or race conditions, though.
> 
> Thanks, here's the latest version with the atomic registration too.

Sure, I will test with this version.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-16 18:01                                                     ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-16 18:01 UTC (permalink / raw)
  To: bp
  Cc: linux-kernel, rostedt, mchehab, rjw, tony.luck, lenb, linux-acpi,
	linux-edac

On Wed, 2017-08-16 at 19:40 +0200, Borislav Petkov wrote:
> On Wed, Aug 16, 2017 at 05:28:50PM +0000, Kani, Toshimitsu wrote:
> > Assuming this big spinlock works, yes, this addresses my concern
> > that
> 
> You mean, lengthy locked section. We can always switch to on-stack
> buffers if there are issues or even to something more fancy like
> genpool.

Yes, I meant the lengthy locked section.  I still think that multiple
mcis is a better approach for concurrency as it naturally addresses it.
But as long as edac func and error handlers are tolerant with this
spinlock and serialization, I am OK with that.

> > I will test the patch with an SCI when I got a chance.  I won't be
> > able to test other notification types or race conditions, though.
> 
> Thanks, here's the latest version with the atomic registration too.

Sure, I will test with this version.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-17 21:08                                                       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Kani, Toshimitsu @ 2017-08-17 21:08 UTC (permalink / raw)
  To: bp
  Cc: linux-kernel, rostedt, mchehab, rjw, tony.luck, lenb, linux-acpi,
	linux-edac

On Wed, 2017-08-16 at 11:51 -0600, Toshi Kani wrote:
> On Wed, 2017-08-16 at 19:40 +0200, Borislav Petkov wrote:
> > On Wed, Aug 16, 2017 at 05:28:50PM +0000, Kani, Toshimitsu wrote:
 :
> > > I will test the patch with an SCI when I got a chance.  I won't
> > > be able to test other notification types or race conditions,
> > > though.
> > 
> > Thanks, here's the latest version with the atomic registration too.
> 
> Sure, I will test with this version.

I briefly tested the patch with an SCI. The error reporting part seems
to work fine.

There are two issues I noticed:

1. It creates mc0 and mc1.  
I think this is because you called edac_mc_alloc() with mc_num 1.

2. 'ras-mc-ctl --layout' does not show all DIMMs.

# ras-mc-ctl --layout
         +------------------------+
         |    mc0     |    mc1     |
---------+------------------------+
memory4: |     0 MB  |  16384 MB  |
memory3: |     0 MB  |     0 MB  |
memory2: |     0 MB  |     0 MB  |
memory1: |     0 MB  |     0 MB  |
memory0: |     0 MB  |  8192 MB  |
---------+-----------------------+

# ls /sys/bus/mc1/devices
csrow0   csrow12  csrow23  dimm0   dimm12  dimm23  mc1
csrow11  csrow19  csrow4   dimm11  dimm19  dimm4

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-17 21:08                                                       ` Toshi Kani
  0 siblings, 0 replies; 130+ messages in thread
From: Toshi Kani @ 2017-08-17 21:08 UTC (permalink / raw)
  To: bp
  Cc: linux-kernel, rostedt, mchehab, rjw, tony.luck, lenb, linux-acpi,
	linux-edac

On Wed, 2017-08-16 at 11:51 -0600, Toshi Kani wrote:
> On Wed, 2017-08-16 at 19:40 +0200, Borislav Petkov wrote:
> > On Wed, Aug 16, 2017 at 05:28:50PM +0000, Kani, Toshimitsu wrote:
 :
> > > I will test the patch with an SCI when I got a chance.  I won't
> > > be able to test other notification types or race conditions,
> > > though.
> > 
> > Thanks, here's the latest version with the atomic registration too.
> 
> Sure, I will test with this version.

I briefly tested the patch with an SCI. The error reporting part seems
to work fine.

There are two issues I noticed:

1. It creates mc0 and mc1.  
I think this is because you called edac_mc_alloc() with mc_num 1.

2. 'ras-mc-ctl --layout' does not show all DIMMs.

# ras-mc-ctl --layout
         +------------------------+
         |    mc0     |    mc1     |
---------+------------------------+
memory4: |     0 MB  |  16384 MB  |
memory3: |     0 MB  |     0 MB  |
memory2: |     0 MB  |     0 MB  |
memory1: |     0 MB  |     0 MB  |
memory0: |     0 MB  |  8192 MB  |
---------+-----------------------+

# ls /sys/bus/mc1/devices
csrow0   csrow12  csrow23  dimm0   dimm12  dimm23  mc1
csrow11  csrow19  csrow4   dimm11  dimm19  dimm4

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-21  9:29                                                         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-21  9:29 UTC (permalink / raw)
  To: Kani, Toshimitsu, mchehab
  Cc: linux-kernel, rostedt, rjw, tony.luck, lenb, linux-acpi, linux-edac

On Thu, Aug 17, 2017 at 09:08:40PM +0000, Kani, Toshimitsu wrote:
> 1. It creates mc0 and mc1.  
> I think this is because you called edac_mc_alloc() with mc_num 1.

Fixed, see below.

> 
> 2. 'ras-mc-ctl --layout' does not show all DIMMs.

Yap, that's strange.

$ grep . /sys/devices/system/edac/mc/mc0/dimm*/size
/sys/devices/system/edac/mc/mc0/dimm10/size:2048
/sys/devices/system/edac/mc/mc0/dimm11/size:2048
/sys/devices/system/edac/mc/mc0/dimm12/size:2048
/sys/devices/system/edac/mc/mc0/dimm13/size:2048
/sys/devices/system/edac/mc/mc0/dimm14/size:2048
/sys/devices/system/edac/mc/mc0/dimm15/size:2048
/sys/devices/system/edac/mc/mc0/dimm8/size:2048
/sys/devices/system/edac/mc/mc0/dimm9/size:2048
$ ras-mc-ctl --layout
         +-----------+
         |    mc0    |
---------+-----------+
memory9: |  2048 MB  |
memory8: |  2048 MB  |
---------+-----------+
memory7: |     0 MB  |
memory6: |     0 MB  |
---------+-----------+
memory5: |     0 MB  |
memory4: |     0 MB  |
---------+-----------+
memory3: |     0 MB  |
memory2: |     0 MB  |
---------+-----------+
memory1: |     0 MB  |
memory0: |     0 MB  |
---------+-----------+

the driver detects them correctly though:

[    7.900694] ghes_edac: This system has 16 DIMM sockets.
[    7.911366] EDAC DEBUG: ghes_edac_dmidecode: DIMM8: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.928437] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    7.944628] EDAC DEBUG: ghes_edac_dmidecode: DIMM9: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.961683] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    7.977871] EDAC DEBUG: ghes_edac_dmidecode: DIMM10: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.995105] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.011291] EDAC DEBUG: ghes_edac_dmidecode: DIMM11: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.028524] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.044712] EDAC DEBUG: ghes_edac_dmidecode: DIMM12: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.061942] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.078129] EDAC DEBUG: ghes_edac_dmidecode: DIMM13: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.095360] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.111547] EDAC DEBUG: ghes_edac_dmidecode: DIMM14: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.161703] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.177904] EDAC DEBUG: ghes_edac_dmidecode: DIMM15: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.195132] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.211321] EDAC DEBUG: edac_mc_add_mc_with_groups: 
[    8.221456] EDAC DEBUG: edac_create_sysfs_mci_device: creating bus mc0
[    8.234736] EDAC DEBUG: edac_create_sysfs_mci_device: creating device mc0
[    8.248545] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm8, located at memory 8 
[    8.265457] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm8
[    8.280601] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm9, located at memory 9 
[    8.297503] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm9
[    8.312650] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm10, located at memory 10 
[    8.329900] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm10
[    8.345220] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm11, located at memory 11 
[    8.362470] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm11
[    8.377789] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm12, located at memory 12 
[    8.395039] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm12
[    8.410358] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm13, located at memory 13 
[    8.427608] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm13
[    8.442928] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm14, located at memory 14 
[    8.460194] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm14
[    8.475517] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm15, located at memory 15 
[    8.492768] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm15

Mauro?

---
From: Borislav Petkov <bp@suse.de>
Date: Wed, 16 Aug 2017 10:33:44 +0200
Subject: [PATCH] EDAC, ghes: Model a single, logical memory controller

We're enumerating the DIMMs through a DMI walk and since we can't get
any more detailed topological information about which DIMMs belong to
which memory controller, convert it to a single, logical controller
which contains all the DIMMs.

The error reporting path from GHES ghes_edac_report_mem_error() doesn't
get called in NMI context but add a warning about it to catch any
changes in the future as if so, our locking scheme will be insufficient
then.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/edac/ghes_edac.c | 116 +++++++++++++++++++++--------------------------
 1 file changed, 51 insertions(+), 65 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..e790d64b8edd 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,10 +28,15 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static atomic_t ghes_init = ATOMIC_INIT(0);
+static struct ghes_edac_pvt *ghes_pvt;
 
+/*
+ * Sync with other, potentially concurrent callers of
+ * ghes_edac_report_mem_error(). We don't know what the
+ * "inventive" firmware would do.
+ */
+static DEFINE_SPINLOCK(ghes_lock);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -169,18 +174,26 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
+	unsigned long flags;
 	char *p;
 	u8 grain_bits;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
 		pr_err("Internal error: Can't find EDAC structure\n");
 		return;
 	}
+
+	/*
+	 * We can do the locking below because GHES defers error processing
+	 * from NMI to IRQ context. Whenever that changes, we'd at least
+	 * know.
+	 */
+	if (WARN_ON_ONCE(in_nmi()))
+		return;
+
+	spin_lock_irqsave(&ghes_lock, flags);
+
 	mci = pvt->mci;
 	e = &mci->error_desc;
 
@@ -398,8 +411,8 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
 		       grain_bits, e->syndrome, pvt->detail_location);
 
-	/* Report the error via EDAC API */
 	edac_raw_mc_handle_error(type, mci, e);
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
@@ -409,9 +422,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
+	/*
+	 * We have only one logical memory controller to which all DIMMs belong.
+	 */
+	if (atomic_inc_return(&ghes_init) > 1)
+		return 0;
+
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
@@ -425,26 +443,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	/*
-	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
-	 * to avoid duplicated memory controller numbers
-	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
-	pvt->mci  = mci;
-	mci->pdev = dev;
+	ghes_pvt	= mci->pvt_info;
+	ghes_pvt->ghes	= ghes;
+	ghes_pvt->mci	= mci;
 
+	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -452,36 +461,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
+	if (!fake) {
+		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+		pr_info("to correct its BIOS.\n");
+		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+	} else {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {
-		/*
-		 * Fill DIMM info from DMI for the memory controller #0
-		 *
-		 * Keep it in blank for the other memory controllers, as
-		 * there's no reliable way to properly credit each DIMM to
-		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
-		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -497,12 +493,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
-
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ghes_edac_register);
@@ -510,15 +502,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	mci = ghes_pvt->mci;
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_unregister);
-- 
2.13.0


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [v2,4/7] ghes_edac: avoid multiple calls to dmi_walk()
@ 2017-08-21  9:29                                                         ` Borislav Petkov
  0 siblings, 0 replies; 130+ messages in thread
From: Borislav Petkov @ 2017-08-21  9:29 UTC (permalink / raw)
  To: Kani, Toshimitsu, mchehab
  Cc: linux-kernel, rostedt, rjw, tony.luck, lenb, linux-acpi, linux-edac

On Thu, Aug 17, 2017 at 09:08:40PM +0000, Kani, Toshimitsu wrote:
> 1. It creates mc0 and mc1.  
> I think this is because you called edac_mc_alloc() with mc_num 1.

Fixed, see below.

> 
> 2. 'ras-mc-ctl --layout' does not show all DIMMs.

Yap, that's strange.

$ grep . /sys/devices/system/edac/mc/mc0/dimm*/size
/sys/devices/system/edac/mc/mc0/dimm10/size:2048
/sys/devices/system/edac/mc/mc0/dimm11/size:2048
/sys/devices/system/edac/mc/mc0/dimm12/size:2048
/sys/devices/system/edac/mc/mc0/dimm13/size:2048
/sys/devices/system/edac/mc/mc0/dimm14/size:2048
/sys/devices/system/edac/mc/mc0/dimm15/size:2048
/sys/devices/system/edac/mc/mc0/dimm8/size:2048
/sys/devices/system/edac/mc/mc0/dimm9/size:2048
$ ras-mc-ctl --layout
         +-----------+
         |    mc0    |
---------+-----------+
memory9: |  2048 MB  |
memory8: |  2048 MB  |
---------+-----------+
memory7: |     0 MB  |
memory6: |     0 MB  |
---------+-----------+
memory5: |     0 MB  |
memory4: |     0 MB  |
---------+-----------+
memory3: |     0 MB  |
memory2: |     0 MB  |
---------+-----------+
memory1: |     0 MB  |
memory0: |     0 MB  |
---------+-----------+

the driver detects them correctly though:

[    7.900694] ghes_edac: This system has 16 DIMM sockets.
[    7.911366] EDAC DEBUG: ghes_edac_dmidecode: DIMM8: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.928437] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    7.944628] EDAC DEBUG: ghes_edac_dmidecode: DIMM9: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.961683] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    7.977871] EDAC DEBUG: ghes_edac_dmidecode: DIMM10: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    7.995105] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.011291] EDAC DEBUG: ghes_edac_dmidecode: DIMM11: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.028524] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.044712] EDAC DEBUG: ghes_edac_dmidecode: DIMM12: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.061942] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.078129] EDAC DEBUG: ghes_edac_dmidecode: DIMM13: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.095360] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.111547] EDAC DEBUG: ghes_edac_dmidecode: DIMM14: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.161703] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.177904] EDAC DEBUG: ghes_edac_dmidecode: DIMM15: Unbuffered DDR3 RAM size = 2048 MB(ECC)
[    8.195132] EDAC DEBUG: ghes_edac_dmidecode:         type 24, detail 0x80, width 72(total 64)
[    8.211321] EDAC DEBUG: edac_mc_add_mc_with_groups: 
[    8.221456] EDAC DEBUG: edac_create_sysfs_mci_device: creating bus mc0
[    8.234736] EDAC DEBUG: edac_create_sysfs_mci_device: creating device mc0
[    8.248545] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm8, located at memory 8 
[    8.265457] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm8
[    8.280601] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm9, located at memory 9 
[    8.297503] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm9
[    8.312650] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm10, located at memory 10 
[    8.329900] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm10
[    8.345220] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm11, located at memory 11 
[    8.362470] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm11
[    8.377789] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm12, located at memory 12 
[    8.395039] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm12
[    8.410358] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm13, located at memory 13 
[    8.427608] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm13
[    8.442928] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm14, located at memory 14 
[    8.460194] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm14
[    8.475517] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm15, located at memory 15 
[    8.492768] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device dimm15

Mauro?
---
From: Borislav Petkov <bp@suse.de>
Date: Wed, 16 Aug 2017 10:33:44 +0200
Subject: [PATCH] EDAC, ghes: Model a single, logical memory controller

We're enumerating the DIMMs through a DMI walk and since we can't get
any more detailed topological information about which DIMMs belong to
which memory controller, convert it to a single, logical controller
which contains all the DIMMs.

The error reporting path from GHES ghes_edac_report_mem_error() doesn't
get called in NMI context but add a warning about it to catch any
changes in the future as if so, our locking scheme will be insufficient
then.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/edac/ghes_edac.c | 116 +++++++++++++++++++++--------------------------
 1 file changed, 51 insertions(+), 65 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6f80eb65c26c..e790d64b8edd 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -28,10 +28,15 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static LIST_HEAD(ghes_reglist);
-static DEFINE_MUTEX(ghes_edac_lock);
-static int ghes_edac_mc_num;
+static atomic_t ghes_init = ATOMIC_INIT(0);
+static struct ghes_edac_pvt *ghes_pvt;
 
+/*
+ * Sync with other, potentially concurrent callers of
+ * ghes_edac_report_mem_error(). We don't know what the
+ * "inventive" firmware would do.
+ */
+static DEFINE_SPINLOCK(ghes_lock);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -169,18 +174,26 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = NULL;
+	struct ghes_edac_pvt *pvt = ghes_pvt;
+	unsigned long flags;
 	char *p;
 	u8 grain_bits;
 
-	list_for_each_entry(pvt, &ghes_reglist, list) {
-		if (ghes == pvt->ghes)
-			break;
-	}
 	if (!pvt) {
 		pr_err("Internal error: Can't find EDAC structure\n");
 		return;
 	}
+
+	/*
+	 * We can do the locking below because GHES defers error processing
+	 * from NMI to IRQ context. Whenever that changes, we'd at least
+	 * know.
+	 */
+	if (WARN_ON_ONCE(in_nmi()))
+		return;
+
+	spin_lock_irqsave(&ghes_lock, flags);
+
 	mci = pvt->mci;
 	e = &mci->error_desc;
 
@@ -398,8 +411,8 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
 		       grain_bits, e->syndrome, pvt->detail_location);
 
-	/* Report the error via EDAC API */
 	edac_raw_mc_handle_error(type, mci, e);
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
@@ -409,9 +422,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
 
+	/*
+	 * We have only one logical memory controller to which all DIMMs belong.
+	 */
+	if (atomic_inc_return(&ghes_init) > 1)
+		return 0;
+
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
@@ -425,26 +443,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	/*
-	 * We need to serialize edac_mc_alloc() and edac_mc_add_mc(),
-	 * to avoid duplicated memory controller numbers
-	 */
-	mutex_lock(&ghes_edac_lock);
-	mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers,
-			    sizeof(*pvt));
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		mutex_unlock(&ghes_edac_lock);
 		return -ENOMEM;
 	}
 
-	pvt = mci->pvt_info;
-	memset(pvt, 0, sizeof(*pvt));
-	list_add_tail(&pvt->list, &ghes_reglist);
-	pvt->ghes = ghes;
-	pvt->mci  = mci;
-	mci->pdev = dev;
+	ghes_pvt	= mci->pvt_info;
+	ghes_pvt->ghes	= ghes;
+	ghes_pvt->mci	= mci;
 
+	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -452,36 +461,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!ghes_edac_mc_num) {
-		if (!fake) {
-			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
-			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
-			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
-			pr_info("If you find incorrect reports, please contact your hardware vendor\n");
-			pr_info("to correct its BIOS.\n");
-			pr_info("This system has %d DIMM sockets.\n",
-				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
-		}
+	if (!fake) {
+		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+		pr_info("to correct its BIOS.\n");
+		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+	} else {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {
-		/*
-		 * Fill DIMM info from DMI for the memory controller #0
-		 *
-		 * Keep it in blank for the other memory controllers, as
-		 * there's no reliable way to properly credit each DIMM to
-		 * the memory controller, as different BIOSes fill the
-		 * DMI bank location fields on different ways
-		 */
-		if (!ghes_edac_mc_num) {
-			dimm_fill.count = 0;
-			dimm_fill.mci = mci;
-			dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-		}
+		dimm_fill.count = 0;
+		dimm_fill.mci = mci;
+		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
 		struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 						       mci->n_layers, 0, 0, 0);
@@ -497,12 +493,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		mutex_unlock(&ghes_edac_lock);
 		return -ENODEV;
 	}
-
-	ghes_edac_mc_num++;
-	mutex_unlock(&ghes_edac_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ghes_edac_register);
@@ -510,15 +502,9 @@ EXPORT_SYMBOL_GPL(ghes_edac_register);
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt, *tmp;
-
-	list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) {
-		if (ghes == pvt->ghes) {
-			mci = pvt->mci;
-			edac_mc_del_mc(mci->pdev);
-			edac_mc_free(mci);
-			list_del(&pvt->list);
-		}
-	}
+
+	mci = ghes_pvt->mci;
+	edac_mc_del_mc(mci->pdev);
+	edac_mc_free(mci);
 }
 EXPORT_SYMBOL_GPL(ghes_edac_unregister);

^ permalink raw reply related	[flat|nested] 130+ messages in thread

end of thread, other threads:[~2017-08-21  9:29 UTC | newest]

Thread overview: 130+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-03 21:57 [PATCH v2 0/7] enable ghes_edac on selected platforms Toshi Kani
2017-08-03 21:57 ` [PATCH v2 1/7] ACPI / blacklist: add acpi_match_oemlist() interface Toshi Kani
2017-08-03 21:57   ` [v2,1/7] " Toshi Kani
2017-08-04  3:42   ` [PATCH v2 1/7] " Borislav Petkov
2017-08-04  3:42     ` [v2,1/7] " Borislav Petkov
2017-08-04 20:39     ` [PATCH v2 1/7] " Kani, Toshimitsu
2017-08-04 20:39       ` [v2,1/7] " Toshi Kani
2017-08-05  5:12       ` [PATCH v2 1/7] " Borislav Petkov
2017-08-05  5:12         ` [v2,1/7] " Borislav Petkov
2017-08-07 14:49         ` [PATCH v2 1/7] " Kani, Toshimitsu
2017-08-07 14:49           ` [v2,1/7] " Toshi Kani
2017-08-03 21:57 ` [PATCH v2 2/7] intel_pstate: convert to use acpi_match_oemlist() Toshi Kani
2017-08-03 21:57   ` [v2,2/7] " Toshi Kani
2017-08-03 21:57 ` [PATCH v2 3/7] ACPI / APEI: add OSC APEI bit check for ghes_edac Toshi Kani
2017-08-03 21:57   ` [v2,3/7] " Toshi Kani
2017-08-04  3:44   ` [PATCH v2 3/7] " Borislav Petkov
2017-08-04  3:44     ` [v2,3/7] " Borislav Petkov
2017-08-04 20:49     ` [PATCH v2 3/7] " Kani, Toshimitsu
2017-08-04 20:49       ` [v2,3/7] " Toshi Kani
2017-08-05  5:14       ` [PATCH v2 3/7] " Borislav Petkov
2017-08-05  5:14         ` [v2,3/7] " Borislav Petkov
2017-08-07 14:50         ` [PATCH v2 3/7] " Kani, Toshimitsu
2017-08-07 14:50           ` [v2,3/7] " Toshi Kani
2017-08-03 21:57 ` [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk() Toshi Kani
2017-08-03 21:57   ` [v2,4/7] " Toshi Kani
2017-08-04  4:05   ` [PATCH v2 4/7] " Borislav Petkov
2017-08-04  4:05     ` [v2,4/7] " Borislav Petkov
2017-08-04 21:02     ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-04 21:02       ` [v2,4/7] " Toshi Kani
2017-08-05  5:16       ` [PATCH v2 4/7] " Borislav Petkov
2017-08-05  5:16         ` [v2,4/7] " Borislav Petkov
2017-08-07 17:59         ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-07 17:59           ` [v2,4/7] " Toshi Kani
2017-08-11  9:04           ` [PATCH v2 4/7] " Borislav Petkov
2017-08-11  9:04             ` [v2,4/7] " Borislav Petkov
2017-08-14 15:57             ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-14 15:57               ` [v2,4/7] " Toshi Kani
2017-08-14 16:24               ` [PATCH v2 4/7] " Borislav Petkov
2017-08-14 16:24                 ` [v2,4/7] " Borislav Petkov
2017-08-14 16:48                 ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-14 16:48                   ` [v2,4/7] " Toshi Kani
2017-08-14 17:05                   ` [PATCH v2 4/7] " Borislav Petkov
2017-08-14 17:05                     ` [v2,4/7] " Borislav Petkov
2017-08-14 17:52                     ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-14 17:52                       ` [v2,4/7] " Toshi Kani
2017-08-14 18:05                       ` [PATCH v2 4/7] " Borislav Petkov
2017-08-14 18:05                         ` [v2,4/7] " Borislav Petkov
2017-08-14 18:17                         ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-14 18:17                           ` [v2,4/7] " Toshi Kani
2017-08-14 18:35                           ` [PATCH v2 4/7] " Borislav Petkov
2017-08-14 18:35                             ` [v2,4/7] " Borislav Petkov
2017-08-14 19:02                             ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-14 19:02                               ` [v2,4/7] " Toshi Kani
2017-08-14 19:34                               ` [PATCH v2 4/7] " Borislav Petkov
2017-08-14 19:34                                 ` [v2,4/7] " Borislav Petkov
2017-08-14 20:17                                 ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-14 20:17                                   ` [v2,4/7] " Toshi Kani
2017-08-14 20:39                                   ` [PATCH v2 4/7] " Borislav Petkov
2017-08-14 20:39                                     ` [v2,4/7] " Borislav Petkov
2017-08-15 15:35                                     ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-15 15:35                                       ` [v2,4/7] " Toshi Kani
2017-08-15 15:48                                       ` [PATCH v2 4/7] " Luck, Tony
2017-08-15 15:48                                         ` [v2,4/7] " Luck, Tony
2017-08-15 15:53                                         ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-15 15:53                                           ` [v2,4/7] " Toshi Kani
2017-08-16  8:29                                         ` [PATCH v2 4/7] " Borislav Petkov
2017-08-16  8:29                                           ` [v2,4/7] " Borislav Petkov
2017-08-16 11:29                                           ` [PATCH v2 4/7] " Borislav Petkov
2017-08-16 11:29                                             ` [v2,4/7] " Borislav Petkov
2017-08-16 13:59                                           ` [PATCH v2 4/7] " Steven Rostedt
2017-08-16 13:59                                             ` [v2,4/7] " Steven Rostedt
2017-08-16 14:03                                             ` [PATCH v2 4/7] " Borislav Petkov
2017-08-16 14:03                                               ` [v2,4/7] " Borislav Petkov
2017-08-16 14:22                                               ` [PATCH v2 4/7] " Steven Rostedt
2017-08-16 14:22                                                 ` [v2,4/7] " Steven Rostedt
2017-08-16 17:31                                                 ` [PATCH v2 4/7] " Borislav Petkov
2017-08-16 17:31                                                   ` [v2,4/7] " Borislav Petkov
2017-08-16 15:26                                           ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-16 15:26                                             ` [v2,4/7] " Toshi Kani
2017-08-16 16:42                                             ` [PATCH v2 4/7] " Borislav Petkov
2017-08-16 16:42                                               ` [v2,4/7] " Borislav Petkov
2017-08-16 17:28                                               ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-16 17:28                                                 ` [v2,4/7] " Toshi Kani
2017-08-16 17:40                                                 ` [PATCH v2 4/7] " Borislav Petkov
2017-08-16 17:40                                                   ` [v2,4/7] " Borislav Petkov
2017-08-16 18:01                                                   ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-16 18:01                                                     ` [v2,4/7] " Toshi Kani
2017-08-17 21:08                                                     ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-17 21:08                                                       ` [v2,4/7] " Toshi Kani
2017-08-21  9:29                                                       ` [PATCH v2 4/7] " Borislav Petkov
2017-08-21  9:29                                                         ` [v2,4/7] " Borislav Petkov
2017-08-15 15:50                                       ` [PATCH v2 4/7] " Borislav Petkov
2017-08-15 15:50                                         ` [v2,4/7] " Borislav Petkov
2017-08-15 16:19                                         ` [PATCH v2 4/7] " Kani, Toshimitsu
2017-08-15 16:19                                           ` [v2,4/7] " Toshi Kani
2017-08-03 21:57 ` [PATCH v2 5/7] ghes_edac: add platform check to enable ghes_edac Toshi Kani
2017-08-03 21:57   ` [v2,5/7] " Toshi Kani
2017-08-04  8:31   ` [PATCH v2 5/7] " Borislav Petkov
2017-08-04  8:31     ` [v2,5/7] " Borislav Petkov
2017-08-04 21:06     ` [PATCH v2 5/7] " Kani, Toshimitsu
2017-08-04 21:06       ` [v2,5/7] " Toshi Kani
2017-08-05  5:37       ` [PATCH v2 5/7] " Borislav Petkov
2017-08-05  5:37         ` [v2,5/7] " Borislav Petkov
2017-08-07 14:54         ` [PATCH v2 5/7] " Kani, Toshimitsu
2017-08-07 14:54           ` [v2,5/7] " Toshi Kani
2017-08-03 21:57 ` [PATCH v2 6/7] EDAC: add edac_check_mc_owner() to check MC owner Toshi Kani
2017-08-03 21:57   ` [v2,6/7] " Toshi Kani
2017-08-04  8:30   ` [PATCH v2 6/7] " Borislav Petkov
2017-08-04  8:30     ` [v2,6/7] " Borislav Petkov
2017-08-04 21:35     ` [PATCH v2 6/7] " Kani, Toshimitsu
2017-08-04 21:35       ` [v2,6/7] " Toshi Kani
2017-08-05  5:44       ` [PATCH v2 6/7] " Borislav Petkov
2017-08-05  5:44         ` [v2,6/7] " Borislav Petkov
2017-08-07 14:55         ` [PATCH v2 6/7] " Kani, Toshimitsu
2017-08-07 14:55           ` [v2,6/7] " Toshi Kani
2017-08-04 13:06   ` [PATCH v2 6/7] " kbuild test robot
2017-08-04 13:06     ` [v2,6/7] " kbuild test robot
2017-08-04 13:06     ` [PATCH v2 6/7] " kbuild test robot
2017-08-04 15:21     ` Kani, Toshimitsu
2017-08-04 15:21       ` [v2,6/7] " Toshi Kani
2017-08-03 21:57 ` [PATCH v2 7/7] edac drivers: add MC owner check in init Toshi Kani
2017-08-03 21:57   ` [v2,7/7] " Toshi Kani
2017-08-04  8:39   ` [PATCH v2 7/7] " Borislav Petkov
2017-08-04  8:39     ` [v2,7/7] " Borislav Petkov
2017-08-04 21:48     ` [PATCH v2 7/7] " Kani, Toshimitsu
2017-08-04 21:48       ` [v2,7/7] " Toshi Kani
2017-08-05  5:49       ` [PATCH v2 7/7] " Borislav Petkov
2017-08-05  5:49         ` [v2,7/7] " Borislav Petkov
2017-08-07 14:57         ` [PATCH v2 7/7] " Kani, Toshimitsu
2017-08-07 14:57           ` [v2,7/7] " Toshi Kani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.