Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting
@ 2020-03-06 15:13 Robert Richter
  2020-03-06 15:13 ` [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc() Robert Richter
                   ` (11 more replies)
  0 siblings, 12 replies; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

This series contains a significant cleanup and rework of the ghes
driver and improves the memory reporting as follows:

 * fix of DIMM label in error reports (patch #2),

 * creation of multiple memory controllers to group DIMMs depending on
   the physical memory array (patches #9-#11). This should reflect the
   memory topology of a system in sysfs. Esp. multi-node systems show
   up with one memory controller per node now.

The changes base on the remaining patches that are a general cleanup
and rework:

 * small change to edac_mc, not really dependent on the rest of the
   series (patch #1),

 * general cleanup and rework of the ghes driver (patches #3-#8).

The implementation of multiple memory controllers bases on the
suggestion from James (see patch #11), thank you James for your
valuable input here. The patches are created newly from scratch and
obsolete the GHES part of my previous postings a while ago that have
not been accepted upstream:

 https://lore.kernel.org/patchwork/cover/1093488/

Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.


Robert Richter (11):
  EDAC/mc: Use int type for parameters of edac_mc_alloc()
  EDAC/ghes: Setup DIMM label from DMI and use it in error reports
  EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()
  EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to
    ghes_mci
  EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to
    ghes_dimm_fill
  EDAC/ghes: Carve out MC device handling into separate functions
  EDAC/ghes: Have a separate code path for creating the fake MC
  EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}()
  EDAC/ghes: Implement DIMM mapping table for SMBIOS handles
  EDAC/ghes: Create an own device for each mci
  EDAC/ghes: Create one memory controller per physical memory array

 drivers/edac/edac_mc.c   |   7 +-
 drivers/edac/edac_mc.h   |   6 +-
 drivers/edac/ghes_edac.c | 514 +++++++++++++++++++++++++++++----------
 include/linux/edac.h     |   2 -
 4 files changed, 390 insertions(+), 139 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc()
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-04-06 11:38   ` Matthias Brugger
  2020-03-06 15:13 ` [PATCH 02/11] EDAC/ghes: Setup DIMM label from DMI and use it in error reports Robert Richter
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

Most iterators use int type as index. mci->mc_idx is also type int. So
use int type for parameters of edac_mc_alloc(). Extend the range check
to check for negative values. There is a type cast now when assigning
variable n_layers to mci->n_layer, it is safe due to the range check.

While at it, rename the users of edac_mc_alloc() to mc_idx as this
fits better here.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/edac_mc.c | 7 +++----
 drivers/edac/edac_mc.h | 6 +++---
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 75ede27bdf6a..8bd3d00b4385 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -444,8 +444,7 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
 	return 0;
 }
 
-struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
-				   unsigned int n_layers,
+struct mem_ctl_info *edac_mc_alloc(int mc_idx, int n_layers,
 				   struct edac_mc_layer *layers,
 				   unsigned int sz_pvt)
 {
@@ -456,7 +455,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
 	void *pvt, *ptr = NULL;
 	bool per_rank = false;
 
-	if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
+	if (WARN_ON(mc_idx < 0 || n_layers < 1 || n_layers > EDAC_MAX_LAYERS))
 		return NULL;
 
 	/*
@@ -505,7 +504,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
 	pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
 
 	/* setup index and various internal pointers */
-	mci->mc_idx = mc_num;
+	mci->mc_idx = mc_idx;
 	mci->tot_dimms = tot_dimms;
 	mci->pvt_info = pvt;
 	mci->n_layers = n_layers;
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 881b00eadf7a..4815f50afea0 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -98,7 +98,7 @@ do {									\
 /**
  * edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
  *
- * @mc_num:		Memory controller number
+ * @mc_idx:		Memory controller number
  * @n_layers:		Number of MC hierarchy layers
  * @layers:		Describes each layer as seen by the Memory Controller
  * @sz_pvt:		size of private storage needed
@@ -122,8 +122,8 @@ do {									\
  *	On success, return a pointer to struct mem_ctl_info pointer;
  *	%NULL otherwise
  */
-struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
-				   unsigned int n_layers,
+struct mem_ctl_info *edac_mc_alloc(int mc_idx,
+				   int n_layers,
 				   struct edac_mc_layer *layers,
 				   unsigned int sz_pvt);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 02/11] EDAC/ghes: Setup DIMM label from DMI and use it in error reports
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
  2020-03-06 15:13 ` [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc() Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-06 15:13 ` [PATCH 03/11] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode() Robert Richter
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

The ghes driver reports errors with 'unknown label' even if the actual
DIMM label is known, e.g.:

 EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0
   module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0
   page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location:
   node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM
   location:N0 DIMM_A0 status(0x0000000000000400): Storage error in
   DRAM memory)

Fix this by using struct dimm_info's label string in error reports:

 EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0
   rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0
   page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location:
   node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM
   location:N0 DIMM_A0 status(0x0000000000000400): Storage error in
   DRAM memory)

The labels are initialized by reading the bank and device strings from
DMI. Now, the label information can also read from sysfs. E.g. a
ThunderX2 system will show the following:

 /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0
 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0
 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0
 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0
 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0
 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0
 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0
 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0

Since dimm_labels can be rewritten, that label will be used in a later
error report:

 # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label
 # # some error injection here
 # dmesg | grep foobar
 [ 2119.784489] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0
 module:0 rank:0 bank:769 col:1 bit_pos:16 DIMM location:foobar
 page:0x94d027 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0
 card:0 module:0 rank:0 bank:769 col:1 bit_pos:16 DIMM location:foobar
 status(0x0000000000000400): Storage error in DRAM memory)

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 44 +++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index cb3dab56a875..07fa3867cba1 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -87,16 +87,32 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
 		(*num_dimm)++;
 }
 
-static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
+static struct dimm_info *find_dimm_by_handle(struct mem_ctl_info *mci,
+					u16 handle)
 {
 	struct dimm_info *dimm;
 
 	mci_for_each_dimm(mci, dimm) {
 		if (dimm->smbios_handle == handle)
-			return dimm->idx;
+			return dimm;
 	}
 
-	return -1;
+	return NULL;
+}
+
+static void ghes_dimm_setup_label(struct dimm_info *dimm, u16 handle)
+{
+	const char *bank = NULL, *device = NULL;
+
+	dmi_memdev_name(handle, &bank, &device);
+
+	/* both strings must be non-zero */
+	if (bank && *bank && device && *device)
+		snprintf(dimm->label, sizeof(dimm->label),
+			"%s %s", bank, device);
+	else
+		snprintf(dimm->label, sizeof(dimm->label),
+			"unknown memory (handle: 0x%.4x)", handle);
 }
 
 static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
@@ -179,9 +195,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 		dimm->dtype = DEV_UNKNOWN;
 		dimm->grain = 128;		/* Likely, worse case */
 
-		/*
-		 * FIXME: It shouldn't be hard to also fill the DIMM labels
-		 */
+		ghes_dimm_setup_label(dimm, entry->handle);
 
 		if (dimm->nr_pages) {
 			edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
@@ -344,19 +358,17 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 	if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
 		p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
 	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
-		const char *bank = NULL, *device = NULL;
-		int index = -1;
+		struct dimm_info *dimm;
 
-		dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
-		if (bank != NULL && device != NULL)
-			p += sprintf(p, "DIMM location:%s %s ", bank, device);
-		else
+		dimm = find_dimm_by_handle(mci, mem_err->mem_dev_handle);
+		if (dimm) {
+			e->top_layer = dimm->idx;
+			strcpy(e->label, dimm->label);
+			p += sprintf(p, "DIMM location:%s ", dimm->label);
+		} else {
 			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
 				     mem_err->mem_dev_handle);
-
-		index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle);
-		if (index >= 0)
-			e->top_layer = index;
+		}
 	}
 	if (p > e->location)
 		*(p - 1) = '\0';
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 03/11] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
  2020-03-06 15:13 ` [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc() Robert Richter
  2020-03-06 15:13 ` [PATCH 02/11] EDAC/ghes: Setup DIMM label from DMI and use it in error reports Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-04-06 11:51   ` Matthias Brugger
  2020-03-06 15:13 ` [PATCH 04/11] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci Robert Richter
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

The local variable rdr_mask serves as a static constant here. It hides
what the code is doing. Remove it and replace it with the actual logic
that checks some bits.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 07fa3867cba1..fce53893731a 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -123,7 +123,6 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 	if (dh->type == DMI_ENTRY_MEM_DEVICE) {
 		struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
 		struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 0, 0);
-		u16 rdr_mask = BIT(7) | BIT(13);
 
 		if (entry->size == 0xffff) {
 			pr_info("Can't get DIMM%i size\n",
@@ -173,7 +172,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 		default:
 			if (entry->type_detail & BIT(6))
 				dimm->mtype = MEM_RMBS;
-			else if ((entry->type_detail & rdr_mask) == rdr_mask)
+			else if ((entry->type_detail & BIT(7)) &&
+				 (entry->type_detail & BIT(13)))
 				dimm->mtype = MEM_RDR;
 			else if (entry->type_detail & BIT(7))
 				dimm->mtype = MEM_SDR;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 04/11] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (2 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 03/11] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode() Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-06 15:13 ` [PATCH 05/11] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill Robert Richter
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

The struct members list and ghes of struct ghes_edac_pvt are unused,
remove them. On that occasion, rename it to struct ghes_mci. This is
shorter and aligns better with the current naming scheme.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index fce53893731a..438972dfea09 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -15,9 +15,7 @@
 #include "edac_module.h"
 #include <ras/ras_event.h>
 
-struct ghes_edac_pvt {
-	struct list_head list;
-	struct ghes *ghes;
+struct ghes_mci {
 	struct mem_ctl_info *mci;
 
 	/* Buffers for the error handling routine */
@@ -32,7 +30,7 @@ static refcount_t ghes_refcount = REFCOUNT_INIT(0);
  * also provides the necessary (implicit) memory barrier for the SMP
  * case to make the pointer visible on another CPU.
  */
-static struct ghes_edac_pvt *ghes_pvt;
+static struct ghes_mci *ghes_pvt;
 
 /* GHES registration mutex */
 static DEFINE_MUTEX(ghes_reg_mutex);
@@ -217,7 +215,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 {
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt;
+	struct ghes_mci *pvt;
 	unsigned long flags;
 	char *p;
 
@@ -469,7 +467,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	bool fake = false;
 	int rc = 0, num_dimm = 0;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt;
+	struct ghes_mci *pvt;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_dimm_fill dimm_fill;
 	unsigned long flags;
@@ -506,7 +504,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(*pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
 		rc = -ENOMEM;
@@ -514,7 +512,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	}
 
 	pvt		= mci->pvt_info;
-	pvt->ghes	= ghes;
 	pvt->mci	= mci;
 
 	mci->pdev = dev;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 05/11] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (3 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 04/11] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-16  9:29   ` Borislav Petkov
  2020-03-06 15:13 ` [PATCH 06/11] EDAC/ghes: Carve out MC device handling into separate functions Robert Richter
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

The struct is used to store temporary data for the dmidecode callback.
Clean this up a bit:

 1) Rename it to something shorter and more reasonable.

 2) Rename member count to index since this is what it is used for.

 3) Move code close to ghes_edac_dmidecode() where it is used.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 438972dfea09..358519e8c2e9 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -72,11 +72,6 @@ struct memdev_dmi_entry {
 	u16 conf_mem_clk_speed;
 } __attribute__((__packed__));
 
-struct ghes_edac_dimm_fill {
-	struct mem_ctl_info *mci;
-	unsigned int count;
-};
-
 static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
 {
 	int *num_dimm = arg;
@@ -113,18 +108,23 @@ static void ghes_dimm_setup_label(struct dimm_info *dimm, u16 handle)
 			"unknown memory (handle: 0x%.4x)", handle);
 }
 
+struct ghes_dimm_fill {
+	struct mem_ctl_info *mci;
+	int index;
+};
+
 static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 {
-	struct ghes_edac_dimm_fill *dimm_fill = arg;
+	struct ghes_dimm_fill *dimm_fill = arg;
 	struct mem_ctl_info *mci = dimm_fill->mci;
 
 	if (dh->type == DMI_ENTRY_MEM_DEVICE) {
 		struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
-		struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 0, 0);
+		struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->index, 0, 0);
 
 		if (entry->size == 0xffff) {
 			pr_info("Can't get DIMM%i size\n",
-				dimm_fill->count);
+				dimm_fill->index);
 			dimm->nr_pages = MiB_TO_PAGES(32);/* Unknown */
 		} else if (entry->size == 0x7fff) {
 			dimm->nr_pages = MiB_TO_PAGES(entry->extended_size);
@@ -197,7 +197,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 
 		if (dimm->nr_pages) {
 			edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
-				dimm_fill->count, edac_mem_types[dimm->mtype],
+				dimm_fill->index, edac_mem_types[dimm->mtype],
 				PAGES_TO_MiB(dimm->nr_pages),
 				(dimm->edac_mode != EDAC_NONE) ? "(ECC)" : "");
 			edac_dbg(2, "\ttype %d, detail 0x%02x, width %d(total %d)\n",
@@ -207,7 +207,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 
 		dimm->smbios_handle = entry->handle;
 
-		dimm_fill->count++;
+		dimm_fill->index++;
 	}
 }
 
@@ -469,7 +469,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	struct mem_ctl_info *mci;
 	struct ghes_mci *pvt;
 	struct edac_mc_layer layers[1];
-	struct ghes_edac_dimm_fill dimm_fill;
+	struct ghes_dimm_fill dimm_fill;
 	unsigned long flags;
 	int idx = -1;
 
@@ -536,7 +536,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	}
 
 	if (!fake) {
-		dimm_fill.count = 0;
+		dimm_fill.index = 0;
 		dimm_fill.mci = mci;
 		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 	} else {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 06/11] EDAC/ghes: Carve out MC device handling into separate functions
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (4 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 05/11] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-16  9:31   ` Borislav Petkov
  2020-03-06 15:13 ` [PATCH 07/11] EDAC/ghes: Have a separate code path for creating the fake MC Robert Richter
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

The functions are too long, carve out code that handles MC devices
into the new functions ghes_mc_create(), ghes_mc_add_or_free() and
ghes_mc_free(). Apart from better code readability the functions can
be reused and the implementation of the error paths becomes easier.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 133 +++++++++++++++++++++++----------------
 1 file changed, 79 insertions(+), 54 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 358519e8c2e9..5a4c9694bbff 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -462,16 +462,81 @@ static struct acpi_platform_list plat_list[] = {
 	{ } /* End */
 };
 
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
+					int num_dimm)
 {
-	bool fake = false;
-	int rc = 0, num_dimm = 0;
+	struct edac_mc_layer layers[1];
 	struct mem_ctl_info *mci;
 	struct ghes_mci *pvt;
-	struct edac_mc_layer layers[1];
-	struct ghes_dimm_fill dimm_fill;
+
+	layers[0].type = EDAC_MC_LAYER_ALL_MEM;
+	layers[0].size = num_dimm;
+	layers[0].is_virt_csrow = true;
+
+	mci = edac_mc_alloc(mc_idx, ARRAY_SIZE(layers), layers, sizeof(*pvt));
+	if (!mci)
+		return NULL;
+
+	pvt		= mci->pvt_info;
+	pvt->mci	= mci;
+
+	mci->pdev = dev;
+	mci->mtype_cap = MEM_FLAG_EMPTY;
+	mci->edac_ctl_cap = EDAC_FLAG_NONE;
+	mci->edac_cap = EDAC_FLAG_NONE;
+	mci->mod_name = "ghes_edac.c";
+	mci->ctl_name = "ghes_edac";
+	mci->dev_name = "ghes";
+
+	return mci;
+}
+
+static int ghes_mc_add_or_free(struct mem_ctl_info *mci)
+{
 	unsigned long flags;
-	int idx = -1;
+	int rc;
+
+	rc = edac_mc_add_mc(mci);
+	if (rc < 0) {
+		edac_mc_free(mci);
+		return rc;
+	}
+
+	spin_lock_irqsave(&ghes_lock, flags);
+	ghes_pvt = mci->pvt_info;
+	spin_unlock_irqrestore(&ghes_lock, flags);
+
+	return 0;
+}
+
+static void ghes_mc_free(void)
+{
+	struct mem_ctl_info *mci;
+	unsigned long flags;
+
+	/*
+	 * Wait for the irq handler being finished.
+	 */
+	spin_lock_irqsave(&ghes_lock, flags);
+	mci = ghes_pvt ? ghes_pvt->mci : NULL;
+	ghes_pvt = NULL;
+	spin_unlock_irqrestore(&ghes_lock, flags);
+
+	if (!mci)
+		return;
+
+	mci = edac_mc_del_mc(mci->pdev);
+	if (mci)
+		edac_mc_free(mci);
+}
+
+int ghes_edac_register(struct ghes *ghes, struct device *dev)
+{
+	struct ghes_dimm_fill dimm_fill;
+	int rc = 0, num_dimm = 0;
+	struct mem_ctl_info *mci;
+	bool fake = false;
+	int idx;
 
 	if (IS_ENABLED(CONFIG_X86)) {
 		/* Check if safe to enable on this system */
@@ -500,28 +565,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		num_dimm = 1;
 	}
 
-	layers[0].type = EDAC_MC_LAYER_ALL_MEM;
-	layers[0].size = num_dimm;
-	layers[0].is_virt_csrow = true;
-
-	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(*pvt));
+	mci = ghes_mc_create(dev, 0, num_dimm);
 	if (!mci) {
-		pr_info("Can't allocate memory for EDAC data\n");
 		rc = -ENOMEM;
 		goto unlock;
 	}
 
-	pvt		= mci->pvt_info;
-	pvt->mci	= mci;
-
-	mci->pdev = dev;
-	mci->mtype_cap = MEM_FLAG_EMPTY;
-	mci->edac_ctl_cap = EDAC_FLAG_NONE;
-	mci->edac_cap = EDAC_FLAG_NONE;
-	mci->mod_name = "ghes_edac.c";
-	mci->ctl_name = "ghes_edac";
-	mci->dev_name = "ghes";
-
 	if (fake) {
 		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
 		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
@@ -549,22 +598,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		dimm->edac_mode = EDAC_SECDED;
 	}
 
-	rc = edac_mc_add_mc(mci);
-	if (rc < 0) {
-		pr_info("Can't register at EDAC core\n");
-		edac_mc_free(mci);
-		rc = -ENODEV;
+	rc = ghes_mc_add_or_free(mci);
+	if (rc)
 		goto unlock;
-	}
-
-	spin_lock_irqsave(&ghes_lock, flags);
-	ghes_pvt = pvt;
-	spin_unlock_irqrestore(&ghes_lock, flags);
 
 	/* only set on success */
 	refcount_set(&ghes_refcount, 1);
 
 unlock:
+	if (rc)
+		pr_err("Can't register at EDAC core: %d\n", rc);
+
 	mutex_unlock(&ghes_reg_mutex);
 
 	return rc;
@@ -572,29 +616,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 
 void ghes_edac_unregister(struct ghes *ghes)
 {
-	struct mem_ctl_info *mci;
-	unsigned long flags;
-
 	mutex_lock(&ghes_reg_mutex);
 
-	if (!refcount_dec_and_test(&ghes_refcount))
-		goto unlock;
-
-	/*
-	 * Wait for the irq handler being finished.
-	 */
-	spin_lock_irqsave(&ghes_lock, flags);
-	mci = ghes_pvt ? ghes_pvt->mci : NULL;
-	ghes_pvt = NULL;
-	spin_unlock_irqrestore(&ghes_lock, flags);
-
-	if (!mci)
-		goto unlock;
-
-	mci = edac_mc_del_mc(mci->pdev);
-	if (mci)
-		edac_mc_free(mci);
+	if (refcount_dec_and_test(&ghes_refcount))
+		ghes_mc_free();
 
-unlock:
 	mutex_unlock(&ghes_reg_mutex);
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 07/11] EDAC/ghes: Have a separate code path for creating the fake MC
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (5 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 06/11] EDAC/ghes: Carve out MC device handling into separate functions Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-06 15:13 ` [PATCH 08/11] EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}() Robert Richter
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

The code in ghes_edac_register() switches back and forth between
standard and fake controller creation. Do one thing only and separate
the code path that creates the fake MC.

Note: For better review the code is not yet carved out in separate
functions.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 65 ++++++++++++++++++++++++----------------
 1 file changed, 40 insertions(+), 25 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 5a4c9694bbff..7ead5667ed73 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -535,7 +535,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	struct ghes_dimm_fill dimm_fill;
 	int rc = 0, num_dimm = 0;
 	struct mem_ctl_info *mci;
-	bool fake = false;
 	int idx;
 
 	if (IS_ENABLED(CONFIG_X86)) {
@@ -559,23 +558,42 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
-	/* Check if we've got a bogus BIOS */
-	if (num_dimm == 0) {
-		fake = true;
-		num_dimm = 1;
-	}
+	if (!num_dimm) {
+		/*
+		 * Bogus BIOS: Ignore DMI topology and use a single MC
+		 * with only one DIMM for the whole address range to
+		 * catch all errros.
+		 */
+		struct dimm_info *dimm;
 
-	mci = ghes_mc_create(dev, 0, num_dimm);
-	if (!mci) {
-		rc = -ENOMEM;
-		goto unlock;
-	}
+		mci = ghes_mc_create(dev, 0, 1);
+		if (!mci) {
+			rc = -ENOMEM;
+			goto unlock;
+		}
+
+		dimm = edac_get_dimm_by_index(mci, 0);
+
+		dimm->nr_pages = 1;
+		dimm->grain = 128;
+		dimm->mtype = MEM_UNKNOWN;
+		dimm->dtype = DEV_UNKNOWN;
+		dimm->edac_mode = EDAC_SECDED;
+
+		snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
+
+		rc = ghes_mc_add_or_free(mci);
+		if (rc)
+			goto unlock;
 
-	if (fake) {
 		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
 		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
 		pr_info("work on such system. Use this driver with caution\n");
-	} else if (idx < 0) {
+
+		goto out;
+	}
+
+	if (idx < 0) {
 		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
 		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
 		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
@@ -584,24 +602,21 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		pr_info("This system has %d DIMM sockets.\n", num_dimm);
 	}
 
-	if (!fake) {
-		dimm_fill.index = 0;
-		dimm_fill.mci = mci;
-		dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-	} else {
-		struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
-
-		dimm->nr_pages = 1;
-		dimm->grain = 128;
-		dimm->mtype = MEM_UNKNOWN;
-		dimm->dtype = DEV_UNKNOWN;
-		dimm->edac_mode = EDAC_SECDED;
+	mci = ghes_mc_create(dev, 0, num_dimm);
+	if (!mci) {
+		rc = -ENOMEM;
+		goto unlock;
 	}
 
+	dimm_fill.index = 0;
+	dimm_fill.mci = mci;
+	dmi_walk(ghes_edac_dmidecode, &dimm_fill);
+
 	rc = ghes_mc_add_or_free(mci);
 	if (rc)
 		goto unlock;
 
+out:
 	/* only set on success */
 	refcount_set(&ghes_refcount, 1);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 08/11] EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}()
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (6 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 07/11] EDAC/ghes: Have a separate code path for creating the fake MC Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-06 15:13 ` [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles Robert Richter
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

Factor out code to register a memory controller including DIMMs. Do
this for standard and fake memory controller in the two functions
ghes_edac_register_one() and ghes_edac_register_fake().

Function ghes_edac_register_one() could be reused to register multiple
*mci structs.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 72 ++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 32 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 7ead5667ed73..0af7f6a988a9 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -530,11 +530,47 @@ static void ghes_mc_free(void)
 		edac_mc_free(mci);
 }
 
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static int ghes_edac_register_fake(struct device *dev)
+{
+	struct mem_ctl_info *mci;
+	struct dimm_info *dimm;
+
+	mci = ghes_mc_create(dev, 0, 1);
+	if (!mci)
+		return -ENOMEM;
+
+	dimm = edac_get_dimm_by_index(mci, 0);
+
+	dimm->nr_pages = 1;
+	dimm->grain = 128;
+	dimm->mtype = MEM_UNKNOWN;
+	dimm->dtype = DEV_UNKNOWN;
+	dimm->edac_mode = EDAC_SECDED;
+
+	snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
+
+	return ghes_mc_add_or_free(mci);
+}
+
+static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
 {
 	struct ghes_dimm_fill dimm_fill;
-	int rc = 0, num_dimm = 0;
 	struct mem_ctl_info *mci;
+
+	mci = ghes_mc_create(dev, mc_idx, num_dimm);
+	if (!mci)
+		return -ENOMEM;
+
+	dimm_fill.index = 0;
+	dimm_fill.mci = mci;
+	dmi_walk(ghes_edac_dmidecode, &dimm_fill);
+
+	return ghes_mc_add_or_free(mci);
+}
+
+int ghes_edac_register(struct ghes *ghes, struct device *dev)
+{
+	int rc = 0, num_dimm = 0;
 	int idx;
 
 	if (IS_ENABLED(CONFIG_X86)) {
@@ -564,25 +600,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		 * with only one DIMM for the whole address range to
 		 * catch all errros.
 		 */
-		struct dimm_info *dimm;
-
-		mci = ghes_mc_create(dev, 0, 1);
-		if (!mci) {
-			rc = -ENOMEM;
-			goto unlock;
-		}
-
-		dimm = edac_get_dimm_by_index(mci, 0);
-
-		dimm->nr_pages = 1;
-		dimm->grain = 128;
-		dimm->mtype = MEM_UNKNOWN;
-		dimm->dtype = DEV_UNKNOWN;
-		dimm->edac_mode = EDAC_SECDED;
-
-		snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
-
-		rc = ghes_mc_add_or_free(mci);
+		rc = ghes_edac_register_fake(dev);
 		if (rc)
 			goto unlock;
 
@@ -602,17 +620,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		pr_info("This system has %d DIMM sockets.\n", num_dimm);
 	}
 
-	mci = ghes_mc_create(dev, 0, num_dimm);
-	if (!mci) {
-		rc = -ENOMEM;
-		goto unlock;
-	}
-
-	dimm_fill.index = 0;
-	dimm_fill.mci = mci;
-	dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-
-	rc = ghes_mc_add_or_free(mci);
+	rc = ghes_edac_register_one(dev, 0, num_dimm);
 	if (rc)
 		goto unlock;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (7 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 08/11] EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}() Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-16  9:40   ` Borislav Petkov
  2020-03-06 15:13 ` [PATCH 10/11] EDAC/ghes: Create an own device for each mci Robert Richter
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

GHES uses SMBIOS handles to identify the DIMM that was causing a hw
error. Currently this is stored in smbios_handle of struct dimm_info.
This implementation has several drawbacks. The information is private
to the ghes driver, but struct dimm_info is for general use. The DIMMs
are tied to a *mci struct, which makes a lockup inefficient. It is
hard to dynamically allocate DIMMs and also to meet locking
constraints when adding or removing them. This becomes even more
challenging when having multiple MC instances that group a set of
DIMMs, so the change is needed also to later support multiple MC
instances.

Add an own mapping table for the ghes driver to solve these issues.
The DIMM data is collected in struct ghes_dimm. A data array exists
for all DIMMs found in the system. Lists are used to maintain the DIMM
entries, during initialization they are all added to a pool of
available data structs. Using list entries makes locking easy as only
list operations need to be locked by either the ghes_reg_mutex or
ghes_lock.

There are a couple of helper functions to maintain the dimm list:

 ghes_dimm_init()/ghes_dimm_free():       create/remove the list
 ghes_dimm_alloc()/ghes_dimm_release():   add/remove list entries
 get_default_mci()/find_dimm_by_handle(): lockup a MC or DIMM

The lists are used to identify the corresponding *mci and *dimm_info
structs in the hw error interrupt handler. Esp. the *ghes_pvt pointer
is obsolete now, so *mci can be removed from struct ghes_mci.

The mapping table of the ghes driver makes smbios_handle in struct
dimm_info also obsolete, remove it too.

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 172 +++++++++++++++++++++++++++++++--------
 include/linux/edac.h     |   2 -
 2 files changed, 136 insertions(+), 38 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 0af7f6a988a9..cd61b8ae49f6 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -15,9 +15,13 @@
 #include "edac_module.h"
 #include <ras/ras_event.h>
 
-struct ghes_mci {
-	struct mem_ctl_info *mci;
+struct ghes_dimm {
+	struct list_head entry;
+	struct dimm_info *dimm;
+	u16 handle;
+};
 
+struct ghes_mci {
 	/* Buffers for the error handling routine */
 	char other_detail[400];
 	char msg[80];
@@ -26,22 +30,25 @@ struct ghes_mci {
 static refcount_t ghes_refcount = REFCOUNT_INIT(0);
 
 /*
- * Access to ghes_pvt must be protected by ghes_lock. The spinlock
- * also provides the necessary (implicit) memory barrier for the SMP
- * case to make the pointer visible on another CPU.
+ * GHES registration mutex: Serialize registration/unregistration
+ * code, protect ghes_dimm_pool and *dimms.
+ *
  */
-static struct ghes_mci *ghes_pvt;
-
-/* GHES registration mutex */
 static DEFINE_MUTEX(ghes_reg_mutex);
 
 /*
  * Sync with other, potentially concurrent callers of
  * ghes_edac_report_mem_error(). We don't know what the
  * "inventive" firmware would do.
+ *
+ * ghes_dimm_list must be protected by ghes_lock.
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
+static struct ghes_dimm *dimms;
+static LIST_HEAD(ghes_dimm_list);
+static LIST_HEAD(ghes_dimm_pool);
+
 /* "ghes_edac.force_load=1" skips the platform check */
 static bool __read_mostly force_load;
 module_param(force_load, bool, 0);
@@ -72,6 +79,52 @@ struct memdev_dmi_entry {
 	u16 conf_mem_clk_speed;
 } __attribute__((__packed__));
 
+/* ghes_reg_mutex must be held. */
+static int ghes_dimm_init(int num_dimm)
+{
+	struct ghes_dimm *ghes_dimm;
+
+	dimms = kcalloc(num_dimm, sizeof(*dimms), GFP_KERNEL);
+	if (!dimms)
+		return -ENOMEM;
+
+	for (ghes_dimm = dimms; ghes_dimm < dimms + num_dimm; ghes_dimm++)
+		list_add(&ghes_dimm->entry, &ghes_dimm_pool);
+
+	return 0;
+}
+
+/* ghes_reg_mutex must be held. */
+static void ghes_dimm_free(void)
+{
+	INIT_LIST_HEAD(&ghes_dimm_pool);
+	kfree(dimms);
+}
+
+/* ghes_reg_mutex must be held. */
+static struct ghes_dimm *ghes_dimm_alloc(struct dimm_info *dimm, u16 handle)
+{
+	struct ghes_dimm *ghes_dimm;
+
+	ghes_dimm = list_first_entry_or_null(&ghes_dimm_pool,
+					struct ghes_dimm, entry);
+
+	/* should be always non-zero */
+	if (!WARN_ON_ONCE(!ghes_dimm)) {
+		ghes_dimm->dimm = dimm;
+		ghes_dimm->handle = handle;
+		list_del(&ghes_dimm->entry);
+	}
+
+	return ghes_dimm;
+}
+
+/* ghes_reg_mutex must be held. */
+static void ghes_dimm_release(struct list_head *dimms)
+{
+	list_splice(dimms, &ghes_dimm_pool);
+}
+
 static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
 {
 	int *num_dimm = arg;
@@ -80,14 +133,30 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
 		(*num_dimm)++;
 }
 
-static struct dimm_info *find_dimm_by_handle(struct mem_ctl_info *mci,
-					u16 handle)
+/*
+ * Make first DIMM's MC the default to report errors in case
+ * no DIMM for the mem device handle is found.
+ *
+ * ghes_lock must be held.
+ */
+static struct mem_ctl_info *get_default_mci(void)
 {
-	struct dimm_info *dimm;
+	struct ghes_dimm *ghes_dimm;
+
+	ghes_dimm = list_first_entry_or_null(&ghes_dimm_list,
+					struct ghes_dimm, entry);
+
+	return ghes_dimm ? ghes_dimm->dimm->mci : NULL;
+}
+
+/* ghes_lock must be held as long as struct ghes_info is used */
+static struct dimm_info *find_dimm_by_handle(u16 handle)
+{
+	struct ghes_dimm *ghes_dimm;
 
-	mci_for_each_dimm(mci, dimm) {
-		if (dimm->smbios_handle == handle)
-			return dimm;
+	list_for_each_entry(ghes_dimm, &ghes_dimm_list, entry) {
+		if (ghes_dimm->handle == handle)
+			return ghes_dimm->dimm;
 	}
 
 	return NULL;
@@ -109,6 +178,7 @@ static void ghes_dimm_setup_label(struct dimm_info *dimm, u16 handle)
 }
 
 struct ghes_dimm_fill {
+	struct list_head dimms;
 	struct mem_ctl_info *mci;
 	int index;
 };
@@ -121,6 +191,11 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 	if (dh->type == DMI_ENTRY_MEM_DEVICE) {
 		struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
 		struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->index, 0, 0);
+		struct ghes_dimm *ghes_dimm;
+
+		ghes_dimm = ghes_dimm_alloc(dimm, entry->handle);
+		if (ghes_dimm)
+			list_add_tail(&ghes_dimm->entry, &dimm_fill->dimms);
 
 		if (entry->size == 0xffff) {
 			pr_info("Can't get DIMM%i size\n",
@@ -205,8 +280,6 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 				entry->total_width, entry->data_width);
 		}
 
-		dimm->smbios_handle = entry->handle;
-
 		dimm_fill->index++;
 	}
 }
@@ -214,6 +287,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 {
 	struct edac_raw_error_desc *e;
+	struct dimm_info *dimm = NULL;
 	struct mem_ctl_info *mci;
 	struct ghes_mci *pvt;
 	unsigned long flags;
@@ -229,11 +303,14 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 	spin_lock_irqsave(&ghes_lock, flags);
 
-	pvt = ghes_pvt;
-	if (!pvt)
+	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE)
+		dimm = find_dimm_by_handle(mem_err->mem_dev_handle);
+
+	mci = dimm ? dimm->mci : get_default_mci();
+	if (!mci)
 		goto unlock;
 
-	mci = pvt->mci;
+	pvt = mci->pvt_info;
 	e = &mci->error_desc;
 
 	/* Cleans the error report buffer */
@@ -356,9 +433,6 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 	if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
 		p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
 	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
-		struct dimm_info *dimm;
-
-		dimm = find_dimm_by_handle(mci, mem_err->mem_dev_handle);
 		if (dimm) {
 			e->top_layer = dimm->idx;
 			strcpy(e->label, dimm->label);
@@ -477,8 +551,7 @@ static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
 	if (!mci)
 		return NULL;
 
-	pvt		= mci->pvt_info;
-	pvt->mci	= mci;
+	pvt = mci->pvt_info;
 
 	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
@@ -491,19 +564,21 @@ static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
 	return mci;
 }
 
-static int ghes_mc_add_or_free(struct mem_ctl_info *mci)
+static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
+			struct list_head *dimms)
 {
 	unsigned long flags;
 	int rc;
 
 	rc = edac_mc_add_mc(mci);
 	if (rc < 0) {
+		ghes_dimm_release(dimms);
 		edac_mc_free(mci);
 		return rc;
 	}
 
 	spin_lock_irqsave(&ghes_lock, flags);
-	ghes_pvt = mci->pvt_info;
+	list_splice_tail(dimms, &ghes_dimm_list);
 	spin_unlock_irqrestore(&ghes_lock, flags);
 
 	return 0;
@@ -511,17 +586,25 @@ static int ghes_mc_add_or_free(struct mem_ctl_info *mci)
 
 static void ghes_mc_free(void)
 {
-	struct mem_ctl_info *mci;
+	struct ghes_dimm *ghes_dimm, *tmp;
+	struct mem_ctl_info *mci = NULL;
+	LIST_HEAD(dimm_list);
 	unsigned long flags;
 
-	/*
-	 * Wait for the irq handler being finished.
-	 */
 	spin_lock_irqsave(&ghes_lock, flags);
-	mci = ghes_pvt ? ghes_pvt->mci : NULL;
-	ghes_pvt = NULL;
+
+	list_for_each_entry_safe(ghes_dimm, tmp, &ghes_dimm_list, entry) {
+		mci = mci ?: ghes_dimm->dimm->mci;
+		WARN_ON_ONCE(mci != ghes_dimm->dimm->mci);
+		list_move_tail(&ghes_dimm->entry, &dimm_list);
+	}
+
+	WARN_ON_ONCE(!list_empty(&ghes_dimm_list));
+
 	spin_unlock_irqrestore(&ghes_lock, flags);
 
+	ghes_dimm_release(&dimm_list);
+
 	if (!mci)
 		return;
 
@@ -532,8 +615,10 @@ static void ghes_mc_free(void)
 
 static int ghes_edac_register_fake(struct device *dev)
 {
+	struct ghes_dimm *ghes_dimm;
 	struct mem_ctl_info *mci;
 	struct dimm_info *dimm;
+	LIST_HEAD(dimm_list);
 
 	mci = ghes_mc_create(dev, 0, 1);
 	if (!mci)
@@ -549,7 +634,11 @@ static int ghes_edac_register_fake(struct device *dev)
 
 	snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
 
-	return ghes_mc_add_or_free(mci);
+	ghes_dimm = ghes_dimm_alloc(dimm, 0xffff);
+	if (ghes_dimm)
+		list_add_tail(&ghes_dimm->entry, &dimm_list);
+
+	return ghes_mc_add_or_free(mci, &dimm_list);
 }
 
 static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
@@ -563,9 +652,11 @@ static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
 
 	dimm_fill.index = 0;
 	dimm_fill.mci = mci;
+	INIT_LIST_HEAD(&dimm_fill.dimms);
+
 	dmi_walk(ghes_edac_dmidecode, &dimm_fill);
 
-	return ghes_mc_add_or_free(mci);
+	return ghes_mc_add_or_free(mci, &dimm_fill.dimms);
 }
 
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
@@ -594,6 +685,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
 
+	rc = ghes_dimm_init(num_dimm ?: 1);
+	if (rc)
+		goto unlock;
+
 	if (!num_dimm) {
 		/*
 		 * Bogus BIOS: Ignore DMI topology and use a single MC
@@ -629,8 +724,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	refcount_set(&ghes_refcount, 1);
 
 unlock:
-	if (rc)
+	if (rc) {
+		ghes_mc_free();
+		ghes_dimm_free();
 		pr_err("Can't register at EDAC core: %d\n", rc);
+	}
 
 	mutex_unlock(&ghes_reg_mutex);
 
@@ -641,8 +739,10 @@ void ghes_edac_unregister(struct ghes *ghes)
 {
 	mutex_lock(&ghes_reg_mutex);
 
-	if (refcount_dec_and_test(&ghes_refcount))
+	if (refcount_dec_and_test(&ghes_refcount)) {
 		ghes_mc_free();
+		ghes_dimm_free();
+	}
 
 	mutex_unlock(&ghes_reg_mutex);
 }
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 0f20b986b0ab..6b7f594782c0 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -382,8 +382,6 @@ struct dimm_info {
 
 	unsigned int csrow, cschannel;	/* Points to the old API data */
 
-	u16 smbios_handle;              /* Handle for SMBIOS type 17 */
-
 	u32 ce_count;
 	u32 ue_count;
 };
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 10/11] EDAC/ghes: Create an own device for each mci
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (8 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-16  9:45   ` Borislav Petkov
  2020-03-06 15:13 ` [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array Robert Richter
  2020-03-10 20:18 ` [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Aristeu Rozanski
  11 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

Each edac mc must have a unique device for identification (see
add_mc_to_global_list()). This 1:1 mapping between parent device and
mci is a limitation for supporting multiple instances created by the
ghes driver. Solve this by creating an own device in between of the
ghes parent and the mci struct, this allows a 1:n mapping between
both.

Implement this using a platform device as this is the least possible
effort to create and free a device. It shows up nicely in sysfs:

 # find /sys/ -name ghes_mc*
 /sys/devices/platform/GHES.0/ghes_mc.1
 /sys/devices/platform/GHES.0/ghes_mc.0
 /sys/bus/platform/devices/ghes_mc.1
 /sys/bus/platform/devices/ghes_mc.0

Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index cd61b8ae49f6..64220397296e 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -539,6 +539,8 @@ static struct acpi_platform_list plat_list[] = {
 static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
 					int num_dimm)
 {
+	struct platform_device_info pdevinfo;
+	struct platform_device *pdev;
 	struct edac_mc_layer layers[1];
 	struct mem_ctl_info *mci;
 	struct ghes_mci *pvt;
@@ -547,13 +549,23 @@ static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
+	pdevinfo	= (struct platform_device_info){
+		.parent	= dev,
+		.name	= "ghes_mc",
+		.id	= mc_idx,
+	};
+
+	pdev = platform_device_register_full(&pdevinfo);
+	if (IS_ERR(pdev))
+		goto fail;
+
 	mci = edac_mc_alloc(mc_idx, ARRAY_SIZE(layers), layers, sizeof(*pvt));
 	if (!mci)
-		return NULL;
+		goto fail;
 
 	pvt = mci->pvt_info;
 
-	mci->pdev = dev;
+	mci->pdev = &pdev->dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
 	mci->edac_cap = EDAC_FLAG_NONE;
@@ -562,6 +574,11 @@ static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
 	mci->dev_name = "ghes";
 
 	return mci;
+
+fail:
+	platform_device_unregister(pdev);
+
+	return NULL;
 }
 
 static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
@@ -573,6 +590,7 @@ static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
 	rc = edac_mc_add_mc(mci);
 	if (rc < 0) {
 		ghes_dimm_release(dimms);
+		platform_device_unregister(to_platform_device(mci->pdev));
 		edac_mc_free(mci);
 		return rc;
 	}
@@ -609,8 +627,10 @@ static void ghes_mc_free(void)
 		return;
 
 	mci = edac_mc_del_mc(mci->pdev);
-	if (mci)
+	if (mci) {
+		platform_device_unregister(to_platform_device(mci->pdev));
 		edac_mc_free(mci);
+	}
 }
 
 static int ghes_edac_register_fake(struct device *dev)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (9 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 10/11] EDAC/ghes: Create an own device for each mci Robert Richter
@ 2020-03-06 15:13 ` Robert Richter
  2020-03-16  9:51   ` Borislav Petkov
  2020-03-10 20:18 ` [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Aristeu Rozanski
  11 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel

The ghes driver only creates one memory controller for the whole
system. This does not reflect memory topology especially in multi-node
systems. E.g. a Marvell ThunderX2 system shows:

 /sys/devices/system/edac/mc/mc0/dimm0
 /sys/devices/system/edac/mc/mc0/dimm1
 /sys/devices/system/edac/mc/mc0/dimm2
 /sys/devices/system/edac/mc/mc0/dimm3
 /sys/devices/system/edac/mc/mc0/dimm4
 /sys/devices/system/edac/mc/mc0/dimm5
 /sys/devices/system/edac/mc/mc0/dimm6
 /sys/devices/system/edac/mc/mc0/dimm7
 /sys/devices/system/edac/mc/mc0/dimm8
 /sys/devices/system/edac/mc/mc0/dimm9
 /sys/devices/system/edac/mc/mc0/dimm10
 /sys/devices/system/edac/mc/mc0/dimm11
 /sys/devices/system/edac/mc/mc0/dimm12
 /sys/devices/system/edac/mc/mc0/dimm13
 /sys/devices/system/edac/mc/mc0/dimm14
 /sys/devices/system/edac/mc/mc0/dimm15

The DIMMs 9-15 are located on the 2nd node of the system. On
comparable x86 systems there is one memory controller per node. The
ghes driver should also group DIMMs depending on the topology and
create one MC per node.

There are several options to detect the topology. ARM64 systems
retrieve the (NUMA) node information from the ACPI SRAT table (see
acpi_table_parse_srat()). The node id is later stored in the physical
address page. The pfn_to_nid() macro could be used for a DIMM after
determining its physical address. The drawback of this approach is
that there are too many subsystems involved it depends on. It could
easily break and makes the implementation complex. E.g. pfn_to_nid()
can only be reliable used on mapped address ranges which is not always
granted, there are various firmware instances involved which could be
broken, or results may vary depending on NUMA settings.

Another approach that was suggested by James' is to use the DIMM's
physical memory array handle to group DIMMs [1]. The advantage is to
only use the information on memory devices from the SMBIOS table that
contains a reference to the physical memory array it belongs too. This
information is mandatory same as the use of DIMM handle references by
GHES to provide the DIMM location of an error. There is only a single
table to parse which eases implementation. This patch uses this
approach for DIMM grouping.

Modify the DMI decoder to also detect the physical memory array a DIMM
is linked to and create one memory controller per array to group
DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system
shows one MC per node now:

 # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
 /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
 /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
 /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
 /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
 /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
 /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
 /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
 /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
 /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0

[1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com

Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++---------
 1 file changed, 107 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 64220397296e..35b38cccc6da 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -125,12 +125,44 @@ static void ghes_dimm_release(struct list_head *dimms)
 	list_splice(dimms, &ghes_dimm_pool);
 }
 
-static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
+struct ghes_mci_fill {
+	unsigned long *map;
+	int index;
+	int count;
+	int num_mc;
+	int num_dimm;
+	u16 handle;
+};
+
+static void ghes_edac_dmidecode_mci(const struct dmi_header *dh, void *arg)
 {
-	int *num_dimm = arg;
+	struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
+	struct ghes_mci_fill *mci_fill = arg;
+
+	if (dh->type != DMI_ENTRY_MEM_DEVICE)
+		return;
+
+	/* First run, no mapping, just count. */
+	if (!mci_fill->map) {
+		mci_fill->count++;
+		return;
+	}
+
+	if (mci_fill->index >= mci_fill->count)
+		goto out;
 
-	if (dh->type == DMI_ENTRY_MEM_DEVICE)
-		(*num_dimm)++;
+	if (test_bit(mci_fill->index, mci_fill->map))
+		goto out;
+
+	if (!mci_fill->num_dimm)
+		mci_fill->handle = entry->phys_mem_array_handle;
+	else if (mci_fill->handle != entry->phys_mem_array_handle)
+		goto out;
+
+	set_bit(mci_fill->index, mci_fill->map);
+	mci_fill->num_dimm++;
+out:
+	mci_fill->index++;
 }
 
 /*
@@ -181,17 +213,29 @@ struct ghes_dimm_fill {
 	struct list_head dimms;
 	struct mem_ctl_info *mci;
 	int index;
+	u16 link;
 };
 
-static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
+static void ghes_edac_dmidecode_dimm(const struct dmi_header *dh, void *arg)
 {
 	struct ghes_dimm_fill *dimm_fill = arg;
 	struct mem_ctl_info *mci = dimm_fill->mci;
+	struct memdev_dmi_entry *entry;
+	struct ghes_dimm *ghes_dimm;
+	struct dimm_info *dimm;
 
 	if (dh->type == DMI_ENTRY_MEM_DEVICE) {
-		struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
-		struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->index, 0, 0);
-		struct ghes_dimm *ghes_dimm;
+		entry = (struct memdev_dmi_entry *)dh;
+		if (entry->phys_mem_array_handle != dimm_fill->link)
+			return;
+
+		/*
+		 * Always returns non-zero as the mci should have
+		 * allocated the correct number of DIMMs.
+		 */
+		dimm = edac_get_dimm_by_index(mci, dimm_fill->index);
+		if (WARN_ON_ONCE(!dimm))
+			return;
 
 		ghes_dimm = ghes_dimm_alloc(dimm, entry->handle);
 		if (ghes_dimm)
@@ -605,29 +649,35 @@ static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
 static void ghes_mc_free(void)
 {
 	struct ghes_dimm *ghes_dimm, *tmp;
-	struct mem_ctl_info *mci = NULL;
+	struct mem_ctl_info *mci;
 	LIST_HEAD(dimm_list);
 	unsigned long flags;
 
-	spin_lock_irqsave(&ghes_lock, flags);
+	while (1) {
+		mci = NULL;
 
-	list_for_each_entry_safe(ghes_dimm, tmp, &ghes_dimm_list, entry) {
-		mci = mci ?: ghes_dimm->dimm->mci;
-		WARN_ON_ONCE(mci != ghes_dimm->dimm->mci);
-		list_move_tail(&ghes_dimm->entry, &dimm_list);
-	}
+		spin_lock_irqsave(&ghes_lock, flags);
 
-	WARN_ON_ONCE(!list_empty(&ghes_dimm_list));
+		list_for_each_entry_safe(ghes_dimm, tmp, &ghes_dimm_list, entry) {
+			mci = mci ?: ghes_dimm->dimm->mci;
+			if (mci != ghes_dimm->dimm->mci)
+				continue;
+			list_move_tail(&ghes_dimm->entry, &dimm_list);
+		}
 
-	spin_unlock_irqrestore(&ghes_lock, flags);
+		WARN_ON_ONCE(!mci && !list_empty(&ghes_dimm_list));
 
-	ghes_dimm_release(&dimm_list);
+		spin_unlock_irqrestore(&ghes_lock, flags);
 
-	if (!mci)
-		return;
+		ghes_dimm_release(&dimm_list);
+
+		if (!mci)
+			return;
+
+		mci = edac_mc_del_mc(mci->pdev);
+		if (!mci)
+			continue;
 
-	mci = edac_mc_del_mc(mci->pdev);
-	if (mci) {
 		platform_device_unregister(to_platform_device(mci->pdev));
 		edac_mc_free(mci);
 	}
@@ -661,7 +711,8 @@ static int ghes_edac_register_fake(struct device *dev)
 	return ghes_mc_add_or_free(mci, &dimm_list);
 }
 
-static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
+static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm,
+				u16 handle)
 {
 	struct ghes_dimm_fill dimm_fill;
 	struct mem_ctl_info *mci;
@@ -672,16 +723,18 @@ static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
 
 	dimm_fill.index = 0;
 	dimm_fill.mci = mci;
+	dimm_fill.link = handle;
 	INIT_LIST_HEAD(&dimm_fill.dimms);
 
-	dmi_walk(ghes_edac_dmidecode, &dimm_fill);
+	dmi_walk(ghes_edac_dmidecode_dimm, &dimm_fill);
 
 	return ghes_mc_add_or_free(mci, &dimm_fill.dimms);
 }
 
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
-	int rc = 0, num_dimm = 0;
+	struct ghes_mci_fill mci_fill = { };
+	int rc = 0;
 	int idx;
 
 	if (IS_ENABLED(CONFIG_X86)) {
@@ -703,13 +756,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		goto unlock;
 
 	/* Get the number of DIMMs */
-	dmi_walk(ghes_edac_count_dimms, &num_dimm);
+	dmi_walk(ghes_edac_dmidecode_mci, &mci_fill);
 
-	rc = ghes_dimm_init(num_dimm ?: 1);
+	rc = ghes_dimm_init(mci_fill.count ?: 1);
 	if (rc)
 		goto unlock;
 
-	if (!num_dimm) {
+	if (!mci_fill.count) {
 		/*
 		 * Bogus BIOS: Ignore DMI topology and use a single MC
 		 * with only one DIMM for the whole address range to
@@ -732,10 +785,34 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
 		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
 		pr_info("to correct its BIOS.\n");
-		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+		pr_info("This system has %d DIMM sockets.\n", mci_fill.count);
 	}
 
-	rc = ghes_edac_register_one(dev, 0, num_dimm);
+	mci_fill.map = kcalloc(BITS_TO_LONGS(mci_fill.count),
+			sizeof(*mci_fill.map), GFP_KERNEL);
+
+	if (!mci_fill.map) {
+		rc = -ENOMEM;
+		goto unlock;
+	}
+
+	while (1) {
+		dmi_walk(ghes_edac_dmidecode_mci, &mci_fill);
+		if (!mci_fill.num_dimm)
+			break;
+
+		rc = ghes_edac_register_one(dev, mci_fill.num_mc,
+					mci_fill.num_dimm, mci_fill.handle);
+		if (rc)
+			break;
+
+		mci_fill.index    = 0;
+		mci_fill.num_dimm = 0;
+		mci_fill.num_mc++;
+	}
+
+	kfree(mci_fill.map);
+
 	if (rc)
 		goto unlock;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting
  2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
                   ` (10 preceding siblings ...)
  2020-03-06 15:13 ` [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array Robert Richter
@ 2020-03-10 20:18 ` Aristeu Rozanski
  11 siblings, 0 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2020-03-10 20:18 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck, James Morse,
	linux-edac, linux-kernel

On Fri, Mar 06, 2020 at 04:13:07PM +0100, Robert Richter wrote:
> This series contains a significant cleanup and rework of the ghes
> driver and improves the memory reporting as follows:
> 
>  * fix of DIMM label in error reports (patch #2),
> 
>  * creation of multiple memory controllers to group DIMMs depending on
>    the physical memory array (patches #9-#11). This should reflect the
>    memory topology of a system in sysfs. Esp. multi-node systems show
>    up with one memory controller per node now.
> 
> The changes base on the remaining patches that are a general cleanup
> and rework:
> 
>  * small change to edac_mc, not really dependent on the rest of the
>    series (patch #1),
> 
>  * general cleanup and rework of the ghes driver (patches #3-#8).
> 
> The implementation of multiple memory controllers bases on the
> suggestion from James (see patch #11), thank you James for your
> valuable input here. The patches are created newly from scratch and
> obsolete the GHES part of my previous postings a while ago that have
> not been accepted upstream:
> 
>  https://lore.kernel.org/patchwork/cover/1093488/
> 
> Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.

Acked-by: Aristeu Rozanski <aris@redhat.com>

-- 
Aristeu


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/11] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill
  2020-03-06 15:13 ` [PATCH 05/11] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill Robert Richter
@ 2020-03-16  9:29   ` Borislav Petkov
  0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2020-03-16  9:29 UTC (permalink / raw)
  To: Robert Richter
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel

On Fri, Mar 06, 2020 at 04:13:12PM +0100, Robert Richter wrote:
> The struct is used to store temporary data for the dmidecode callback.
> Clean this up a bit:
> 
>  1) Rename it to something shorter and more reasonable.
> 
>  2) Rename member count to index since this is what it is used for.
> 
>  3) Move code close to ghes_edac_dmidecode() where it is used.
> 
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> ---
>  drivers/edac/ghes_edac.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 438972dfea09..358519e8c2e9 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -72,11 +72,6 @@ struct memdev_dmi_entry {
>  	u16 conf_mem_clk_speed;
>  } __attribute__((__packed__));
>  
> -struct ghes_edac_dimm_fill {
> -	struct mem_ctl_info *mci;
> -	unsigned int count;
> -};
> -
>  static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
>  {
>  	int *num_dimm = arg;
> @@ -113,18 +108,23 @@ static void ghes_dimm_setup_label(struct dimm_info *dimm, u16 handle)
>  			"unknown memory (handle: 0x%.4x)", handle);
>  }
>  
> +struct ghes_dimm_fill {

Locally used only so it can be just "dimm_fill" since you're shortening
it.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] EDAC/ghes: Carve out MC device handling into separate functions
  2020-03-06 15:13 ` [PATCH 06/11] EDAC/ghes: Carve out MC device handling into separate functions Robert Richter
@ 2020-03-16  9:31   ` Borislav Petkov
  2020-03-16 12:12     ` Robert Richter
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2020-03-16  9:31 UTC (permalink / raw)
  To: Robert Richter
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel

On Fri, Mar 06, 2020 at 04:13:13PM +0100, Robert Richter wrote:
> The functions are too long, carve out code that handles MC devices
> into the new functions ghes_mc_create(), ghes_mc_add_or_free() and
> ghes_mc_free(). Apart from better code readability the functions can
> be reused and the implementation of the error paths becomes easier.
> 
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> ---
>  drivers/edac/ghes_edac.c | 133 +++++++++++++++++++++++----------------
>  1 file changed, 79 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 358519e8c2e9..5a4c9694bbff 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -462,16 +462,81 @@ static struct acpi_platform_list plat_list[] = {
>  	{ } /* End */
>  };
>  
> -int ghes_edac_register(struct ghes *ghes, struct device *dev)
> +static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
> +					int num_dimm)
>  {
> -	bool fake = false;
> -	int rc = 0, num_dimm = 0;
> +	struct edac_mc_layer layers[1];
>  	struct mem_ctl_info *mci;
>  	struct ghes_mci *pvt;
> -	struct edac_mc_layer layers[1];
> -	struct ghes_dimm_fill dimm_fill;
> +
> +	layers[0].type = EDAC_MC_LAYER_ALL_MEM;
> +	layers[0].size = num_dimm;
> +	layers[0].is_virt_csrow = true;
> +
> +	mci = edac_mc_alloc(mc_idx, ARRAY_SIZE(layers), layers, sizeof(*pvt));
> +	if (!mci)
> +		return NULL;
> +
> +	pvt		= mci->pvt_info;
> +	pvt->mci	= mci;
> +
> +	mci->pdev = dev;
> +	mci->mtype_cap = MEM_FLAG_EMPTY;
> +	mci->edac_ctl_cap = EDAC_FLAG_NONE;
> +	mci->edac_cap = EDAC_FLAG_NONE;
> +	mci->mod_name = "ghes_edac.c";
> +	mci->ctl_name = "ghes_edac";
> +	mci->dev_name = "ghes";
> +
> +	return mci;
> +}
> +
> +static int ghes_mc_add_or_free(struct mem_ctl_info *mci)

ghes_mc_add() is good enough. The fact that the function has error
handling doesn't need to be in the name.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles
  2020-03-06 15:13 ` [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles Robert Richter
@ 2020-03-16  9:40   ` Borislav Petkov
  0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2020-03-16  9:40 UTC (permalink / raw)
  To: Robert Richter
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel

On Fri, Mar 06, 2020 at 04:13:16PM +0100, Robert Richter wrote:
> GHES uses SMBIOS handles to identify the DIMM that was causing a hw
> error. Currently this is stored in smbios_handle of struct dimm_info.
> This implementation has several drawbacks. The information is private
> to the ghes driver, but struct dimm_info is for general use. The DIMMs
> are tied to a *mci struct, which makes a lockup inefficient. It is

"lookup"

> hard to dynamically allocate DIMMs and also to meet locking
> constraints when adding or removing them. This becomes even more
> challenging when having multiple MC instances that group a set of
> DIMMs, so the change is needed also to later support multiple MC
> instances.

Err, I don't understand: normally a bunch of DIMMs belong to a memory
controller and that gives you the hierarchy automatically. Why is that a
problem?

And normally you allocate the DIMM representations on MC init and
deallocate them when removing the MC.

So what is the problem here?

...

...
> @@ -72,6 +79,52 @@ struct memdev_dmi_entry {
>  	u16 conf_mem_clk_speed;
>  } __attribute__((__packed__));
>  
> +/* ghes_reg_mutex must be held. */

lockdep_assert_held()

...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/11] EDAC/ghes: Create an own device for each mci
  2020-03-06 15:13 ` [PATCH 10/11] EDAC/ghes: Create an own device for each mci Robert Richter
@ 2020-03-16  9:45   ` Borislav Petkov
  0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2020-03-16  9:45 UTC (permalink / raw)
  To: Robert Richter
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel

On Fri, Mar 06, 2020 at 04:13:17PM +0100, Robert Richter wrote:
> Each edac mc must have a unique device for identification (see
> add_mc_to_global_list()). This 1:1 mapping between parent device and
> mci is a limitation for supporting multiple instances created by the
> ghes driver. Solve this by creating an own device in between of the
> ghes parent and the mci struct, this allows a 1:n mapping between
> both.
> 
> Implement this using a platform device as this is the least possible
> effort to create and free a device. It shows up nicely in sysfs:
> 
>  # find /sys/ -name ghes_mc*
>  /sys/devices/platform/GHES.0/ghes_mc.1
>  /sys/devices/platform/GHES.0/ghes_mc.0
>  /sys/bus/platform/devices/ghes_mc.1
>  /sys/bus/platform/devices/ghes_mc.0
> 
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> ---
>  drivers/edac/ghes_edac.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index cd61b8ae49f6..64220397296e 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -539,6 +539,8 @@ static struct acpi_platform_list plat_list[] = {
>  static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
>  					int num_dimm)
>  {
> +	struct platform_device_info pdevinfo;
> +	struct platform_device *pdev;
>  	struct edac_mc_layer layers[1];
>  	struct mem_ctl_info *mci;
>  	struct ghes_mci *pvt;
> @@ -547,13 +549,23 @@ static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
>  	layers[0].size = num_dimm;
>  	layers[0].is_virt_csrow = true;
>  
> +	pdevinfo	= (struct platform_device_info){
> +		.parent	= dev,
> +		.name	= "ghes_mc",
> +		.id	= mc_idx,
> +	};

You can statically allocate that one once at the top of the file and
assign ->parent and ->id each time before calling the function below.

> +
> +	pdev = platform_device_register_full(&pdevinfo);
> +	if (IS_ERR(pdev))
> +		goto fail;
> +

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
  2020-03-06 15:13 ` [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array Robert Richter
@ 2020-03-16  9:51   ` Borislav Petkov
  2020-03-17 16:34     ` John Garry
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2020-03-16  9:51 UTC (permalink / raw)
  To: Robert Richter
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel, Toshi Kani, John Garry

On Fri, Mar 06, 2020 at 04:13:18PM +0100, Robert Richter wrote:
> The ghes driver only creates one memory controller for the whole
> system. This does not reflect memory topology especially in multi-node
> systems. E.g. a Marvell ThunderX2 system shows:
> 
>  /sys/devices/system/edac/mc/mc0/dimm0
>  /sys/devices/system/edac/mc/mc0/dimm1
>  /sys/devices/system/edac/mc/mc0/dimm2
>  /sys/devices/system/edac/mc/mc0/dimm3
>  /sys/devices/system/edac/mc/mc0/dimm4
>  /sys/devices/system/edac/mc/mc0/dimm5
>  /sys/devices/system/edac/mc/mc0/dimm6
>  /sys/devices/system/edac/mc/mc0/dimm7
>  /sys/devices/system/edac/mc/mc0/dimm8
>  /sys/devices/system/edac/mc/mc0/dimm9
>  /sys/devices/system/edac/mc/mc0/dimm10
>  /sys/devices/system/edac/mc/mc0/dimm11
>  /sys/devices/system/edac/mc/mc0/dimm12
>  /sys/devices/system/edac/mc/mc0/dimm13
>  /sys/devices/system/edac/mc/mc0/dimm14
>  /sys/devices/system/edac/mc/mc0/dimm15
> 
> The DIMMs 9-15 are located on the 2nd node of the system. On
> comparable x86 systems there is one memory controller per node. The
> ghes driver should also group DIMMs depending on the topology and
> create one MC per node.
> 
> There are several options to detect the topology. ARM64 systems
> retrieve the (NUMA) node information from the ACPI SRAT table (see
> acpi_table_parse_srat()). The node id is later stored in the physical
> address page. The pfn_to_nid() macro could be used for a DIMM after
> determining its physical address. The drawback of this approach is
> that there are too many subsystems involved it depends on. It could
> easily break and makes the implementation complex. E.g. pfn_to_nid()
> can only be reliable used on mapped address ranges which is not always
> granted, there are various firmware instances involved which could be
> broken, or results may vary depending on NUMA settings.
> 
> Another approach that was suggested by James' is to use the DIMM's
> physical memory array handle to group DIMMs [1]. The advantage is to
> only use the information on memory devices from the SMBIOS table that
> contains a reference to the physical memory array it belongs too. This
> information is mandatory same as the use of DIMM handle references by
> GHES to provide the DIMM location of an error. There is only a single
> table to parse which eases implementation. This patch uses this
> approach for DIMM grouping.
> 
> Modify the DMI decoder to also detect the physical memory array a DIMM
> is linked to and create one memory controller per array to group
> DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system
> shows one MC per node now:
> 
>  # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
>  /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
>  /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
>  /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
>  /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
>  /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
>  /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
>  /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
>  /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
>  /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
>  /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
>  /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
>  /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
>  /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
>  /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
>  /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
>  /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0
> 
> [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com
> 
> Suggested-by: James Morse <james.morse@arm.com>
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> ---
>  drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++---------
>  1 file changed, 107 insertions(+), 30 deletions(-)

This is all fine and good but that change affects the one x86 platform
we support so the whole patchset should be tested there too. Adding
Toshi.

As a matter of fact, the final version of this set should be tested on
all platforms which are using this thing. Adding John Garry too who
reported issues with this driver recently on his platform.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] EDAC/ghes: Carve out MC device handling into separate functions
  2020-03-16  9:31   ` Borislav Petkov
@ 2020-03-16 12:12     ` Robert Richter
  0 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2020-03-16 12:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel

On 16.03.20 10:31:34, Borislav Petkov wrote:
> On Fri, Mar 06, 2020 at 04:13:13PM +0100, Robert Richter wrote:
> > The functions are too long, carve out code that handles MC devices
> > into the new functions ghes_mc_create(), ghes_mc_add_or_free() and
> > ghes_mc_free(). Apart from better code readability the functions can
> > be reused and the implementation of the error paths becomes easier.
> > 
> > Signed-off-by: Robert Richter <rrichter@marvell.com>
> > ---
> >  drivers/edac/ghes_edac.c | 133 +++++++++++++++++++++++----------------
> >  1 file changed, 79 insertions(+), 54 deletions(-)
> > 
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index 358519e8c2e9..5a4c9694bbff 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -462,16 +462,81 @@ static struct acpi_platform_list plat_list[] = {
> >  	{ } /* End */
> >  };
> >  
> > -int ghes_edac_register(struct ghes *ghes, struct device *dev)
> > +static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
> > +					int num_dimm)
> >  {
> > -	bool fake = false;
> > -	int rc = 0, num_dimm = 0;
> > +	struct edac_mc_layer layers[1];
> >  	struct mem_ctl_info *mci;
> >  	struct ghes_mci *pvt;
> > -	struct edac_mc_layer layers[1];
> > -	struct ghes_dimm_fill dimm_fill;
> > +
> > +	layers[0].type = EDAC_MC_LAYER_ALL_MEM;
> > +	layers[0].size = num_dimm;
> > +	layers[0].is_virt_csrow = true;
> > +
> > +	mci = edac_mc_alloc(mc_idx, ARRAY_SIZE(layers), layers, sizeof(*pvt));
> > +	if (!mci)
> > +		return NULL;
> > +
> > +	pvt		= mci->pvt_info;
> > +	pvt->mci	= mci;
> > +
> > +	mci->pdev = dev;
> > +	mci->mtype_cap = MEM_FLAG_EMPTY;
> > +	mci->edac_ctl_cap = EDAC_FLAG_NONE;
> > +	mci->edac_cap = EDAC_FLAG_NONE;
> > +	mci->mod_name = "ghes_edac.c";
> > +	mci->ctl_name = "ghes_edac";
> > +	mci->dev_name = "ghes";
> > +
> > +	return mci;
> > +}
> > +
> > +static int ghes_mc_add_or_free(struct mem_ctl_info *mci)
> 
> ghes_mc_add() is good enough. The fact that the function has error
> handling doesn't need to be in the name.

It's not just error handling here. I choose the name as the mci is
freed if it could not be added, which is different to just return an
error if it fails. Otherwise the flow would be something like:

	mci = ghes_mc_create(...);
	rc = ghes_mc_add(mci);
	if (rc) {
		ghes_mc_free(mci);
		/* do something */
	}

But we do not need mci any longer on error, so we can free it directly
which makes the code much easier, but in fact it does 2 things at a
time then:

	mci = ghes_mc_create(...);
	rc = ghes_mc_add_or_free(mci);
	if (rc)
        	/* do something */

I would rather prefer ghes_mc_add_or_free().

-Robert



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
  2020-03-16  9:51   ` Borislav Petkov
@ 2020-03-17 16:34     ` John Garry
  2020-03-17 22:14       ` Kani, Toshi
  2020-03-24 11:32       ` Xiaofei Tan
  0 siblings, 2 replies; 27+ messages in thread
From: John Garry @ 2020-03-17 16:34 UTC (permalink / raw)
  To: Borislav Petkov, Robert Richter
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel, Toshi Kani, Xiaofei Tan, Shiju Jose

On 16/03/2020 09:51, Borislav Petkov wrote:
> On Fri, Mar 06, 2020 at 04:13:18PM +0100, Robert Richter wrote:
>> The ghes driver only creates one memory controller for the whole
>> system. This does not reflect memory topology especially in multi-node
>> systems. E.g. a Marvell ThunderX2 system shows:
>>
>>   /sys/devices/system/edac/mc/mc0/dimm0
>>   /sys/devices/system/edac/mc/mc0/dimm1
>>   /sys/devices/system/edac/mc/mc0/dimm2
>>   /sys/devices/system/edac/mc/mc0/dimm3
>>   /sys/devices/system/edac/mc/mc0/dimm4
>>   /sys/devices/system/edac/mc/mc0/dimm5
>>   /sys/devices/system/edac/mc/mc0/dimm6
>>   /sys/devices/system/edac/mc/mc0/dimm7
>>   /sys/devices/system/edac/mc/mc0/dimm8
>>   /sys/devices/system/edac/mc/mc0/dimm9
>>   /sys/devices/system/edac/mc/mc0/dimm10
>>   /sys/devices/system/edac/mc/mc0/dimm11
>>   /sys/devices/system/edac/mc/mc0/dimm12
>>   /sys/devices/system/edac/mc/mc0/dimm13
>>   /sys/devices/system/edac/mc/mc0/dimm14
>>   /sys/devices/system/edac/mc/mc0/dimm15
>>
>> The DIMMs 9-15 are located on the 2nd node of the system. On
>> comparable x86 systems there is one memory controller per node. The
>> ghes driver should also group DIMMs depending on the topology and
>> create one MC per node.
>>
>> There are several options to detect the topology. ARM64 systems
>> retrieve the (NUMA) node information from the ACPI SRAT table (see
>> acpi_table_parse_srat()). The node id is later stored in the physical
>> address page. The pfn_to_nid() macro could be used for a DIMM after
>> determining its physical address. The drawback of this approach is
>> that there are too many subsystems involved it depends on. It could
>> easily break and makes the implementation complex. E.g. pfn_to_nid()
>> can only be reliable used on mapped address ranges which is not always
>> granted, there are various firmware instances involved which could be
>> broken, or results may vary depending on NUMA settings.
>>
>> Another approach that was suggested by James' is to use the DIMM's
>> physical memory array handle to group DIMMs [1]. The advantage is to
>> only use the information on memory devices from the SMBIOS table that
>> contains a reference to the physical memory array it belongs too. This
>> information is mandatory same as the use of DIMM handle references by
>> GHES to provide the DIMM location of an error. There is only a single
>> table to parse which eases implementation. This patch uses this
>> approach for DIMM grouping.
>>
>> Modify the DMI decoder to also detect the physical memory array a DIMM
>> is linked to and create one memory controller per array to group
>> DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system
>> shows one MC per node now:
>>
>>   # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
>>   /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
>>   /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
>>   /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
>>   /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
>>   /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
>>   /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
>>   /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
>>   /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
>>   /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
>>   /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
>>   /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
>>   /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
>>   /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
>>   /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
>>   /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
>>   /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0
>>
>> [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com
>>
>> Suggested-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Robert Richter <rrichter@marvell.com>
>> ---
>>   drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++---------
>>   1 file changed, 107 insertions(+), 30 deletions(-)
> 
> This is all fine and good but that change affects the one x86 platform
> we support so the whole patchset should be tested there too. Adding
> Toshi.
> 
> As a matter of fact, the final version of this set should be tested on
> all platforms which are using this thing. Adding John Garry too who
> reported issues with this driver recently on his platform.

Adding other RAS-centric guys for H.

Cheers,
John

> 
> Thx.
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
  2020-03-17 16:34     ` John Garry
@ 2020-03-17 22:14       ` Kani, Toshi
  2020-03-17 22:50         ` Borislav Petkov
  2020-03-24 11:32       ` Xiaofei Tan
  1 sibling, 1 reply; 27+ messages in thread
From: Kani, Toshi @ 2020-03-17 22:14 UTC (permalink / raw)
  To: John Garry, Borislav Petkov, Robert Richter
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel, Xiaofei Tan, Shiju Jose

> > This is all fine and good but that change affects the one x86 platform
> > we support so the whole patchset should be tested there too. Adding
> > Toshi.

Thanks for the heads-up.  I do not have an access to a test system at
the moment, but am checking my colleague to do the test.

-Toshi



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
  2020-03-17 22:14       ` Kani, Toshi
@ 2020-03-17 22:50         ` Borislav Petkov
  2020-03-17 22:53           ` Kani, Toshi
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2020-03-17 22:50 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: John Garry, Robert Richter, Mauro Carvalho Chehab, Tony Luck,
	James Morse, Aristeu Rozanski, linux-edac, linux-kernel,
	Xiaofei Tan, Shiju Jose

On Tue, Mar 17, 2020 at 10:14:46PM +0000, Kani, Toshi wrote:
> > > This is all fine and good but that change affects the one x86 platform
> > > we support so the whole patchset should be tested there too. Adding
> > > Toshi.
> 
> Thanks for the heads-up.  I do not have an access to a test system at
> the moment, but am checking my colleague to do the test.

Thx but hold that off for now - Robert is reworking the set.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
  2020-03-17 22:50         ` Borislav Petkov
@ 2020-03-17 22:53           ` Kani, Toshi
  2020-03-18  0:10             ` Robert Richter
  0 siblings, 1 reply; 27+ messages in thread
From: Kani, Toshi @ 2020-03-17 22:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: John Garry, Robert Richter, Mauro Carvalho Chehab, Tony Luck,
	James Morse, Aristeu Rozanski, linux-edac, linux-kernel,
	Xiaofei Tan, Shiju Jose

> On Tue, Mar 17, 2020 at 10:14:46PM +0000, Kani, Toshi wrote:
> > > > This is all fine and good but that change affects the one x86
> > > > platform we support so the whole patchset should be tested there
> > > > too. Adding Toshi.
> >
> > Thanks for the heads-up.  I do not have an access to a test system at
> > the moment, but am checking my colleague to do the test.
> 
> Thx but hold that off for now - Robert is reworking the set.

Got it. Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
  2020-03-17 22:53           ` Kani, Toshi
@ 2020-03-18  0:10             ` Robert Richter
  0 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2020-03-18  0:10 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: Borislav Petkov, John Garry, Mauro Carvalho Chehab, Tony Luck,
	James Morse, Aristeu Rozanski, linux-edac, linux-kernel,
	Xiaofei Tan, Shiju Jose

On 17.03.20 22:53:09, Kani, Toshi wrote:
> > On Tue, Mar 17, 2020 at 10:14:46PM +0000, Kani, Toshi wrote:
> > > > > This is all fine and good but that change affects the one x86
> > > > > platform we support so the whole patchset should be tested there
> > > > > too. Adding Toshi.
> > >
> > > Thanks for the heads-up.  I do not have an access to a test system at
> > > the moment, but am checking my colleague to do the test.
> > 
> > Thx but hold that off for now - Robert is reworking the set.

It would still be good to have a test run as the general concept wont
change and we will then early see potential issues, especially wrt
SMBIOS/DMI.

It would esp. be interesting to see how the node mapping reflects the
memory topology:

# grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
/sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
/sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
/sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
/sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
/sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
/sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
/sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
/sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
/sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
/sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
/sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
/sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
/sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
/sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
/sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0


Thanks,

-Robert

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
  2020-03-17 16:34     ` John Garry
  2020-03-17 22:14       ` Kani, Toshi
@ 2020-03-24 11:32       ` Xiaofei Tan
  1 sibling, 0 replies; 27+ messages in thread
From: Xiaofei Tan @ 2020-03-24 11:32 UTC (permalink / raw)
  To: John Garry, Borislav Petkov, Robert Richter
  Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
	linux-edac, linux-kernel, Toshi Kani, Shiju Jose


On 2020/3/18 0:34, John Garry wrote:
> On 16/03/2020 09:51, Borislav Petkov wrote:
>> On Fri, Mar 06, 2020 at 04:13:18PM +0100, Robert Richter wrote:
>>> The ghes driver only creates one memory controller for the whole
>>> system. This does not reflect memory topology especially in multi-node
>>> systems. E.g. a Marvell ThunderX2 system shows:
>>>
>>>   /sys/devices/system/edac/mc/mc0/dimm0
>>>   /sys/devices/system/edac/mc/mc0/dimm1
>>>   /sys/devices/system/edac/mc/mc0/dimm2
>>>   /sys/devices/system/edac/mc/mc0/dimm3
>>>   /sys/devices/system/edac/mc/mc0/dimm4
>>>   /sys/devices/system/edac/mc/mc0/dimm5
>>>   /sys/devices/system/edac/mc/mc0/dimm6
>>>   /sys/devices/system/edac/mc/mc0/dimm7
>>>   /sys/devices/system/edac/mc/mc0/dimm8
>>>   /sys/devices/system/edac/mc/mc0/dimm9
>>>   /sys/devices/system/edac/mc/mc0/dimm10
>>>   /sys/devices/system/edac/mc/mc0/dimm11
>>>   /sys/devices/system/edac/mc/mc0/dimm12
>>>   /sys/devices/system/edac/mc/mc0/dimm13
>>>   /sys/devices/system/edac/mc/mc0/dimm14
>>>   /sys/devices/system/edac/mc/mc0/dimm15
>>>
>>> The DIMMs 9-15 are located on the 2nd node of the system. On
>>> comparable x86 systems there is one memory controller per node. The
>>> ghes driver should also group DIMMs depending on the topology and
>>> create one MC per node.
>>>
>>> There are several options to detect the topology. ARM64 systems
>>> retrieve the (NUMA) node information from the ACPI SRAT table (see
>>> acpi_table_parse_srat()). The node id is later stored in the physical
>>> address page. The pfn_to_nid() macro could be used for a DIMM after
>>> determining its physical address. The drawback of this approach is
>>> that there are too many subsystems involved it depends on. It could
>>> easily break and makes the implementation complex. E.g. pfn_to_nid()
>>> can only be reliable used on mapped address ranges which is not always
>>> granted, there are various firmware instances involved which could be
>>> broken, or results may vary depending on NUMA settings.
>>>
>>> Another approach that was suggested by James' is to use the DIMM's
>>> physical memory array handle to group DIMMs [1]. The advantage is to
>>> only use the information on memory devices from the SMBIOS table that
>>> contains a reference to the physical memory array it belongs too. This
>>> information is mandatory same as the use of DIMM handle references by
>>> GHES to provide the DIMM location of an error. There is only a single
>>> table to parse which eases implementation. This patch uses this
>>> approach for DIMM grouping.
>>>
>>> Modify the DMI decoder to also detect the physical memory array a DIMM
>>> is linked to and create one memory controller per array to group
>>> DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system
>>> shows one MC per node now:
>>>
>>>   # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
>>>   /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
>>>   /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
>>>   /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
>>>   /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
>>>   /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
>>>   /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
>>>   /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
>>>   /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
>>>   /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
>>>   /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
>>>   /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
>>>   /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
>>>   /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
>>>   /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
>>>   /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
>>>   /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0
>>>
>>> [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com
>>>
>>> Suggested-by: James Morse <james.morse@arm.com>
>>> Signed-off-by: Robert Richter <rrichter@marvell.com>
>>> ---
>>>   drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++---------
>>>   1 file changed, 107 insertions(+), 30 deletions(-)
>>
>> This is all fine and good but that change affects the one x86 platform
>> we support so the whole patchset should be tested there too. Adding
>> Toshi.
>>
>> As a matter of fact, the final version of this set should be tested on
>> all platforms which are using this thing. Adding John Garry too who
>> reported issues with this driver recently on his platform.
> 
> Adding other RAS-centric guys for H.

Hi John & Borislav & Robert
I have tested this patch set on our platform. Only one memory controller found when there is one DIMM on
each socket or node. Just like this:
estuary:/$ grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:SOCKET 0 CHANNEL 0 DIMM 0 DIMM0
/sys/devices/system/edac/mc/mc0/dimm20/dimm_label:SOCKET 1 CHANNEL 2 DIMM 0 DIMM1

It is not the problem of the patch set. Because our BIOS only defined one "Physical Memory Array Handle" in DMI table.
Just like this:
estuary:/$ dmidecode -t memory | grep "Array Handle"
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000
        Array Handle: 0x0000

BTW, i also test other function of edac driver our platform used. They're all good. :)
> 
> Cheers,
> John
> 
>>
>> Thx.
>>
> 
> 
> .
> 

-- 
 thanks
tanxiaofei


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc()
  2020-03-06 15:13 ` [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc() Robert Richter
@ 2020-04-06 11:38   ` Matthias Brugger
  0 siblings, 0 replies; 27+ messages in thread
From: Matthias Brugger @ 2020-04-06 11:38 UTC (permalink / raw)
  To: Robert Richter, Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, linux-edac, linux-kernel

On 06/03/2020 16:13, Robert Richter wrote:
> Most iterators use int type as index. mci->mc_idx is also type int. So
> use int type for parameters of edac_mc_alloc(). Extend the range check
> to check for negative values. There is a type cast now when assigning
> variable n_layers to mci->n_layer, it is safe due to the range check.
> 
> While at it, rename the users of edac_mc_alloc() to mc_idx as this
> fits better here.
> 
> Signed-off-by: Robert Richter <rrichter@marvell.com>

Reviewed-by: Matthias Brugger <mbrugger@suse.com>

> ---
>  drivers/edac/edac_mc.c | 7 +++----
>  drivers/edac/edac_mc.h | 6 +++---
>  2 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 75ede27bdf6a..8bd3d00b4385 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -444,8 +444,7 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
>  	return 0;
>  }
>  
> -struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> -				   unsigned int n_layers,
> +struct mem_ctl_info *edac_mc_alloc(int mc_idx, int n_layers,
>  				   struct edac_mc_layer *layers,
>  				   unsigned int sz_pvt)
>  {
> @@ -456,7 +455,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
>  	void *pvt, *ptr = NULL;
>  	bool per_rank = false;
>  
> -	if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
> +	if (WARN_ON(mc_idx < 0 || n_layers < 1 || n_layers > EDAC_MAX_LAYERS))
>  		return NULL;
>  
>  	/*
> @@ -505,7 +504,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
>  	pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
>  
>  	/* setup index and various internal pointers */
> -	mci->mc_idx = mc_num;
> +	mci->mc_idx = mc_idx;
>  	mci->tot_dimms = tot_dimms;
>  	mci->pvt_info = pvt;
>  	mci->n_layers = n_layers;
> diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
> index 881b00eadf7a..4815f50afea0 100644
> --- a/drivers/edac/edac_mc.h
> +++ b/drivers/edac/edac_mc.h
> @@ -98,7 +98,7 @@ do {									\
>  /**
>   * edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
>   *
> - * @mc_num:		Memory controller number
> + * @mc_idx:		Memory controller number
>   * @n_layers:		Number of MC hierarchy layers
>   * @layers:		Describes each layer as seen by the Memory Controller
>   * @sz_pvt:		size of private storage needed
> @@ -122,8 +122,8 @@ do {									\
>   *	On success, return a pointer to struct mem_ctl_info pointer;
>   *	%NULL otherwise
>   */
> -struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> -				   unsigned int n_layers,
> +struct mem_ctl_info *edac_mc_alloc(int mc_idx,
> +				   int n_layers,
>  				   struct edac_mc_layer *layers,
>  				   unsigned int sz_pvt);
>  
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/11] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()
  2020-03-06 15:13 ` [PATCH 03/11] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode() Robert Richter
@ 2020-04-06 11:51   ` Matthias Brugger
  0 siblings, 0 replies; 27+ messages in thread
From: Matthias Brugger @ 2020-04-06 11:51 UTC (permalink / raw)
  To: Robert Richter, Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
  Cc: James Morse, Aristeu Rozanski, linux-edac, linux-kernel

On 06/03/2020 16:13, Robert Richter wrote:
> The local variable rdr_mask serves as a static constant here. It hides
> what the code is doing. Remove it and replace it with the actual logic
> that checks some bits.
> 
> Signed-off-by: Robert Richter <rrichter@marvell.com>

Reviewed-by: Matthias Brugger <mbrugger@suse.com>

> ---
>  drivers/edac/ghes_edac.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 07fa3867cba1..fce53893731a 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -123,7 +123,6 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>  	if (dh->type == DMI_ENTRY_MEM_DEVICE) {
>  		struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
>  		struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 0, 0);
> -		u16 rdr_mask = BIT(7) | BIT(13);
>  
>  		if (entry->size == 0xffff) {
>  			pr_info("Can't get DIMM%i size\n",
> @@ -173,7 +172,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>  		default:
>  			if (entry->type_detail & BIT(6))
>  				dimm->mtype = MEM_RMBS;
> -			else if ((entry->type_detail & rdr_mask) == rdr_mask)
> +			else if ((entry->type_detail & BIT(7)) &&
> +				 (entry->type_detail & BIT(13)))
>  				dimm->mtype = MEM_RDR;
>  			else if (entry->type_detail & BIT(7))
>  				dimm->mtype = MEM_SDR;
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, back to index

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
2020-03-06 15:13 ` [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc() Robert Richter
2020-04-06 11:38   ` Matthias Brugger
2020-03-06 15:13 ` [PATCH 02/11] EDAC/ghes: Setup DIMM label from DMI and use it in error reports Robert Richter
2020-03-06 15:13 ` [PATCH 03/11] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode() Robert Richter
2020-04-06 11:51   ` Matthias Brugger
2020-03-06 15:13 ` [PATCH 04/11] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci Robert Richter
2020-03-06 15:13 ` [PATCH 05/11] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill Robert Richter
2020-03-16  9:29   ` Borislav Petkov
2020-03-06 15:13 ` [PATCH 06/11] EDAC/ghes: Carve out MC device handling into separate functions Robert Richter
2020-03-16  9:31   ` Borislav Petkov
2020-03-16 12:12     ` Robert Richter
2020-03-06 15:13 ` [PATCH 07/11] EDAC/ghes: Have a separate code path for creating the fake MC Robert Richter
2020-03-06 15:13 ` [PATCH 08/11] EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}() Robert Richter
2020-03-06 15:13 ` [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles Robert Richter
2020-03-16  9:40   ` Borislav Petkov
2020-03-06 15:13 ` [PATCH 10/11] EDAC/ghes: Create an own device for each mci Robert Richter
2020-03-16  9:45   ` Borislav Petkov
2020-03-06 15:13 ` [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array Robert Richter
2020-03-16  9:51   ` Borislav Petkov
2020-03-17 16:34     ` John Garry
2020-03-17 22:14       ` Kani, Toshi
2020-03-17 22:50         ` Borislav Petkov
2020-03-17 22:53           ` Kani, Toshi
2020-03-18  0:10             ` Robert Richter
2020-03-24 11:32       ` Xiaofei Tan
2020-03-10 20:18 ` [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Aristeu Rozanski

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git