Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
From: Robert Richter <rrichter@marvell.com>
To: Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Tony Luck <tony.luck@intel.com>
Cc: James Morse <james.morse@arm.com>,
	Aristeu Rozanski <aris@redhat.com>,
	Robert Richter <rrichter@marvell.com>,
	<linux-edac@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
Date: Fri,  6 Mar 2020 16:13:18 +0100
Message-ID: <20200306151318.17422-12-rrichter@marvell.com> (raw)
In-Reply-To: <20200306151318.17422-1-rrichter@marvell.com>

The ghes driver only creates one memory controller for the whole
system. This does not reflect memory topology especially in multi-node
systems. E.g. a Marvell ThunderX2 system shows:

 /sys/devices/system/edac/mc/mc0/dimm0
 /sys/devices/system/edac/mc/mc0/dimm1
 /sys/devices/system/edac/mc/mc0/dimm2
 /sys/devices/system/edac/mc/mc0/dimm3
 /sys/devices/system/edac/mc/mc0/dimm4
 /sys/devices/system/edac/mc/mc0/dimm5
 /sys/devices/system/edac/mc/mc0/dimm6
 /sys/devices/system/edac/mc/mc0/dimm7
 /sys/devices/system/edac/mc/mc0/dimm8
 /sys/devices/system/edac/mc/mc0/dimm9
 /sys/devices/system/edac/mc/mc0/dimm10
 /sys/devices/system/edac/mc/mc0/dimm11
 /sys/devices/system/edac/mc/mc0/dimm12
 /sys/devices/system/edac/mc/mc0/dimm13
 /sys/devices/system/edac/mc/mc0/dimm14
 /sys/devices/system/edac/mc/mc0/dimm15

The DIMMs 9-15 are located on the 2nd node of the system. On
comparable x86 systems there is one memory controller per node. The
ghes driver should also group DIMMs depending on the topology and
create one MC per node.

There are several options to detect the topology. ARM64 systems
retrieve the (NUMA) node information from the ACPI SRAT table (see
acpi_table_parse_srat()). The node id is later stored in the physical
address page. The pfn_to_nid() macro could be used for a DIMM after
determining its physical address. The drawback of this approach is
that there are too many subsystems involved it depends on. It could
easily break and makes the implementation complex. E.g. pfn_to_nid()
can only be reliable used on mapped address ranges which is not always
granted, there are various firmware instances involved which could be
broken, or results may vary depending on NUMA settings.

Another approach that was suggested by James' is to use the DIMM's
physical memory array handle to group DIMMs [1]. The advantage is to
only use the information on memory devices from the SMBIOS table that
contains a reference to the physical memory array it belongs too. This
information is mandatory same as the use of DIMM handle references by
GHES to provide the DIMM location of an error. There is only a single
table to parse which eases implementation. This patch uses this
approach for DIMM grouping.

Modify the DMI decoder to also detect the physical memory array a DIMM
is linked to and create one memory controller per array to group
DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system
shows one MC per node now:

 # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
 /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
 /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
 /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
 /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
 /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
 /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
 /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
 /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
 /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0

[1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com

Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
---
 drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++---------
 1 file changed, 107 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 64220397296e..35b38cccc6da 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -125,12 +125,44 @@ static void ghes_dimm_release(struct list_head *dimms)
 	list_splice(dimms, &ghes_dimm_pool);
 }
 
-static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
+struct ghes_mci_fill {
+	unsigned long *map;
+	int index;
+	int count;
+	int num_mc;
+	int num_dimm;
+	u16 handle;
+};
+
+static void ghes_edac_dmidecode_mci(const struct dmi_header *dh, void *arg)
 {
-	int *num_dimm = arg;
+	struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
+	struct ghes_mci_fill *mci_fill = arg;
+
+	if (dh->type != DMI_ENTRY_MEM_DEVICE)
+		return;
+
+	/* First run, no mapping, just count. */
+	if (!mci_fill->map) {
+		mci_fill->count++;
+		return;
+	}
+
+	if (mci_fill->index >= mci_fill->count)
+		goto out;
 
-	if (dh->type == DMI_ENTRY_MEM_DEVICE)
-		(*num_dimm)++;
+	if (test_bit(mci_fill->index, mci_fill->map))
+		goto out;
+
+	if (!mci_fill->num_dimm)
+		mci_fill->handle = entry->phys_mem_array_handle;
+	else if (mci_fill->handle != entry->phys_mem_array_handle)
+		goto out;
+
+	set_bit(mci_fill->index, mci_fill->map);
+	mci_fill->num_dimm++;
+out:
+	mci_fill->index++;
 }
 
 /*
@@ -181,17 +213,29 @@ struct ghes_dimm_fill {
 	struct list_head dimms;
 	struct mem_ctl_info *mci;
 	int index;
+	u16 link;
 };
 
-static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
+static void ghes_edac_dmidecode_dimm(const struct dmi_header *dh, void *arg)
 {
 	struct ghes_dimm_fill *dimm_fill = arg;
 	struct mem_ctl_info *mci = dimm_fill->mci;
+	struct memdev_dmi_entry *entry;
+	struct ghes_dimm *ghes_dimm;
+	struct dimm_info *dimm;
 
 	if (dh->type == DMI_ENTRY_MEM_DEVICE) {
-		struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
-		struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->index, 0, 0);
-		struct ghes_dimm *ghes_dimm;
+		entry = (struct memdev_dmi_entry *)dh;
+		if (entry->phys_mem_array_handle != dimm_fill->link)
+			return;
+
+		/*
+		 * Always returns non-zero as the mci should have
+		 * allocated the correct number of DIMMs.
+		 */
+		dimm = edac_get_dimm_by_index(mci, dimm_fill->index);
+		if (WARN_ON_ONCE(!dimm))
+			return;
 
 		ghes_dimm = ghes_dimm_alloc(dimm, entry->handle);
 		if (ghes_dimm)
@@ -605,29 +649,35 @@ static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
 static void ghes_mc_free(void)
 {
 	struct ghes_dimm *ghes_dimm, *tmp;
-	struct mem_ctl_info *mci = NULL;
+	struct mem_ctl_info *mci;
 	LIST_HEAD(dimm_list);
 	unsigned long flags;
 
-	spin_lock_irqsave(&ghes_lock, flags);
+	while (1) {
+		mci = NULL;
 
-	list_for_each_entry_safe(ghes_dimm, tmp, &ghes_dimm_list, entry) {
-		mci = mci ?: ghes_dimm->dimm->mci;
-		WARN_ON_ONCE(mci != ghes_dimm->dimm->mci);
-		list_move_tail(&ghes_dimm->entry, &dimm_list);
-	}
+		spin_lock_irqsave(&ghes_lock, flags);
 
-	WARN_ON_ONCE(!list_empty(&ghes_dimm_list));
+		list_for_each_entry_safe(ghes_dimm, tmp, &ghes_dimm_list, entry) {
+			mci = mci ?: ghes_dimm->dimm->mci;
+			if (mci != ghes_dimm->dimm->mci)
+				continue;
+			list_move_tail(&ghes_dimm->entry, &dimm_list);
+		}
 
-	spin_unlock_irqrestore(&ghes_lock, flags);
+		WARN_ON_ONCE(!mci && !list_empty(&ghes_dimm_list));
 
-	ghes_dimm_release(&dimm_list);
+		spin_unlock_irqrestore(&ghes_lock, flags);
 
-	if (!mci)
-		return;
+		ghes_dimm_release(&dimm_list);
+
+		if (!mci)
+			return;
+
+		mci = edac_mc_del_mc(mci->pdev);
+		if (!mci)
+			continue;
 
-	mci = edac_mc_del_mc(mci->pdev);
-	if (mci) {
 		platform_device_unregister(to_platform_device(mci->pdev));
 		edac_mc_free(mci);
 	}
@@ -661,7 +711,8 @@ static int ghes_edac_register_fake(struct device *dev)
 	return ghes_mc_add_or_free(mci, &dimm_list);
 }
 
-static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
+static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm,
+				u16 handle)
 {
 	struct ghes_dimm_fill dimm_fill;
 	struct mem_ctl_info *mci;
@@ -672,16 +723,18 @@ static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
 
 	dimm_fill.index = 0;
 	dimm_fill.mci = mci;
+	dimm_fill.link = handle;
 	INIT_LIST_HEAD(&dimm_fill.dimms);
 
-	dmi_walk(ghes_edac_dmidecode, &dimm_fill);
+	dmi_walk(ghes_edac_dmidecode_dimm, &dimm_fill);
 
 	return ghes_mc_add_or_free(mci, &dimm_fill.dimms);
 }
 
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
-	int rc = 0, num_dimm = 0;
+	struct ghes_mci_fill mci_fill = { };
+	int rc = 0;
 	int idx;
 
 	if (IS_ENABLED(CONFIG_X86)) {
@@ -703,13 +756,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		goto unlock;
 
 	/* Get the number of DIMMs */
-	dmi_walk(ghes_edac_count_dimms, &num_dimm);
+	dmi_walk(ghes_edac_dmidecode_mci, &mci_fill);
 
-	rc = ghes_dimm_init(num_dimm ?: 1);
+	rc = ghes_dimm_init(mci_fill.count ?: 1);
 	if (rc)
 		goto unlock;
 
-	if (!num_dimm) {
+	if (!mci_fill.count) {
 		/*
 		 * Bogus BIOS: Ignore DMI topology and use a single MC
 		 * with only one DIMM for the whole address range to
@@ -732,10 +785,34 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
 		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
 		pr_info("to correct its BIOS.\n");
-		pr_info("This system has %d DIMM sockets.\n", num_dimm);
+		pr_info("This system has %d DIMM sockets.\n", mci_fill.count);
 	}
 
-	rc = ghes_edac_register_one(dev, 0, num_dimm);
+	mci_fill.map = kcalloc(BITS_TO_LONGS(mci_fill.count),
+			sizeof(*mci_fill.map), GFP_KERNEL);
+
+	if (!mci_fill.map) {
+		rc = -ENOMEM;
+		goto unlock;
+	}
+
+	while (1) {
+		dmi_walk(ghes_edac_dmidecode_mci, &mci_fill);
+		if (!mci_fill.num_dimm)
+			break;
+
+		rc = ghes_edac_register_one(dev, mci_fill.num_mc,
+					mci_fill.num_dimm, mci_fill.handle);
+		if (rc)
+			break;
+
+		mci_fill.index    = 0;
+		mci_fill.num_dimm = 0;
+		mci_fill.num_mc++;
+	}
+
+	kfree(mci_fill.map);
+
 	if (rc)
 		goto unlock;
 
-- 
2.20.1


  parent reply index

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 15:13 [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Robert Richter
2020-03-06 15:13 ` [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc() Robert Richter
2020-04-06 11:38   ` Matthias Brugger
2020-03-06 15:13 ` [PATCH 02/11] EDAC/ghes: Setup DIMM label from DMI and use it in error reports Robert Richter
2020-03-06 15:13 ` [PATCH 03/11] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode() Robert Richter
2020-04-06 11:51   ` Matthias Brugger
2020-03-06 15:13 ` [PATCH 04/11] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci Robert Richter
2020-03-06 15:13 ` [PATCH 05/11] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill Robert Richter
2020-03-16  9:29   ` Borislav Petkov
2020-03-06 15:13 ` [PATCH 06/11] EDAC/ghes: Carve out MC device handling into separate functions Robert Richter
2020-03-16  9:31   ` Borislav Petkov
2020-03-16 12:12     ` Robert Richter
2020-03-06 15:13 ` [PATCH 07/11] EDAC/ghes: Have a separate code path for creating the fake MC Robert Richter
2020-03-06 15:13 ` [PATCH 08/11] EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}() Robert Richter
2020-03-06 15:13 ` [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles Robert Richter
2020-03-16  9:40   ` Borislav Petkov
2020-03-06 15:13 ` [PATCH 10/11] EDAC/ghes: Create an own device for each mci Robert Richter
2020-03-16  9:45   ` Borislav Petkov
2020-03-06 15:13 ` Robert Richter [this message]
2020-03-16  9:51   ` [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array Borislav Petkov
2020-03-17 16:34     ` John Garry
2020-03-17 22:14       ` Kani, Toshi
2020-03-17 22:50         ` Borislav Petkov
2020-03-17 22:53           ` Kani, Toshi
2020-03-18  0:10             ` Robert Richter
2020-03-24 11:32       ` Xiaofei Tan
2020-03-10 20:18 ` [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting Aristeu Rozanski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200306151318.17422-12-rrichter@marvell.com \
    --to=rrichter@marvell.com \
    --cc=aris@redhat.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git