All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naveen Krishna Chatradhi <nchatrad@amd.com>
To: <linux-edac@vger.kernel.org>
Cc: <bp@alien8.de>, <mingo@redhat.com>, <mchehab@kernel.org>,
	<mchehab+huawei@kernel.org>, <yazen.ghannam@amd.com>,
	Muralidhara M K <muralimk@amd.com>,
	Naveen Krishna Chatradhi <nchatrad@amd.com>
Subject: [PATCH 3/4] rasdaemon: Enumerate memory on noncpu nodes
Date: Tue, 10 Aug 2021 22:52:13 +0530	[thread overview]
Message-ID: <20210810172214.134099-4-nchatrad@amd.com> (raw)
In-Reply-To: <20210810172214.134099-1-nchatrad@amd.com>

From: Muralidhara M K <muralimk@amd.com>

On newer heterogeneous systems from AMD with GPU nodes
(with HBM2 memory) connected via xGMI links to the CPUs.

The node id information is available in the
MCA_IPID[47:44](InstanceIdHI) register.

The UMC Phys on Aldeberan nodes are enumerated as csrow
The UMC channels connected to HBMs are enumerated as ranks.

Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
---
 mce-amd-smca.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/mce-amd-smca.c b/mce-amd-smca.c
index 3c346f4..9381aa1 100644
--- a/mce-amd-smca.c
+++ b/mce-amd-smca.c
@@ -78,6 +78,16 @@ enum smca_bank_types {
 /* Maximum number of MCA banks per CPU. */
 #define MAX_NR_BANKS	64
 
+/*
+ * On newer heterogeneous systems the data gabrics of the CPUs and GPUs
+ * are connected directly via a custom links, like is done with
+ * 2 socket CPU systems and also within a socket for Multi-chip Module
+ * (MCM) CPUs like Naples.
+ * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value
+ * of 8 (the second GPU node has 9, etc.).
+ */
+#define NONCPU_NODE_INDEX	8
+
 /* SMCA Extended error strings */
 /* Load Store */
 static const char * const smca_ls_mce_desc[] = {
@@ -531,6 +541,26 @@ static int find_umc_channel(struct mce_event *e)
 {
 	return EXTRACT(e->ipid, 0, 31) >> 20;
 }
+
+/*
+ * The HBM memory managed by the UMCCH of the noncpu node
+ * can be calculated based on the [15:12]bits of IPID
+ */
+static int find_hbm_channel(struct mce_event *e)
+{
+	int umc, tmp;
+
+	umc = EXTRACT(e->ipid, 0, 31) >> 20;
+
+	/*
+	 * The HBM channel managed by the UMC of the noncpu node
+	 * can be calculated based on the [15:12]bits of IPID as follows
+	 */
+	tmp = ((e->ipid >> 12) & 0xf);
+
+	return (umc % 2) ? tmp + 4 : tmp;
+}
+
 /* Decode extended errors according to Scalable MCA specification */
 static void decode_smca_error(struct mce_event *e)
 {
@@ -539,6 +569,7 @@ static void decode_smca_error(struct mce_event *e)
 	unsigned short xec = (e->status >> 16) & 0x3f;
 	const struct smca_hwid *s_hwid;
 	uint32_t mcatype_hwid = EXTRACT(e->ipid, 32, 63);
+	uint8_t mcatype_instancehi = EXTRACT(e->ipid, 44, 47);
 	unsigned int csrow = -1, channel = -1;
 	unsigned int i;
 
@@ -548,14 +579,16 @@ static void decode_smca_error(struct mce_event *e)
 			bank_type = s_hwid->bank_type;
 			break;
 		}
+		if (mcatype_instancehi >= NONCPU_NODE_INDEX)
+			bank_type = SMCA_UMC_V2;
 	}
 
-	if (i >= ARRAY_SIZE(smca_hwid_mcatypes)) {
+	if (i >= MAX_NR_BANKS) {
 		strcpy(e->mcastatus_msg, "Couldn't find bank type with IPID");
 		return;
 	}
 
-	if (bank_type >= N_SMCA_BANK_TYPES) {
+	if (bank_type >= MAX_NR_BANKS) {
 		strcpy(e->mcastatus_msg, "Don't know how to decode this bank");
 		return;
 	}
@@ -580,6 +613,16 @@ static void decode_smca_error(struct mce_event *e)
 		mce_snprintf(e->mc_location, "memory_channel=%d,csrow=%d",
 			     channel, csrow);
 	}
+
+	if (bank_type == SMCA_UMC_V2 && xec == 0) {
+		/* The UMCPHY is reported as csrow in case of noncpu nodes */
+		csrow = find_umc_channel(e) / 2;
+		/* UMCCH is managing the HBM memory */
+		channel = find_hbm_channel(e);
+		mce_snprintf(e->mc_location, "memory_channel=%d,csrow=%d",
+			     channel, csrow);
+	}
+
 }
 
 int parse_amd_smca_event(struct ras_events *ras, struct mce_event *e)
-- 
2.17.1


  parent reply	other threads:[~2021-08-10 17:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10 17:22 [PATCH 0/4] rasdaemon: Add support for AMD CPUs family 19h Naveen Krishna Chatradhi
2021-08-10 17:22 ` [PATCH 1/4] rasdaemon: Add new SMCA bank types with error decoding Naveen Krishna Chatradhi
2021-08-10 17:22 ` [PATCH 2/4] rasdaemon: set SMCA maximum number of banks to 64 Naveen Krishna Chatradhi
2021-08-10 17:22 ` Naveen Krishna Chatradhi [this message]
2021-08-10 17:22 ` [PATCH 4/4] rasdaemon: Support MCE for AMD CPU family 19h Naveen Krishna Chatradhi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210810172214.134099-4-nchatrad@amd.com \
    --to=nchatrad@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mchehab@kernel.org \
    --cc=mingo@redhat.com \
    --cc=muralimk@amd.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.