All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yazen Ghannam <yazen.ghannam@amd.com>
To: <x86@kernel.org>
Cc: <linux-kernel@vger.kernel.org>,
	<platform-driver-x86@vger.kernel.org>, <markgross@kernel.org>,
	<hdegoede@redhat.com>, <Shyam-sundar.S-k@amd.com>,
	<linux-edac@vger.kernel.org>, <clemens@ladisch.de>,
	<jdelvare@suse.com>, <linux@roeck-us.net>,
	<linux-hwmon@vger.kernel.org>, <mario.limonciello@amd.com>,
	<babu.moger@amd.com>, Yazen Ghannam <yazen.ghannam@amd.com>
Subject: [PATCH v2 4/6] x86/amd_nb: Enhance SMN access error checking
Date: Thu, 15 Jun 2023 11:03:26 -0500	[thread overview]
Message-ID: <20230615160328.419610-5-yazen.ghannam@amd.com> (raw)
In-Reply-To: <20230615160328.419610-1-yazen.ghannam@amd.com>

AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.

SMN accesses are done indirectly through an index/data pair in PCI
config space. The PCI config access may fail and return an error code.
This would prevent the "read" value from being updated, and it would
give an indication to the caller that the read or write operation failed.

However for reads, the PCI config access may succeed, but the return
value may be invalid. This is in similar fashion to PCI bad reads, i.e.
return all bits set.

Most systems will return 0 for SMN addresses that are not accessible.
This is in line with AMD convention that unavailable registers are
Read-as-Zero/Writes-Ignored.

However, some systems will return a "PCI Error Response" instead. This
value, along with an error code of 0 from the PCI config access, will
confuse callers of the amd_smn_read() function.

Check for this condition and set a proper error code for SMN reads.

Furthermore, require error checking for callers of amd_smn_read() and
amd_smn_write(). This is needed because many error conditions cannot
be checked by these functions.

Also, drop the extern keyword as it's not needed. And remove a warning
that will not be trigger in many cases.

Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---
Link:
https://lore.kernel.org/r/20230516202430.4157216-5-yazen.ghannam@amd.com

v1->v2:
* No changes.

 arch/x86/include/asm/amd_nb.h |  4 +--
 arch/x86/kernel/amd_nb.c      | 46 ++++++++++++++++++++++++++++++-----
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index ed0eaf65c437..e6fe405aa567 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -21,8 +21,8 @@ extern int amd_numa_init(void);
 extern int amd_get_subcaches(int);
 extern int amd_set_subcaches(int, unsigned long);
 
-extern int amd_smn_read(u16 node, u32 address, u32 *value);
-extern int amd_smn_write(u16 node, u32 address, u32 value);
+int __must_check amd_smn_read(u16 node, u32 address, u32 *value);
+int __must_check amd_smn_write(u16 node, u32 address, u32 value);
 
 struct amd_l3_cache {
 	unsigned indices;
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 035a3db5330b..60bebf6d4a37 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -163,6 +163,38 @@ static struct pci_dev *next_northbridge(struct pci_dev *dev,
 	return dev;
 }
 
+/*
+ * SMN accesses may fail in ways that are difficult to detect here in the called
+ * functions smn_read() and smn_write(). Therefore, callers of these functions
+ * must do their own checking based on what behavior they expect.
+ *
+ * For SMN reads, the returned SMN value may be zero if the register is
+ * Read-as-Zero . Or it may be a "PCI Error Response", e.g. all 0xFFs. The "PCI
+ * Error Response" can be checked here, and a proper error code can be returned.
+ * But the Read-as-Zero response cannot be verified here. A value of 0 may be
+ * correct in some cases, so callers must check that this correct is for the
+ * register/fields they need.
+ *
+ * For SMN writes, success can be determined through a "write and read back"
+ * procedure. However, this is not robust when done here.
+ *
+ * Possible issues:
+ * 1) Bits that are "Write-1-to-Clear". In this case, the read value should
+ *    *not* match the write value.
+ * 2) Bits that are "Read-as-Zero"/"Writes-Ignored". This information cannot be
+ *    known here.
+ * 3) Bits that are "Reserved / Set to 1". Ditto above.
+ *
+ * Callers of amd_smn_write() should do the "write and read back" check themselves,
+ * if needed.
+ *
+ * For #1, they can see if their target bits got cleared.
+ *
+ * For #2 and #3, they can check if their target bits got set as intended.
+ *
+ * This matches what is done for rdmsr/wrmsr. As long as there's no #GP, then
+ * the operation is considered a success, and the caller does their own checking.
+ */
 static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
 {
 	struct pci_dev *root;
@@ -185,9 +217,6 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
 
 	err = (write ? pci_write_config_dword(root, 0x64, *value)
 		     : pci_read_config_dword(root, 0x64, value));
-	if (err)
-		pr_warn("Error %s SMN address 0x%x.\n",
-			(write ? "writing to" : "reading from"), address);
 
 out_unlock:
 	mutex_unlock(&smn_mutex);
@@ -196,13 +225,18 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
 	return err;
 }
 
-int amd_smn_read(u16 node, u32 address, u32 *value)
+int __must_check amd_smn_read(u16 node, u32 address, u32 *value)
 {
-	return __amd_smn_rw(node, address, value, false);
+	int err = __amd_smn_rw(node, address, value, false);
+
+	if (PCI_POSSIBLE_ERROR(*value))
+		err = -ENODEV;
+
+	return err;
 }
 EXPORT_SYMBOL_GPL(amd_smn_read);
 
-int amd_smn_write(u16 node, u32 address, u32 value)
+int __must_check amd_smn_write(u16 node, u32 address, u32 value)
 {
 	return __amd_smn_rw(node, address, &value, true);
 }
-- 
2.34.1


  parent reply	other threads:[~2023-06-15 16:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 16:03 [PATCH v2 0/6] Enhance AMD SMN Error Checking Yazen Ghannam
2023-06-15 16:03 ` [PATCH v2 1/6] EDAC/amd64: Remove unused register accesses Yazen Ghannam
2023-06-15 16:03 ` [PATCH v2 2/6] EDAC/amd64: Check return value of amd_smn_read() Yazen Ghannam
2023-06-15 16:03 ` [PATCH v2 3/6] hwmon: (k10temp) " Yazen Ghannam
2023-06-16  2:21   ` Guenter Roeck
2023-06-15 16:03 ` Yazen Ghannam [this message]
2023-06-15 16:03 ` [PATCH v2 5/6] hwmon: (k10temp) Define helper function to read CCD temp Yazen Ghannam
2023-06-16  2:22   ` Guenter Roeck
2023-06-15 16:03 ` [PATCH v2 6/6] hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters Yazen Ghannam
2023-06-19  2:42 ` [PATCH v2 0/6] Enhance AMD SMN Error Checking Mario Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230615160328.419610-5-yazen.ghannam@amd.com \
    --to=yazen.ghannam@amd.com \
    --cc=Shyam-sundar.S-k@amd.com \
    --cc=babu.moger@amd.com \
    --cc=clemens@ladisch.de \
    --cc=hdegoede@redhat.com \
    --cc=jdelvare@suse.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mario.limonciello@amd.com \
    --cc=markgross@kernel.org \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.