linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Terry Bowman <terry.bowman@amd.com>
To: <alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
	<ira.weiny@intel.com>, <bwidawsk@kernel.org>,
	<dan.j.williams@intel.com>, <dave.jiang@intel.com>,
	<Jonathan.Cameron@huawei.com>, <linux-cxl@vger.kernel.org>
Cc: <terry.bowman@amd.com>, <rrichter@amd.com>,
	<linux-kernel@vger.kernel.org>, <bhelgaas@google.com>
Subject: [PATCH v5 24/26] cxl/pci: Add RCH downstream port error logging
Date: Wed, 7 Jun 2023 17:16:49 -0500	[thread overview]
Message-ID: <20230607221651.2454764-25-terry.bowman@amd.com> (raw)
In-Reply-To: <20230607221651.2454764-1-terry.bowman@amd.com>

RCH downstream port error logging is missing in the current CXL driver. The
missing AER and RAS error logging is needed for communicating driver error
details to userspace. Update the driver to include PCIe AER and CXL RAS
error logging.

Add RCH downstream port error handling into the existing RCiEP handler.
The downstream port error handler is added to the RCiEP error handler
because the downstream port is implemented in a RCRB, is not PCI
enumerable, and as a result is not directly accessible to the PCI AER
root port driver. The AER root port driver calls the RCiEP handler for
handling RCD errors and RCH downstream port protocol errors.

Update existing RCiEP correctable and uncorrectable handlers to also call
the RCH handler. The RCH handler will read the RCH AER registers, check for
error severity, and if an error exists will log using an existing kernel
AER trace routine. The RCH handler will also log downstream port RAS errors
if they exist.

Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cxl/core/pci.c | 98 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index def6ee5ab4f5..97886aacc64a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -5,6 +5,7 @@
 #include <linux/delay.h>
 #include <linux/pci.h>
 #include <linux/pci-doe.h>
+#include <linux/aer.h>
 #include <cxlpci.h>
 #include <cxlmem.h>
 #include <cxl.h>
@@ -747,10 +748,105 @@ static bool cxl_report_and_clear(struct cxl_dev_state *cxlds)
 	return __cxl_report_and_clear(cxlds, cxlds->regs.ras);
 }
 
+#ifdef CONFIG_PCIEAER_CXL
+
+static void cxl_log_correctable_ras_dport(struct cxl_dev_state *cxlds,
+					  struct cxl_dport *dport)
+{
+	return __cxl_log_correctable_ras(cxlds, dport->regs.ras);
+}
+
+static bool cxl_report_and_clear_dport(struct cxl_dev_state *cxlds,
+				       struct cxl_dport *dport)
+{
+	return __cxl_report_and_clear(cxlds, dport->regs.ras);
+}
+
+/*
+ * Copy the AER capability registers using 32 bit read accesses.
+ * This is necessary because RCRB AER capability is MMIO mapped. Clear the
+ * status after copying.
+ *
+ * @aer_base: base address of AER capability block in RCRB
+ * @aer_regs: destination for copying AER capability
+ */
+static bool cxl_rch_get_aer_info(void __iomem *aer_base,
+				 struct aer_capability_regs *aer_regs)
+{
+	int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32);
+	u32 *aer_regs_buf = (u32 *)aer_regs;
+	int n;
+
+	if (!aer_base)
+		return false;
+
+	/* Use readl() to guarantee 32-bit accesses */
+	for (n = 0; n < read_cnt; n++)
+		aer_regs_buf[n] = readl(aer_base + n * sizeof(u32));
+
+	writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS);
+	writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS);
+
+	return true;
+}
+
+/* Get AER severity. Return false if there is no error. */
+static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
+				     int *severity)
+{
+	if (aer_regs->uncor_status & ~aer_regs->uncor_mask) {
+		if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV)
+			*severity = AER_FATAL;
+		else
+			*severity = AER_NONFATAL;
+		return true;
+	}
+
+	if (aer_regs->cor_status & ~aer_regs->cor_mask) {
+		*severity = AER_CORRECTABLE;
+		return true;
+	}
+
+	return false;
+}
+
+static void cxl_handle_rch_dport_errors(struct cxl_dev_state *cxlds)
+{
+	struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+	struct aer_capability_regs aer_regs;
+	struct cxl_dport *dport;
+	int severity;
+
+	if (!cxlds->rcd)
+		return;
+
+	if (!cxl_pci_find_port(pdev, &dport) || !dport->rch)
+		return;
+
+	if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
+		return;
+
+	if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
+		return;
+
+	pci_print_aer(pdev, severity, &aer_regs);
+
+	if (severity == AER_CORRECTABLE)
+		cxl_log_correctable_ras_dport(cxlds, dport);
+	else
+		cxl_report_and_clear_dport(cxlds, dport);
+}
+
+#else
+static void cxl_handle_rch_dport_errors(struct cxl_dev_state *cxlds) { }
+#endif
+
 void cxl_cor_error_detected(struct pci_dev *pdev)
 {
 	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
 
+	cxl_handle_rch_dport_errors(cxlds);
+
 	cxl_log_correctable_ras_endpoint(cxlds);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, CXL);
@@ -763,6 +859,8 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
 	struct device *dev = &cxlmd->dev;
 	bool ue;
 
+	cxl_handle_rch_dport_errors(cxlds);
+
 	/*
 	 * A frozen channel indicates an impending reset which is fatal to
 	 * CXL.mem operation, and will likely crash the system. On the off
-- 
2.34.1


  parent reply	other threads:[~2023-06-07 22:24 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-07 22:16 [PATCH v5 00/26] cxl/pci: Add support for RCH RAS error handling Terry Bowman
2023-06-07 22:16 ` [PATCH v5 01/26] cxl/acpi: Probe RCRB later during RCH downstream port creation Terry Bowman
2023-06-08  1:03   ` Dan Williams
2023-06-08  1:11     ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 02/26] cxl/rch: Prepare for caching the MMIO mapped PCIe AER capability Terry Bowman
2023-06-08  4:53   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 03/26] cxl: Rename member @dport of struct cxl_dport to @dev Terry Bowman
2023-06-08  6:42   ` Dan Williams
2023-06-08 14:36     ` Terry Bowman
2023-06-08 19:08       ` Dan Williams
2023-06-08 19:22         ` Terry Bowman
2023-06-07 22:16 ` [PATCH v5 04/26] cxl/core/regs: Rename phys_addr in cxl_map_component_regs() Terry Bowman
2023-06-08  6:47   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 05/26] cxl/core/regs: Add @dev to cxl_register_map Terry Bowman
2023-06-08 19:29   ` Dan Williams
2023-06-08 21:50     ` Terry Bowman
2023-06-07 22:16 ` [PATCH v5 06/26] cxl/pci: Refactor component register discovery for reuse Terry Bowman
2023-06-08 19:57   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 07/26] cxl/acpi: Moving add_host_bridge_uport() around Terry Bowman
2023-06-08 20:02   ` Dan Williams
2023-06-08 21:50     ` Terry Bowman
2023-06-07 22:16 ` [PATCH v5 08/26] cxl/acpi: Directly bind the CEDT detected CHBCR to the Host Bridge's port Terry Bowman
2023-06-09  4:24   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 09/26] cxl/regs: Remove early capability checks in Component Register setup Terry Bowman
2023-06-10  0:18   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 10/26] cxl/mem: Prepare for early RCH dport component register setup Terry Bowman
2023-06-10  0:26   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 11/26] cxl/pci: Early setup RCH dport component registers from RCRB Terry Bowman
2023-06-10  1:36   ` Dan Williams
2023-06-10  1:44   ` Dan Williams
2023-06-12 20:39   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 12/26] cxl/port: Store the port's Component Register mappings in struct cxl_port Terry Bowman
2023-06-10  2:18   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 13/26] cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport Terry Bowman
2023-06-10  2:23   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 14/26] cxl/pci: Store the endpoint's Component Register mappings in struct cxl_dev_state Terry Bowman
2023-06-07 23:01   ` Concept of LD-ID in QEMU Shesha Bhushan Sreenivasamurthy
2023-06-08 10:31     ` Jonathan Cameron
2023-06-08 10:36       ` Jonathan Cameron
2023-06-08 23:38         ` [EXT] " Shesha Bhushan Sreenivasamurthy
2023-06-09 11:20           ` Jonathan Cameron
2023-06-10  2:29   ` [PATCH v5 14/26] cxl/pci: Store the endpoint's Component Register mappings in struct cxl_dev_state Dan Williams
2023-06-07 22:16 ` [PATCH v5 15/26] cxl/hdm: Use stored Component Register mappings to map HDM decoder capability Terry Bowman
2023-06-10  2:34   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 16/26] cxl/port: Remove Component Register base address from struct cxl_port Terry Bowman
2023-06-10  2:36   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 17/26] cxl/port: Remove Component Register base address from struct cxl_dport Terry Bowman
2023-06-10  2:37   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 18/26] cxl/pci: Remove Component Register base address from struct cxl_dev_state Terry Bowman
2023-06-10  2:38   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 19/26] cxl/pci: Add RCH downstream port AER register discovery Terry Bowman
2023-06-10  3:09   ` Dan Williams
2023-06-12 14:41     ` Terry Bowman
2023-06-07 22:16 ` [PATCH v5 20/26] PCI/AER: Refactor cper_print_aer() for use by CXL driver module Terry Bowman
2023-06-10  3:11   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 21/26] cxl/pci: Update CXL error logging to use RAS register address Terry Bowman
2023-06-10  3:12   ` Dan Williams
2023-06-12 21:12   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 22/26] cxl/pci: Map RCH downstream AER registers for logging protocol errors Terry Bowman
2023-06-10  3:23   ` Dan Williams
2023-06-12 18:19     ` Terry Bowman
2023-06-07 22:16 ` [PATCH v5 23/26] cxl/pci: Disable root port interrupts in RCH mode Terry Bowman
2023-06-12 20:29   ` Dan Williams
2023-06-13 15:28     ` Terry Bowman
2023-06-07 22:16 ` Terry Bowman [this message]
2023-06-12 21:38   ` [PATCH v5 24/26] cxl/pci: Add RCH downstream port error logging Dan Williams
2023-06-16 16:17     ` Terry Bowman
2023-06-16 16:28       ` Terry Bowman
2023-06-07 22:16 ` [PATCH v5 25/26] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler Terry Bowman
2023-06-08 22:54   ` kernel test robot
2023-06-12 22:49   ` Dan Williams
2023-06-07 22:16 ` [PATCH v5 26/26] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling Terry Bowman
2023-06-08 19:21   ` Bjorn Helgaas
2023-06-12 22:57   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230607221651.2454764-25-terry.bowman@amd.com \
    --to=terry.bowman@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=bwidawsk@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rrichter@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).