linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
@ 2022-10-21 18:56 Terry Bowman
  2022-10-21 18:56 ` [PATCH 1/5] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Terry Bowman
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-21 18:56 UTC (permalink / raw)
  To: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

This patchset adds CXL downport PCI AER and CXL RAS logging to the CXL
error handling. This is necessary for communicating CXL HW issues to users.
The included patches find and cache pointers to the AER and CXL RAS PCIe
capability structures. The cached pointers are then used to display the
error information in a later patch. These changes follow the CXL
specification, Chapter 8 'Control and Status Registers'.[1]

The first patch enables CXL1.1 RCD support through the ACPI _OSC support
method.

The 2nd and 3rd patches find and map PCIe AER and CXL RAS capabilities.

The 4th patch enables AER error reporting.

The 5th patch adds functionality to log the PCIe AER and RAS capabilities. 

TODO work remains to consolidate the HDM and CXL RAS register mapping
(patch#3). The current CXL RAS register mapping will be replaced to reuse
cxl_probe_component_regs() function as David Jiang and Alison Schofield
upstreamed. Should the same be done for the AER registers (patch#2)? The
AER registers are not in the component register block but are instead in
the downport and upport (RCRB).

TODO work remains to add support for upports in some cases here where
downport is addressed. For instance, will need another aer_map to support
upport AER ?

TODO work to support CXL2.0. Should be trivial since aer_cap and aer_stats
is member of 'struct pci_dev'.

Base is from: https://patchwork.kernel.org/project/cxl/list/?series=686272

[1] - https://www.computeexpresslink.org/spec-landing

Terry Bowman (5):
  cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
  cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
  cxl/pci: Enable RCD dport AER reporting
  cxl/pci: Log CXL device's PCIe AER and CXL RAS error information

 drivers/acpi/pci_root.c |   1 +
 drivers/cxl/acpi.c      |  56 +++++++
 drivers/cxl/core/regs.c |   1 +
 drivers/cxl/cxl.h       |  13 ++
 drivers/cxl/cxlmem.h    |   3 +
 drivers/cxl/mem.c       |   2 +
 drivers/cxl/pci.c       | 319 ++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pcie/aer.c  |  45 +++++-
 include/linux/pci.h     |   4 +
 9 files changed, 443 insertions(+), 1 deletion(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-10-21 18:56 [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information Terry Bowman
@ 2022-10-21 18:56 ` Terry Bowman
  2022-10-21 22:39   ` Dan Williams
  2022-10-21 18:56 ` [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability Terry Bowman
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Terry Bowman @ 2022-10-21 18:56 UTC (permalink / raw)
  To: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

ACPI includes a CXL _OSC support method to communicate the available
CXL support to FW. The CXL support _OSC includes a field to indicate
CXL1.1 RCH RCD support. The OS sets this bit to 1 if it supports access
to RCD and RCH Port registers.[1] FW can potentially change it's operation
depending on the _OSC support setting reported by the OS.

The ACPI driver does not currently set the ACPI _OSC support to indicate
CXL1.1 RCD RCH support. Change the capability reported to include CXL1.1.

[1] CXL3.0 Table 9-26 'Interpretation of CXL _OSC Support Field'

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 drivers/acpi/pci_root.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index c8385ef54c37..094a59b216ae 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -492,6 +492,7 @@ static u32 calculate_cxl_support(void)
 	u32 support;
 
 	support = OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT;
+	support |= OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT;
 	if (pci_aer_available())
 		support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
 	if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
  2022-10-21 18:56 [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information Terry Bowman
  2022-10-21 18:56 ` [PATCH 1/5] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Terry Bowman
@ 2022-10-21 18:56 ` Terry Bowman
  2022-10-22 21:45   ` Dan Williams
  2022-10-27 14:52   ` Bjorn Helgaas
  2022-10-21 18:56 ` [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers Terry Bowman
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-21 18:56 UTC (permalink / raw)
  To: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

CXL downport PCIe AER information needs to be logged during error handling.
The RCD downport/upport does not have a BDF and is not PCI enumerable. As a
result the CXL PCIe driver is not aware of the AER in 'PCI Express'
capability located in the RCRB downport/upport. Logic must be introduced to
use the downport/upport AER information.

Update the CXL driver to find the downport's PCIe AER capability and cache
a pointer for using later. First, find the RCRB to provide the
downport/upport memory region. The downport/upport are mapped as MMIO not
PCI config space. Use readl/writel/readq/writeq as required by the CXL spec
to find and operate on the AER registers.[1]

Also, add function to detect if the device is a CXL1.1 RCD device.

[1] CXL3.0, 8.2 'Memory Mapped Registers'

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 drivers/cxl/acpi.c      |  56 ++++++++++++++
 drivers/cxl/core/regs.c |   1 +
 drivers/cxl/cxl.h       |   9 +++
 drivers/cxl/cxlmem.h    |   2 +
 drivers/cxl/mem.c       |   2 +
 drivers/cxl/pci.c       | 158 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 228 insertions(+)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index bf251a27e460..5d543c789e8d 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -232,6 +232,7 @@ struct cxl_chbs_context {
 	struct device *dev;
 	unsigned long long uid;
 	struct acpi_cedt_chbs chbs;
+	resource_size_t chbcr;
 };
 
 static int cxl_get_chbs(union acpi_subtable_headers *header, void *arg,
@@ -417,6 +418,61 @@ static void remove_cxl_resources(void *data)
 	}
 }
 
+static const struct acpi_device_id cxl_host_ids[] = {
+	{ "ACPI0016", 0 },
+	{ "PNP0A08", 0 },
+	{ },
+};
+
+static int __cxl_get_rcrb(union acpi_subtable_headers *header, void *arg,
+			  const unsigned long end)
+{
+	struct cxl_chbs_context *ctx = arg;
+	struct acpi_cedt_chbs *chbs;
+
+	if (ctx->chbcr)
+		return 0;
+
+	chbs = (struct acpi_cedt_chbs *)header;
+
+	if (ctx->uid != chbs->uid)
+		return 0;
+
+	if (chbs->cxl_version != ACPI_CEDT_CHBS_VERSION_CXL11)
+		return 0;
+
+	if (chbs->length != SZ_8K)
+		return 0;
+
+	ctx->chbcr = chbs->base;
+
+	return 0;
+}
+
+resource_size_t cxl_get_rcrb(struct cxl_memdev *cxlmd)
+{
+	struct pci_host_bridge *host = NULL;
+	struct cxl_chbs_context ctx = {0};
+	struct cxl_dport *dport;
+	struct cxl_port *port;
+
+	port = cxl_mem_find_port(cxlmd, NULL);
+	if (!port)
+		return 0;
+
+	dport = port->parent_dport;
+	ctx.uid = dport ? dport->port_id : 0;
+	if (!dport)
+		return 0;
+
+	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, __cxl_get_rcrb, &ctx);
+
+	dev_dbg(&host->dev, "RCRB found: 0x%08llx\n", (u64)ctx.chbcr);
+
+	return ctx.chbcr;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_rcrb, CXL);
+
 /**
  * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
  * @cxl_res: A standalone resource tree where each CXL window is a sibling
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index ec178e69b18f..0d4f633e5c01 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -184,6 +184,7 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
 
 	return ret_val;
 }
+EXPORT_SYMBOL_NS_GPL(devm_cxl_iomap_block, CXL);
 
 int cxl_map_component_regs(struct pci_dev *pdev,
 			   struct cxl_component_regs *regs,
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index ac8998b627b5..7d507ab80a78 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -204,6 +204,14 @@ struct cxl_register_map {
 	};
 };
 
+struct cxl_memdev;
+int cxl_pci_probe_dport(struct cxl_memdev *cxlmd);
+
+void cxl_pci_aer_init(struct cxl_memdev *cxlmd);
+
+void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
+				   resource_size_t length);
+
 void cxl_probe_component_regs(struct device *dev, void __iomem *base,
 			      struct cxl_component_reg_map *map);
 void cxl_probe_device_regs(struct device *dev, void __iomem *base,
@@ -549,6 +557,7 @@ static inline bool is_cxl_root(struct cxl_port *port)
 	return port->uport == port->dev.parent;
 }
 
+resource_size_t cxl_get_rcrb(struct cxl_memdev *cxlmd);
 bool is_cxl_port(struct device *dev);
 struct cxl_port *to_cxl_port(struct device *dev);
 struct pci_bus;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 88e3a8e54b6a..079db2e15acc 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -242,6 +242,8 @@ struct cxl_dev_state {
 	u64 next_volatile_bytes;
 	u64 next_persistent_bytes;
 
+	struct cxl_register_map aer_map;
+
 	resource_size_t component_reg_phys;
 	u64 serial;
 
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 64ccf053d32c..d1e663be43c2 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -74,6 +74,8 @@ static int cxl_mem_probe(struct device *dev)
 	if (rc)
 		return rc;
 
+	cxl_pci_aer_init(cxlmd);
+
 	parent_port = cxl_mem_find_port(cxlmd, &dport);
 	if (!parent_port) {
 		dev_err(dev, "CXL port topology not found\n");
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..2287b5225862 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -35,6 +35,15 @@
 	(readl((cxlds)->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET) &                  \
 	 CXLDEV_MBOX_CTRL_DOORBELL)
 
+/* PCI 5.0 - 7.8.4 Advanced Error Reporting Extended Capability */
+#define PCI_AER_CAP_SIZE 0x48
+
+/* CXL 3.0 - 8.2.1.3.3, Offset to DVSEC Port Status */
+#define CXL_DVSEC_PORT_STATUS_OFF 0xE
+
+/* CXL 3.0 - 8.2.1.3.3 */
+#define CXL_DVSEC_VH_SUPPORT 0x20
+
 /* CXL 2.0 - 8.2.8.4 */
 #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
 
@@ -428,6 +437,155 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
 	}
 }
 
+static resource_size_t cxl_get_dport_ext_cap(struct cxl_memdev *cxlmd, u32 cap_id)
+{
+	resource_size_t rcrb, offset;
+	void *rcrb_mapped;
+	u32 cap_hdr;
+
+	rcrb = cxl_get_rcrb(cxlmd);
+	if (!rcrb)
+		return 0;
+
+	rcrb_mapped = ioremap(rcrb, SZ_4K);
+	if (!rcrb_mapped)
+		return 0;
+
+	offset = PCI_CFG_SPACE_SIZE;
+	cap_hdr = readl(rcrb_mapped + offset);
+
+	while (PCI_EXT_CAP_ID(cap_hdr)) {
+		if (PCI_EXT_CAP_ID(cap_hdr) == cap_id)
+			break;
+
+		offset = PCI_EXT_CAP_NEXT(cap_hdr);
+		if (offset == 0)
+			break;
+
+		cap_hdr = readl(rcrb_mapped + offset);
+	}
+	iounmap((void *)rcrb_mapped);
+
+	if (PCI_EXT_CAP_ID(cap_hdr) != cap_id)
+		return 0;
+
+	pr_debug("Found capability %X @ %llX (%X)\n",
+		 cap_id, rcrb + offset, cap_hdr);
+
+	return rcrb + offset;
+}
+
+bool is_rcd(struct cxl_memdev *cxlmd)
+{
+	struct pci_dev *pdev;
+	resource_size_t dvsec;
+	void *dvsec_mapped;
+	u32 dvsec_data;
+
+	if (!dev_is_pci(cxlmd->cxlds->dev))
+		return false;
+
+	pdev = to_pci_dev(cxlmd->cxlds->dev);
+
+	/*
+	 * 'CXL devices operating in this mode always set the Device/Port
+	 * Type field in the PCI Express Capabilities register to RCiEP.'
+	 * - CXL3.0 9.11.1 'RCD Mode'
+	 */
+	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
+		return false;
+
+	/*
+	 * Check if VH is enabled
+	 * - CXL3.0 8.2.1.3.1 'DVSEC Flex Bus Port Capability'
+	 */
+	dvsec = cxl_get_dport_ext_cap(cxlmd, PCI_EXT_CAP_ID_DVSEC);
+	if (!dvsec)
+		return false;
+
+	dvsec_mapped = ioremap(dvsec, SZ_4K);
+	dvsec_data = readl(dvsec_mapped + CXL_DVSEC_PORT_STATUS_OFF);
+	iounmap(dvsec_mapped);
+	if (dvsec_data & CXL_DVSEC_VH_SUPPORT)
+		return false;
+
+	return true;
+}
+
+#define PCI_CAP_ID(header)	(header & 0x000000ff)
+#define PCI_CAP_NEXT(header)	((header >> 8) & 0xff)
+
+static resource_size_t cxl_get_dport_cap(struct cxl_memdev *cxlmd, int cap_id)
+{
+	resource_size_t offset, rcrb;
+	void *rcrb_mapped;
+	u32 cap_hdr;
+
+	rcrb = cxl_get_rcrb(cxlmd);
+	if (!rcrb)
+		return 0;
+
+	rcrb_mapped = ioremap(rcrb, SZ_4K);
+	if (!rcrb_mapped)
+		return 0;
+
+	offset = readl(rcrb_mapped + PCI_CAPABILITY_LIST);
+	cap_hdr = readl(rcrb_mapped + offset);
+
+	while (PCI_CAP_ID(cap_hdr)) {
+		if (PCI_CAP_ID(cap_hdr) == cap_id)
+			break;
+
+		offset = PCI_CAP_NEXT(cap_hdr);
+		if (offset == 0)
+			break;
+
+		cap_hdr = readl(rcrb_mapped + offset);
+	}
+	iounmap((void *)rcrb_mapped);
+
+	if (PCI_CAP_ID(cap_hdr) != cap_id)
+		return 0;
+
+	pr_debug("Found capability %X @ %llX (%X)\n",
+		 cap_id, rcrb + offset, cap_hdr);
+
+	return rcrb + offset;
+}
+
+static int cxl_setup_dport_aer(struct cxl_memdev *cxlmd, resource_size_t cap_base)
+{
+	struct cxl_register_map *map = &cxlmd->cxlds->aer_map;
+	struct pci_dev *pdev = to_pci_dev(&cxlmd->dev);
+
+	if (!cap_base)
+		return  -ENODEV;
+
+	map->base = devm_cxl_iomap_block(&pdev->dev, cap_base,
+					 PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1);
+	if (!map->base)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
+{
+	resource_size_t cap_base;
+
+	/* CXL2.0 is enumerable and will use AER attached to `struct pci_dev` */
+	if (!is_rcd(cxlmd))
+		return;
+
+	/*
+	 * Read base address of the PCI express cap. Cache the cap's
+	 * PCI_EXP_DEVCTL and PCI_EXP_DEVSTA for AER control and status.
+	 */
+	cap_base = cxl_get_dport_cap(cxlmd, PCI_CAP_ID_EXP);
+	cxl_setup_dport_aer(cxlmd, cap_base);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
+
 static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct cxl_register_map map;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
  2022-10-21 18:56 [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information Terry Bowman
  2022-10-21 18:56 ` [PATCH 1/5] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Terry Bowman
  2022-10-21 18:56 ` [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability Terry Bowman
@ 2022-10-21 18:56 ` Terry Bowman
  2022-10-22 22:44   ` Dan Williams
  2022-10-28 12:53   ` Ariel.Sibley
  2022-10-21 18:56 ` [PATCH 4/5] cxl/pci: Enable RCD dport AER reporting Terry Bowman
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-21 18:56 UTC (permalink / raw)
  To: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

CXL RAS information resides in a RAS capability structure located in
CXL.cache and CXL.mem registers.[1] The RAS capability provides CXL
specific error information that can be helpful in debugging. This
information is not currently logged but needs to be logged during PCIe AER
error handling.

Update the CXL driver to find and cache a pointer to the CXL RAS
capability. The RAS registers resides in the downport's component register
block. Note:RAS registers are not in the upport. The component registers
can be found by first using the RCRB to goto the downport. Next, the
downport's 64-bit BAR[0] will point to the component register block.

[1] CXL3.0 Spec, '8.2.5 CXL.cache and CXL.mem Registers'

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 drivers/cxl/cxl.h    |  4 +++
 drivers/cxl/cxlmem.h |  1 +
 drivers/cxl/pci.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 7d507ab80a78..69b50131ad86 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -36,6 +36,10 @@
 #define   CXL_CM_CAP_CAP_ID_HDM 0x5
 #define   CXL_CM_CAP_CAP_HDM_VERSION 1
 
+/* CXL 3.0 8.2.4.2 CXL RAS Capability Header */
+#define CXL_CM_CAP_ID_RAS 0x2
+#define CXL_CM_CAP_SIZE_RAS 0x5C
+
 /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
 #define CXL_HDM_DECODER_CAP_OFFSET 0x0
 #define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 079db2e15acc..515273e224ea 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -243,6 +243,7 @@ struct cxl_dev_state {
 	u64 next_persistent_bytes;
 
 	struct cxl_register_map aer_map;
+	struct cxl_register_map ras_map;
 
 	resource_size_t component_reg_phys;
 	u64 serial;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 2287b5225862..7f717fb47a36 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -586,6 +586,78 @@ void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
 
+static resource_size_t cxl_get_dport_ras_base(struct cxl_memdev *cxlmd)
+{
+	resource_size_t component_reg_phys, offset = 0;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	void *cap_hdr_addr, *comp_reg_mapped;
+	u32 cap_hdr, ras_cap_hdr;
+	int cap_ndx;
+
+	comp_reg_mapped = ioremap(cxlds->component_reg_phys +
+				  CXL_CM_OFFSET, CXL_COMPONENT_REG_BLOCK_SIZE);
+	if (!comp_reg_mapped)
+		return 0;
+
+	cap_hdr_addr = comp_reg_mapped;
+	cap_hdr = readl(cap_hdr_addr);
+	for (cap_ndx = 0;
+	     cap_ndx < FIELD_GET(CXL_CM_CAP_HDR_ARRAY_SIZE_MASK, cap_hdr);
+	     cap_ndx++) {
+		ras_cap_hdr = readl(cap_hdr_addr + cap_ndx*sizeof(u32));
+
+		if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) == CXL_CM_CAP_ID_RAS) {
+			pr_debug("RAS cap header = %X @ %pa, cap_ndx = %d\n",
+				 ras_cap_hdr, cap_hdr_addr, cap_ndx);
+			break;
+		}
+	}
+
+	offset = CXL_CM_OFFSET + PCI_EXT_CAP_NEXT(ras_cap_hdr);
+
+	iounmap(comp_reg_mapped);
+
+	if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) != CXL_CM_CAP_ID_RAS)
+		return 0;
+
+	pr_debug("Found RAS capability @ %llX (%X)\n",
+		 component_reg_phys + offset, *((u32 *)(comp_reg_mapped + offset)));
+
+	return component_reg_phys + offset;
+}
+
+static int cxl_setup_dport_ras(struct cxl_memdev *cxlmd, resource_size_t resource)
+{
+	struct cxl_register_map *map = &cxlmd->cxlds->ras_map;
+	struct pci_dev *pdev = to_pci_dev(&cxlmd->dev);
+
+	if (!resource) {
+		pr_err("%s():%d: RAS resource ptr is NULL\n", __func__, __LINE__);
+		return  -EINVAL;
+	}
+
+	map->base = devm_cxl_iomap_block(&pdev->dev, resource, CXL_CM_CAP_SIZE_RAS);
+	if (!map->base)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void cxl_pci_ras_init(struct cxl_memdev *cxlmd)
+{
+	resource_size_t cap;
+
+	/*
+	 * TODO - CXL2.0 will need change to support PCI config space.
+	 */
+	if (!is_rcd(cxlmd))
+		return;
+
+	cap = cxl_get_dport_ras_base(cxlmd);
+	cxl_setup_dport_ras(cxlmd, cap);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_pci_ras_init, CXL);
+
 static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct cxl_register_map map;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] cxl/pci: Enable RCD dport AER reporting
  2022-10-21 18:56 [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information Terry Bowman
                   ` (2 preceding siblings ...)
  2022-10-21 18:56 ` [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers Terry Bowman
@ 2022-10-21 18:56 ` Terry Bowman
  2022-10-21 18:56 ` [PATCH 5/5] cxl/pci: Log CXL device's PCIe AER and CXL RAS error information Terry Bowman
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-21 18:56 UTC (permalink / raw)
  To: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

The RCD downport/upport include 'PCI express' capability with AER
registers. The PCI subsystem is not aware of RCD downport/upport AER
because the downport/upport are not enumerable devices. Since the
downport/upport are not enumerable the existing PCIe AER logic to enable
AER reporting does not apply.

Add logic to the CXL driver to enable AER reporting in the RCRB 'PCI
express' capability. These must be set for correctly reporting the PCIe
AER errors to the RCEC or root port.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 drivers/cxl/pci.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 7f717fb47a36..80a01b304efe 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -553,6 +553,17 @@ static resource_size_t cxl_get_dport_cap(struct cxl_memdev *cxlmd, int cap_id)
 	return rcrb + offset;
 }
 
+static void cxl_enable_dport_aer(struct cxl_memdev *cxlmd)
+{
+	struct cxl_register_map *map = &cxlmd->cxlds->aer_map;
+	u32 devctl_cap;
+
+	devctl_cap = readl(map->base + PCI_EXP_DEVCTL);
+	devctl_cap |= (PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE |
+		       PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE);
+	writel(devctl_cap, map->base + PCI_EXP_DEVCTL);
+}
+
 static int cxl_setup_dport_aer(struct cxl_memdev *cxlmd, resource_size_t cap_base)
 {
 	struct cxl_register_map *map = &cxlmd->cxlds->aer_map;
@@ -566,6 +577,8 @@ static int cxl_setup_dport_aer(struct cxl_memdev *cxlmd, resource_size_t cap_bas
 	if (!map->base)
 		return -ENOMEM;
 
+	cxl_enable_dport_aer(cxlmd);
+
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] cxl/pci: Log CXL device's PCIe AER and CXL RAS error information
  2022-10-21 18:56 [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information Terry Bowman
                   ` (3 preceding siblings ...)
  2022-10-21 18:56 ` [PATCH 4/5] cxl/pci: Enable RCD dport AER reporting Terry Bowman
@ 2022-10-21 18:56 ` Terry Bowman
  2022-10-24 15:14   ` Jonathan Cameron
  2022-10-27 21:30   ` Bjorn Helgaas
  2022-10-21 19:02 ` [PATCH 0/5] cxl: Log downport " Terry Bowman
  2022-10-28 12:30 ` Ariel.Sibley
  6 siblings, 2 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-21 18:56 UTC (permalink / raw)
  To: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

The CXL downport PCIe AER and CXL RAS capability information needs to be
logged during PCIe AER error handling.

The existing PCIe AER error handler logs native AER errors but does not
log upport/downport AER capability residing in the RCRB. The CXL1.1
RCRB does not have a BDF and is not enunmerable. The existing error handler
logic does not display CXL RAS details either.

Add a CXL error handler to the existing PCI error handlers. Add a call
to the CXL error handler within the PCIe AER error handler. Implement the
driver's CXL callback to log downport PCIe AER and CXL RAS capability
information.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 drivers/cxl/pci.c      | 76 ++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pcie/aer.c | 45 ++++++++++++++++++++++++-
 include/linux/pci.h    |  4 +++
 3 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 80a01b304efe..dceda9f9fc60 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -7,6 +7,7 @@
 #include <linux/sizes.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
+#include <linux/aer.h>
 #include <linux/pci.h>
 #include <linux/pci-doe.h>
 #include <linux/io.h>
@@ -14,6 +15,9 @@
 #include "cxlpci.h"
 #include "cxl.h"
 
+extern void cxl_print_aer(struct pci_dev *dev, int aer_severity,
+			  struct aer_capability_regs *aer);
+
 /**
  * DOC: cxl pci
  *
@@ -744,9 +748,80 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
 
+	dev_set_drvdata(&pdev->dev, cxlmd);
+
 	return rc;
 }
 
+struct ras_cap {
+	u32 uc_error_status;
+	u32 uc_error_mask;
+	u32 uc_error_severity;
+	u32 c_error_status;
+	u32 c_error_mask;
+	u32 ctrl;
+	u32 log[];
+};
+
+/*
+ * Log the state of the CXL downport AER and RAS status registers.
+ */
+static void cxl_error_report(struct cxl_memdev *cxlmd)
+{
+	struct pci_dev *pdev = to_pci_dev(cxlmd->cxlds->dev);
+	struct aer_capability_regs *aer_cap;
+	struct ras_cap *ras_cap;
+
+	aer_cap = (struct aer_capability_regs *)cxlmd->cxlds->aer_map.base;
+	ras_cap = (struct ras_cap *)cxlmd->cxlds->ras_map.base;
+
+	pci_err(pdev, "CXL Error Report\n");
+	pci_err(pdev, "AER Errors:\n");
+	if (aer_cap) {
+		cxl_print_aer(pdev, AER_CORRECTABLE, aer_cap);
+		cxl_print_aer(pdev, AER_NONFATAL, aer_cap);
+		cxl_print_aer(pdev, AER_FATAL, aer_cap);
+	}
+
+	pci_err(pdev, "RAS Errors:\n");
+	if (ras_cap) {
+		pci_err(pdev, "RAS: uc_error_status = %X\n", readl(&ras_cap->uc_error_status));
+		pci_err(pdev, "RAS: uc_error_mask = %X\n", readl(&ras_cap->uc_error_mask));
+		pci_err(pdev, "RAS: uc_error_severity = %X\n", readl(&ras_cap->uc_error_severity));
+		pci_err(pdev, "RAS: c_error_status = %X\n", readl(&ras_cap->c_error_status));
+		pci_err(pdev, "RAS: c_error_mask = %X\n", readl(&ras_cap->c_error_mask));
+		pci_err(pdev, "RAS: ras_caps->ctrl = %X\n", readl(&ras_cap->ctrl));
+		pci_err(pdev, "RAS: log = %X\n", readl(&ras_cap->log));
+	}
+}
+
+static void cxl_error_detected(struct pci_dev *pdev)
+{
+	struct cxl_memdev *cxlmd;
+
+	if (!is_cxl_memdev(&pdev->dev)) {
+		pci_err(pdev, "CXL memory device is invalid\n");
+		return;
+	}
+
+	cxlmd = dev_get_drvdata(&pdev->dev);
+	if (!cxlmd) {
+		pci_err(pdev, "CXL memory device is NULL\n");
+		return;
+	}
+
+	if (!cxlmd->cxlds) {
+		pci_err(pdev, "CXL device state object is NULL\n");
+		return;
+	}
+
+	cxl_error_report(cxlmd);
+}
+
+static struct pci_error_handlers cxl_error_handlers = {
+	.cxl_error_detected = cxl_error_detected,
+};
+
 static const struct pci_device_id cxl_mem_pci_tbl[] = {
 	/* PCI class code for CXL.mem Type-3 Devices */
 	{ PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF), ~0)},
@@ -761,6 +836,7 @@ static struct pci_driver cxl_pci_driver = {
 	.driver	= {
 		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
 	},
+	.err_handler		= &cxl_error_handlers,
 };
 
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2d8a74f83c3..dea04d412406 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -811,6 +811,13 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity,
 }
 #endif
 
+void cxl_print_aer(struct pci_dev *dev, int aer_severity,
+		    struct aer_capability_regs *aer)
+{
+	cper_print_aer(dev, aer_severity, aer);
+}
+EXPORT_SYMBOL_GPL(cxl_print_aer);
+
 /**
  * add_error_device - list device to be handled
  * @e_info: pointer to error info
@@ -1169,6 +1176,40 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 	}
 }
 
+static int report_cxl_errors_iter(struct pci_dev *pdev, void *data)
+{
+	struct pci_driver *pdrv = pdev->driver;
+
+	if (pdrv &&
+	    pdrv->err_handler &&
+	    pdrv->err_handler->cxl_error_detected)
+		pdrv->err_handler->cxl_error_detected(pdev);
+
+	return 0;
+}
+
+static void report_cxl_errors(struct aer_rpc *rpc,
+			      struct aer_err_source *e_src)
+{
+	struct pci_dev *pdev = rpc->rpd;
+	struct aer_err_info e_info;
+	u32 uncor_status, cor_status;
+
+	pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &uncor_status);
+	pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_STATUS, &cor_status);
+
+	if (!uncor_status && !cor_status)
+		return;
+
+	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_EC)
+		pcie_walk_rcec(pdev, report_cxl_errors_iter, &e_info);
+	else
+		pci_walk_bus(pdev->subordinate, report_cxl_errors_iter, &e_info);
+
+	pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, uncor_status);
+	pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_STATUS, cor_status);
+}
+
 /**
  * aer_isr - consume errors detected by root port
  * @irq: IRQ assigned to Root Port
@@ -1185,8 +1226,10 @@ static irqreturn_t aer_isr(int irq, void *context)
 	if (kfifo_is_empty(&rpc->aer_fifo))
 		return IRQ_NONE;
 
-	while (kfifo_get(&rpc->aer_fifo, &e_src))
+	while (kfifo_get(&rpc->aer_fifo, &e_src)) {
+		report_cxl_errors(rpc, &e_src);
 		aer_isr_one_error(rpc, &e_src);
+	}
 	return IRQ_HANDLED;
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2bda4a4e47e8..4f4b3a8f5454 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -827,6 +827,10 @@ enum pci_ers_result {
 
 /* PCI bus error event callbacks */
 struct pci_error_handlers {
+
+	/* CXL error detected on this device */
+	void (*cxl_error_detected)(struct pci_dev *dev);
+
 	/* PCI bus error detected on this device */
 	pci_ers_result_t (*error_detected)(struct pci_dev *dev,
 					   pci_channel_state_t error);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
  2022-10-21 18:56 [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information Terry Bowman
                   ` (4 preceding siblings ...)
  2022-10-21 18:56 ` [PATCH 5/5] cxl/pci: Log CXL device's PCIe AER and CXL RAS error information Terry Bowman
@ 2022-10-21 19:02 ` Terry Bowman
  2022-10-28 12:30 ` Ariel.Sibley
  6 siblings, 0 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-21 19:02 UTC (permalink / raw)
  To: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

s/PATCH/RFC/

On 10/21/22 13:56, Terry Bowman wrote:
> This patchset adds CXL downport PCI AER and CXL RAS logging to the CXL
> error handling. This is necessary for communicating CXL HW issues to users.
> The included patches find and cache pointers to the AER and CXL RAS PCIe
> capability structures. The cached pointers are then used to display the
> error information in a later patch. These changes follow the CXL
> specification, Chapter 8 'Control and Status Registers'.[1]
> 
> The first patch enables CXL1.1 RCD support through the ACPI _OSC support
> method.
> 
> The 2nd and 3rd patches find and map PCIe AER and CXL RAS capabilities.
> 
> The 4th patch enables AER error reporting.
> 
> The 5th patch adds functionality to log the PCIe AER and RAS capabilities. 
> 
> TODO work remains to consolidate the HDM and CXL RAS register mapping
> (patch#3). The current CXL RAS register mapping will be replaced to reuse
> cxl_probe_component_regs() function as David Jiang and Alison Schofield
> upstreamed. Should the same be done for the AER registers (patch#2)? The
> AER registers are not in the component register block but are instead in
> the downport and upport (RCRB).
> 
> TODO work remains to add support for upports in some cases here where
> downport is addressed. For instance, will need another aer_map to support
> upport AER ?
> 
> TODO work to support CXL2.0. Should be trivial since aer_cap and aer_stats
> is member of 'struct pci_dev'.
> 
> Base is from: https://patchwork.kernel.org/project/cxl/list/?series=686272
> 
> [1] - https://www.computeexpresslink.org/spec-landing
> 
> Terry Bowman (5):
>   cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
>   cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
>   cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
>   cxl/pci: Enable RCD dport AER reporting
>   cxl/pci: Log CXL device's PCIe AER and CXL RAS error information
> 
>  drivers/acpi/pci_root.c |   1 +
>  drivers/cxl/acpi.c      |  56 +++++++
>  drivers/cxl/core/regs.c |   1 +
>  drivers/cxl/cxl.h       |  13 ++
>  drivers/cxl/cxlmem.h    |   3 +
>  drivers/cxl/mem.c       |   2 +
>  drivers/cxl/pci.c       | 319 ++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pcie/aer.c  |  45 +++++-
>  include/linux/pci.h     |   4 +
>  9 files changed, 443 insertions(+), 1 deletion(-)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 1/5] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-10-21 18:56 ` [PATCH 1/5] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Terry Bowman
@ 2022-10-21 22:39   ` Dan Williams
  2022-10-25 16:23     ` Terry Bowman
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Williams @ 2022-10-21 22:39 UTC (permalink / raw)
  To: Terry Bowman, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter, linux-acpi

Terry Bowman wrote:
> ACPI includes a CXL _OSC support method to communicate the available
> CXL support to FW. The CXL support _OSC includes a field to indicate
> CXL1.1 RCH RCD support. The OS sets this bit to 1 if it supports access
> to RCD and RCH Port registers.[1] FW can potentially change it's operation
> depending on the _OSC support setting reported by the OS.
> 
> The ACPI driver does not currently set the ACPI _OSC support to indicate
> CXL1.1 RCD RCH support. Change the capability reported to include CXL1.1.
> 
> [1] CXL3.0 Table 9-26 'Interpretation of CXL _OSC Support Field'
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
>  drivers/acpi/pci_root.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index c8385ef54c37..094a59b216ae 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c

Be sure to copy linux-acpi@vger.kernel.org on patches that touch
drivers/acpi/

> @@ -492,6 +492,7 @@ static u32 calculate_cxl_support(void)
>  	u32 support;
>  
>  	support = OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT;
> +	support |= OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT;
>  	if (pci_aer_available())
>  		support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
>  	if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))

This looks good to me though.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
  2022-10-21 18:56 ` [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability Terry Bowman
@ 2022-10-22 21:45   ` Dan Williams
  2022-10-25 16:42     ` Terry Bowman
  2022-10-27 14:52   ` Bjorn Helgaas
  1 sibling, 1 reply; 25+ messages in thread
From: Dan Williams @ 2022-10-22 21:45 UTC (permalink / raw)
  To: Terry Bowman, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

Terry Bowman wrote:
> CXL downport PCIe AER information needs to be logged during error handling.
> The RCD downport/upport does not have a BDF and is not PCI enumerable. As a
> result the CXL PCIe driver is not aware of the AER in 'PCI Express'
> capability located in the RCRB downport/upport. Logic must be introduced to
> use the downport/upport AER information.

Nice, I am happy to see this work get started.

> 
> Update the CXL driver to find the downport's PCIe AER capability and cache
> a pointer for using later. First, find the RCRB to provide the
> downport/upport memory region. The downport/upport are mapped as MMIO not
> PCI config space. Use readl/writel/readq/writeq as required by the CXL spec
> to find and operate on the AER registers.[1]
> 
> Also, add function to detect if the device is a CXL1.1 RCD device.
> 
> [1] CXL3.0, 8.2 'Memory Mapped Registers'
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
>  drivers/cxl/acpi.c      |  56 ++++++++++++++
>  drivers/cxl/core/regs.c |   1 +
>  drivers/cxl/cxl.h       |   9 +++
>  drivers/cxl/cxlmem.h    |   2 +
>  drivers/cxl/mem.c       |   2 +
>  drivers/cxl/pci.c       | 158 ++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 228 insertions(+)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index bf251a27e460..5d543c789e8d 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -232,6 +232,7 @@ struct cxl_chbs_context {
>  	struct device *dev;
>  	unsigned long long uid;
>  	struct acpi_cedt_chbs chbs;
> +	resource_size_t chbcr;
>  };
>  
>  static int cxl_get_chbs(union acpi_subtable_headers *header, void *arg,
> @@ -417,6 +418,61 @@ static void remove_cxl_resources(void *data)
>  	}
>  }
>  
> +static const struct acpi_device_id cxl_host_ids[] = {
> +	{ "ACPI0016", 0 },
> +	{ "PNP0A08", 0 },
> +	{ },
> +};
> +
> +static int __cxl_get_rcrb(union acpi_subtable_headers *header, void *arg,
> +			  const unsigned long end)
> +{
> +	struct cxl_chbs_context *ctx = arg;
> +	struct acpi_cedt_chbs *chbs;
> +
> +	if (ctx->chbcr)
> +		return 0;
> +
> +	chbs = (struct acpi_cedt_chbs *)header;
> +
> +	if (ctx->uid != chbs->uid)
> +		return 0;
> +
> +	if (chbs->cxl_version != ACPI_CEDT_CHBS_VERSION_CXL11)
> +		return 0;
> +
> +	if (chbs->length != SZ_8K)
> +		return 0;
> +
> +	ctx->chbcr = chbs->base;
> +
> +	return 0;
> +}

This seems redundant with component register enumeration that was
already added in Robert's patches.

> +
> +resource_size_t cxl_get_rcrb(struct cxl_memdev *cxlmd)
> +{
> +	struct pci_host_bridge *host = NULL;
> +	struct cxl_chbs_context ctx = {0};
> +	struct cxl_dport *dport;
> +	struct cxl_port *port;
> +
> +	port = cxl_mem_find_port(cxlmd, NULL);
> +	if (!port)
> +		return 0;
> +
> +	dport = port->parent_dport;
> +	ctx.uid = dport ? dport->port_id : 0;
> +	if (!dport)
> +		return 0;
> +
> +	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, __cxl_get_rcrb, &ctx);
> +
> +	dev_dbg(&host->dev, "RCRB found: 0x%08llx\n", (u64)ctx.chbcr);
> +
> +	return ctx.chbcr;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_rcrb, CXL);

While CXL to date has been tied ACPI platforms there is no requirement
that CXL be ACPI based. Given that other coherent bus specifications
that were deployed on PowerPC have now joined the CXL consortium there
is an increasing chance of CXL appearing on an Open Firmware based
platforms. Even if that does not come to pass, the discipline of clear
separation between platform code and PCI/CXL mechanisms leads to cleaner
code organization.

All that to say, no, cxl_acpi cannot export functions for other cxl
modules to depend upon. Instead it needs to publish these details in the
CXL objects that it registers.

In this case my expectation is that cxl_acpi registers a dport decorated
with the extra RCH information. Something like:

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f680450f0b16..b42f4759743b 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -499,12 +499,19 @@ cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev)
  * @port_id: unique hardware identifier for dport in decoder target list
  * @component_reg_phys: downstream port component registers
  * @port: reference to cxl_port that contains this downstream port
+ * @is_rch: enable RCH vs VH enumeration (see CXL 3.0 9.11.8)
  */
 struct cxl_dport {
        struct device *dport;
        int port_id;
        resource_size_t component_reg_phys;
        struct cxl_port *port;
+       bool is_rch;
+};
+
+struct cxl_rch_dport {
+       struct cxl_dport dport;
+       resource_size_t rcrb_phys;
 };
 
 /**

Then, when cxl_mem notices that the memdev is being produced by an
RCIEP, it can skip devm_cxl_enumerate_ports() and jump straight to
cxl_mem_find_port(). That will return this dport with the rcrb base
where cxl_mem can arrange the AER handling. Likely we will need some
notification mechanism to route Root Complex AER events to cxl_acpi,
cxl_pci, and/or cxl_mem to optionally add the CXL RAS data to the log.

> +
>  /**
>   * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
>   * @cxl_res: A standalone resource tree where each CXL window is a sibling
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index ec178e69b18f..0d4f633e5c01 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -184,6 +184,7 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
>  
>  	return ret_val;
>  }
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_iomap_block, CXL);
>  
>  int cxl_map_component_regs(struct pci_dev *pdev,
>  			   struct cxl_component_regs *regs,
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index ac8998b627b5..7d507ab80a78 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -204,6 +204,14 @@ struct cxl_register_map {
>  	};
>  };
>  
> +struct cxl_memdev;
> +int cxl_pci_probe_dport(struct cxl_memdev *cxlmd);
> +
> +void cxl_pci_aer_init(struct cxl_memdev *cxlmd);
> +
> +void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> +				   resource_size_t length);
> +
>  void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>  			      struct cxl_component_reg_map *map);
>  void cxl_probe_device_regs(struct device *dev, void __iomem *base,
> @@ -549,6 +557,7 @@ static inline bool is_cxl_root(struct cxl_port *port)
>  	return port->uport == port->dev.parent;
>  }
>  
> +resource_size_t cxl_get_rcrb(struct cxl_memdev *cxlmd);
>  bool is_cxl_port(struct device *dev);
>  struct cxl_port *to_cxl_port(struct device *dev);
>  struct pci_bus;
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..079db2e15acc 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -242,6 +242,8 @@ struct cxl_dev_state {
>  	u64 next_volatile_bytes;
>  	u64 next_persistent_bytes;
>  
> +	struct cxl_register_map aer_map;
> +
>  	resource_size_t component_reg_phys;
>  	u64 serial;
>  
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 64ccf053d32c..d1e663be43c2 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -74,6 +74,8 @@ static int cxl_mem_probe(struct device *dev)
>  	if (rc)
>  		return rc;
>  
> +	cxl_pci_aer_init(cxlmd);
> +
>  	parent_port = cxl_mem_find_port(cxlmd, &dport);
>  	if (!parent_port) {
>  		dev_err(dev, "CXL port topology not found\n");
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..2287b5225862 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -35,6 +35,15 @@
>  	(readl((cxlds)->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET) &                  \
>  	 CXLDEV_MBOX_CTRL_DOORBELL)
>  
> +/* PCI 5.0 - 7.8.4 Advanced Error Reporting Extended Capability */
> +#define PCI_AER_CAP_SIZE 0x48
> +
> +/* CXL 3.0 - 8.2.1.3.3, Offset to DVSEC Port Status */
> +#define CXL_DVSEC_PORT_STATUS_OFF 0xE
> +
> +/* CXL 3.0 - 8.2.1.3.3 */
> +#define CXL_DVSEC_VH_SUPPORT 0x20
> +
>  /* CXL 2.0 - 8.2.8.4 */
>  #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
>  
> @@ -428,6 +437,155 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
>  	}
>  }
>  
> +static resource_size_t cxl_get_dport_ext_cap(struct cxl_memdev *cxlmd, u32 cap_id)
> +{
> +	resource_size_t rcrb, offset;
> +	void *rcrb_mapped;
> +	u32 cap_hdr;
> +
> +	rcrb = cxl_get_rcrb(cxlmd);
> +	if (!rcrb)
> +		return 0;
> +
> +	rcrb_mapped = ioremap(rcrb, SZ_4K);
> +	if (!rcrb_mapped)
> +		return 0;
> +
> +	offset = PCI_CFG_SPACE_SIZE;
> +	cap_hdr = readl(rcrb_mapped + offset);
> +
> +	while (PCI_EXT_CAP_ID(cap_hdr)) {
> +		if (PCI_EXT_CAP_ID(cap_hdr) == cap_id)
> +			break;
> +
> +		offset = PCI_EXT_CAP_NEXT(cap_hdr);
> +		if (offset == 0)
> +			break;
> +
> +		cap_hdr = readl(rcrb_mapped + offset);
> +	}
> +	iounmap((void *)rcrb_mapped);

The memdev owns the upstream port RAS capability, why is it mapping the
downstream port ras capability?

> +
> +	if (PCI_EXT_CAP_ID(cap_hdr) != cap_id)
> +		return 0;
> +
> +	pr_debug("Found capability %X @ %llX (%X)\n",
> +		 cap_id, rcrb + offset, cap_hdr);
> +
> +	return rcrb + offset;
> +}
> +
> +bool is_rcd(struct cxl_memdev *cxlmd)
> +{
> +	struct pci_dev *pdev;
> +	resource_size_t dvsec;
> +	void *dvsec_mapped;
> +	u32 dvsec_data;
> +
> +	if (!dev_is_pci(cxlmd->cxlds->dev))
> +		return false;

Just use cxlmd->dev.parent, no need to route through cxlds for this.

> +
> +	pdev = to_pci_dev(cxlmd->cxlds->dev);
> +
> +	/*
> +	 * 'CXL devices operating in this mode always set the Device/Port
> +	 * Type field in the PCI Express Capabilities register to RCiEP.'
> +	 * - CXL3.0 9.11.1 'RCD Mode'
> +	 */
> +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> +		return false;
> +
> +	/*
> +	 * Check if VH is enabled
> +	 * - CXL3.0 8.2.1.3.1 'DVSEC Flex Bus Port Capability'
> +	 */
> +	dvsec = cxl_get_dport_ext_cap(cxlmd, PCI_EXT_CAP_ID_DVSEC);
> +	if (!dvsec)
> +		return false;
> +
> +	dvsec_mapped = ioremap(dvsec, SZ_4K);

No ioremap error handling?

> +	dvsec_data = readl(dvsec_mapped + CXL_DVSEC_PORT_STATUS_OFF);
> +	iounmap(dvsec_mapped);
> +	if (dvsec_data & CXL_DVSEC_VH_SUPPORT)
> +		return false;

There's no such thing as a root-complex-integrated endpoint in CXL VH
mode, right? Is this not redundant?

> +
> +	return true;
> +}
> +
> +#define PCI_CAP_ID(header)	(header & 0x000000ff)
> +#define PCI_CAP_NEXT(header)	((header >> 8) & 0xff)
> +
> +static resource_size_t cxl_get_dport_cap(struct cxl_memdev *cxlmd, int cap_id)
> +{
> +	resource_size_t offset, rcrb;
> +	void *rcrb_mapped;
> +	u32 cap_hdr;
> +
> +	rcrb = cxl_get_rcrb(cxlmd);
> +	if (!rcrb)
> +		return 0;
> +
> +	rcrb_mapped = ioremap(rcrb, SZ_4K);
> +	if (!rcrb_mapped)
> +		return 0;
> +
> +	offset = readl(rcrb_mapped + PCI_CAPABILITY_LIST);
> +	cap_hdr = readl(rcrb_mapped + offset);
> +
> +	while (PCI_CAP_ID(cap_hdr)) {
> +		if (PCI_CAP_ID(cap_hdr) == cap_id)
> +			break;
> +
> +		offset = PCI_CAP_NEXT(cap_hdr);
> +		if (offset == 0)
> +			break;
> +
> +		cap_hdr = readl(rcrb_mapped + offset);
> +	}
> +	iounmap((void *)rcrb_mapped);

All of this mapping and unmapping of the RCRB needs to be centralized in
one place and owned by one driver for the downstream portion and one
driver for the upstream portion. That *feels* like cxl_acpi for the
downstream side and cxl_port for the upstream side (when it drives the
endpoint port registered by cxl_mem). It should also be protected by
request_region() to make sure multiple agents are not stepping on each
other's toes.

> +
> +	if (PCI_CAP_ID(cap_hdr) != cap_id)
> +		return 0;
> +
> +	pr_debug("Found capability %X @ %llX (%X)\n",
> +		 cap_id, rcrb + offset, cap_hdr);
> +
> +	return rcrb + offset;
> +}
> +
> +static int cxl_setup_dport_aer(struct cxl_memdev *cxlmd, resource_size_t cap_base)
> +{
> +	struct cxl_register_map *map = &cxlmd->cxlds->aer_map;
> +	struct pci_dev *pdev = to_pci_dev(&cxlmd->dev);
> +
> +	if (!cap_base)
> +		return  -ENODEV;
> +
> +	map->base = devm_cxl_iomap_block(&pdev->dev, cap_base,
> +					 PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1);
> +	if (!map->base)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
> +{
> +	resource_size_t cap_base;
> +
> +	/* CXL2.0 is enumerable and will use AER attached to `struct pci_dev` */
> +	if (!is_rcd(cxlmd))
> +		return;
> +
> +	/*
> +	 * Read base address of the PCI express cap. Cache the cap's
> +	 * PCI_EXP_DEVCTL and PCI_EXP_DEVSTA for AER control and status.
> +	 */
> +	cap_base = cxl_get_dport_cap(cxlmd, PCI_CAP_ID_EXP);
> +	cxl_setup_dport_aer(cxlmd, cap_base);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct cxl_register_map map;
> -- 
> 2.34.1
> 



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
  2022-10-21 18:56 ` [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers Terry Bowman
@ 2022-10-22 22:44   ` Dan Williams
  2022-10-26 19:01     ` Terry Bowman
  2022-10-28 12:53   ` Ariel.Sibley
  1 sibling, 1 reply; 25+ messages in thread
From: Dan Williams @ 2022-10-22 22:44 UTC (permalink / raw)
  To: Terry Bowman, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk, dan.j.williams
  Cc: terry.bowman, linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

Terry Bowman wrote:
> CXL RAS information resides in a RAS capability structure located in
> CXL.cache and CXL.mem registers.[1] The RAS capability provides CXL
> specific error information that can be helpful in debugging. This
> information is not currently logged but needs to be logged during PCIe AER
> error handling.
> 
> Update the CXL driver to find and cache a pointer to the CXL RAS
> capability. The RAS registers resides in the downport's component register
> block. Note:RAS registers are not in the upport. The component registers
> can be found by first using the RCRB to goto the downport. Next, the
> downport's 64-bit BAR[0] will point to the component register block.
> 
> [1] CXL3.0 Spec, '8.2.5 CXL.cache and CXL.mem Registers'
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
>  drivers/cxl/cxl.h    |  4 +++
>  drivers/cxl/cxlmem.h |  1 +
>  drivers/cxl/pci.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 77 insertions(+)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 7d507ab80a78..69b50131ad86 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -36,6 +36,10 @@
>  #define   CXL_CM_CAP_CAP_ID_HDM 0x5
>  #define   CXL_CM_CAP_CAP_HDM_VERSION 1
>  
> +/* CXL 3.0 8.2.4.2 CXL RAS Capability Header */
> +#define CXL_CM_CAP_ID_RAS 0x2
> +#define CXL_CM_CAP_SIZE_RAS 0x5C
> +
>  /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
>  #define CXL_HDM_DECODER_CAP_OFFSET 0x0
>  #define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 079db2e15acc..515273e224ea 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -243,6 +243,7 @@ struct cxl_dev_state {
>  	u64 next_persistent_bytes;
>  
>  	struct cxl_register_map aer_map;
> +	struct cxl_register_map ras_map;
>  
>  	resource_size_t component_reg_phys;
>  	u64 serial;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 2287b5225862..7f717fb47a36 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -586,6 +586,78 @@ void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
>  
> +static resource_size_t cxl_get_dport_ras_base(struct cxl_memdev *cxlmd)
> +{
> +	resource_size_t component_reg_phys, offset = 0;
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	void *cap_hdr_addr, *comp_reg_mapped;
> +	u32 cap_hdr, ras_cap_hdr;
> +	int cap_ndx;
> +
> +	comp_reg_mapped = ioremap(cxlds->component_reg_phys +
> +				  CXL_CM_OFFSET, CXL_COMPONENT_REG_BLOCK_SIZE);
> +	if (!comp_reg_mapped)
> +		return 0;
> +
> +	cap_hdr_addr = comp_reg_mapped;
> +	cap_hdr = readl(cap_hdr_addr);
> +	for (cap_ndx = 0;
> +	     cap_ndx < FIELD_GET(CXL_CM_CAP_HDR_ARRAY_SIZE_MASK, cap_hdr);
> +	     cap_ndx++) {
> +		ras_cap_hdr = readl(cap_hdr_addr + cap_ndx*sizeof(u32));
> +
> +		if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) == CXL_CM_CAP_ID_RAS) {
> +			pr_debug("RAS cap header = %X @ %pa, cap_ndx = %d\n",
> +				 ras_cap_hdr, cap_hdr_addr, cap_ndx);
> +			break;
> +		}
> +	}
> +
> +	offset = CXL_CM_OFFSET + PCI_EXT_CAP_NEXT(ras_cap_hdr);
> +
> +	iounmap(comp_reg_mapped);
> +
> +	if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) != CXL_CM_CAP_ID_RAS)
> +		return 0;
> +
> +	pr_debug("Found RAS capability @ %llX (%X)\n",
> +		 component_reg_phys + offset, *((u32 *)(comp_reg_mapped + offset)));
> +
> +	return component_reg_phys + offset;

For the RAS capability in the cxl_pci device this patch needs to be
reconciled with this effort:

https://lore.kernel.org/linux-cxl/166336972295.3803215.1047199449525031921.stgit@djiang5-desk3.ch.intel.com/

I think we will want RCD and VH RAS capability reporting to happen in
the same place, and that can not be cxl_pci because cxl_pci has no way
to find the RAS registers on its own. It needs the help from cxl_mem to
do the upstream cxl_port associtation first.

Given CXL switches will have their own RAS capabilities to report it
feels like the cxl_port driver is where all of this should be
centralized.


> +}
> +
> +static int cxl_setup_dport_ras(struct cxl_memdev *cxlmd, resource_size_t resource)
> +{
> +	struct cxl_register_map *map = &cxlmd->cxlds->ras_map;
> +	struct pci_dev *pdev = to_pci_dev(&cxlmd->dev);
> +
> +	if (!resource) {
> +		pr_err("%s():%d: RAS resource ptr is NULL\n", __func__, __LINE__);
> +		return  -EINVAL;
> +	}
> +
> +	map->base = devm_cxl_iomap_block(&pdev->dev, resource, CXL_CM_CAP_SIZE_RAS);
> +	if (!map->base)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +void cxl_pci_ras_init(struct cxl_memdev *cxlmd)
> +{
> +	resource_size_t cap;
> +
> +	/*
> +	 * TODO - CXL2.0 will need change to support PCI config space.
> +	 */
> +	if (!is_rcd(cxlmd))
> +		return;
> +
> +	cap = cxl_get_dport_ras_base(cxlmd);
> +	cxl_setup_dport_ras(cxlmd, cap);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_pci_ras_init, CXL);
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct cxl_register_map map;
> -- 
> 2.34.1
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] cxl/pci: Log CXL device's PCIe AER and CXL RAS error information
  2022-10-21 18:56 ` [PATCH 5/5] cxl/pci: Log CXL device's PCIe AER and CXL RAS error information Terry Bowman
@ 2022-10-24 15:14   ` Jonathan Cameron
  2022-10-27 21:30   ` Bjorn Helgaas
  1 sibling, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2022-10-24 15:14 UTC (permalink / raw)
  To: Terry Bowman
  Cc: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams, linux-cxl, linux-kernel, bhelgaas,
	rafael, lenb, dave, rrichter

On Fri, 21 Oct 2022 13:56:15 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> The CXL downport PCIe AER and CXL RAS capability information needs to be
> logged during PCIe AER error handling.
> 
> The existing PCIe AER error handler logs native AER errors but does not
> log upport/downport AER capability residing in the RCRB. The CXL1.1
> RCRB does not have a BDF and is not enunmerable. The existing error handler
> logic does not display CXL RAS details either.

So this is doing two things? 
If so split it into dealing with the non enumerable part, and then a separate
patch to deal with the CXL RAS information. However, I'm only spotting the
CXL RAS handling part in here.

Also, there are numerous drivers that provide additional logging in response
to an AER error. With the exception of providing a way to do that for
corrected errors (as I mentioned in David Jiang's series on CXL 2.0+ RAS handling)
I don't think we should be doing anything down in the PCI layers for this.
We should see an internal AER error and then the CXL driver should handle that,
adding more logging as needed.

> 
> Add a CXL error handler to the existing PCI error handlers. Add a call
> to the CXL error handler within the PCIe AER error handler. Implement the
> driver's CXL callback to log downport PCIe AER and CXL RAS capability
> information.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
>  drivers/cxl/pci.c      | 76 ++++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pcie/aer.c | 45 ++++++++++++++++++++++++-
>  include/linux/pci.h    |  4 +++
>  3 files changed, 124 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 80a01b304efe..dceda9f9fc60 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -7,6 +7,7 @@
>  #include <linux/sizes.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
> +#include <linux/aer.h>
>  #include <linux/pci.h>
>  #include <linux/pci-doe.h>
>  #include <linux/io.h>
> @@ -14,6 +15,9 @@
>  #include "cxlpci.h"
>  #include "cxl.h"
>  
> +extern void cxl_print_aer(struct pci_dev *dev, int aer_severity,
> +			  struct aer_capability_regs *aer);
> +
>  /**
>   * DOC: cxl pci
>   *
> @@ -744,9 +748,80 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>  
> +	dev_set_drvdata(&pdev->dev, cxlmd);
> +
>  	return rc;
>  }
>  
> +struct ras_cap {
> +	u32 uc_error_status;
> +	u32 uc_error_mask;
> +	u32 uc_error_severity;
> +	u32 c_error_status;
> +	u32 c_error_mask;
> +	u32 ctrl;
> +	u32 log[];
> +};
> +
> +/*
> + * Log the state of the CXL downport AER and RAS status registers.
> + */
> +static void cxl_error_report(struct cxl_memdev *cxlmd)
> +{
> +	struct pci_dev *pdev = to_pci_dev(cxlmd->cxlds->dev);
> +	struct aer_capability_regs *aer_cap;
> +	struct ras_cap *ras_cap;
> +
> +	aer_cap = (struct aer_capability_regs *)cxlmd->cxlds->aer_map.base;
> +	ras_cap = (struct ras_cap *)cxlmd->cxlds->ras_map.base;
> +
> +	pci_err(pdev, "CXL Error Report\n");
> +	pci_err(pdev, "AER Errors:\n");
> +	if (aer_cap) {
> +		cxl_print_aer(pdev, AER_CORRECTABLE, aer_cap);
> +		cxl_print_aer(pdev, AER_NONFATAL, aer_cap);
> +		cxl_print_aer(pdev, AER_FATAL, aer_cap);
> +	}
> +
> +	pci_err(pdev, "RAS Errors:\n");
> +	if (ras_cap) {
> +		pci_err(pdev, "RAS: uc_error_status = %X\n", readl(&ras_cap->uc_error_status));
> +		pci_err(pdev, "RAS: uc_error_mask = %X\n", readl(&ras_cap->uc_error_mask));
> +		pci_err(pdev, "RAS: uc_error_severity = %X\n", readl(&ras_cap->uc_error_severity));
> +		pci_err(pdev, "RAS: c_error_status = %X\n", readl(&ras_cap->c_error_status));
> +		pci_err(pdev, "RAS: c_error_mask = %X\n", readl(&ras_cap->c_error_mask));
> +		pci_err(pdev, "RAS: ras_caps->ctrl = %X\n", readl(&ras_cap->ctrl));
> +		pci_err(pdev, "RAS: log = %X\n", readl(&ras_cap->log));
> +	}
> +}
> +
> +static void cxl_error_detected(struct pci_dev *pdev)
> +{
> +	struct cxl_memdev *cxlmd;
> +
> +	if (!is_cxl_memdev(&pdev->dev)) {
> +		pci_err(pdev, "CXL memory device is invalid\n");
> +		return;
> +	}
> +
> +	cxlmd = dev_get_drvdata(&pdev->dev);
> +	if (!cxlmd) {
> +		pci_err(pdev, "CXL memory device is NULL\n");
> +		return;
> +	}
> +
> +	if (!cxlmd->cxlds) {
> +		pci_err(pdev, "CXL device state object is NULL\n");
> +		return;
> +	}
> +
> +	cxl_error_report(cxlmd);
> +}
> +
> +static struct pci_error_handlers cxl_error_handlers = {
> +	.cxl_error_detected = cxl_error_detected,
> +};
> +
>  static const struct pci_device_id cxl_mem_pci_tbl[] = {
>  	/* PCI class code for CXL.mem Type-3 Devices */
>  	{ PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF), ~0)},
> @@ -761,6 +836,7 @@ static struct pci_driver cxl_pci_driver = {
>  	.driver	= {
>  		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
>  	},
> +	.err_handler		= &cxl_error_handlers,
>  };
>  
>  MODULE_LICENSE("GPL v2");
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e2d8a74f83c3..dea04d412406 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -811,6 +811,13 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity,
>  }
>  #endif
>  
> +void cxl_print_aer(struct pci_dev *dev, int aer_severity,
> +		    struct aer_capability_regs *aer)
> +{
> +	cper_print_aer(dev, aer_severity, aer);
> +}
> +EXPORT_SYMBOL_GPL(cxl_print_aer);
> +
>  /**
>   * add_error_device - list device to be handled
>   * @e_info: pointer to error info
> @@ -1169,6 +1176,40 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
>  	}
>  }
>  
> +static int report_cxl_errors_iter(struct pci_dev *pdev, void *data)
> +{
> +	struct pci_driver *pdrv = pdev->driver;
> +
> +	if (pdrv &&
> +	    pdrv->err_handler &&
> +	    pdrv->err_handler->cxl_error_detected)
> +		pdrv->err_handler->cxl_error_detected(pdev);
> +
> +	return 0;
> +}
> +
> +static void report_cxl_errors(struct aer_rpc *rpc,
> +			      struct aer_err_source *e_src)
> +{
> +	struct pci_dev *pdev = rpc->rpd;
> +	struct aer_err_info e_info;
> +	u32 uncor_status, cor_status;
> +
> +	pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &uncor_status);
> +	pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_STATUS, &cor_status);
> +
> +	if (!uncor_status && !cor_status)
> +		return;
> +
> +	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_EC)
> +		pcie_walk_rcec(pdev, report_cxl_errors_iter, &e_info);
> +	else
> +		pci_walk_bus(pdev->subordinate, report_cxl_errors_iter, &e_info);
> +
> +	pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, uncor_status);
> +	pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_STATUS, cor_status);
> +}
> +
>  /**
>   * aer_isr - consume errors detected by root port
>   * @irq: IRQ assigned to Root Port
> @@ -1185,8 +1226,10 @@ static irqreturn_t aer_isr(int irq, void *context)
>  	if (kfifo_is_empty(&rpc->aer_fifo))
>  		return IRQ_NONE;
>  
> -	while (kfifo_get(&rpc->aer_fifo, &e_src))
> +	while (kfifo_get(&rpc->aer_fifo, &e_src)) {
> +		report_cxl_errors(rpc, &e_src);
>  		aer_isr_one_error(rpc, &e_src);
> +	}
>  	return IRQ_HANDLED;
>  }
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2bda4a4e47e8..4f4b3a8f5454 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -827,6 +827,10 @@ enum pci_ers_result {
>  
>  /* PCI bus error event callbacks */
>  struct pci_error_handlers {
> +
> +	/* CXL error detected on this device */
> +	void (*cxl_error_detected)(struct pci_dev *dev);
> +
>  	/* PCI bus error detected on this device */
>  	pci_ers_result_t (*error_detected)(struct pci_dev *dev,
>  					   pci_channel_state_t error);


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-10-21 22:39   ` Dan Williams
@ 2022-10-25 16:23     ` Terry Bowman
  0 siblings, 0 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-25 16:23 UTC (permalink / raw)
  To: Dan Williams, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter, linux-acpi



On 10/21/22 17:39, Dan Williams wrote:
> Terry Bowman wrote:
>> ACPI includes a CXL _OSC support method to communicate the available
>> CXL support to FW. The CXL support _OSC includes a field to indicate
>> CXL1.1 RCH RCD support. The OS sets this bit to 1 if it supports access
>> to RCD and RCH Port registers.[1] FW can potentially change it's operation
>> depending on the _OSC support setting reported by the OS.
>>
>> The ACPI driver does not currently set the ACPI _OSC support to indicate
>> CXL1.1 RCD RCH support. Change the capability reported to include CXL1.1.
>>
>> [1] CXL3.0 Table 9-26 'Interpretation of CXL _OSC Support Field'
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> ---
>>  drivers/acpi/pci_root.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index c8385ef54c37..094a59b216ae 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
> 
> Be sure to copy linux-acpi@vger.kernel.org on patches that touch
> drivers/acpi/
> 

Ok, I will.

Regards,
Terry

>> @@ -492,6 +492,7 @@ static u32 calculate_cxl_support(void)
>>  	u32 support;
>>  
>>  	support = OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT;
>> +	support |= OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT;
>>  	if (pci_aer_available())
>>  		support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
>>  	if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> 
> This looks good to me though.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
  2022-10-22 21:45   ` Dan Williams
@ 2022-10-25 16:42     ` Terry Bowman
  2022-10-25 18:21       ` Dan Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Terry Bowman @ 2022-10-25 16:42 UTC (permalink / raw)
  To: Dan Williams, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter



On 10/22/22 16:45, Dan Williams wrote:
> Terry Bowman wrote:
>> CXL downport PCIe AER information needs to be logged during error handling.
>> The RCD downport/upport does not have a BDF and is not PCI enumerable. As a
>> result the CXL PCIe driver is not aware of the AER in 'PCI Express'
>> capability located in the RCRB downport/upport. Logic must be introduced to
>> use the downport/upport AER information.
> 
> Nice, I am happy to see this work get started.
> 
>>
>> Update the CXL driver to find the downport's PCIe AER capability and cache
>> a pointer for using later. First, find the RCRB to provide the
>> downport/upport memory region. The downport/upport are mapped as MMIO not
>> PCI config space. Use readl/writel/readq/writeq as required by the CXL spec
>> to find and operate on the AER registers.[1]
>>
>> Also, add function to detect if the device is a CXL1.1 RCD device.
>>
>> [1] CXL3.0, 8.2 'Memory Mapped Registers'
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> ---
>>  drivers/cxl/acpi.c      |  56 ++++++++++++++
>>  drivers/cxl/core/regs.c |   1 +
>>  drivers/cxl/cxl.h       |   9 +++
>>  drivers/cxl/cxlmem.h    |   2 +
>>  drivers/cxl/mem.c       |   2 +
>>  drivers/cxl/pci.c       | 158 ++++++++++++++++++++++++++++++++++++++++
>>  6 files changed, 228 insertions(+)
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index bf251a27e460..5d543c789e8d 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -232,6 +232,7 @@ struct cxl_chbs_context {
>>  	struct device *dev;
>>  	unsigned long long uid;
>>  	struct acpi_cedt_chbs chbs;
>> +	resource_size_t chbcr;
>>  };
>>  
>>  static int cxl_get_chbs(union acpi_subtable_headers *header, void *arg,
>> @@ -417,6 +418,61 @@ static void remove_cxl_resources(void *data)
>>  	}
>>  }
>>  
>> +static const struct acpi_device_id cxl_host_ids[] = {
>> +	{ "ACPI0016", 0 },
>> +	{ "PNP0A08", 0 },
>> +	{ },
>> +};
>> +
>> +static int __cxl_get_rcrb(union acpi_subtable_headers *header, void *arg,
>> +			  const unsigned long end)
>> +{
>> +	struct cxl_chbs_context *ctx = arg;
>> +	struct acpi_cedt_chbs *chbs;
>> +
>> +	if (ctx->chbcr)
>> +		return 0;
>> +
>> +	chbs = (struct acpi_cedt_chbs *)header;
>> +
>> +	if (ctx->uid != chbs->uid)
>> +		return 0;
>> +
>> +	if (chbs->cxl_version != ACPI_CEDT_CHBS_VERSION_CXL11)
>> +		return 0;
>> +
>> +	if (chbs->length != SZ_8K)
>> +		return 0;
>> +
>> +	ctx->chbcr = chbs->base;
>> +
>> +	return 0;
>> +}
> 
> This seems redundant with component register enumeration that was
> already added in Robert's patches.
> 

Noted. Plese see my next response below.

>> +
>> +resource_size_t cxl_get_rcrb(struct cxl_memdev *cxlmd)
>> +{
>> +	struct pci_host_bridge *host = NULL;
>> +	struct cxl_chbs_context ctx = {0};
>> +	struct cxl_dport *dport;
>> +	struct cxl_port *port;
>> +
>> +	port = cxl_mem_find_port(cxlmd, NULL);
>> +	if (!port)
>> +		return 0;
>> +
>> +	dport = port->parent_dport;
>> +	ctx.uid = dport ? dport->port_id : 0;
>> +	if (!dport)
>> +		return 0;
>> +
>> +	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, __cxl_get_rcrb, &ctx);
>> +
>> +	dev_dbg(&host->dev, "RCRB found: 0x%08llx\n", (u64)ctx.chbcr);
>> +
>> +	return ctx.chbcr;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_rcrb, CXL);
> 
> While CXL to date has been tied ACPI platforms there is no requirement
> that CXL be ACPI based. Given that other coherent bus specifications
> that were deployed on PowerPC have now joined the CXL consortium there
> is an increasing chance of CXL appearing on an Open Firmware based
> platforms. Even if that does not come to pass, the discipline of clear
> separation between platform code and PCI/CXL mechanisms leads to cleaner
> code organization.
> 
> All that to say, no, cxl_acpi cannot export functions for other cxl
> modules to depend upon. Instead it needs to publish these details in the
> CXL objects that it registers.
> 

Ok. Ill rework such that ACPI functions are not exported. Adding  the recommended 
caching 'rcrb_phys' will help recfactoring out the exported function. I'll
cache the RCRB to 'rcrb_phys' during enumeration instead of calling a helper 
function for every query.

> In this case my expectation is that cxl_acpi registers a dport decorated
> with the extra RCH information. Something like:
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f680450f0b16..b42f4759743b 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -499,12 +499,19 @@ cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev)
>   * @port_id: unique hardware identifier for dport in decoder target list
>   * @component_reg_phys: downstream port component registers
>   * @port: reference to cxl_port that contains this downstream port
> + * @is_rch: enable RCH vs VH enumeration (see CXL 3.0 9.11.8)
>   */
>  struct cxl_dport {
>         struct device *dport;
>         int port_id;
>         resource_size_t component_reg_phys;
>         struct cxl_port *port;
> +       bool is_rch;
> +};
> +
> +struct cxl_rch_dport {
> +       struct cxl_dport dport;
> +       resource_size_t rcrb_phys;
>  };
>  

The same is needed for uport as well, correct ?

>  /**
> 
> Then, when cxl_mem notices that the memdev is being produced by an
> RCIEP, it can skip devm_cxl_enumerate_ports() and jump straight to
> cxl_mem_find_port(). That will return this dport with the rcrb base
> where cxl_mem can arrange the AER handling. Likely we will need some
> notification mechanism to route Root Complex AER events to cxl_acpi,
> cxl_pci, and/or cxl_mem to optionally add the CXL RAS data to the log.
> 
Isn't the notification mechanism through the AER interrupt processing? 
I will have more related comments in patch 3/5.

>> +
>>  /**
>>   * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
>>   * @cxl_res: A standalone resource tree where each CXL window is a sibling
>> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
>> index ec178e69b18f..0d4f633e5c01 100644
>> --- a/drivers/cxl/core/regs.c
>> +++ b/drivers/cxl/core/regs.c
>> @@ -184,6 +184,7 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
>>  
>>  	return ret_val;
>>  }
>> +EXPORT_SYMBOL_NS_GPL(devm_cxl_iomap_block, CXL);
>>  
>>  int cxl_map_component_regs(struct pci_dev *pdev,
>>  			   struct cxl_component_regs *regs,
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index ac8998b627b5..7d507ab80a78 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -204,6 +204,14 @@ struct cxl_register_map {
>>  	};
>>  };
>>  
>> +struct cxl_memdev;
>> +int cxl_pci_probe_dport(struct cxl_memdev *cxlmd);
>> +
>> +void cxl_pci_aer_init(struct cxl_memdev *cxlmd);
>> +
>> +void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
>> +				   resource_size_t length);
>> +
>>  void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>>  			      struct cxl_component_reg_map *map);
>>  void cxl_probe_device_regs(struct device *dev, void __iomem *base,
>> @@ -549,6 +557,7 @@ static inline bool is_cxl_root(struct cxl_port *port)
>>  	return port->uport == port->dev.parent;
>>  }
>>  
>> +resource_size_t cxl_get_rcrb(struct cxl_memdev *cxlmd);
>>  bool is_cxl_port(struct device *dev);
>>  struct cxl_port *to_cxl_port(struct device *dev);
>>  struct pci_bus;
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index 88e3a8e54b6a..079db2e15acc 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -242,6 +242,8 @@ struct cxl_dev_state {
>>  	u64 next_volatile_bytes;
>>  	u64 next_persistent_bytes;
>>  
>> +	struct cxl_register_map aer_map;
>> +
>>  	resource_size_t component_reg_phys;
>>  	u64 serial;
>>  
>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
>> index 64ccf053d32c..d1e663be43c2 100644
>> --- a/drivers/cxl/mem.c
>> +++ b/drivers/cxl/mem.c
>> @@ -74,6 +74,8 @@ static int cxl_mem_probe(struct device *dev)
>>  	if (rc)
>>  		return rc;
>>  
>> +	cxl_pci_aer_init(cxlmd);
>> +
>>  	parent_port = cxl_mem_find_port(cxlmd, &dport);
>>  	if (!parent_port) {
>>  		dev_err(dev, "CXL port topology not found\n");
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index faeb5d9d7a7a..2287b5225862 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -35,6 +35,15 @@
>>  	(readl((cxlds)->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET) &                  \
>>  	 CXLDEV_MBOX_CTRL_DOORBELL)
>>  
>> +/* PCI 5.0 - 7.8.4 Advanced Error Reporting Extended Capability */
>> +#define PCI_AER_CAP_SIZE 0x48
>> +
>> +/* CXL 3.0 - 8.2.1.3.3, Offset to DVSEC Port Status */
>> +#define CXL_DVSEC_PORT_STATUS_OFF 0xE
>> +
>> +/* CXL 3.0 - 8.2.1.3.3 */
>> +#define CXL_DVSEC_VH_SUPPORT 0x20
>> +
>>  /* CXL 2.0 - 8.2.8.4 */
>>  #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
>>  
>> @@ -428,6 +437,155 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
>>  	}
>>  }
>>  
>> +static resource_size_t cxl_get_dport_ext_cap(struct cxl_memdev *cxlmd, u32 cap_id)
>> +{
>> +	resource_size_t rcrb, offset;
>> +	void *rcrb_mapped;
>> +	u32 cap_hdr;
>> +
>> +	rcrb = cxl_get_rcrb(cxlmd);
>> +	if (!rcrb)
>> +		return 0;
>> +
>> +	rcrb_mapped = ioremap(rcrb, SZ_4K);
>> +	if (!rcrb_mapped)
>> +		return 0;
>> +
>> +	offset = PCI_CFG_SPACE_SIZE;
>> +	cap_hdr = readl(rcrb_mapped + offset);
>> +
>> +	while (PCI_EXT_CAP_ID(cap_hdr)) {
>> +		if (PCI_EXT_CAP_ID(cap_hdr) == cap_id)
>> +			break;
>> +
>> +		offset = PCI_EXT_CAP_NEXT(cap_hdr);
>> +		if (offset == 0)
>> +			break;
>> +
>> +		cap_hdr = readl(rcrb_mapped + offset);
>> +	}
>> +	iounmap((void *)rcrb_mapped);
> 
> The memdev owns the upstream port RAS capability, why is it mapping the
> downstream port ras capability?
> 

You're right, this needs to be changed to read the upport's extended 
capabilities. 

>> +
>> +	if (PCI_EXT_CAP_ID(cap_hdr) != cap_id)
>> +		return 0;
>> +
>> +	pr_debug("Found capability %X @ %llX (%X)\n",
>> +		 cap_id, rcrb + offset, cap_hdr);
>> +
>> +	return rcrb + offset;
>> +}
>> +
>> +bool is_rcd(struct cxl_memdev *cxlmd)
>> +{
>> +	struct pci_dev *pdev;
>> +	resource_size_t dvsec;
>> +	void *dvsec_mapped;
>> +	u32 dvsec_data;
>> +
>> +	if (!dev_is_pci(cxlmd->cxlds->dev))
>> +		return false;
> 
> Just use cxlmd->dev.parent, no need to route through cxlds for this.
> 

Ok

>> +
>> +	pdev = to_pci_dev(cxlmd->cxlds->dev);
>> +
>> +	/*
>> +	 * 'CXL devices operating in this mode always set the Device/Port
>> +	 * Type field in the PCI Express Capabilities register to RCiEP.'
>> +	 * - CXL3.0 9.11.1 'RCD Mode'
>> +	 */
>> +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
>> +		return false;
>> +
>> +	/*
>> +	 * Check if VH is enabled
>> +	 * - CXL3.0 8.2.1.3.1 'DVSEC Flex Bus Port Capability'
>> +	 */
>> +	dvsec = cxl_get_dport_ext_cap(cxlmd, PCI_EXT_CAP_ID_DVSEC);
>> +	if (!dvsec)
>> +		return false;
>> +
>> +	dvsec_mapped = ioremap(dvsec, SZ_4K);
> 
> No ioremap error handling?
> 

I'll add.

>> +	dvsec_data = readl(dvsec_mapped + CXL_DVSEC_PORT_STATUS_OFF);
>> +	iounmap(dvsec_mapped);
>> +	if (dvsec_data & CXL_DVSEC_VH_SUPPORT)
>> +		return false;
> 
> There's no such thing as a root-complex-integrated endpoint in CXL VH
> mode, right? Is this not redundant?
> 

Youre correct.

'CXL Root Ports may be directly connected to a CXL device that is not an eRCD, or a CXL
Switch. These Root Ports spawn a CXL Virtual Hierarchy (VH). Enumeration within a
CXL VH is described below. These CXL devices appear as a standard PCIe Endpoints with a Type 0 Header.'
-cxl3.0 9.12.2 'CXL Virtual Hierarchy'

I will remove the DVSEC check.

>> +
>> +	return true;
>> +}
>> +
>> +#define PCI_CAP_ID(header)	(header & 0x000000ff)
>> +#define PCI_CAP_NEXT(header)	((header >> 8) & 0xff)
>> +
>> +static resource_size_t cxl_get_dport_cap(struct cxl_memdev *cxlmd, int cap_id)
>> +{
>> +	resource_size_t offset, rcrb;
>> +	void *rcrb_mapped;
>> +	u32 cap_hdr;
>> +
>> +	rcrb = cxl_get_rcrb(cxlmd);
>> +	if (!rcrb)
>> +		return 0;
>> +
>> +	rcrb_mapped = ioremap(rcrb, SZ_4K);
>> +	if (!rcrb_mapped)
>> +		return 0;
>> +
>> +	offset = readl(rcrb_mapped + PCI_CAPABILITY_LIST);
>> +	cap_hdr = readl(rcrb_mapped + offset);
>> +
>> +	while (PCI_CAP_ID(cap_hdr)) {
>> +		if (PCI_CAP_ID(cap_hdr) == cap_id)
>> +			break;
>> +
>> +		offset = PCI_CAP_NEXT(cap_hdr);
>> +		if (offset == 0)
>> +			break;
>> +
>> +		cap_hdr = readl(rcrb_mapped + offset);
>> +	}
>> +	iounmap((void *)rcrb_mapped);
> 
> All of this mapping and unmapping of the RCRB needs to be centralized in
> one place and owned by one driver for the downstream portion and one
> driver for the upstream portion. That *feels* like cxl_acpi for the
> downstream side and cxl_port for the upstream side (when it drives the
> endpoint port registered by cxl_mem). It should also be protected by
> request_region() to make sure multiple agents are not stepping on each
> other's toes.
> 

Ok. I'll look into this and make the change.

Thank you for this review.

Regards,
Terry Bowman

>> +
>> +	if (PCI_CAP_ID(cap_hdr) != cap_id)
>> +		return 0;
>> +
>> +	pr_debug("Found capability %X @ %llX (%X)\n",
>> +		 cap_id, rcrb + offset, cap_hdr);
>> +
>> +	return rcrb + offset;
>> +}
>> +
>> +static int cxl_setup_dport_aer(struct cxl_memdev *cxlmd, resource_size_t cap_base)
>> +{
>> +	struct cxl_register_map *map = &cxlmd->cxlds->aer_map;
>> +	struct pci_dev *pdev = to_pci_dev(&cxlmd->dev);
>> +
>> +	if (!cap_base)
>> +		return  -ENODEV;
>> +
>> +	map->base = devm_cxl_iomap_block(&pdev->dev, cap_base,
>> +					 PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1);
>> +	if (!map->base)
>> +		return -ENOMEM;
>> +
>> +	return 0;
>> +}
>> +
>> +void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
>> +{
>> +	resource_size_t cap_base;
>> +
>> +	/* CXL2.0 is enumerable and will use AER attached to `struct pci_dev` */
>> +	if (!is_rcd(cxlmd))
>> +		return;
>> +
>> +	/*
>> +	 * Read base address of the PCI express cap. Cache the cap's
>> +	 * PCI_EXP_DEVCTL and PCI_EXP_DEVSTA for AER control and status.
>> +	 */
>> +	cap_base = cxl_get_dport_cap(cxlmd, PCI_CAP_ID_EXP);
>> +	cxl_setup_dport_aer(cxlmd, cap_base);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
>> +
>>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>  {
>>  	struct cxl_register_map map;
>> -- 
>> 2.34.1
>>
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
  2022-10-25 16:42     ` Terry Bowman
@ 2022-10-25 18:21       ` Dan Williams
  0 siblings, 0 replies; 25+ messages in thread
From: Dan Williams @ 2022-10-25 18:21 UTC (permalink / raw)
  To: Terry Bowman, Dan Williams, alison.schofield, vishal.l.verma,
	dave.jiang, ira.weiny, bwidawsk
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

Terry Bowman wrote:
[..]
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index f680450f0b16..b42f4759743b 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -499,12 +499,19 @@ cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev)
> >   * @port_id: unique hardware identifier for dport in decoder target list
> >   * @component_reg_phys: downstream port component registers
> >   * @port: reference to cxl_port that contains this downstream port
> > + * @is_rch: enable RCH vs VH enumeration (see CXL 3.0 9.11.8)
> >   */
> >  struct cxl_dport {
> >         struct device *dport;
> >         int port_id;
> >         resource_size_t component_reg_phys;
> >         struct cxl_port *port;
> > +       bool is_rch;
> > +};
> > +
> > +struct cxl_rch_dport {
> > +       struct cxl_dport dport;
> > +       resource_size_t rcrb_phys;
> >  };
> >  
> 
> The same is needed for uport as well, correct ?

Hmm, I don't think there are any 'struct cxl_port' instances that need
an RCH flag in the Linux CXL topology model.  That was the feedback /
realization that Dave and I came to while reviewing Robert's RCH series.
In the RCH case the ACPI0016 host-bridge houses a
root-complex-integrated endpoint that is a peer of the
downstream-PCI-root-ports in the bridge. The topology ends up looking
like this:

# cxl list -BEMPTu
{
  "bus":"root0",
  "provider":"ACPI.CXL",
  "nr_dports":1,
  "dports":[
    {
      "dport":"pci0000:38",
      "id":"0x31"
    }
  ],
  "endpoints:root0":[
    {
      "endpoint":"endpoint1",
      "host":"mem0",
      "depth":1,
      "memdev":{
        "memdev":"mem0",
        "pmem_size":0,
        "ram_size":"16.00 GiB (17.18 GB)",
        "serial":"0",
        "numa_node":0,
        "host":"0000:38:00.0"
      }
    }
  ]
}

Notice that the logical ACPI0017 (CXL root) device lists the ACPI0016
device (associated with pci0000:38) as a 'struct cxl_dport'. The
endpoint is then connected to that dport. There are no intervening
'struct cxl_port' instances between the root0 and endpoint1 like there
would be in the CXL VH case.

Now, the problem is that there is no viable object that can drive access
to the upstream port component registers. It may be the case that when
the cxl_port driver attaches to an endpoint port that the endpoint port
driver maps both the upstream and downstream hardware-port registers for
the purposes of conveying RAS information.

> >  /**
> > 
> > Then, when cxl_mem notices that the memdev is being produced by an
> > RCIEP, it can skip devm_cxl_enumerate_ports() and jump straight to
> > cxl_mem_find_port(). That will return this dport with the rcrb base
> > where cxl_mem can arrange the AER handling. Likely we will need some
> > notification mechanism to route Root Complex AER events to cxl_acpi,
> > cxl_pci, and/or cxl_mem to optionally add the CXL RAS data to the log.
> > 
> Isn't the notification mechanism through the AER interrupt processing? 
> I will have more related comments in patch 3/5.

In this case I was talking about notifying the cxl_mem driver and/or the
cxl_port driver that it needs to decorate an AER event with more
object-local information.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
  2022-10-22 22:44   ` Dan Williams
@ 2022-10-26 19:01     ` Terry Bowman
  2022-10-27 20:32       ` Dan Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Terry Bowman @ 2022-10-26 19:01 UTC (permalink / raw)
  To: Dan Williams, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter



On 10/22/22 17:44, Dan Williams wrote:
> Terry Bowman wrote:
>> CXL RAS information resides in a RAS capability structure located in
>> CXL.cache and CXL.mem registers.[1] The RAS capability provides CXL
>> specific error information that can be helpful in debugging. This
>> information is not currently logged but needs to be logged during PCIe AER
>> error handling.
>>
>> Update the CXL driver to find and cache a pointer to the CXL RAS
>> capability. The RAS registers resides in the downport's component register
>> block. Note:RAS registers are not in the upport. The component registers
>> can be found by first using the RCRB to goto the downport. Next, the
>> downport's 64-bit BAR[0] will point to the component register block.
>>
>> [1] CXL3.0 Spec, '8.2.5 CXL.cache and CXL.mem Registers'
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> ---
>>  drivers/cxl/cxl.h    |  4 +++
>>  drivers/cxl/cxlmem.h |  1 +
>>  drivers/cxl/pci.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 77 insertions(+)
>>
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 7d507ab80a78..69b50131ad86 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -36,6 +36,10 @@
>>  #define   CXL_CM_CAP_CAP_ID_HDM 0x5
>>  #define   CXL_CM_CAP_CAP_HDM_VERSION 1
>>  
>> +/* CXL 3.0 8.2.4.2 CXL RAS Capability Header */
>> +#define CXL_CM_CAP_ID_RAS 0x2
>> +#define CXL_CM_CAP_SIZE_RAS 0x5C
>> +
>>  /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
>>  #define CXL_HDM_DECODER_CAP_OFFSET 0x0
>>  #define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index 079db2e15acc..515273e224ea 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -243,6 +243,7 @@ struct cxl_dev_state {
>>  	u64 next_persistent_bytes;
>>  
>>  	struct cxl_register_map aer_map;
>> +	struct cxl_register_map ras_map;
>>  
>>  	resource_size_t component_reg_phys;
>>  	u64 serial;
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 2287b5225862..7f717fb47a36 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -586,6 +586,78 @@ void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
>>  
>> +static resource_size_t cxl_get_dport_ras_base(struct cxl_memdev *cxlmd)
>> +{
>> +	resource_size_t component_reg_phys, offset = 0;
>> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>> +	void *cap_hdr_addr, *comp_reg_mapped;
>> +	u32 cap_hdr, ras_cap_hdr;
>> +	int cap_ndx;
>> +
>> +	comp_reg_mapped = ioremap(cxlds->component_reg_phys +
>> +				  CXL_CM_OFFSET, CXL_COMPONENT_REG_BLOCK_SIZE);
>> +	if (!comp_reg_mapped)
>> +		return 0;
>> +
>> +	cap_hdr_addr = comp_reg_mapped;
>> +	cap_hdr = readl(cap_hdr_addr);
>> +	for (cap_ndx = 0;
>> +	     cap_ndx < FIELD_GET(CXL_CM_CAP_HDR_ARRAY_SIZE_MASK, cap_hdr);
>> +	     cap_ndx++) {
>> +		ras_cap_hdr = readl(cap_hdr_addr + cap_ndx*sizeof(u32));
>> +
>> +		if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) == CXL_CM_CAP_ID_RAS) {
>> +			pr_debug("RAS cap header = %X @ %pa, cap_ndx = %d\n",
>> +				 ras_cap_hdr, cap_hdr_addr, cap_ndx);
>> +			break;
>> +		}
>> +	}
>> +
>> +	offset = CXL_CM_OFFSET + PCI_EXT_CAP_NEXT(ras_cap_hdr);
>> +
>> +	iounmap(comp_reg_mapped);
>> +
>> +	if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) != CXL_CM_CAP_ID_RAS)
>> +		return 0;
>> +
>> +	pr_debug("Found RAS capability @ %llX (%X)\n",
>> +		 component_reg_phys + offset, *((u32 *)(comp_reg_mapped + offset)));
>> +
>> +	return component_reg_phys + offset;
> 
> For the RAS capability in the cxl_pci device this patch needs to be
> reconciled with this effort:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-cxl%2F166336972295.3803215.1047199449525031921.stgit%40djiang5-desk3.ch.intel.com%2F&amp;data=05%7C01%7Cterry.bowman%40amd.com%7C33092f731a854d7a9a5b08dab47f1075%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638020755070581692%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=84oOACCwh4XtndFKOUV%2FltpJjFnp4lsFQgw75UMphHQ%3D&amp;reserved=0
> 
> I think we will want RCD and VH RAS capability reporting to happen in
> the same place, and that can not be cxl_pci because cxl_pci has no way
> to find the RAS registers on its own. It needs the help from cxl_mem to
> do the upstream cxl_port associtation first.
> 
> Given CXL switches will have their own RAS capabilities to report it
> feels like the cxl_port driver is where all of this should be
> centralized.
> 
> 

I'm working on merging the patchsets now.

I'm merging the following:
  Dave Jiang's onto 6.1.0-rc1+, provides RAS mapping.
  Roberts series ontop of Dave's, provides RCD discovery.
  And this patchset ontop of Robert's, provides AER and RAS logging

Regards,
Terry

>> +}
>> +
>> +static int cxl_setup_dport_ras(struct cxl_memdev *cxlmd, resource_size_t resource)
>> +{
>> +	struct cxl_register_map *map = &cxlmd->cxlds->ras_map;
>> +	struct pci_dev *pdev = to_pci_dev(&cxlmd->dev);
>> +
>> +	if (!resource) {
>> +		pr_err("%s():%d: RAS resource ptr is NULL\n", __func__, __LINE__);
>> +		return  -EINVAL;
>> +	}
>> +
>> +	map->base = devm_cxl_iomap_block(&pdev->dev, resource, CXL_CM_CAP_SIZE_RAS);
>> +	if (!map->base)
>> +		return -ENOMEM;
>> +
>> +	return 0;
>> +}
>> +
>> +void cxl_pci_ras_init(struct cxl_memdev *cxlmd)
>> +{
>> +	resource_size_t cap;
>> +
>> +	/*
>> +	 * TODO - CXL2.0 will need change to support PCI config space.
>> +	 */
>> +	if (!is_rcd(cxlmd))
>> +		return;
>> +
>> +	cap = cxl_get_dport_ras_base(cxlmd);
>> +	cxl_setup_dport_ras(cxlmd, cap);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_pci_ras_init, CXL);
>> +
>>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>  {
>>  	struct cxl_register_map map;
>> -- 
>> 2.34.1
>>
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
  2022-10-21 18:56 ` [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability Terry Bowman
  2022-10-22 21:45   ` Dan Williams
@ 2022-10-27 14:52   ` Bjorn Helgaas
  2022-10-28 14:38     ` Terry Bowman
  1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2022-10-27 14:52 UTC (permalink / raw)
  To: Terry Bowman
  Cc: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams, linux-cxl, linux-kernel, bhelgaas,
	rafael, lenb, Jonathan.Cameron, dave, rrichter

On Fri, Oct 21, 2022 at 01:56:12PM -0500, Terry Bowman wrote:
> CXL downport PCIe AER information needs to be logged during error handling.
> The RCD downport/upport does not have a BDF and is not PCI enumerable. As a
> result the CXL PCIe driver is not aware of the AER in 'PCI Express'
> capability located in the RCRB downport/upport. Logic must be introduced to
> use the downport/upport AER information.

I assume "downport" is the same as "dport" in "cxl_dport" and means
"Downstream Port".  Might be nice to reduce the number of variations
if feasible.

> +static resource_size_t cxl_get_dport_cap(struct cxl_memdev *cxlmd, int cap_id)
> +{
> +	resource_size_t offset, rcrb;
> +	void *rcrb_mapped;
> +	u32 cap_hdr;
> +
> +	rcrb = cxl_get_rcrb(cxlmd);
> +	if (!rcrb)
> +		return 0;
> +
> +	rcrb_mapped = ioremap(rcrb, SZ_4K);
> +	if (!rcrb_mapped)
> +		return 0;
> +
> +	offset = readl(rcrb_mapped + PCI_CAPABILITY_LIST);
> +	cap_hdr = readl(rcrb_mapped + offset);
> +
> +	while (PCI_CAP_ID(cap_hdr)) {
> +		if (PCI_CAP_ID(cap_hdr) == cap_id)
> +			break;
> +
> +		offset = PCI_CAP_NEXT(cap_hdr);
> +		if (offset == 0)
> +			break;
> +
> +		cap_hdr = readl(rcrb_mapped + offset);
> +	}
> +	iounmap((void *)rcrb_mapped);
> +
> +	if (PCI_CAP_ID(cap_hdr) != cap_id)
> +		return 0;
> +
> +	pr_debug("Found capability %X @ %llX (%X)\n",
> +		 cap_id, rcrb + offset, cap_hdr);

Would be nice to use dev_dbg() if possible here.

Is "%X" (upper-case hex) the convention in CXL?  Most places in Linux
seem to use "%x".  Also consider "%#x" (or "%#X") so it's obvious
these are hex.

> +void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
> +{
> +	resource_size_t cap_base;
> +
> +	/* CXL2.0 is enumerable and will use AER attached to `struct pci_dev` */
> +	if (!is_rcd(cxlmd))
> +		return;
> +
> +	/*
> +	 * Read base address of the PCI express cap. Cache the cap's
> +	 * PCI_EXP_DEVCTL and PCI_EXP_DEVSTA for AER control and status.
> +	 */
> +	cap_base = cxl_get_dport_cap(cxlmd, PCI_CAP_ID_EXP);
> +	cxl_setup_dport_aer(cxlmd, cap_base);

I don't see anything about PCI_EXP_DEVCTL and PCI_EXP_DEVSTA in
cxl_get_dport_cap() or cxl_setup_dport_aer().  And I don't see any
caching, except for setting map->base in cxl_setup_dport_aer().

Caching those registers, especially PCI_EXP_DEVSTA, doesn't seem like
it would make much sense anyway since bits there are set by hardware
when things happen.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
  2022-10-26 19:01     ` Terry Bowman
@ 2022-10-27 20:32       ` Dan Williams
  2022-10-31 16:17         ` Terry Bowman
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Williams @ 2022-10-27 20:32 UTC (permalink / raw)
  To: Terry Bowman, Dan Williams, alison.schofield, vishal.l.verma,
	dave.jiang, ira.weiny, bwidawsk
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

Terry Bowman wrote:
> 
> 
> On 10/22/22 17:44, Dan Williams wrote:
> > Terry Bowman wrote:
> >> CXL RAS information resides in a RAS capability structure located in
> >> CXL.cache and CXL.mem registers.[1] The RAS capability provides CXL
> >> specific error information that can be helpful in debugging. This
> >> information is not currently logged but needs to be logged during PCIe AER
> >> error handling.
> >>
> >> Update the CXL driver to find and cache a pointer to the CXL RAS
> >> capability. The RAS registers resides in the downport's component register
> >> block. Note:RAS registers are not in the upport. The component registers
> >> can be found by first using the RCRB to goto the downport. Next, the
> >> downport's 64-bit BAR[0] will point to the component register block.
> >>
> >> [1] CXL3.0 Spec, '8.2.5 CXL.cache and CXL.mem Registers'
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> >> ---
> >>  drivers/cxl/cxl.h    |  4 +++
> >>  drivers/cxl/cxlmem.h |  1 +
> >>  drivers/cxl/pci.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 77 insertions(+)
> >>
> >> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> >> index 7d507ab80a78..69b50131ad86 100644
> >> --- a/drivers/cxl/cxl.h
> >> +++ b/drivers/cxl/cxl.h
> >> @@ -36,6 +36,10 @@
> >>  #define   CXL_CM_CAP_CAP_ID_HDM 0x5
> >>  #define   CXL_CM_CAP_CAP_HDM_VERSION 1
> >>  
> >> +/* CXL 3.0 8.2.4.2 CXL RAS Capability Header */
> >> +#define CXL_CM_CAP_ID_RAS 0x2
> >> +#define CXL_CM_CAP_SIZE_RAS 0x5C
> >> +
> >>  /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
> >>  #define CXL_HDM_DECODER_CAP_OFFSET 0x0
> >>  #define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
> >> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> >> index 079db2e15acc..515273e224ea 100644
> >> --- a/drivers/cxl/cxlmem.h
> >> +++ b/drivers/cxl/cxlmem.h
> >> @@ -243,6 +243,7 @@ struct cxl_dev_state {
> >>  	u64 next_persistent_bytes;
> >>  
> >>  	struct cxl_register_map aer_map;
> >> +	struct cxl_register_map ras_map;
> >>  
> >>  	resource_size_t component_reg_phys;
> >>  	u64 serial;
> >> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> >> index 2287b5225862..7f717fb47a36 100644
> >> --- a/drivers/cxl/pci.c
> >> +++ b/drivers/cxl/pci.c
> >> @@ -586,6 +586,78 @@ void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
> >>  }
> >>  EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
> >>  
> >> +static resource_size_t cxl_get_dport_ras_base(struct cxl_memdev *cxlmd)
> >> +{
> >> +	resource_size_t component_reg_phys, offset = 0;
> >> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> >> +	void *cap_hdr_addr, *comp_reg_mapped;
> >> +	u32 cap_hdr, ras_cap_hdr;
> >> +	int cap_ndx;
> >> +
> >> +	comp_reg_mapped = ioremap(cxlds->component_reg_phys +
> >> +				  CXL_CM_OFFSET, CXL_COMPONENT_REG_BLOCK_SIZE);
> >> +	if (!comp_reg_mapped)
> >> +		return 0;
> >> +
> >> +	cap_hdr_addr = comp_reg_mapped;
> >> +	cap_hdr = readl(cap_hdr_addr);
> >> +	for (cap_ndx = 0;
> >> +	     cap_ndx < FIELD_GET(CXL_CM_CAP_HDR_ARRAY_SIZE_MASK, cap_hdr);
> >> +	     cap_ndx++) {
> >> +		ras_cap_hdr = readl(cap_hdr_addr + cap_ndx*sizeof(u32));
> >> +
> >> +		if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) == CXL_CM_CAP_ID_RAS) {
> >> +			pr_debug("RAS cap header = %X @ %pa, cap_ndx = %d\n",
> >> +				 ras_cap_hdr, cap_hdr_addr, cap_ndx);
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	offset = CXL_CM_OFFSET + PCI_EXT_CAP_NEXT(ras_cap_hdr);
> >> +
> >> +	iounmap(comp_reg_mapped);
> >> +
> >> +	if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) != CXL_CM_CAP_ID_RAS)
> >> +		return 0;
> >> +
> >> +	pr_debug("Found RAS capability @ %llX (%X)\n",
> >> +		 component_reg_phys + offset, *((u32 *)(comp_reg_mapped + offset)));
> >> +
> >> +	return component_reg_phys + offset;
> > 
> > For the RAS capability in the cxl_pci device this patch needs to be
> > reconciled with this effort:
> > 
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-cxl%2F166336972295.3803215.1047199449525031921.stgit%40djiang5-desk3.ch.intel.com%2F&amp;data=05%7C01%7Cterry.bowman%40amd.com%7C33092f731a854d7a9a5b08dab47f1075%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638020755070581692%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=84oOACCwh4XtndFKOUV%2FltpJjFnp4lsFQgw75UMphHQ%3D&amp;reserved=0
> > 
> > I think we will want RCD and VH RAS capability reporting to happen in
> > the same place, and that can not be cxl_pci because cxl_pci has no way
> > to find the RAS registers on its own. It needs the help from cxl_mem to
> > do the upstream cxl_port associtation first.
> > 
> > Given CXL switches will have their own RAS capabilities to report it
> > feels like the cxl_port driver is where all of this should be
> > centralized.
> > 
> > 
> 
> I'm working on merging the patchsets now.
> 
> I'm merging the following:
>   Dave Jiang's onto 6.1.0-rc1+, provides RAS mapping.

Sounds like I should add this to the RCH branch so you can build on it.

>   Roberts series ontop of Dave's, provides RCD discovery.

Robert's series is still pending the rework to drop the
devm_cxl_enumerate_ports() changes, not sure it's at a state where you
can reliably build on it.

>   And this patchset ontop of Robert's, provides AER and RAS logging

As long as you are expecting to do one more rebase on the final form of
Robert's series, sounds good.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] cxl/pci: Log CXL device's PCIe AER and CXL RAS error information
  2022-10-21 18:56 ` [PATCH 5/5] cxl/pci: Log CXL device's PCIe AER and CXL RAS error information Terry Bowman
  2022-10-24 15:14   ` Jonathan Cameron
@ 2022-10-27 21:30   ` Bjorn Helgaas
  1 sibling, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2022-10-27 21:30 UTC (permalink / raw)
  To: Terry Bowman
  Cc: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams, linux-cxl, linux-kernel, bhelgaas,
	rafael, lenb, Jonathan.Cameron, dave, rrichter

On Fri, Oct 21, 2022 at 01:56:15PM -0500, Terry Bowman wrote:
> The CXL downport PCIe AER and CXL RAS capability information needs to be
> logged during PCIe AER error handling.
> 
> The existing PCIe AER error handler logs native AER errors but does not
> log upport/downport AER capability residing in the RCRB. The CXL1.1
> RCRB does not have a BDF and is not enunmerable. The existing error handler
> logic does not display CXL RAS details either.

s/enunmerable/enumerable/

The patch itself doesn't seem to reference RCRB.  What's the
connection?

Is this specific to CXL?  The base PCIe spec also documents an RCRB,
though I don't think Linux does anything with it.

I guess at least the RCRB discovery must be CXL-specific, since I have
no idea how to find a generic PCIe RCRB.

> +static void cxl_error_report(struct cxl_memdev *cxlmd)
> +{
> +	struct pci_dev *pdev = to_pci_dev(cxlmd->cxlds->dev);
> +	struct aer_capability_regs *aer_cap;
> +	struct ras_cap *ras_cap;
> +
> +	aer_cap = (struct aer_capability_regs *)cxlmd->cxlds->aer_map.base;
> +	ras_cap = (struct ras_cap *)cxlmd->cxlds->ras_map.base;

I don't think you need casts since .base is void *.

> +	pci_err(pdev, "CXL Error Report\n");
> +	pci_err(pdev, "AER Errors:\n");
> +	if (aer_cap) {
> +		cxl_print_aer(pdev, AER_CORRECTABLE, aer_cap);
> +		cxl_print_aer(pdev, AER_NONFATAL, aer_cap);
> +		cxl_print_aer(pdev, AER_FATAL, aer_cap);
> +	}
> +
> +	pci_err(pdev, "RAS Errors:\n");
> +	if (ras_cap) {
> +		pci_err(pdev, "RAS: uc_error_status = %X\n", readl(&ras_cap->uc_error_status));

"%X" will look a lot different than what cper_print_aer() logged
above.  No "0x", upper-case vs lower-case, "=" vs ":", etc.  Maybe
there should be a hint to connect RAS with CXL (maybe there's already
a dev_fmt somewhere that I missed)?

> +static void cxl_error_detected(struct pci_dev *pdev)
> +{
> +	struct cxl_memdev *cxlmd;
> +
> +	if (!is_cxl_memdev(&pdev->dev)) {
> +		pci_err(pdev, "CXL memory device is invalid\n");
> +		return;
> +	}
> +
> +	cxlmd = dev_get_drvdata(&pdev->dev);
> +	if (!cxlmd) {
> +		pci_err(pdev, "CXL memory device is NULL\n");
> +		return;
> +	}
> +
> +	if (!cxlmd->cxlds) {
> +		pci_err(pdev, "CXL device state object is NULL\n");
> +		return;
> +	}

Would these NULL pointers indicate a programming error, or do they
indicate lack of an optional feature?  If the former, I generally
prefer to just take the NULL pointer dereference oops instead of just
printing an easily-missed message.  But maybe the CXL style is to be
more defensive.

> +void cxl_print_aer(struct pci_dev *dev, int aer_severity,
> +		    struct aer_capability_regs *aer)
> +{
> +	cper_print_aer(dev, aer_severity, aer);

What is the purpose of this wrapper?  I guess you need an exported
symbol for some reason?

> +}
> +EXPORT_SYMBOL_GPL(cxl_print_aer);

> +static void report_cxl_errors(struct aer_rpc *rpc,
> +			      struct aer_err_source *e_src)
> +{
> +	struct pci_dev *pdev = rpc->rpd;
> +	struct aer_err_info e_info;
> +	u32 uncor_status, cor_status;
> +
> +	pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &uncor_status);
> +	pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_STATUS, &cor_status);

I think it's kind of an existing defect that we don't have a single
place to read these registers.  I think they should be read either in
firmware (for firmware-first error handling, where Linux basically
gets a package of these register contents) or in Linux (for native
handling).  Ideally I think these paths would converge right after
Linux reads them.

Anyway, I don't think we should read these registers *again* for CXL.
And I assume firmware-first error handling should work for CXL as well
as for base PCIe?  That would imply that we wouldn't read them at all
here for the firmware-first case.

> +	if (!uncor_status && !cor_status)
> +		return;
> +
> +	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_EC)
> +		pcie_walk_rcec(pdev, report_cxl_errors_iter, &e_info);
> +	else
> +		pci_walk_bus(pdev->subordinate, report_cxl_errors_iter, &e_info);
> +
> +	pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, uncor_status);
> +	pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_STATUS, cor_status);

Shouldn't this clearing be somehow contingent on pcie_aer_is_native()?

> +++ b/include/linux/pci.h
> @@ -827,6 +827,10 @@ enum pci_ers_result {
>  
>  /* PCI bus error event callbacks */
>  struct pci_error_handlers {
> +
> +	/* CXL error detected on this device */

Nit on the comment: calling this function doesn't imply that a CXL
error was detected; we *always* call it.  Apparently it's just an
opportunity to log any CXL-specific errors that may have occurred?

I think we need a comment about why this couldn't be done in the
existing .error_detected() callback.  I gather it might be related to
AER_CORRECTABLE errors, for which we don't call .error_detected()?

If the purpose is only to learn about correctable errors, maybe the
callback doesn't need to be CXL-specific and could be called at the
point where we test for AER_CORRECTABLE?

> +	void (*cxl_error_detected)(struct pci_dev *dev);
> +
>  	/* PCI bus error detected on this device */
>  	pci_ers_result_t (*error_detected)(struct pci_dev *dev,
>  					   pci_channel_state_t error);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
  2022-10-21 18:56 [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information Terry Bowman
                   ` (5 preceding siblings ...)
  2022-10-21 19:02 ` [PATCH 0/5] cxl: Log downport " Terry Bowman
@ 2022-10-28 12:30 ` Ariel.Sibley
  2022-10-28 14:29   ` Terry Bowman
  6 siblings, 1 reply; 25+ messages in thread
From: Ariel.Sibley @ 2022-10-28 12:30 UTC (permalink / raw)
  To: terry.bowman, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk, dan.j.williams
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

> -----Original Message-----
> From: Terry Bowman <terry.bowman@amd.com>
> Sent: Friday, October 21, 2022 3:56 PM
> To: alison.schofield@intel.com; vishal.l.verma@intel.com; dave.jiang@intel.com; ira.weiny@intel.com;
> bwidawsk@kernel.org; dan.j.williams@intel.com
> Cc: terry.bowman@amd.com; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org; bhelgaas@google.com;
> rafael@kernel.org; lenb@kernel.org; Jonathan.Cameron@huawei.com; dave@stgolabs.net; rrichter@amd.com
> Subject: [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
> 
> This patchset adds CXL downport PCI AER and CXL RAS logging to the CXL
> error handling. This is necessary for communicating CXL HW issues to users.
> The included patches find and cache pointers to the AER and CXL RAS PCIe
> capability structures. The cached pointers are then used to display the
> error information in a later patch. These changes follow the CXL
> specification, Chapter 8 'Control and Status Registers'.[1]
> 
> The first patch enables CXL1.1 RCD support through the ACPI _OSC support
> method.
> 
> The 2nd and 3rd patches find and map PCIe AER and CXL RAS capabilities.
> 
> The 4th patch enables AER error reporting.
> 
> The 5th patch adds functionality to log the PCIe AER and RAS capabilities.
> 
> TODO work remains to consolidate the HDM and CXL RAS register mapping
> (patch#3). The current CXL RAS register mapping will be replaced to reuse
> cxl_probe_component_regs() function as David Jiang and Alison Schofield
> upstreamed. Should the same be done for the AER registers (patch#2)? The
> AER registers are not in the component register block but are instead in
> the downport and upport (RCRB).

The RCD's AER registers are not in either the component register block or
RCRB. They are in the RCiEP config space.

Per CXL 3.0 Section 12.2.1.2 RCD Upstream Port-detected Errors:
"2. Upstream Port RCRB shall not implement the AER Extended Capability."
...
"4. CXL.io Functions log the received message in their respective AER Extended
Capability."

> 
> TODO work remains to add support for upports in some cases here where
> downport is addressed. For instance, will need another aer_map to support
> upport AER ?
> 
> TODO work to support CXL2.0. Should be trivial since aer_cap and aer_stats
> is member of 'struct pci_dev'.
> 
> Base is from: https://patchwork.kernel.org/project/cxl/list/?series=686272
> 
> [1] - https://www.computeexpresslink.org/spec-landing
> 
> Terry Bowman (5):
>   cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
>   cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
>   cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
>   cxl/pci: Enable RCD dport AER reporting
>   cxl/pci: Log CXL device's PCIe AER and CXL RAS error information
> 
>  drivers/acpi/pci_root.c |   1 +
>  drivers/cxl/acpi.c      |  56 +++++++
>  drivers/cxl/core/regs.c |   1 +
>  drivers/cxl/cxl.h       |  13 ++
>  drivers/cxl/cxlmem.h    |   3 +
>  drivers/cxl/mem.c       |   2 +
>  drivers/cxl/pci.c       | 319 ++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pcie/aer.c  |  45 +++++-
>  include/linux/pci.h     |   4 +
>  9 files changed, 443 insertions(+), 1 deletion(-)
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
  2022-10-21 18:56 ` [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers Terry Bowman
  2022-10-22 22:44   ` Dan Williams
@ 2022-10-28 12:53   ` Ariel.Sibley
  2022-10-28 14:46     ` Terry Bowman
  1 sibling, 1 reply; 25+ messages in thread
From: Ariel.Sibley @ 2022-10-28 12:53 UTC (permalink / raw)
  To: terry.bowman, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk, dan.j.williams
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

> -----Original Message-----
> From: Terry Bowman <terry.bowman@amd.com>
> Sent: Friday, October 21, 2022 3:56 PM
> To: alison.schofield@intel.com; vishal.l.verma@intel.com; dave.jiang@intel.com; ira.weiny@intel.com;
> bwidawsk@kernel.org; dan.j.williams@intel.com
> Cc: terry.bowman@amd.com; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org; bhelgaas@google.com;
> rafael@kernel.org; lenb@kernel.org; Jonathan.Cameron@huawei.com; dave@stgolabs.net; rrichter@amd.com
> Subject: [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
> 
> CXL RAS information resides in a RAS capability structure located in
> CXL.cache and CXL.mem registers.[1] The RAS capability provides CXL
> specific error information that can be helpful in debugging. This
> information is not currently logged but needs to be logged during PCIe AER
> error handling.
> 
> Update the CXL driver to find and cache a pointer to the CXL RAS
> capability. The RAS registers resides in the downport's component register
> block. Note:RAS registers are not in the upport. The component registers
> can be found by first using the RCRB to goto the downport. Next, the
> downport's 64-bit BAR[0] will point to the component register block.

I realize this patch is for dport only, but regarding "Note:RAS registers
are not in the upport.", the upstream port also has RAS registers.

Per CXL 3.0 Section 12.2.1.2 RCD Upstream Port-detected Errors:
"1. If a CXL.cache or CXL.mem logic block in UPZ detects a protocol or link
error, the block shall log the error in the CXL RAS Capability (see Section
8.2.4.16)."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
  2022-10-28 12:30 ` Ariel.Sibley
@ 2022-10-28 14:29   ` Terry Bowman
  2022-10-28 16:37     ` Ariel.Sibley
  0 siblings, 1 reply; 25+ messages in thread
From: Terry Bowman @ 2022-10-28 14:29 UTC (permalink / raw)
  To: Ariel.Sibley, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk, dan.j.williams
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

Hi Ariel,

On 10/28/22 07:30, Ariel.Sibley@microchip.com wrote:
>> -----Original Message-----
>> From: Terry Bowman <terry.bowman@amd.com>
>> Sent: Friday, October 21, 2022 3:56 PM
>> To: alison.schofield@intel.com; vishal.l.verma@intel.com; dave.jiang@intel.com; ira.weiny@intel.com;
>> bwidawsk@kernel.org; dan.j.williams@intel.com
>> Cc: terry.bowman@amd.com; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org; bhelgaas@google.com;
>> rafael@kernel.org; lenb@kernel.org; Jonathan.Cameron@huawei.com; dave@stgolabs.net; rrichter@amd.com
>> Subject: [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
>>
>> This patchset adds CXL downport PCI AER and CXL RAS logging to the CXL
>> error handling. This is necessary for communicating CXL HW issues to users.
>> The included patches find and cache pointers to the AER and CXL RAS PCIe
>> capability structures. The cached pointers are then used to display the
>> error information in a later patch. These changes follow the CXL
>> specification, Chapter 8 'Control and Status Registers'.[1]
>>
>> The first patch enables CXL1.1 RCD support through the ACPI _OSC support
>> method.
>>
>> The 2nd and 3rd patches find and map PCIe AER and CXL RAS capabilities.
>>
>> The 4th patch enables AER error reporting.
>>
>> The 5th patch adds functionality to log the PCIe AER and RAS capabilities.
>>
>> TODO work remains to consolidate the HDM and CXL RAS register mapping
>> (patch#3). The current CXL RAS register mapping will be replaced to reuse
>> cxl_probe_component_regs() function as David Jiang and Alison Schofield
>> upstreamed. Should the same be done for the AER registers (patch#2)? The
>> AER registers are not in the component register block but are instead in
>> the downport and upport (RCRB).
> 
> The RCD's AER registers are not in either the component register block or
> RCRB. They are in the RCiEP config space.
> 
> Per CXL 3.0 Section 12.2.1.2 RCD Upstream Port-detected Errors:
> "2. Upstream Port RCRB shall not implement the AER Extended Capability."
> ...
> "4. CXL.io Functions log the received message in their respective AER Extended
> Capability."
> 

I based this comment on CXL3.0 8.2.1.1 "RCH Downstream Port RCRB":

"The RCH Downstream Port RCRB is a 4-KB memory region that contains
registers based upon the PCIe-defined registers for a root port... The
RCH Downstream Port supported PCIe capabilities and extended
capabilities are listed in Table 8-18"

And Table 8-18 includes 'Advanced Error Reporting
Extended Capability' with no exceptions.

The RCD upstream port needs to be removed from my comment. Thank you for
pointing that out. My understanding is the RCH downstream port does
include the AER registers.

Regards,
Terry

>>
>> TODO work remains to add support for upports in some cases here where
>> downport is addressed. For instance, will need another aer_map to support
>> upport AER ?
>>
>> TODO work to support CXL2.0. Should be trivial since aer_cap and aer_stats
>> is member of 'struct pci_dev'.
>>
>> Base is from: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Fcxl%2Flist%2F%3Fseries%3D686272&amp;data=05%7C01%7Cterry.bowman%40amd.com%7C121bfa9df0c44b311aef08dab8e03663%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638025570444835378%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ckPk6RyL61lsX%2BNYKLQ%2FzRgA2424ccLj%2B6FLG9K6Sdc%3D&amp;reserved=0
>>
>> [1] - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.computeexpresslink.org%2Fspec-landing&amp;data=05%7C01%7Cterry.bowman%40amd.com%7C121bfa9df0c44b311aef08dab8e03663%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638025570444835378%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2For6BQCHX616kZL%2BFbSqOqT7hQYntiJYD%2BnpWTKkDXE%3D&amp;reserved=0
>>
>> Terry Bowman (5):
>>   cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
>>   cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
>>   cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
>>   cxl/pci: Enable RCD dport AER reporting
>>   cxl/pci: Log CXL device's PCIe AER and CXL RAS error information
>>
>>  drivers/acpi/pci_root.c |   1 +
>>  drivers/cxl/acpi.c      |  56 +++++++
>>  drivers/cxl/core/regs.c |   1 +
>>  drivers/cxl/cxl.h       |  13 ++
>>  drivers/cxl/cxlmem.h    |   3 +
>>  drivers/cxl/mem.c       |   2 +
>>  drivers/cxl/pci.c       | 319 ++++++++++++++++++++++++++++++++++++++++
>>  drivers/pci/pcie/aer.c  |  45 +++++-
>>  include/linux/pci.h     |   4 +
>>  9 files changed, 443 insertions(+), 1 deletion(-)
>>
>> --
>> 2.34.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability
  2022-10-27 14:52   ` Bjorn Helgaas
@ 2022-10-28 14:38     ` Terry Bowman
  0 siblings, 0 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-28 14:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: alison.schofield, vishal.l.verma, dave.jiang, ira.weiny,
	bwidawsk, dan.j.williams, linux-cxl, linux-kernel, bhelgaas,
	rafael, lenb, Jonathan.Cameron, dave, rrichter

Hi Bjorn,


On 10/27/22 09:52, Bjorn Helgaas wrote:
> On Fri, Oct 21, 2022 at 01:56:12PM -0500, Terry Bowman wrote:
>> CXL downport PCIe AER information needs to be logged during error handling.
>> The RCD downport/upport does not have a BDF and is not PCI enumerable. As a
>> result the CXL PCIe driver is not aware of the AER in 'PCI Express'
>> capability located in the RCRB downport/upport. Logic must be introduced to
>> use the downport/upport AER information.
> 
> I assume "downport" is the same as "dport" in "cxl_dport" and means
> "Downstream Port".  Might be nice to reduce the number of variations
> if feasible.
> 

Yes, I'll update the terminology throughout to use cxl_port, cxl_dport,
and cxl_uport.

>> +static resource_size_t cxl_get_dport_cap(struct cxl_memdev *cxlmd, int cap_id)
>> +{
>> +	resource_size_t offset, rcrb;
>> +	void *rcrb_mapped;
>> +	u32 cap_hdr;
>> +
>> +	rcrb = cxl_get_rcrb(cxlmd);
>> +	if (!rcrb)
>> +		return 0;
>> +
>> +	rcrb_mapped = ioremap(rcrb, SZ_4K);
>> +	if (!rcrb_mapped)
>> +		return 0;
>> +
>> +	offset = readl(rcrb_mapped + PCI_CAPABILITY_LIST);
>> +	cap_hdr = readl(rcrb_mapped + offset);
>> +
>> +	while (PCI_CAP_ID(cap_hdr)) {
>> +		if (PCI_CAP_ID(cap_hdr) == cap_id)
>> +			break;
>> +
>> +		offset = PCI_CAP_NEXT(cap_hdr);
>> +		if (offset == 0)
>> +			break;
>> +
>> +		cap_hdr = readl(rcrb_mapped + offset);
>> +	}
>> +	iounmap((void *)rcrb_mapped);
>> +
>> +	if (PCI_CAP_ID(cap_hdr) != cap_id)
>> +		return 0;
>> +
>> +	pr_debug("Found capability %X @ %llX (%X)\n",
>> +		 cap_id, rcrb + offset, cap_hdr);
> 
> Would be nice to use dev_dbg() if possible here.
> 
> Is "%X" (upper-case hex) the convention in CXL?  Most places in Linux
> seem to use "%x".  Also consider "%#x" (or "%#X") so it's obvious
> these are hex.
> 

Ok, I will make the change.

>> +void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
>> +{
>> +	resource_size_t cap_base;
>> +
>> +	/* CXL2.0 is enumerable and will use AER attached to `struct pci_dev` */
>> +	if (!is_rcd(cxlmd))
>> +		return;
>> +
>> +	/*
>> +	 * Read base address of the PCI express cap. Cache the cap's
>> +	 * PCI_EXP_DEVCTL and PCI_EXP_DEVSTA for AER control and status.
>> +	 */
>> +	cap_base = cxl_get_dport_cap(cxlmd, PCI_CAP_ID_EXP);
>> +	cxl_setup_dport_aer(cxlmd, cap_base);
> 
> I don't see anything about PCI_EXP_DEVCTL and PCI_EXP_DEVSTA in
> cxl_get_dport_cap() or cxl_setup_dport_aer().  And I don't see any
> caching, except for setting map->base in cxl_setup_dport_aer().
> 
> Caching those registers, especially PCI_EXP_DEVSTA, doesn't seem like
> it would make much sense anyway since bits there are set by hardware
> when things happen.

This comment needs to be moved to later patch enabling AER.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
  2022-10-28 12:53   ` Ariel.Sibley
@ 2022-10-28 14:46     ` Terry Bowman
  0 siblings, 0 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-28 14:46 UTC (permalink / raw)
  To: Ariel.Sibley, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk, dan.j.williams
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

Hi Ariel,


On 10/28/22 07:53, Ariel.Sibley@microchip.com wrote:
>> -----Original Message-----
>> From: Terry Bowman <terry.bowman@amd.com>
>> Sent: Friday, October 21, 2022 3:56 PM
>> To: alison.schofield@intel.com; vishal.l.verma@intel.com; dave.jiang@intel.com; ira.weiny@intel.com;
>> bwidawsk@kernel.org; dan.j.williams@intel.com
>> Cc: terry.bowman@amd.com; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org; bhelgaas@google.com;
>> rafael@kernel.org; lenb@kernel.org; Jonathan.Cameron@huawei.com; dave@stgolabs.net; rrichter@amd.com
>> Subject: [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
>>
>> CXL RAS information resides in a RAS capability structure located in
>> CXL.cache and CXL.mem registers.[1] The RAS capability provides CXL
>> specific error information that can be helpful in debugging. This
>> information is not currently logged but needs to be logged during PCIe AER
>> error handling.
>>
>> Update the CXL driver to find and cache a pointer to the CXL RAS
>> capability. The RAS registers resides in the downport's component register
>> block. Note:RAS registers are not in the upport. The component registers
>> can be found by first using the RCRB to goto the downport. Next, the
>> downport's 64-bit BAR[0] will point to the component register block.
> 
> I realize this patch is for dport only, but regarding "Note:RAS registers
> are not in the upport.", the upstream port also has RAS registers.
> 

Correct. Thanks for pointing this out.

> Per CXL 3.0 Section 12.2.1.2 RCD Upstream Port-detected Errors:
> "1. If a CXL.cache or CXL.mem logic block in UPZ detects a protocol or link
> error, the block shall log the error in the CXL RAS Capability (see Section
> 8.2.4.16)."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
  2022-10-28 14:29   ` Terry Bowman
@ 2022-10-28 16:37     ` Ariel.Sibley
  0 siblings, 0 replies; 25+ messages in thread
From: Ariel.Sibley @ 2022-10-28 16:37 UTC (permalink / raw)
  To: Terry.Bowman, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk, dan.j.williams
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter

> -----Original Message-----
> From: Terry Bowman <Terry.Bowman@amd.com>
> Sent: Friday, October 28, 2022 11:30 AM
> To: Ariel Sibley - C33371 <Ariel.Sibley@microchip.com>; alison.schofield@intel.com;
> vishal.l.verma@intel.com; dave.jiang@intel.com; ira.weiny@intel.com; bwidawsk@kernel.org;
> dan.j.williams@intel.com
> Cc: linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org; bhelgaas@google.com; rafael@kernel.org;
> lenb@kernel.org; Jonathan.Cameron@huawei.com; dave@stgolabs.net; rrichter@amd.com
> Subject: Re: [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi Ariel,
> 
> On 10/28/22 07:30, Ariel.Sibley@microchip.com wrote:
> >> -----Original Message-----
> >> From: Terry Bowman <terry.bowman@amd.com>
> >> Sent: Friday, October 21, 2022 3:56 PM
> >> To: alison.schofield@intel.com; vishal.l.verma@intel.com; dave.jiang@intel.com; ira.weiny@intel.com;
> >> bwidawsk@kernel.org; dan.j.williams@intel.com
> >> Cc: terry.bowman@amd.com; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org; bhelgaas@google.com;
> >> rafael@kernel.org; lenb@kernel.org; Jonathan.Cameron@huawei.com; dave@stgolabs.net; rrichter@amd.com
> >> Subject: [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information
> >>
> >> This patchset adds CXL downport PCI AER and CXL RAS logging to the CXL
> >> error handling. This is necessary for communicating CXL HW issues to users.
> >> The included patches find and cache pointers to the AER and CXL RAS PCIe
> >> capability structures. The cached pointers are then used to display the
> >> error information in a later patch. These changes follow the CXL
> >> specification, Chapter 8 'Control and Status Registers'.[1]
> >>
> >> The first patch enables CXL1.1 RCD support through the ACPI _OSC support
> >> method.
> >>
> >> The 2nd and 3rd patches find and map PCIe AER and CXL RAS capabilities.
> >>
> >> The 4th patch enables AER error reporting.
> >>
> >> The 5th patch adds functionality to log the PCIe AER and RAS capabilities.
> >>
> >> TODO work remains to consolidate the HDM and CXL RAS register mapping
> >> (patch#3). The current CXL RAS register mapping will be replaced to reuse
> >> cxl_probe_component_regs() function as David Jiang and Alison Schofield
> >> upstreamed. Should the same be done for the AER registers (patch#2)? The
> >> AER registers are not in the component register block but are instead in
> >> the downport and upport (RCRB).
> >
> > The RCD's AER registers are not in either the component register block or
> > RCRB. They are in the RCiEP config space.
> >
> > Per CXL 3.0 Section 12.2.1.2 RCD Upstream Port-detected Errors:
> > "2. Upstream Port RCRB shall not implement the AER Extended Capability."
> > ...
> > "4. CXL.io Functions log the received message in their respective AER Extended
> > Capability."
> >
> 
> I based this comment on CXL3.0 8.2.1.1 "RCH Downstream Port RCRB":
> 
> "The RCH Downstream Port RCRB is a 4-KB memory region that contains
> registers based upon the PCIe-defined registers for a root port... The
> RCH Downstream Port supported PCIe capabilities and extended
> capabilities are listed in Table 8-18"
> 
> And Table 8-18 includes 'Advanced Error Reporting
> Extended Capability' with no exceptions.
> 
> The RCD upstream port needs to be removed from my comment. Thank you for
> pointing that out. My understanding is the RCH downstream port does
> include the AER registers.
> 
> Regards,
> Terry
> 

Thanks Terry. Yes, the RCH downstream port RCRB must contain AER registers
due to this text in CXL 3.0 Section 12.2.1.1 RCH Downstream Port-detected Errors:
"1. DPA CXL.io-detected errors are logged in the local AER Extended Capability
in DPA RCRB. Software must ensure that the Root Port Control register in the
DPA AER Extended Capability are not configured to generate interrupts."

However, the text in the Exceptions column in Table 8-18 for AER is "N/A. Software
should ignore.", which I find somewhat confusing given the text in Section
12.2.1.1. However, since the text in 12.2.1.1 is explicit, the AER registers
must be there.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers
  2022-10-27 20:32       ` Dan Williams
@ 2022-10-31 16:17         ` Terry Bowman
  0 siblings, 0 replies; 25+ messages in thread
From: Terry Bowman @ 2022-10-31 16:17 UTC (permalink / raw)
  To: Dan Williams, alison.schofield, vishal.l.verma, dave.jiang,
	ira.weiny, bwidawsk
  Cc: linux-cxl, linux-kernel, bhelgaas, rafael, lenb,
	Jonathan.Cameron, dave, rrichter



On 10/27/22 15:32, Dan Williams wrote:
> Terry Bowman wrote:
>>
>>
>> On 10/22/22 17:44, Dan Williams wrote:
>>> Terry Bowman wrote:
>>>> CXL RAS information resides in a RAS capability structure located in
>>>> CXL.cache and CXL.mem registers.[1] The RAS capability provides CXL
>>>> specific error information that can be helpful in debugging. This
>>>> information is not currently logged but needs to be logged during PCIe AER
>>>> error handling.
>>>>
>>>> Update the CXL driver to find and cache a pointer to the CXL RAS
>>>> capability. The RAS registers resides in the downport's component register
>>>> block. Note:RAS registers are not in the upport. The component registers
>>>> can be found by first using the RCRB to goto the downport. Next, the
>>>> downport's 64-bit BAR[0] will point to the component register block.
>>>>
>>>> [1] CXL3.0 Spec, '8.2.5 CXL.cache and CXL.mem Registers'
>>>>
>>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>>>> ---
>>>>  drivers/cxl/cxl.h    |  4 +++
>>>>  drivers/cxl/cxlmem.h |  1 +
>>>>  drivers/cxl/pci.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 77 insertions(+)
>>>>
>>>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>>>> index 7d507ab80a78..69b50131ad86 100644
>>>> --- a/drivers/cxl/cxl.h
>>>> +++ b/drivers/cxl/cxl.h
>>>> @@ -36,6 +36,10 @@
>>>>  #define   CXL_CM_CAP_CAP_ID_HDM 0x5
>>>>  #define   CXL_CM_CAP_CAP_HDM_VERSION 1
>>>>  
>>>> +/* CXL 3.0 8.2.4.2 CXL RAS Capability Header */
>>>> +#define CXL_CM_CAP_ID_RAS 0x2
>>>> +#define CXL_CM_CAP_SIZE_RAS 0x5C
>>>> +
>>>>  /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
>>>>  #define CXL_HDM_DECODER_CAP_OFFSET 0x0
>>>>  #define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
>>>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>>>> index 079db2e15acc..515273e224ea 100644
>>>> --- a/drivers/cxl/cxlmem.h
>>>> +++ b/drivers/cxl/cxlmem.h
>>>> @@ -243,6 +243,7 @@ struct cxl_dev_state {
>>>>  	u64 next_persistent_bytes;
>>>>  
>>>>  	struct cxl_register_map aer_map;
>>>> +	struct cxl_register_map ras_map;
>>>>  
>>>>  	resource_size_t component_reg_phys;
>>>>  	u64 serial;
>>>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>>>> index 2287b5225862..7f717fb47a36 100644
>>>> --- a/drivers/cxl/pci.c
>>>> +++ b/drivers/cxl/pci.c
>>>> @@ -586,6 +586,78 @@ void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
>>>>  }
>>>>  EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
>>>>  
>>>> +static resource_size_t cxl_get_dport_ras_base(struct cxl_memdev *cxlmd)
>>>> +{
>>>> +	resource_size_t component_reg_phys, offset = 0;
>>>> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>>>> +	void *cap_hdr_addr, *comp_reg_mapped;
>>>> +	u32 cap_hdr, ras_cap_hdr;
>>>> +	int cap_ndx;
>>>> +
>>>> +	comp_reg_mapped = ioremap(cxlds->component_reg_phys +
>>>> +				  CXL_CM_OFFSET, CXL_COMPONENT_REG_BLOCK_SIZE);
>>>> +	if (!comp_reg_mapped)
>>>> +		return 0;
>>>> +
>>>> +	cap_hdr_addr = comp_reg_mapped;
>>>> +	cap_hdr = readl(cap_hdr_addr);
>>>> +	for (cap_ndx = 0;
>>>> +	     cap_ndx < FIELD_GET(CXL_CM_CAP_HDR_ARRAY_SIZE_MASK, cap_hdr);
>>>> +	     cap_ndx++) {
>>>> +		ras_cap_hdr = readl(cap_hdr_addr + cap_ndx*sizeof(u32));
>>>> +
>>>> +		if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) == CXL_CM_CAP_ID_RAS) {
>>>> +			pr_debug("RAS cap header = %X @ %pa, cap_ndx = %d\n",
>>>> +				 ras_cap_hdr, cap_hdr_addr, cap_ndx);
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	offset = CXL_CM_OFFSET + PCI_EXT_CAP_NEXT(ras_cap_hdr);
>>>> +
>>>> +	iounmap(comp_reg_mapped);
>>>> +
>>>> +	if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, ras_cap_hdr) != CXL_CM_CAP_ID_RAS)
>>>> +		return 0;
>>>> +
>>>> +	pr_debug("Found RAS capability @ %llX (%X)\n",
>>>> +		 component_reg_phys + offset, *((u32 *)(comp_reg_mapped + offset)));
>>>> +
>>>> +	return component_reg_phys + offset;
>>>
>>> For the RAS capability in the cxl_pci device this patch needs to be
>>> reconciled with this effort:
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-cxl%2F166336972295.3803215.1047199449525031921.stgit%40djiang5-desk3.ch.intel.com%2F&amp;data=05%7C01%7CTerry.Bowman%40amd.com%7C17f2392d8abc4fcc3d0608dab85a586e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638024995411582774%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=d64IsDCPTyPaBmsUYXlvn0rBMyaG8AxYHMuKHTSTOs8%3D&amp;reserved=0
>>>
>>> I think we will want RCD and VH RAS capability reporting to happen in
>>> the same place, and that can not be cxl_pci because cxl_pci has no way
>>> to find the RAS registers on its own. It needs the help from cxl_mem to
>>> do the upstream cxl_port associtation first.
>>>
>>> Given CXL switches will have their own RAS capabilities to report it
>>> feels like the cxl_port driver is where all of this should be
>>> centralized.
>>>
>>>
>>
>> I'm working on merging the patchsets now.
>>
>> I'm merging the following:
>>   Dave Jiang's onto 6.1.0-rc1+, provides RAS mapping.
> 
> Sounds like I should add this to the RCH branch so you can build on it.
> 
>>   Roberts series ontop of Dave's, provides RCD discovery.
> 
> Robert's series is still pending the rework to drop the
> devm_cxl_enumerate_ports() changes, not sure it's at a state where you
> can reliably build on it.
> 
>>   And this patchset ontop of Robert's, provides AER and RAS logging
> 
> As long as you are expecting to do one more rebase on the final form of
> Robert's series, sounds good.

I found there is some work to manually resolve merge conflicts between the 2 base 
patchsets. I will change directions and submit for review using Dave Jiang's 
patchset (using the URL you provided above). 

Regards,
Terry



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-10-31 16:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21 18:56 [PATCH 0/5] cxl: Log downport PCIe AER and CXL RAS error information Terry Bowman
2022-10-21 18:56 ` [PATCH 1/5] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Terry Bowman
2022-10-21 22:39   ` Dan Williams
2022-10-25 16:23     ` Terry Bowman
2022-10-21 18:56 ` [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability Terry Bowman
2022-10-22 21:45   ` Dan Williams
2022-10-25 16:42     ` Terry Bowman
2022-10-25 18:21       ` Dan Williams
2022-10-27 14:52   ` Bjorn Helgaas
2022-10-28 14:38     ` Terry Bowman
2022-10-21 18:56 ` [PATCH 3/5] cxl/pci: Discover and cache pointer to RCD dport's CXL RAS registers Terry Bowman
2022-10-22 22:44   ` Dan Williams
2022-10-26 19:01     ` Terry Bowman
2022-10-27 20:32       ` Dan Williams
2022-10-31 16:17         ` Terry Bowman
2022-10-28 12:53   ` Ariel.Sibley
2022-10-28 14:46     ` Terry Bowman
2022-10-21 18:56 ` [PATCH 4/5] cxl/pci: Enable RCD dport AER reporting Terry Bowman
2022-10-21 18:56 ` [PATCH 5/5] cxl/pci: Log CXL device's PCIe AER and CXL RAS error information Terry Bowman
2022-10-24 15:14   ` Jonathan Cameron
2022-10-27 21:30   ` Bjorn Helgaas
2022-10-21 19:02 ` [PATCH 0/5] cxl: Log downport " Terry Bowman
2022-10-28 12:30 ` Ariel.Sibley
2022-10-28 14:29   ` Terry Bowman
2022-10-28 16:37     ` Ariel.Sibley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).